General IPDPS Info

Sponsors


IN COOPERATION WITH

ACM

ACM SIGARCH

and

TCCA.png

TCDP.png

IPDPS 2024 Advance Program

Please visit the IPDPS website regularly for updates, including schedule revisions.

For workshop programs and schedules, click link to their website below.

Authors who have corrections should send email to contact@ipdps.org giving full details.

(Updated as of 15 May 2024)

MONDAY - 27 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

MONDAY
Work
shops

 

ALL DAY

 

See each individual
workshop program
for schedule details

1

HCW

Heterogeneity in Computing Workshop

2

RAW

Reconfigurable Architectures Workshop

3

APDCM

Advances in Parallel and Distributed Computational Models

4

AsHES

Accelerators and Hybrid Emerging Systems

5

EduPar

NSF/TCPP Workshop on Parallel and Distributed Computing Education

6

ESSA

Extreme-Scale Storage and Analysis

7

GrAPL

Graphs, Architectures, Programming, and Learning

8

HiCOMB

High Performance Computational Biology

9

PAISE

Parallel AI and Systems for the Edge

Reception
6:00 PM -7:30 PM

IPDPS - TCPP Welcome Reception


TUESDAY - 28 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Opening Session
8:15 AM - 8:30 AM

Opening Session

Keynote Session
8:30 AM - 9:30 AM

KEYNOTE SPEECH

Session Chair: DK Panda

 

AuroraGPT: Exploring AI Assistant for Science

 

Franck Cappello *
Argonne National Laboratory

 

* Recipient of the 2024 IEEE Charles Babbage Award

 

Abstract: Innovative methods, new instruments, disruptive techniques, and groundbreaking technologies have led to significant leaps in scientific progress. The increasingly powerful Large Language Models (LLMs) released each…

 

Read more information

Morning Break 9:30 AM -10:00 AM

All Day

Main Conference Poster-Accept Papers

 

See listing here. Posters on Display in Ballroom Foyer

Parallel Technical
Sessions 1A & 1B

10:00 AM - 12:00 PM

Session 1A: Numerical Linear Algebra


Session Chair:
Grey Ballard

  • PckGNN: Optimizing Aggregation Operators with Packing Strategies in Graph Neural Networks
     Zhengding Hu, Jingwei Sun, Zhongyang Li, Guangzhong Sun (University of Science and Technology of China)
  • VNEC: A Vectorized Non-Empty Column Format for SpMV on CPUs
    Luhan Wang, Haipeng Jia, Lei Xu, Cunyang Wei (Institute of Computing Technology, Chinese Academy of Sciences); Kun Li (Microsoft Research); Xianmeng Jiang, Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences)
  • Two-Stage Block Orthogonalization to Improve Performance of s-step GMRES    
    Ichitaro Yamazaki (SNL); Andrew J. Higgins (Temple University); Erik G. Boman (SNL); Daniel B. Szyld (Temple University)
  • Alternative Basis Matrix Multiplication is Fast and Stable
    Oded Schwartz (The Hebrew University of Jerusalem); Sivan Toledo (Tel Aviv University); Noa Vaknin (The Hebrew University of Jerusalem); Gal Wiernik (Tel Aviv University)
  • Fast multiplication of random dense matrices with sparse matrices
    Tianyu Liang, Riley Murray, Aydin Buluc, James Demmel (UC Berkeley)
  • A Cholesky QR Type Algorithm for Computing Tall-Skinny QR Factorization with Column Pivoting
    Takeshi Fukaya (Hokkaido University); Yuji Nakatsukasa (University of Oxford); Yusaku Yamamoto (The University of Electro-Communications)


Session 1B: Containers and Serverless Computing


Session Chair:
Alfredo Goldman
  • CKSM: An Efficient Memory Deduplication Method for Container-based Cloud Computing Systems          
    Yunfei Gu, Yihui Lu, Chentao Wu, Jie Li, Minyi Guo (Shanghai Jiao Tong University)

  • Tackling Cold Start in Serverless Computing with Multi-Layer Container Reuse      
    Amelie Chi Zhou (Hong Kong,  Baptist University); Rongzheng Huang (Shenzhen University); Zhoubin Ke (Shenzhen University); Yusen Li (Nankai University); Yi Wang, Rui Mao (Shenzhen University)

  • PALDIA: Enabling SLO-Compliant and Cost-Effective Serverless Computing on Heterogeneous Hardware
    Vivek M. Bhasi, Aakash Sharma, Shruti Mohanty, Mahmut Taylan Kandemir, Chita R. Das (The Pennsylvania State University)

  • Application-Attuned Memory Management for Containerized HPC Workflows
    Moiz Arif, Avinash Maurya, M. Mustafa Rafique (Rochester Institute of Technology); Dimitrios S. Nikolopoulos (Virginia Tech); Ali R. Butt (Rochester Institute of Technology)

  • FEDGE: An Interference-Aware QoS Prediction Framework for Black-Box Scenario in IaaS Clouds with Domain Generalization
    Yunlong Cheng, Xiuqi Huang, Zifeng Liu, Jiadong Chen, Xiaofeng Gao (Shanghai Jiao Tong University); Zhen Fang, Yongqiang Yang (Huawei)

  • Software Resource Disaggregation for HPC with Serverless Computing
    Marcin Copik, Marcin Chrapek (ETH Zürich); Larissa Schmid (Karlsruhe Institute of Technology); Alexandru Calotoiu, Torsten Hoefler (ETH Zürich)                     

12:00 PM – 1:30 PM

Lunch & PhD Program

Parallel Technical
Sessions 2A & 2B

1:30 PM – 2:30 PM

Session 2A: Algorithms on Trees

 

Session Chair: Cynthia Phillips

  • AMST: Accelerating Large-Scale Graph Minimum Spanning Tree Computation on FPGA
    Haishuang Fan, Rui Meng, Qichu Sun, Jingya Wu, Xiaowei Li, Guihai Yan (State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences)

  • Wait-free trees supporting asymptotically efficient range queries
    Ilya Kokorin (ITMO University); Victor Yudov (ITMO University); Vitaly Aksenov (City, University of London); Dan Alistarh (ISTA)

  • Low-Depth Spatial Tree Algorithms
    Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski (ETH Zurich)


Session 2B: Federated and Distributed Learning

 

Session Chair: Amelie Zhou

  • QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices
    Juntao Zhao, Borui Wan (The University of Hong Kong); Yanghua Peng, Haibin Lin, Yibo Zhu (ByteDance Inc.) Chuan Wu (The University of Hong Kong)

  • Enhancing the Generalization of Personalized Federated Learning with Multi-head Model and Ensemble Voting
    Van An Le (National Institute of Advanced Industrial Science and Technology, Japan); Nam Duong Tran, Phuong Nam Nguyen, Thanh Hung Nguyen, Phi Le Nguyen (Hanoi University of Science and Technology, Vietnam) Truong Thao Nguyen (National Institute of Advanced Industrial Science and Technology, Japan); Yusheng Ji (National Institute of Informatics, Japan);

  • UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving
    Yifei Li (Southern University of Science and Technology); Ryan Chard (Argonne National Laboratory); Yadu Babuji, Kyle Chard (University of Chicago); Ian Foster (Argonne National Laboratory); Zhuozhao Li (Southern University of Science and Technology)

Parallel Technical
Sessions 3A & 3B

2:30 PM – 4:10 PM

Session 3A: Applications I

 

Session Chair: Edgar Solomonik

  • Scalable and Differentiable Simulator for Quantum Computational Chemistry
    Zhiqian Xu (Institute of Computing Technology, Chinese Academy of Sciences); Honghui Shang, Yi Fan, Xiongzhi Zeng (University of Science and Technology of China); Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Chu Guo (Hunan normal University)
  • Picasso: Memory-Efficient Graph Coloring Using Palettes With Applications in Quantum Computing
    S.M. Ferdous (Pacific Northwest National Laboratory); Reece Neff (North Carolina State University); Bo Peng, Salman Shuvo, Marco Minutoli, Sayak Mukherjee, Karol Kowalski (Pacific Northwest National Laboratory);  Michela Becchi (North Carolina State University);  Mahantesh Halappanavar, (Pacific Northwest National Laboratory
  • Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging
    Niteya Shah (Virginia Tech); Christine Sweeney (Los Alamos National Laboratory);Vinay Ramakrishnaiah (Los Alamos National Laboratory); Jeffrey Donatelli (Lawrence Berkeley National Laboratory); Wu-chun Feng (Virginia Tech)
  • Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications
    Xiran Zhang, Sameh Abdulah (King Abdullah University of Science and Technology); Jian Cao (University of Houston); Hatem Ltaief,  Ying Sun, Marc G. Genton, David E. Keyes (King Abdullah University of Science and Technology)
  • Enabling High-Performance Physical Based Rendering on New Sunway Supercomputer
    Zeyu Song, Lin Gan, Shengye Xiang, Yinuo Wang (Tsinghua University); Xiaohui Duan (Shandong University); Guangwen Yang (Tsinghua University)


Session 3B: Scheduling I

 

Session Chair: Oguz Selvitopi

  • CoCG: Fine-grained Cloud Game Co-location on Heterogeneous Platform 
    Taolei Wang, Chao Li, Jing Wang, Cheng Xu, Xiaofeng Hou, Minyi Guo (Shanghai Jiao Tong University)
  • Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources
    Thanh Son Phung, Douglas Thain (University of Notre Dame)
  • nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling
    David Álvarez, Kevin Sala, Vicenç Beltran (Barcelona Supercomputing Center)
  • SWEEP: Adaptive Task Scheduling for Exploring Energy Performance Trade-offs
    Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs (Chalmers University of Technology)
  • Interpretable Analysis of Production GPU Clusters Monitoring Data via Association Rule Mining Baolin Li (Northeastern University); Siddharth Samsi (MIT); Vijay Gadepally (MIT Lincoln Laboratory); Devesh Tiwari (Northeastern University)

Late Afternoon Break 4:10 PM – 4:40 PM

PLENARY Session:
Best Papers
4:40 PM - 6:40 PM

Best Paper Nominees

 

Session Chair: Umit Catalyurek

  • CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
    Jan Laukemann, Thomas Gruber, Georg Hager (University of Erlangen-Nuremberg); Dossay Oryspayev (Brookhaven National Laboratory); Gerhard Wellein (Erlangen National High Performance Computing Center)

  • ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor
    Yi-chien Lin (University of Southern California); Yuyang Chen (Tsinghua University); Sameh Gobriel, Nilesh Jain, Gopi Krishna Jha (Intel); Viktor Prasanna (University of Southern California)

  • Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures
    Yuke Li, Arjun Kashyap, Weicong Chen (University of California, Merced); Yanfei Guo (Argonne National Laboratory); Xiaoyi Lu (University of California, Merced)

  • Performance-Portable Multiphase Flow Solutions with Discontinuous Galerkin Methods
    Tobias Flynn (University of Warwick); Robert Manson-Sawko (IBM-Research Europe); Gihan Mudalige (University of Warwick)


WEDNESDAY - 29 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

All Day

Main Conference Poster-Accept Papers

 

See listing here. Posters on Display in Ballroom Foyer

Parallel Technical
Sessions
4A & 4B

8:30 AM – 10:30 AM

Session 4A: Applications II

 

Session Chair: TBA

  • Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method
    Ahmed H. Mahmoud (Autodesk Research and University of California, Davis); Hesam Salehipour, Massimiliano Meneghin (Autodesk Research)
  • Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
    Herbert Owen (Barcelona Supercomputing Center); Dominik Ernst (FAU Erlangen-Nürnberg); Thomas Gruber (FAU Erlangen-Nürnberg); Oriol Lemkuhl, Guillaume Houzeaux, Lucas Gasparino (Barcelona Supercomputing Center); Gerhard Wellein (FAU Erlangen-Nürnberg)
  • CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction
    Zizhe Jian (University of California, Riverside); Sheng Di (Argonne National Laboratory);  Jinyang Liu (University of California, Riverside); Kai Zhao (Florida State University); Xin Liang (University of Kentucky); Haiying Xu (NCAR); Robert Underwood (Argonne National Laboratory);  Shixun Wu, Zizhong Chen (University of California, Riverside); Franck Cappello (Argonne National Laboratory)
  • Automating GPU Scalability for Complex Scientific Models: Phonon Boltzmann Transport Equation
    Eric Heisler (University of Utah); Siddharth Saurav (The Ohio State University); Aadesh Deshmukh (University of Utah); Sandip Mazumder (The Ohio State University); Hari Sundar (University of Utah)
  • An O(N) distributed-memory parallel direct solver for planar integral equations
    Tianyu Liang (The University of California, Berkeley); Chao Chen (North Carolina State University); Per-gunnar Martinsson, George Biros (The University of Texas at Austin)
  • Exploiting long vectors with a CFD code: a co-design show case
    Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta Garcia-Gasulla, Filippo Mantovani (Barcelona Supercomputing Center)


Session 4B: I/O and Storage Systems

 

Session Chair: Hari Subramoni

  • Capturing Periodic I/O Using Frequency Techniques
    Ahmad Tarraf (Technical University of Darmstadt); Alexis Bandet, Francieli Boito (Inria, University of Bordeaux); Guillaume Pallez (Inria); Felix Wolf (Technical University of Darmstadt)

  • To Store or Not to Store: a graph theoretical approach for Dataset Versioning
    Anxin Guo (Northwestern University); Jingwei Li (Columbia University); Pattara Sukprasert (Databricks); Samir Khuller (Northwestern University); Amol Deshpande (University of Maryland); Koyel Mukherjee (Adobe Research)

  • TunIO: An AI-powered Framework for Optimizing HPC I/O
    Neeraj Rajesh, Keith Bateman (Illinois Institute of Technology); Jean luca Bez (Lawrence Berkeley National Laboratory); Suren Byna (Ohio State University); Anthony Kougkas, Xian-he Sun (Illinois Institute of Technology)

  • A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis
    Dong Kyu Sung (Seoul National University); Yongseok Son (Chung-Ang University); Alex Sim, Kesheng Wu (Lawrence Berkeley National Laboratory); Suren Byna (The Ohio State University); Houjun Tang (Lawrence Berkeley National Laboratory); Hyeonsang Eom (Seoul National University); Changjong Kim, Sunggon Kim (Seoul National University of Science and Technology)

  • NVMe-oPF: Designing Efficient Priority Schemes for NVMe-over-Fabrics with Multi-Tenancy Support
    Darren Ng, Andrew Lin, Arjun Kashyap (University of California, Merced); Guanpeng Li (University of Iowa); Xiaoyi Lu (University of California, Merced)

  • Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration
    Hammad Ather (University of Oregon); Jean luca Bez (Lawrence Berkeley National Laboratory); Yankun Xia, Suren Byna (The Ohio State University)

Morning Break 10:30 AM -11:00 AM

Keynote Session
11:00 AM – 12:00PM

KEYNOTE SPEACH

Session Chair: Saday Sadayappan

 

PyTorch 2 and its Compiler Technologies

 

Peng Wu
Meta

 

Abstract: PyTorch 2.0 was unveiled in March 2023, bringing substantial performance enhancements across a diverse array of models, often with just a simple one-liner change. Do not mistake it as the end of the story. The…

 

Read more information

12:00 PM – 1:30 PM

Lunch & PhD Program

Parallel Technical
Sessions
5A & 5B

1:30 AM – 2:30 AM

Session 5A: Performance

 

Session Chair: Ali Butt

  • CachedArrays: Optimizing Data Movement for Heterogeneous Memory Systems
    Mark Hildebrand, Jason Lowe-Power, Venkatesh Akella (UC Davis)
  • Comparative Study of Large Language Model Architectures on Frontier
    Junqi Yin, Avishek Bose, Guojing Cong, Isaac Lyngaas (Oak Ridge National Laboratory), Quentin Anthony (Ohio State University)
  • Predicting Cross-Architecture Performance of Parallel Programs
    Daniel Nichols, Alexander Movsesyan (University of Maryland); Jae-seung Yeom, Abhik Sarkar, Daniel Milroy, Tapasya Patki (Lawrence Livermore National Laboratory); Abhinav Bhatele (University of Maryland)


Session 5B: Resilience

 

Session Chair: Jay Lofstead

  • DRUTO: Upper-Bounding Silent Data Corruption Vulnerability in GPU Applications
    Md Hasanur Rahman (University of Iowa); Sheng Di (Argonne National Laboratory); Shengjian Guo (Amazon Web Services); Xiaoyi Lu (University of California, Merced); Guanpeng Li (University of Iowa); Franck Cappello (Argonne National Laboratory)
  • MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR
    Jad El Karchi (Inria); Hanze Chen, Ali TehraniJamsaz, Ali Jannesari (Iowa State University); Mihail Popov, Emmanuelle Saillard (Inria)
  • A Parallel Partial Merge Repair Algorithm for Multi-block Failures for Erasure Storage Systems
    Shuaipeng Zhang (Harbin Institute of Technology, Shenzhen); Shiyi Li (Harbin Institute of Technology, Shenzhen); Chentao Wu (Shanghai Jiao Tong University); Ruobin Wu (Harbin Institute of Technology, Shenzhen);  Saiqin Long (Jinan University); Wen Xia (Harbin Institute of Technology, Shenzhen)

Parallel Technical
Sessions
6A & 6B

2:30 PM – 4:10 PM

Session 6A: Accelerators

 

Session Chair: Davide Conficconi

  • Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
    Payman Behnam, Uday Kamal (Georgia Institute of Technology); Ali Shafiee (Meta); Alexey Tumanov, Saibal Mukhopadhyay (Georgia Institute of Technology)
  • IPU-EpiDet: Identifying Gene Interactions on Massively Parallel Graph-Based AI Accelerators
    Ricardo Nobre, Aleksandar Ilic (INESC-ID); Sergio Santander-Jiménez (University of Extremadura (UNEX)); Leonel Sousa (INESC-ID)
  • DEFCON: Deformable Convolutions Leveraging Interval Search and GPU Texture Hardware
    Malith Jayaweera, Yanyu Li (Northeastern University); Yanzhi Wang (Northeastern University); Bin Ren (William & Mary); David Kaeli (Northeastern University);
  • Benchmarking and Dissecting the Nvidia Hopper GPU Architecture
    Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du (The Hong Kong University of Science and Technology/ Guangzhou); Qiang Wang (Harbin Institute of Technology, Shenzhen); Xiaowen Chu (The Hong Kong University of Science and Technology/ Guangzhou)
  • Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA
    Emanuele Del Sozzo (RIKEN Center for Computational Science); Xinyuan Wang (University of Toronto); Boma Adhi, Carlos Cortes (RIKEN Center for Computational Science); Jason Anderson (University of Toronto); Kentaro Sano (RIKEN Center for Computational Science)


Session 6B: Scheduling II

 

Session Chair: Suren Byna

  • Hadar: Heterogeneity-Aware Optimization-Based Online Scheduling for Deep Learning Clusters
    Abeda Sultana (University of Louisiana at Lafayette); Fei Xu (East China Normal University); Xu Yuan, Li Chen, Nian-feng Tzeng (University of Louisiana at Lafayette)
  • Fast Abort-freedom for Deterministic Transactions
    Chen Chen (University of Illinois at Chicago); Xingbo Wu (Microsoft Research); Wenshao Zhong, Jakob Eriksson (University of Illinois at Chicago)
  • SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors
    Marta Navarro, Josué Feliu, Salvador Petit, María e. Gómez (Universitat Politècnica de València); Julio Sahuquillo (Universitat Politècnica de València)
  • Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters
    Di Zhang, Monish Soundar Raj (University of North Carolina at Charlotte); Bing Xie (Microsoft); Sheng Di (ANL); Dong Dai (University of North Carolina at Charlotte)
  • Automatic Task Parallelization of Dataflow Graphs in ML/DL Models
    Srinjoy Das, Lawrence Rauchwerger (University of Illinois at Urbana Champaign)

Afternoon Break 4:10 PM - 4:40 PM

4:10 PM - 5:30 PM

Conference Poster Session

- Authors Available at Poster Boards

5:30 PM

PHD Forum - Students at posters

6:30 PM - 7:30 PM

Pre-Banquet Reception

7:30 PM

Banquet(Paper and Poster Awards)


THURSDAY - 30 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

All Day

Main Conference Poster-Accept Papers

 

See listing here. Posters on Display in Ballroom Foyer

Parallel Technical
Sessions
7A & 7B

8:30 AM – 10:30 AM

Session 7A: Message Passing and Communication

 

Session Chair: Dip Sankar Banerjee

  • Adaptive Prefetching for Fine-grain Communication in PGAS Programs
    Thomas B. Rolinger (NVIDIA); Alan Sussman (University of Maryland)
  • An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression
    Jiajun Huang (University of California, Riverside); Sheng Di (Argonne National Laboratory); Xiaodong Yu (Stevens Institute of Technology); Yujia Zhai (University of California, Riverside); Zhaorui Zhang (The Hong Kong Polytechnic University); Jinyang Liu (University of California, Riverside); Xiaoyi Lu (University of California, Merced); Ken Raffenetti, Hui Zhou (Argonne National Laboratory); Kai Zhao (Florida State University); Zizhong Chen (University of California, Riverside); Franck Cappello, Yanfei Guo (Argonne National Laboratory); Rajeev Thakur (Argonne National Laboratory)
  • MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic
    Zijian Li, Zixuan Chen, Yiying Tang, Xin Ai, Yuanyi Zhu, Zhigao Zhao, Jiang Shao (Fudan University); Guowei Liu (Tsinghua University); Sen Liu (Fudan University); Bin Liu (Tsinghua University); Yang Xu (Fudan University)
  • Fast Policy Convergence for Traffic Engineering with Proactive Distributed Message-Passing
    Zicheng Wang, Zirui Zhuang, Jingyu Wang, Qi Qi, Haifeng Sun, Jianxin Liao (Beijing University of Posts and Telecommunications)
  • The Self-adaptive and Topology-aware MPI_ Bcast leveraging Collective offload on Tianhe Express Interconnect
    Chongshan Liang; Yi Dai (NUDT); Jun Xia (Nanhu Lab); Jinbo Xu, Jintao Peng, Weixia Xu, Ming Xie, Jie Liu, Zhiquan Lai, Sheng Ma, Qi Zhu (NUDT)
  • HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions
    Bharath Ramesh, Nick Contini, Nawras Alnaasan, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda (The Ohio State University)

 

Session 7B: Communication Subsystems

 

Session Chair: Dip Sankar Banerjee

  • Flexible NVMe Request Routing for Virtual Machines
    Tu Dinh Ngoc, Boris Teabe, Georges Da Costa, Daniel Hagimont (IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3)
  • HA-CSD: Host and SSD Coordinated Compression for Capacity and Performance
    Xiang Chen (Huazhong University of Science and Technology); Tao Lu, Jiapin Wang (DapuStor); Yu Zhong (Huazhong University of Science and Technology); Guangchun Xie (DapuStor); Xueming Cao, Yuanpeng Ma, Bing Si, Feng Ding, Ying Yang, Yunxing Huang  (DapuStor); Yafei Yang, You Zhou, Fei Wu (Huazhong University of Science and Technology)
  • Graph Analytics on Jellyfish Topology
    Md Nahid Newaz (Oakland University); Sayan Ghosh, Joshua Suetterlein, Nathan T. Tallent (Pacific Northwest National Laboratory); Md Atiqul Mollah (Cornelis Networks); Hua Ming (Oakland University)
  • TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture
    Sukarn Agarwal (IIT Mandi); Shounak Chakraborty, Magnus Sjalander (Norwegian University of Science and Technology)
  • LockillerTM: Enhancing Performance Lower Bounds in Best-Effort Hardware Transactional Memory
    Li Wan, Fu Chao, Qiang Li, Jun Han (Fudan University)
  • Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching
    Pengmiao Zhang, Neelesh Gupta (University of Southern California); Rajgopal Kannan (DEVCOM Army Research Lab); Viktor Prasanna (University of Southern California)

Morning Break 10:30 AM -11:00 AM

Keynote Session
11:00 AM – 12:00PM

KEYNOTE SPEECH

Session Chair: Rich Vuduc

 

Computing Systems in the Foundation Model Era

 

Kunle Olukotun
Stanford University

 

Abstract: Generative AI applications with their ability to produce natural language, computer code and images are transforming all aspects of society. These applications are powered by huge foundation models such as GTP-4,…

 

Read more information

12:00 PM – 1:30 PM

Lunch & PhD Program

Parallel Technical
Sessions
8A & 8B

1:30 AM – 2:50 PM

Session 8A: Graph and MoE Learning

 

Session Chair: Ali Jannesari

  • Aurora: A Versatile and Flexible Accelerator for Generic Graph Neural Networks
    Jiaqi Yang (George Washington University); Hao Zheng (University of Central Florida); Ahmed Louri (George Washington University)

  • cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding
    Lihan Hu (The University of Iowa); Jing Li (Nvidia); Peng Jiang (The University of Iowa)

  • Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
    Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda (The Ohio State University)

  • TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning

    Gangda Deng, Hongkuan Zhou (University of Southern California); Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li (Meta); Rajgopal Kannan (DEVCOM US Army Research Lab); Viktor Prasanna (University of Southern California)

 

Session 8B: Performance Optimization

 

Session Chair: Sara Neuwirth

  • OpenFFT-SME: An Efficient Outer Product Pattern FFT Library on ARM SME CPUs
    Ruge Zhang, Haipeng Jia, Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Baicheng Yan, Penghao Ma, Long Wang (Huawei Technologies Co. Ltd); Wenxuan Zhao (Institute of Computing Technology, Chinese Academy of Sciences)
  • Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures
    Evangelos Georganas, Dhiraj Kalamkar, Kirill Voronin, Abhisek Kundu (Intel Corporation); Antonio Noack (Friedrich Schiller Universität Jena); Hans Pabst (Intel Corporation); Alexander Breuer (Friedrich Schiller Universität Jena); Alexander Heinecke (Intel Corporation) 
  • Optimizing General Matrix Multiplications on Modern Multi-core DSPs
    Kainan Yu, Xinxin Qi, Peng Zhang, Jianbin Fang, Dezun Dong, Ruibo Wang, Tao Tang, Chun Huang, Yonggang Che (National University of Defense Technology); Zheng Wang (Northwest University)
  • Machine-Learning-Driven Runtime Optimization of BLAS Level 3 on Modern Multi-Core Systems
    Yufan Xia (The Chinese University of Hong Kong); Giuseppe Maria Junior Barca (The University of Melbourne)

Afternoon Break 2:50 PM -3:30 PM

Parallel Technical
Sessions
9A & 9B

3:30 PM – 4:50 PM

Session 9A: Distributed Algorithms

 

Session Chair: Khaled Ibrahim

  • Time-Color Tradeoff on Uniform Circle Formation by Asynchronous Robots
    Debasish Pattanayak (Carleton University); Gokarna Sharma (Kent State University)
  • LightDAG: A Low-latency DAG-based BFT Consensus through Lightweight Broadcast
    Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo (Huazhong University of Science and Technology); Rui Hao (Nanjing University); Xia Xie (Hainan University); Hai Jin (Huazhong University of Science and Technology)
  • MAAD: A Distributed Anomaly Detection Architecture for Microservices Systems
    Rongyuan Tan, Zhuozhao Li (Southern University of Science and Technology)
  • OneShot: View-Adapting Streamlined BFT Protocols with Trusted Execution Environments Jeremie Decouchant (Delft University of Technology); David Kozhaya (ABB Research); Vincent Rahli (University of Birmingham); Jiangshan Yu (Monash University)


Session 9B: Graph Algorithms

 

Session Chair: Kishore Kothapalli

  • Practically Tackling Memory Bottlenecks of Graph-Processing Workloads
    Alexandre Valentin Jamet (Universitat Politecnica de Catalunya); Georgios Vavouliotis (Huawei Zurich Research Center); Daniel A. Jiménez (Texas A&M University); Lluc Alvarez (Barcelona Supercomputing Center); Marc Casas (Barcelona Supercomputing Center (BSC))
  • GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs
    Yihua Wei, Peng Jiang (The University of Iowa)
  • Parallel Derandomization for Coloring
    Sam Coy, Artur Czumaj (University of Warwick); Peter Davies-Peck (Durham University);  Gopinath Mishra (National University of Singapore)
  • A Comparative Study of Intersection-Based Triangle Counting Algorithms on GPUs
    Jiangbo Li, Zichen Xu (The Nanchang University); Minh Pham, Yicheng Tu (University of South Florida); Qihe Zhou (City University of Macau)

 

MainConference Closing Session

 

Details to be announced


FRIDAY - 31 May 2024

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

FRIDAY
Workshops

 

ALL DAY


See each individual
workshop program
for schedule details

 

10

CGRA4HPC

Coarse-Grained Reconfigurable Architectures for High-Performance Computing

11

HIPS

High-level Parallel Programming Models and Supportive Environments

12

iWAPT

International Workshop on Automatic Performance Tuning

13

JSSPP

Job Scheduling Strategies for Parallel Processing

14

ParSocial

Parallel and Distributed Processing for Computational Social Systems

15

PDCO

Parallel / Distributed Combinatorics and Optimization

16

PDSEC

Parallel and Distributed Scientific and Engineering Computing

17

Q-CASA

Quantum Computing Algorithms, Systems, and Applications

 

 

IPDPS 2024: Keynote Speakers


 

IPDPS 2024 Tuesday KEYNOTE SPEAKER

 

Franck Cappello
Argonne National Laboratory

 

AuroraGPT: Exploring AI Assistant for Science

Abstract:
Innovative methods, new instruments, disruptive techniques, and groundbreaking technologies have led to significant leaps in scientific progress. The increasingly powerful Large Language Models (LLMs) released each month already speed up research activities such as concept explanation, literature search, and summarization. The transformative potential of AI in research activities, in particular foundation models, raises important questions about their performance in science activities, their potential application in different contexts, and their ethics. In this talk, I will introduce AuroraGPT, Argonne National Laboratory's effort to explore the notion of AI research assistants. To illustrate the gap between existing LLMs and an ideal AI research assistant, I will first share observations from using existing LLMs as early research assistants in three parallel and distributed computing experiments with experts in scheduling, distributed protocols, and PDE solvers. AuroraGPT is developed as an open foundation model trained specifically with scientific data to explore solutions toward the realization of effective AI research assistants. I will describe the activity, challenges, and progress of the different groups developing the key aspects of AuroraGPT. I will particularly focus on the task of conversational research assistant and discuss the evaluation of LLMs' scientific skills, their safety and trustworthiness, and the co-design of a scientific benchmark with domain experts.

Bio:
Franck Cappello received his Ph.D. in Computer Architecture from the University of Paris XI in 1994. He joined the French National Center for Scientific Research (CNRS), where he contributed to cluster and Grid computing, including desktop Grid and later hybrid parallel programming (MPI+OpenMP). In 2003, he moved to INRIA and led the R&D phase of Grid’5000 until 2008. Grid’5000 is a large-scale experimental platform for parallel and distributed computing research, which remains active and has produced over 2,500 scientific publications and supported hundreds of researchers and Ph.D. students. In 2009, as a visiting research professor at the University of Illinois, Cappello, alongside Prof. Marc Snir, established the Joint Laboratory on Petascale Computing (now the Joint Laboratory on Extreme Scale Computing). This collaboration is one of the largest and longest-lasting in high-performance computing, supporting numerous researchers and students in scientific computing, high-performance, and artificial intelligence. From 2009 to 2013, Cappello led an extensive research effort in parallel computing resilience, covering many aspects: failure characterization, checkpointing, fault tolerance protocols, silent data corruption detection, and failure prediction. As a member of the International Exascale Software Project, he led the roadmap efforts related to resilience at extreme scales. In 2016, Cappello became the director of two Exascale Computing Project (ECP) software projects: VeloC, for high-performance checkpointing of exascale applications, and SZ, for lossy compression of scientific data. Both software are now deployed in Exascale systems. He has become a leading figure in lossy compression for scientific data by leading the SZ project and developing key methodologies with the Z-checker compression error assessment tool and the SDRBench repository of reference scientific datasets. Throughout his career, Cappello has made significant contributions to parallel and distributed computing, high-performance computing resilience, and scientific data compression. He is an IEEE Fellow and the recipient of numerous awards, including the 2024 IEEE Charles Babbage Award, the 2024 Euro-Par Achievement Award, the 2022 ACM HPDC Achievement Award, two R&D 100 awards (2019 and 2021), the 2018 IEEE TCPP Outstanding Service Award, and the 2021 IEEE Transactions of Computer Award for Editorial Service and Excellence.


 

IPDPS 2024 Wednesday KEYNOTE SPEAKER

 

Peng Wu
Meta


PyTorch 2 and its Compiler Technologies   

Abstract:
PyTorch 2.0 was unveiled in March 2023, bringing substantial performance enhancements across a diverse array of models, often with just a simple one-liner change. Do not mistake it as the end of the story. The first release of PyTorch 2 marks the beginning of a long technical roadmap to improving PyTorch execution efficiency via compiled mode. This talk will delve into the design and development of the PyTorch Compiler, examining key aspects through the lens of a three-year timeframe and highlighting our unique approach to creating a top-performing ML framework compiler in a highly competitive and rapidly evolving setting.

Bio:
Dr. Peng Wu is the engineering manager for the PyTorch Compiler team at Meta, bringing with her more than ten years of research expertise from IBM Research, where her work encompassed a diverse array of topics within programming systems. Following IBM, she founded the Programming Languages and Compiler Lab at Huawei and led its growth for six years. Since joining Meta, she supported the team's pursuit of effective compiler solutions for PyTorch over the last three years, culminating in the groundbreaking release of PyTorch 2.0 in March 2023. She holds a PhD in Computer Science from the University of Illinois, Urbana-Champaign.


 

IPDPS 2024 Thursday KEYNOTE SPEAKER

 

Kunle Olukotun
Stanford University

 

Computing Systems in the Foundation Model Era

Abstract:
Generative AI applications with their ability to produce natural language, computer code and images are transforming all aspects of society. These applications are powered by huge foundation models such as GTP-4, which have 10s of billions of parameters and are trained on trillions of tokens, have obtained state-of-the-art quality in natural language processing, vision and speech applications. These models are computationally challenging because they require 100s of petaFLOPS of computing capacity for training and inference. Future foundation models will have even greater capabilities provided by more complex model architectures with longer sequence lengths, irregular data access (sparsity) and irregular control flow. In this talk I will describe how the evolving characteristics of foundation models will impact the design of the optimized computing systems required for training and serving these models. A key element of improving the performance and lowering the cost of deploying future foundation models will be optimizing the data movement (Dataflow) within the model using specialized hardware. In contrast to human-in-the-loop applications such as conversational AI, an emerging application of foundation models is in real-time processing applications that operate without human supervision. I will describe how continuous real-time machine learning can be used to create an intelligent network data plane.

Bio:
Kunle Olukotun is the Cadence Design Professor of Electrical Engineering and Computer Science at Stanford University. Olukotun is a pioneer in multicore processor design and the leader of the Stanford Hydra chip multiprocessor (CMP) research project. He founded Afara Websystems to develop high-throughput, low-power multicore processors for server systems. The Afara multi-core multi-thread processor, called Niagara, was acquired by Sun Microsystems and now powers Oracle's SPARC-based servers. Olukotun co-founded SambaNova Systems, a Machine Learning and Artificial Intelligence company, and continues to lead as their Chief Technologist. Olukotun is a member of the National Academy of Engineering, an ACM Fellow, and an IEEE Fellow for contributions to multiprocessors on a chip design and the commercialization of this technology. He received the 2023 ACM-IEEE CS Eckert-Mauchly Award.

Register Today

Early Deadline Extended
To April 8, 2024

Registration Details

Search IPDPS

 

Follow IPDPS

   

IPDPS 2023 Report



37th IEEE International Parallel
& Distributed Processing Symposium
May 15-19, 2023

Hilton St. Petersburg
Bayfront Hotel
St. Petersburg, Florida USA

REPORT ON IPDPS 2023