General IPDPS Info




IEEE Computer Society Technical Committees
Computer Architecture & Distributed Processing



Instituto de ComputacaoSGC Lab




2019 Advance Program

Please visit the IPDPS website regularly for updates, since there may be schedule revisions. Authors who have corrections should send email to giving full details.

MONDAY - 20 May 2019




on MONDAY 20 MAY 2019

Heterogeneity in Computing Workshop



Reconfigurable Architectures Workshop



High Performance Computational Biology



Graph, Architectures, Programming and Learning



NSF/TCPP Workshop on Parallel and Distributed Computing Education



High Level Programming Models and Supportive Environments



High-Performance Big Data and Cloud Computing



Accelerators and Hybrid Exascale Systems



Parallel and Distributed Combinatorics and Optimization



Advances in Parallel and Distributed Computational Models


6:00 PM -7:30 PM
IPDPS - TCPP Welcome Reception

TUESDAY - 21 May 2019


Opening Session
8:00 AM - 8:30 AM


Keynote Session
8:30 AM - 9:30 AM



Session Chair:
Viktor Prasanna


Coding the Continuum

Ian Foster

Argonne National Laboratory and University of Chicago


Abstract: In 2001, as early high-speed networks were deployed, George Gilder observed that “when the… Read more

Morning Break 9:30 AM -10:00 AM

PhD Forum
All day

PhD Forum Posters

On Display All Day Tuesday and Wednesday

Parallel Technical
Sessions 1, 2, 3, & 4

10:00 AM - 12:00 PM

SESSION 1: Graph Algorithms 1

Session Chair: Keshav Pingali

LACC: A Linear-Algebraic Algorithm for Finding Connected Components in Distributed Memory
Ariful Azad (Indiana University), Aydin Buluҫ (Lawrence Berkeley National Lab, UC Berkeley)


Shared-memory Exact Minimum Cuts
Monika Henzinger, Alexander Noe, and Christian Schulz (University of Vienna)


Distributed Weighted All Pairs Shortest Paths Through Pipelining
Udit Agarwal and Vijaya Ramachandran (UT Austin)


Local Distributed Algorithms in Highly Dynamic Networks
Philipp Bamberger, Fabian Kuhn, and Yannic Maus (University of Freiburg)


SESSION 2: HPC Systems

Session Chair: Michael Gerndt


Effects and benefits of node sharing strategies in HPC batch systems
Alvaro Frank, Tim Süss, and André Brinkmann (Johannes Gutenberg University Mainz)


Design Space Exploration of Next-Generation HPC Machines
Constantino Gomez, Francesc Martinez, Adria Armejach, Miquel Moreto, Filippo Mantovani, and Marc Casas (Barcelona Supercomputing Center)


A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning
Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler (ETH Zurich)


Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?
Jens Domke and Kazuaki Matsumura (Tokyo Institute of Technology), Mohamed Wahib (AIST-TokyoTech Real World Big-Data Computation Open Innovation Laboratory), Haoyu Zhang, Keita Yashima, Toshiki Tsuchikawa, Yohei Tsuji, and Artur Podobas (Tokyo Institute of Technology), Satoshi Matsuoka (RIKEN Center for Computational Science/ R-CCS)


SESSION 3: Numerical Algorithms

Session Chair: Olivier Beaumont


Communication-avoiding CholeskyQR2 for rectangular matrices
Edward Hutter and Edgar Solomonik (University of Illinois at Urbana-Champaign)


Asynchronous Multigrid Methods
Jordi Wolfson-Pou and Edmond Chow (Georgia Institute of Technology) -


Fast Batched Matrix Multiplication for Small Sizes using Half Precision Arithmetic on GPUs
Ahmad Abdelfattah, Stanimire Tomov, and Jack Dongarra (University of Tennessee) 

Load-Balanced Sparse MTTKRP on GPUs
Israt Nisa (The Ohio State University), Jiajia Li (Pacific Northwest National Laboratory), Aravind Sukumaran Rajam (The Ohio State University), Richard Vuduc (Georgia Institute of Technology), and P. (Saday) Sadayappan (The Ohio State University)

SESSION 4: Scheduling and Load Balancing I

Session Chair: Ioana Banicescu


Practically Efficient Scheduler for Minimizing Average Flow Time of Parallel Jobs
Kunal Agrawal and I-Ting Angelina Lee (Washington University in St. Louis), Jing Li (New Jersey Institute of Technology), Kefu Lu (Washington University in St. Louis), and Benjamin Moseley (Carnegie Mellon University)


Scheduling on (Un-)Related Machines with Setup Times
Klaus Jansen and Marten Maack (University of Kiel) and Alexander Mäcker (Paderborn University)


A scalable clustering-based task scheduler for homogeneous processors using DAG partitioning
M. Yusuf Özkaya (Georgia Institute of Technology), Anne Benoit (LIP, ENS Lyon), Bora Uçar (CNRS), Julien Herrmann and Ümit V. Çatalyürek (Georgia Institute of Technology)


Reservation Strategies for Stochastic Jobs
Guillaume Aupy (French Institute for Research in Computer Science and Automation (INRIA), University of Bordeaux), Ana Gainaru (Department of EECS, Vanderbilt University, Nashville; Labri, Univ. of Bordeaux), Valentin Honoré (Labri, Univ. of Bordeaux; Inria), Padma Raghavan (Department of EECS, Vanderbilt University, Nashville; University of Tennessee Knoxville), Yves Robert (Laboratoire LIP, ENS Lyon; University of Tennessee Knoxville), and Hongyang Sun (Department of EECS, Vanderbilt University, Nashville)

Parallel Technical Sessions 5, 6, 7, & 8
1:30 PM - 3:30 PM

SESSION 5: Accelerating Neural Networks

Session Chair: P. Sadayapan


Exploiting Flow Graph of System of ODEs to Accelerate the Simulation of Biologically-Detailed Neural Networks
Bruno Magalhaes (Blue Brain Project, École polytechnique fédérale de Lausanne), Michael Hines (Yale School of Medicine, Yale University), Thomas Sterling (Indiana university), and Felix Schuermann (Blue Brain Project, École polytechnique fédérale de Lausanne)


Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training
Jiawen Liu and Dong Li (University of California, Merced), Gokcen Kestor (Pacific Northwest National Laboratory), and Jeffrey Vetter (Oak Ridge National Laboratory)


Dynamic Memory Management for GPU-based training of Deep Neural Networks
Shriram S B, Anshuj Garg, and Purushottam Kulkarni (Indian Institute of Technology Bombay)


Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden (University of Illinois at Urbana-Champaign, Lawrence Livermore National Laboratory), Naoya Maruyama, Tom Benson, and Tim Moon (Lawrence Livermore National Laboratory), Marc Snir (University of Illinois at Urbana-Champaign), and Brian Van Essen (Lawrence Livermore National Laboratory)



SESSION 6: GPU Computing I

Session Chair: Koji Nakano


Excavating the Potential of GPU for Accelerating Graph Traversal
Pengyu Wang, Lu Zhang, Chao Li, and Minyi Guo (Shanghai Jiao Tong University)


ParILUT -  A Parallel Threshold ILU for GPUs 
Hartwig Anzt (Karlsruhe Institute of Technology, University of Tennessee), Tobias Ribizel (Karlsruhe Institute of Technology), Goran Flegar (Universidad Jaume I), Edmond Chow (Georgia Institute of Technology, Oak Ridge National Lab), and Jack Dongarra (University of Tennessee, Oak Ridge National Lab)


C-GDR: High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks
Jie Zhang, Xiaoyi Lu, Ching-Hsiang Chu, and Dhabaleswar K. (DK) Panda (The Ohio State University)


Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs
Tyler Allen, Xizhou Feng, and Rong Ge (Clemson University)



SESSION 7: Learning and Prediction Systems

Session Chair: Felipe Franca


A Deep Recurrent Neural Network Based Predictive Control Framework for Reliable Distributed Stream Data Processing
Jielong Xu, Jian Tang, Zhiyuan Xu, and Chengxiang Yin (Syracuse University), Kevin Kwiat and Charles Kamhoua (ARL)


Architecting Racetrack Memory preshift through pattern-based prediction mechanisms
Adrian Colaso, Pablo Prieto, Pablo Abad, Valentin Puente, and Jose Angel Gregorio (University of Cantabria)


DLHub: Model and Data Serving for Science
Ryan Chard (Argonne National Laboratory), Zhuozhao Li, Kyle Chard, Logan Ward, Yadu Babuji, Anna Woodard, Steven Tuecke, Ben Blaiszik, and Michael Franklin (University of Chicago), Ian Foster (Argonne National Laboratory)


Identifying Latent Reduced Models to Precondition Lossy Compression
Huizhang Luo and Qing Liu (New Jersey Institute of Technology), Hong Jiang (University of Texas at Arlington), and Mengchu Zhou (New Jersey Institute of Technology)



SESSION 8: Multicore Computing

Session Chair: Manoj Kumar


QoS-Driven Coordinated Management of Resources to Save Energy in Multicore Systems
Mehrzad Nejat, Miquel Pericàs, and Per Stenström (Chalmers University of Technology)


Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems
Vasimuddin Md  and Sanchit Misra (Intel Corporation, India), Heng Li (Dana-Farber Cancer Institute; Harvard Medical School), and Srinivas Aluru (Georgia Institute of Technology)


Power and Performance Tradeoffs for Visualization Algorithms
Stephanie Labasan (Lawrence Livermore National Laboratory, University of Oregon), Matthew Larsen (Lawrence Livermore National Laboratory), Hank Childs (University of Oregon), and Barry Rountree (Lawrence Livermore National Laboratory)


Northup: Divide-and-Conquer Programming in Systems with Heterogeneous Memories and Processors
Shuai Che (Alibaba) and Jieming Yin (AMD Research)

Afternoon Break 3:30 PM - 4:00 PM

Best Papers

4:00 PM - 6:00 PM

Best Paper Nominees - Plenary

Session Chair: Alba Melo


Distributed Approximate k-Core Decomposition and Min-Max Edge Orientation: Breaking the Diameter Barrier
T-H. Hubert Chan (University of Hong Kong), Mauro Sozio (Telecom ParisTech), and Bintao Sun (University of Hong Kong)


FALCON: Efficient Designs for Zero-copy MPI Datatype Processing on Emerging Architectures
Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, and Dhabaleswar K. (DK) Panda (The Ohio State University)


Two Elementary Instructions make Compare-and-Swap
Pankaj Khanchandani and Roger Wattenhofer (ETH Zurich)


Robust Dynamic Resource Allocation via Probabilistic Task Pruning in Heterogeneous Computing Systems
James Gentry, Chavit Denninnart, and Mohsen Amini Salehi (University of Louisiana at Lafayette)


6:00 PM – 8:00 PM

NVIDIA Hands-On Lab:
Introduction to GPU Programming with OpenACC


Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs.


Presented by Pedro Mário Cruz e Silva
Solution Architect Manager at NVIDIA
& Board Member of the Brazilian Geophysical Society


Read more

WEDNESDAY - 22 May 2019


Keynote Session
8:30 AM – 9:30 AM



Session Chair:
Vinod Rebello


Two Roads to Parallelism: From Serial Code to Programming with STAPL

Lawrence Rauchwerger

Texas A&M University


Abstract: Parallel computers have come of age and need parallel software to justify their usefulness… Read more

Morning Break 9:30 AM - 10:00 AM

PhD Forum
All day

PhD Forum Posters

On Display All Day Tuesday and Wednesday

Parallel Technical
Sessions 9, 10, 11, & 12

10:00 AM - 12:00 PM

SESSION 9: Cloud Computing

Session Chair: Lucia Drummond


Z-Dedup:A Case for Deduplicating Compressed Contents in Cloud
Zhichao Yan and Hong Jiang (University of Texas at Arlington), Yujuan Tan (Chongqing University), Stan Skelton (NetApp), and Hao Luo (Twitter)


An Architecture and Stochastic Method for Database Container Placement in the Edge-Fog-Cloud Continuum
Petar Kochovski (University of Ljubljana), Rizos Sakellariou (University of Manchester), Marko Bajec (University of Ljubljana) Pavel Drobintsev (Peter the Great St. Petersburg Polytechnic University), and Vlado Stankovski (University of Ljubljana)


Online Live VM Migration Algorithms to Minimize Total Migration Time and Downtime
Nikos Tziritas (Shenzhen Institutes of Advanced Technology), Thanasis Loukopoulos (University of Thessaly), Samee U.Khan (North Dakota State University), Cheng-Zhong Xu (University of Macau) and Albert Y. Zomaya (University of Sydney)


Semantics-aware Virtual Machine Image Management in IaaS Clouds
Nishant Saurabh (University of Innsbruck, Klagenfurt University), Julian Remmers (University of Innsbruck), Dragi Kimovski (Klagenfurt University), Radu Prodan (Klagenfurt University, University of Innsbruck), and Jorge G. Barbosa (LIACC, Faculdade de Engenharia da Universidade do Porto)



SESSION 10: Graph Algorithms II

Session Chair: Lawrence Rauchwerger


Composing Optimization Techniques for Vertex-Centric Graph Processing via Communication Channels
Yongzhe Zhang (National Institute of Informatics, SOKENDAI) and Zhenjiang Hu (National Institute of Informatics, University of Tokyo)


CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics
Loc D. Hoang, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali (The University of Texas at Austin)


Accelerating Sequence Alignment to Graphs
Chirag Jain (Georgia Institute of Technology), Sanchit Misra (Intel Corporation), Haowen Zhang (Georgia Institute of Technology), Alexander Dilthey (University Hospital of Dusseldorf), and Srinivas Aluru (Georgia Institute of Technology)


Accurate, Efficient and Scalable Graph Embedding
Hanqing Zeng, Hongkuan Zhou, and Ajitesh Srivastava (University of Southern California), Rajgopal Kannan (US Army Research Lab), and Viktor Prasanna (University of Southern California)



SESSION 11: Linear Algebra

Session Chair: Hartig Anzt


Matrix Powers Kernels for Thick-restart Lanczos with Explicit External Deflation
Zhaojun Bai (University of California, Davis), Jack Dongarra (University of Tennessee), Ding Lu (University of California, Davis), and Ichitaro Yamazaki (University of Tennessee)


Revisiting the I/O Complexity of Fast Matrix Multiplication with Recomputations
Roy Nissim and Oded Schwartz (The Hebrew University of Jerusalem)


Computation of Matrix Chain Products on Parallel Machines
Oded Schwartz and Elad Weiss (The Hebrew University Jerusalem)


Overlapping Communications with Other Communications and its Application to Distributed Dense Matrix Computations
Hua Huang (Georgia Institute of Technology) and Edmond Chow (Georgia Institute of Technology)



SESSION 12: Storage Systems

Session Chair: Tze Sing Eugene Ng


Data Jockey: Automatic Data Management for HPC Multi-Tiered Storage Systems
Woong Shin, Christopher D. Brumgard, Bing Xie, and Sudharshan S. Vazhkudai (Oak Ridge National Laboratory), Devarshi Ghoshal (Lawrence Berkeley National Laboratory), Sarp Oral (Oak Ridge National Laboratory), and Lavanya Ramakrishnan (Lawrence Berkeley National Laboratory)


NCQ-Aware I/O Scheduling for Conventional Solid State Drives
Hao Fan and Song Wu (HUST), Shadi Ibrahim (INRIA), Ximing Chen, Hai Jin, and Jiang Xiao (HUST)


Optimizing the Parity-Check Matrix for Efficient Decoding of RS-based Cloud Storage Systems
Junqing Gu, Chentao Wu, Xin Xie, Han Qiu, Jie Li, and Minyi Guo (Shanghai Jiao Tong University), Xubin He (Temple University), Yuanyuan Dong and Yafei Zhao (Alibaba Group)


D3: Deterministic Data Distribution for Efficient Data Reconstruction in Erasure-Coded Distributed Storage Systems
Zhipeng Li, Min Lv, Yinlong Xu, Yongkun Li, and Liangliang Xu (University of Science and Technology of China)

Parallel Technical Sessions 13, 14, 15, & 16
1:30 PM – 3:30 PM

SESSION 13: Applications I

Session Chair: Srinivas Aluru


SunwayLB: Enabling Extreme-Scale Lattice Boltzmann Method Based Computing Fluid Dynamics Simulations on Sunway TaihuLight
Zhao Liu (Tsinghua University Beijing), Xuesen Chu (China Ship Scientific Research Center), Xiaojing Lv, Hongsong Meng, and Shupeng Shi (National Supercomputing Center in Wuxi), Wenji Han (China Ship Scientific Research Center), Jingheng Xu, Haohuan Fu and Guangwen Yang (Tsinghua University Beijing)


Containers in HPC: A Scalability and Portability Study in Production Biological Simulations
Oleksandr Rudyy, Marta Garcia-Gasulla, Filippo Mantovani, Alfonso Santiago, Raül Sirvent, and Mariano Vazquez (Barcelona Supercomputing Center)


PaKman: Scalable Assembly of Large Genomes on Distributed Memory Machines
Priyanka Ghosh (Washington State University), Sriram Krishnamoorthy (Pacific Northwest National Laboratory), and Ananth Kalyanaraman (Washington State University)


Language Modeling at Scale
Mostofa Patwary (Baidu), Milind Chabbi (Unaffiliated), Heewoo Jun, Jiaji Huang, Greg Diamos, and Kenneth Church (Baidu)



SESSION 14: File Systems

Session Chair: Ian Foster


DYRS: Bandwidth-Aware Disk-to-Memory Migration of Cold Data in Big-Data File Systems
Simbarashe Dzinamarira (Rice University), Florin Dinu (University of Sydney), and T. S. Eugene Ng (Rice University)


iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems
Bharti Wadhwa and Arnab K. Paul (Virginia Tech), Sarah Neuwirth (University of Heidelberg), Feiyi Wang and Sarp Oral (Oak Ridge National Laboratory), Ali R. Butt, Jon Bernard, and Kirk W. Cameron (Virginia Tech)


SimFS: A Simulation Data Virtualizing File System Interface
Salvatore Di Girolamo, Pirmin Schmid, Thomas Shulthess, and Torsten Hoefler (ETH Zurich)


Sizing and Partitioning Strategies for Burst-Buffers to Reduce IO Contention
Guillaume Aupy, Olivier Beaumont, and Lionel Eyraud-Dubois (Inria, University of Bordeaux)


SESSION 15: GPU Computing II

Session Chair: Ozcan Ozturk


On Optimizing Complex Stencils on GPUs
Prashant Singh Rawat, Miheer Vaidya, Aravind Sukumaran-Rajam, and Atanas Rountev (The Ohio State University), Louis-Noël Pouchet (Colorado State University), and P. Sadayappan (The Ohio State University)


Themis: Predicting and Reining in Application-Level Slowdown on Spatial Multitasking GPUs
Wenyi Zhao and Quan Chen (Shanghai Jiao Tong University), Hao Lin and Jianfeng Zhang (Alibaba Group), Jingwen Leng, Chao Li, Wenli Zheng, Li Li, and Minyi Guo (Shanghai Jiao Tong University)


Exploiting Adaptive Data Compression to Improve Performance and Energy-efficiency of Compute Workloads in Multi-GPU Systems
Mohammad Khavari, Yifan Sun, Nicolas Bohm Agostini, and David Kaeli (Northeastern University)


Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures
Kyung Hoon Kim, Priyank Devpura, Abhishek Nayyar, Andrew Doolittle, Kihwan Yum, and Eun Jung Kim (Texas A&M University)



SESSION 16: Scheduling & Load Balancing II

Session Chair: Radu Prodan


Adapting Batch Scheduling to Workload Characteristics: What can we Expect From Online Learning?
Arnaud Legrand (CNRS), Denis Trystram (Grenoble INP), and Salah Zrigui (Université Grenoble Alpes)


Aladdin: Optimized Maximum Flow Management for Shared Production Clusters
Heng Wu, Wenbo Zhang, Yuanjia Xu, Hao Xiang, and Tao Huang (Institute of Software, Chinese Academy of Sciences), Haiyang Ding and Zheng Zhang (Alibaba Group, Hangzhou, China)


mmWave Wireless Backhaul Scheduling of Stochastic Packet Arrivals
Pawel Garncarek and Tomasz Jurdzinski (University of Wroclaw), Dariusz R. Kowalski (University of Liverpool), and Miguel A. Mosteiro (Pace University)


Tight & Simple Load Balancing
Petra Berenbrink (Universität Hamburg), Tom Friedetzky (Durham University), Dominik Kaaser and Peter Kling (Universität Hamburg)

Afternoon Break 3:30 PM - 4:00 PM

Plenary Panel

4:00 PM – 5:15 PM

Parallel Processing: Challenges for the Next Quarter Century


Panel chair: Marc Snir (UIUC)

Panelists: Tim Mattson (Intel), Manoj Kumar (IBM), Umit Catalyurek (Georgia Tech), Henry Tufo (University of Colorado)


Description: In 1994, parallel processing was still trying its baby teeth. Most supercomputers had 1 to 32 processor… Read more





BANQUET – Details to be announced

THURSDAY - 23 May 2019


Keynote Session
8:30 AM - 9:30 AM



Session Chair:
Jose Moreira


The Path to Delivering Programable Exascale Systems

Luiz DeRose
Cray Inc.


Abstract: The trends in hardware architecture are paving the road towards Exascale. However, these… Read more


Morning Break 9:30 AM - 10:00 AM

Parallel Technical Sessions 17, 18, 19, & 20
10:00 AM - 12:00 PM

SESSION 17: Managing Data

Session Chair: Ali Butt


An Error-Reflective Consistency Model for Distributed Data Stores
Philip Dexter and Kenneth Chiu (SUNY Binghamton), Bedri Sendir (IBM Research)


A High-Performance Distributed Relational Database System for Scalable OLAP Processing
Jason Arnold, Boris Glavic, and Ioan Raicu (Illinois Institute of Technology)


An Approach for Parallel Loading and Pre-Processing of Unstructured Meshes Stored in Spatially Scattered Fashion
Ondřej Meca, Lubomír Říha, and Tomáš Brzobohatý (IT4Innovations, Technical University of Ostrava, Czech Republic)



SESSION 18: Message Passing

Session Chair: Bora Ucar


Exploring MPI Communication Models for Graph Applications Using Graph Matching as a Case Study
Sayan Ghosh (Washington State University), Mahantesh Halappanavar (Pacific Northwest National Laboratory),
Ananth Kalyanaraman (Washington State University), Arif Khan (Pacific Northwest National Laboratory), and Assefaw Gebremedhin (Washington State University)


BigSpa: An Efficient Interprocedural Static Analysis Engine in the Cloud
Zhiqiang Zuo, Rong Gu, Xi Jiang, Zhaokang Wang, Yihua Huang, Linzhang Wang, and Xuandong Li (Nanjing University)


An Efficient Collaborative Communication Mechanism for MPI Neighborhood Collectives
S. Mahdieh Ghazimirsaeed, Seyed H. Mirsadeghi, and Ahmad Afsahi (Queen's University)


SESSION 19: Managing Power and Energy

Session Chair: Sudharshan Vazhkudai


Understanding the Impact of Dynamic Power-Capping on Application Progress
Srinivasan Ramesh (University of Oregon), Swann Perarnau and Sridutt Bhalachandra (Argonne National Laboratory), Allen Malony (University of Oregon), and Pete Beckman (Argonne National Laboratory)


Modelling DVFS and UFS for Region-Based Energy Aware Tuning of HPC Applications
Mohak Chadha and Michael Gerndt (Technische Universität München)


SprintCon: Controllable and Efficient Computational Sprinting for Data Center Servers
Wenli Zheng (Shanghai Jiao Tong University), Xiaorui Wang (The Ohio State University), Yue Ma and Chao Li (Shanghai Jiao Tong University), Hao Lin (Alibaba Group), Bin Yao (Shanghai Jiao Tong University), Jianfeng Zhang (Alibaba Group), Minyi Guo (Shanghai Jiao Tong University)


Drowsy-DC: Data center power management system
Mathieu Bacou (IRIT, Université de Toulouse, CNRS, Toulouse; Atos Intégration, Toulouse), Grégoire Todeschi (IRIT, Université de Toulouse, CNRS, Toulouse, France), Alain Tchana (I3S), Daniel Hagimont (IRIT, Université de Toulouse, CNRS, Toulouse, France), Baptiste Lepers and Willy Zwaenepoel (EPFL)



SESSION 20: Networks

Session Chair: Jacir Bordim


Distributed Dominating Set and Connected Dominating Set Construction in the Dynamic SINR Model
Dongxiao Yu (Institute of Intelligent Computing, School of Computer Science and Technology, Shandong University), Yifei Zou (Department of Computer Science, The University of Hong Kong), Yong Zhang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Feng Li (Institute of Intelligent Computing, School of Computer Science and Technology, Shandong University), Jiguo Yu (Qilu University of Technology, Shandong Computer Science Center), Yu Wu (School of Computer Science and Technology, Dongguan University of Technology), Xiuzhen Cheng (Institute of Intelligent Computing, School of Computer Science and Technology, Shandong University), and Francis C.M. Lau (Department of Computer Science, The University of Hong Kong)


MULTISKIPGRAPH: A Self-stabilizing Overlay Network that Maintains Monotonic Searchability
Linghui Luo (Paderborn University, Heinz Nixdorf Institut), Christian Scheideler and Thim Strothmann (Paderborn University)


Network Size Estimation in Small-World Networks under Byzantine Faults
Soumyottam Chatterjee and Gopal Pandurangan (University of Houston), Peter Robinson (McMaster University)


MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets
Corentin Hardy (Technicolor, Inria), Erwan Le Merrer (Technicolor), and Bruno Sericola (Inria)


Atos Bull
Tech Talk

12:00-1:00 PM

Reducing the Gap among Science, Value proposition and Supercomputing


Presented by Genaro Costa of Atos Bull


Abstract: Within the data science buzzword comes a reality that data is everywhere and so the computational needs. The data analysis on demand, fast response to business operations, opens the Read more

Parallel Technical Sessions 21, 22, 23
2:00 PM - 4:00 PM

SESSION 21: Dealing with Faults

Session Chair: Ignacio Laguna


MOARD: Modeling Application Resilience to Transient Faults on Data Objects
Luanzheng Guo and Dong Li (University of California, Merced)


SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications
Giorgis Georgakoudis (Lawrence Livermore National Laboratory, Queen's Unversity Belfast), Ignacio Laguna (Lawrence Livermore National Laboratory), Hans Vandierendonck and Dimitrios S. Nikolopoulos (Queen's University Belfast), Martin Schulz (Technische Universität München)


Optimal Placement of In-Memory Checkpoints Under Heterogeneous Failure Likelihoods
Zaeem Hussain, Taieb Znati, and Rami Melhem (University of Pittsburgh)


VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale
Bogdan Nicolae (Argonne National Laboratory), Adam Moody, Elsa Gonsiorowski, and Kathryn Mohror (Lawrence Livermore National Laboratory), Franck Cappello (Argonne National Laboratory)



SESSION 22: Optimizing Memory Behavior

Session Chair: Hans-Ulrich Heiss


HART: A Concurrent Hash-Assisted Radix Tree for DRAM-PM Hybrid Memory Systems
Wen Pan, Tao Xie, and Xiaojia Song (San Diego State University)


LLC-guided Data Migration in Hybrid Memory Systems
Evangelos Vasilakis (Chalmers University of Technology, CSE Dept.), Vassilis Papaefstathiou (Foundation for Research and Technology – Hellas/FORTH), Pedro Trancoso and Ioannis Sourdis (Chalmers University of Technology, CSE Dept)


Software-based Buffering of Associative Operations on Random Memory Addresses
Matthias Hauck (Heidelberg University/SAP, SAP SE), Marcus Paradies (Deutsches Zentrum für Luft- und Raumfahrt), and Holger Fröning (Heidelberg University)


Combining Prefetch Control and Cache Partitioning to Improve Multicore Performance
Gongjin Sun, Junjie Shen, and Alexander V. Veidenbaum (University of California, Irvine)



SESSION 23: Programming Languages

Session Chair: Jose Nelson Amaral


UPC++: A High-Performance Communication Framework for Asynchronous Computation
John Bachan, Scott B. Baden, Steven Hofmeyr, Mathias Jacquelin, Amir Kamil, Dan Bonachea, Paul H. Hargrove, and Hadia Ahmed (Lawrence Berkeley National Laboratory)


Cpp-Taskflow: Fast Task-based Parallel Programming using Modern C++
Tsung-Wei Huang, Chun-Xun Lin, Guannan Guo, and Martin D. F. Wong (University of Illinois Urbana-Champaign)


Portal: A High-Performance Language and Compiler for Parallel N-body Problems
Laleh Aghababaie Beni, Saikiran Ramanan, and Aparna Chandramowlishwaran (University of California, Irvine)


SAC Goes Cluster: Fully Implicit Distributed Computing
Thomas Macht (University of Amsterdam, VU University Amsterdam) and Clemens Grelck (University of Amsterdam)

Afternoon Break 4:00 PM - 4:30 PM

Parallel Technical Sessions 24, 25, & 26
4:30 PM - 6:00 PM

SESSION 24: Accelerating Graph Processing

Session Chair: Luiz DeRose


Incremental Graph Processing for On-Line Analytics
Scott Sallinen (University of British Columbia), Roger Pearce (Lawrence Livermore National Labs), Matei Ripeanu (University of British Columbia)


Incrementalization of Vertex-Centric Programs
Timothy A. K. Zakian (University of Oxford), Ludovic A. R. Capelli (University of Edinburgh), and Zhenjiang Hu (National Institute of Informatics, University of Tokyo)


GraphTinker : A High Performance Data Structure for Dynamic Graph Processing
Wole Jaiyeoba and Kevin Skadron (University of Virginia)



SESSION 25: Applications II

Session Chair: Ananth Kalyanaraman


FastJoin: A Skewness-Aware Distributed Stream Join System
Shunjie Zhou, Fan Zhang, Hanhua Chen, and Hai Jin (Huazhong University of Science and Technology), Bing Bing Zhou (The University of Sydney)


A Bin-Based Bitstream Partitioning Approach for Parallel CABAC Decoding in Next Generation Video Coding
Philipp Habermann (Technische Universität Berlin), Chi Ching Chi and Mauricio Alvarez-Mesa (Spin Digital Video Technologies GmbH), Ben Juurlink (Technische Universität Berlin)


Stochastic Gradient Descent on Modern Hardware: Multi-core CPU or GPU? Synchronous or Asynchronous?
Yujing Ma, Florin Rusu, and Martin Torres (University of California Merced)



SESSION 26: Security and Reliability

Session Chair: Taieb Znati


Always be Two Steps Ahead of Your Enemy
Thorsten Götte (Paderborn University), Vipin Ravindran Vijayalakshmi (RWTH Aachen), and Christian Scheideler (Paderborn University)


Peace Through Superior Puzzling: An Asymmetric Sybil Defense
Diksha Gupta and Jared Saia (University of New Mexico), Maxwell Young (Mississippi State University)


Rethinking Support for Region Conflict Exceptions
Swarnendu Biswas (IIT Kanpur), Rui Zhang and Michael D. Bond (Ohio State University), Brandon Lucia (Carnegie Mellon University)

FRIDAY - 24 May 2019



on FRIDAY 24 MAY 2019


Parallel and Distributed Scientific and Engineering Computing



International Workshop on Automatic Performance Tunings



Parallel Programming Model: Special Edition on Edge/Fog/In-Situ Computing



Scalable Networks for Advanced Computing Systems Workshop



Parallel AI and Systems for the Edge



Workshop on Resource Arbitration



Scalable Deep Learning over Parallel and Distributed Infrastructure


8:30 AM

ShowCase Brazil@IPDPS 2019


An all-day workshop to promote Brazilian research in PDC and give visibility to the work of companies, universities and research centers in the country and help stimulate cooperation in the application of cutting-edge technology for the benefit of all.


See more…

2019 Keynote Speakers

Ian Foster
Argonne and University of Chicago

Tuesday, May 21st

Title: Coding the Continuum

Abstract: In 2001, as early high-speed networks were deployed, George Gilder observed that “when the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances.” Two decades later, our networks are 1,000 times faster, our appliances are increasingly specialized, and our computer systems are indeed disintegrating. As hardware acceleration overcomes speed-of-light delays, time and space merge into a computing continuum. Familiar questions like “where should I compute,” “for what workloads should I design computers,” and "where should I place my computers” seem to allow for a myriad of new answers that are exhilarating but also daunting. Are there concepts that can help guide us as we design applications and computer systems in a world that is untethered from familiar landmarks like center, cloud, edge? I propose some ideas and report on experiments in coding the continuum.

Bio: Ian Foster has been working to code the continuum for more than 30 years, first as a PhD student at Imperial College, London, and then as a scientist at Argonne National Laboratory and a professor of computer science at the University of Chicago. He is currently Senior Scientist and Distinguished Fellow, and also director of the Data Science and Learning Division, at Argonne, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at Chicago. Ian received a BSc (Hons I) degree from the University of Canterbury, New Zealand, and a PhD from Imperial College, United Kingdom, both in computer science. His research deals with distributed, parallel, and data-intensive computing technologies, and innovative applications of those technologies to scientific problems in such domains as materials science, climate change, and biomedicine. His Globus software is widely used in national and international cyberinfrastructures. Foster is a fellow of the American Association for the Advancement of Science, Association for Computing Machinery, and British Computer Society. His awards include the Global Information Infrastructure Next Generation award, British Computer Society Lovelace Medal, and IEEE Kanai award. He has received honorary doctorates from the University of Canterbury, New Zealand, and the Mexican Center for Research and Advanced Studies of the National Polytechnic Institute. He co-founded Univa, Inc., a company established to deliver grid and cloud computing solutions, and Praedictus Climate Solutions, which combines data science and high performance computing for quantitative agricultural forecasting.

Lawrence Rauchwerger
Texas A&M University

Wednesday, May 22nd

Title: Two Roads to Parallelism: From Serial Code to Programming with STAPL

Abstract: Parallel computers have come of age and need parallel software to justify their usefulness. There are two major avenues to get programs to run in parallel: parallelizing compilers and parallel languages and/or libraries. In this talk we present our latest results using both approaches and draw some conclusions about their relative effectiveness and potential.

In the first part we introduce the Hybrid Analysis (HA) compiler framework that can seamlessly integrate static and run-time analysis of memory references into a single framework capable of full automatic loop level parallelization. Experimental results on 26 benchmarks show full program speedups superior to those obtained by the Intel Fortran compilers.

In the second part of this talk we present the Standard Template Adaptive Parallel Library (STAPL) based approach to parallelizing code. STAPL  is a collection of generic data structures and algorithms that provides a high productivity, parallel programming infrastructure analogous to the C++ Standard Template Library (STL). In this talk, we provide an overview of the major STAPL components with particular emphasis on graph algorithms. We then present scalability results of real codes using peta scale machines such as IBM BG/Q and Cray. Finally we present some of our ideas for future work in this area.

Bio: Lawrence Rauchwerger is the Eppright Professor of Computer Science and Engineering at Texas A&M University and the co-Director of the Parasol Lab. He is currently a visiting researcher at Google Brain and will be joining the University of Illinois at Urbana-Champaign in the Fall of 2019. He received an Dipl. Engineer degree from the Polytechnic Institute Bucharest, an M.S. in Electrical Engineering from Stanford University and a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. He has held Visiting Faculty positions at the University of Illinois, Bell Labs, IBM T.J. Watson, INRIA, Paris and ETH Zurich. Rauchwerger's  approach to auto-parallelization, thread-level speculation and parallel code development has influenced industrial products at corporations such as IBM, Intel and Sun. Rauchwerger is an IEEE Fellow, an AAAS Fellow, an NSF CAREER award recipient and has chaired various IEEE and ACM conferences, most recently serving as Program Chair of PACT 2016 and  PPoPP 2017.


Luiz DeRose
Cray Inc.

Thursday, May 23rd

Title: The Path to Delivering Programable Exascale Systems

Abstract: The trends in hardware architecture are paving the road towards Exascale. However, these trends are also increasing the complexity of design and development of the software developer environment that is deployed on modern supercomputers. Moreover, the scale and complexity of high-end systems creates a new set of challenges for application developers. Computational scientists are facing system characteristics that will significantly impact the programmability and scalability of applications. In order to address these issues, software architects need to take a holistic view of the entire system and deliver a high-level programming environment that can help maximize programmability, while not losing sight of performance portability. In this talk, I will discuss the current trends in computer architecture and their implications in application development and will present Cray’s high level parallel programming environment for performance and programmability on current and future supercomputers. I will also discuss some of the challenges and open research problems that need to be addressed in order to build a software developer environment for extreme-scale systems that helps users solve multi-disciplinary and multi-scale problems with high levels of performance, programmability, and scalability.

Bio: Luiz DeRose is a Senior Principal Engineer and the Programming Environments Director at Cray Inc, where he is responsible for the programming environment strategy for all Cray systems. Before joining Cray in 2004, he was a research staff member and the Tools Group Leader at the Advanced Computing Technology Center at IBM Research. Dr. DeRose has a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. With more than 25 years of high performance computing experience and a deep knowledge of its programming environments, he has published more than 50 peer-review articles in scientific journals, conferences, and book chapters, primarily on the topics of compilers and tools for high performance computing.

2019 Tutorial Presenter

Pedro Mário Cruz e Silva
Tuesday, May 21st

Abstract: Learn how to accelerate your C/C++ or Fortran application using OpenACC to harness the massively parallel power of NVIDIA GPUs. OpenACC is a directive based approach to computing where you provide compiler hints to accelerate your code, instead of writing the accelerator code yourself. In 2 hours, you will participate in a four-step process for accelerating applications using OpenACC: Characterize and profile your application; Add compute directives; Add directives to optimize data movement; and Optimize your application using kernel scheduling.

Bio: Pedro Mário Cruz e Silva did his BSc (1995) and MSc (1998) at Federal University of Pernambuco (UFPE), and he completed his DSc in 2004 at PUC-Rio. For 15 years, he worked as Manager of Computational Geophysics Group at PUC-Rio, and during this period was responsible for several Software Development and R&D projects for Geophysics with a strong focus on innovation. He also finished an MBA in 2015 at Getúlio Vargas Foundation (FGV/RJ).  He is a member of the main board of The Brazilian Geophysical Society (SBGf) and currently serves as the Solution Architect Manager at NVIDIA responsible for all technologies in the Latin America Region.


IPDPS 2019 Panel 

Wednesday, May 22nd

Title: The Path to Delivering Programable Exascale Systems

Description: In 1994, parallel processing was still trying its baby teeth. Most supercomputers had 1 to 32 processor cores and a few "massively parallel systems" had a few thousand cores. But most computers in the world (scientific, commercial or personal) were single-core machines. Maybe even more significant, programming these parallel systems was nothing short of a heroic effort, despite the availability of some languages and libraries. Fast forward 25 year to the present day and things have changed. Parallel processing is ubiquitous, from multi-core cell phones to million-core supercomputers. Personal computers have graphics and computing accelerators with thousands of processing elements. There are new languages, frameworks and libraries to help program these systems. What can we expect from the next 25 years? Are we going to see simply a growth in scale (more is better), with another 1000-fold increase in the number of cores? Will MPI still be a major programming environment, just like it has been for the last 25 years? Or are we going to see more radical changes, pushed from new technologies and pulled from new demands? These are some of the questions that this panel will address. The panelists will present their views and engage the conference participants for a healthy and productive discussion regarding the future of parallel processing.


2019 Tech Talk Presenter

Genaro Costa
Atos Bull
Thursday, May 23rd

Title: Reducing the Gap among Science, Value proposition and Supercomputing

Abstract: Within the data science buzzword comes a reality that data is everywhere and so the computational needs. The data analysis on demand, fast response to business operations, opens the opportunity for the HPC usage on the enterprise market, scaling from research to operations. Atos had invested in many fields, from processor development, machine integration, software stack, quantum computing and consulting services. The dissemination of the value of HPC is hard. There are many initiatives to overcome that, like scientific applications, SaaS like access and multidisciplinary teams. Most of the applications are not ready for the least hardware advances, and so, there are a rising demand for offload the HPC complexity on non-computer science experts. On this presentation we will show the possibilities of integration among the academia and the industry, proposing a model of R&D as a service, show the new demands for the HPC market and what challenges we had in Brazil market.

Bio: Genaro Costa works as HPC Expert / Project Manager at Atos Bull, coordinating Atos R&D Labs at SENAI-CIMATEC center. He was Adjunct Professor at UFBA, coordinator of the Interdisciplinary Bachelor’s in Science and Technology course and vice-director of the Institute of Humanities Arts and Sciences Professor Milton Santos. He graduated in Computer Science from UCSal, and received Masters and PhD in Informatics from Universitat Autónoma de Barcelona (UAB). He managed several innovation projects through partnership between academia and industry. His research interest is in High Performance Computing, Machine Learning, Big Data, Performance Models and Prescriptive Analytics.


March 22nd Deadline
for Advance Registration

Extended to March 29th

Registration Details

Search IPDPS


Follow IPDPS


Tweets by @IPDPS

IPDPS 2018 Report

32nd IEEE International Parallel &
Distributed Processing Symposium 
May 21 – May 25, 2018
JW Marriott Parq Vancouver
Vancouver, British Columbia CANADA