General IPDPS Info



2018 Advance Program

Please visit the IPDPS website regularly for updates, since there may be schedule revisions. Authors who have corrections should send email to giving full details.

MONDAY - 21 May 2018


* See each individual
workshop program
for schedule details




Heterogeneity in Computing Workshop



Reconfigurable Architectures Workshop



High Performance Computational Biology



Graph Algorithms Building Blocks



NSF/TCPP W. on Parallel and Distributed Computing Education



High Level Programming Models and Supportive Environments



High-Performance Big Data, Deep Learning, and Cloud Computing



Accelerators and Hybrid Exascale Systems



Parallel / Distributed Computing and Optimization



High-Performance, Power-Aware Computing



Advances in Parallel and Distributed Computational Models



Parallel and Distributed Computing for Large-Scale Machine Learning and Big Data Analytics


TUESDAY - 22 May 2018


Opening Session
8:00 AM - 8:30 AM

Opening Session:

Keynote Session
8:30 AM - 9:30 AM


Michael Bender

Stony Brook University


Morning Break 9:30 AM -10:00 AM

PhD Forum
All day

PhD Forum Posters

On Display All Day Tuesday and Wednesday

Parallel Technical
Sessions 1, 2, 3, & 4

10:00 AM - 12:00 PM

SESSION 1: Graph Algorithms 1


MIDAS: Multilinear Detection at Scale
Saliya Ekanayake (Virginia Tech), Jose Cadena (Virginia Tech), Udayanga Wickramasinghe (Indiana University Bloomington), Anil Kumar Vullikanti (Virginia Tech)


Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling
Michael Sutton (The Hebrew University of Jerusalem), Tal Ben-Nun (ETH Zurich), Amnon Barak (The Hebrew University of Jerusalem)

Parallel Algorithms through Approximation: b-Edge Cover
Alex Pothen (Purdue University), Arif Khan (Pacific Northwest National Lab), S. M. Ferdous (Purdue University)


A Parallel Algorithm for Bayesian Network Inference using Arithmetic Circuits
Md Vasimuddin (Indian Institute of Technology Bombay), Sriram P. Chockalingam (Georgia Institute of Technology), Srinivas Aluru (Georgia Institute of Technology)



SESSION 2: Large-Scale Applications 1


Cataloging the Visible Universe through Bayesian Inference at Petascale

Jeffrey Regier (University of California, Berkeley), Kiran Pamnany (Intel), Keno Fischer (Julia Computing), Andreas Noack (Massachusetts Institute of Technology), Maximilian Lam (University of California, Berkeley), Jarrett Revels (Massachusetts Institute of Technology), Steve Howard (University of California, Berkeley), Ryan Giordano (University of California, Berkeley), David Schlegel (Lawrence Berkeley National Laboratory), Jon McAuliffe (University of California, Berkeley), Rollin Thomas (Lawrence Berkeley National Laboratory), Prabhat (Lawrence Berkeley National Laboratory)


Efficient, Parallel At-Scale Correlation Analysis for Atom Probe Tomography on Hybrid Architectures
Hao Lu (Oak Ridge National Laboratory), Sudip Seal (Oak Ridge National Laboratory), Wei Guo (Oak Ridge National Laboratory), Jonathan Poplawsky (Oak Ridge National Laboratory)


A Fast and Massively-Parallel Solver for Nonlinear Tomographic Image Reconstruction
Mert Hidayetoglu (University of Illinois Urbana-Champaign), Carl Pearson (University of Illinois Urbana-Champaign), Izzat El Hajj (University of Illinois Urbana-Champaign), Levent Gurel (University of Illinois Urbana-Champaign), Weng Cho Chew (University of Illinois Urbana-Champaign), Wen-Mei Hwu (University of Illinois Urbana-Champaign)


Real-Time Massively Distributed Multi-Object Adaptive Optics Simulations for the European Extremely Large Telescope
Hatem Ltaief (KAUST), Ali Charara (KAUST), Damien Gratadour (LESIA - Observatoire de Paris), Nicolas Doucet (LESIA - Observatoire de Paris), Bilel Hadri (KAUST Supercomputing Lab), Eric Gendron (LESIA - Observatoire de Paris), Saber Feki (KAUST), David Keyes (KAUST)



SESSION 3: Performance / QoS / Resilience


Performance Isolation of Data-Intensive Scale-out Applications in a Multi-tenant Cloud
Palden Lama (University of Texas at San Antonio), Shaoqi Wang (University of Colorado, Colorado Springs), Xiaobo Zhou (University of Colorado, Colorado Springs), Dazhao Cheng (UNC Charlotte)


QoS Support for Scientific Workflows using Software-Defined Storage Resource Enclaves
Suman Karki (Washington State University), Bao Nguyen (Washington State University), Xuechen Zhang (Washington State University)


Scalable Data Resilience for In-Memory Data Staging
Shaohua Duan (Rutgers Discovery Informatics Institute), Pradeep Subedi (Rutgers Discovery Informatics Institute), Keita Teranishi (Sandia National Laboratories), Philip Davis (Rutgers Discovery Informatics Institute), Hemanth Kolla (Sandia National Laboratories), Marc Gamell (Intel), Manish Parashar (Rutgers Discovery Informatics Institute)


Performance and Scalability of Lightweight Multi-Kernel based Operating Systems
Balazs Gerofi (RIKEN Advanced Institute For Computational Science), Rolf Riesen (Intel), Masamichi Takagi (RIKEN Advanced Institute For Computational Science), Taisuke Boku (University of Tsukuba), Yutaka Ishikawa (RIKEN Advanced Institute For Computational Science), Robert W. Wisniewski (Intel)


SESSION 4: Memory Designs and Optimizations


Architectural support for unlimited memory versioning and renaming
Eran Gilad (Technion), Tehila Mayzels (Technion - Israel Institute of Technology), Elazar Raab (Technion - Israel Institute of Technology), Mark Oskin (University of Washington), Yoav Etsion (Technion - Israel Institute of Technology)


CTA-Aware Prefetching and Scheduling for GPU
Gunjae Koo (University of Southern California), Hyeran Jeon (San Jose State University), Zhenhong Liu (University of Illinois at Urbana–Champaign), Nam Sung Kim (University of Illinois at Urbana–Champaign), Murali Annavaram (University of Southern California)


CIAO: Cache Interference-Aware Throughput-Oriented Architecture and Scheduling
Jie Zhang (Yonsei University), Shuwen Gao (Intel), Nam Sung Kim (University of Illinois at Urbana-Champaign), Myoungsoo Jung (Yonsei University)


Millipede: Memory Optimizations in Die-Stacked Architectures for Big Data Machine Learning
Nitin (NVIDIA), Mithuna Thottethodi (Purdue University), T. N. Vijaykumar (Purdue University)

Parallel Technical Sessions 5, 6, 7, & 8
1:30 PM - 3:30 PM

SESSION 5: Scheduling


Scheduling Monotone Moldable Jobs in Linear Time
Klaus Jansen (University of Kiel), Felix Land (Cristian-Albrechts-Universität zu Kiel)


The Power to Schedule a Parallel Program
Kunal Agrawal (Washington University in Saint Louis), Seth Gilbert (National University of Singapore)


Scheduling Parallel Tasks under Multiple Resources: List Scheduling vs. Pack Scheduling
Hongyang Sun (Vanderbilt University), Redouane Elghazi (ENS Lyon), Ana Gainaru (Vanderbilt University), Guillaume Aupy (INRIA), Padma Raghavan (Vanderbilt University)


Parallel scheduling of DAGs under memory constraints
Loris Marchal (CNRS), Hanna Nagy (Technical University of Cluj-Napoca), Bertrand Simon (ENS Lyon), Frédéric Vivien (INRIA)


SESSION 6: Learning


Evaluating Active Learning with Cost and Memory Awareness
Dmitry Duplyakin (University of Utah), Jed Brown (University of Colorado Boulder), Donna Calhoun (Boise State University)


Semantics-Preserving Parallelization of Stochastic Gradient Descent
Saeed Maleki (Microsoft), Madanlal Musuvathi (Microsoft), Todd Mytkowicz (Microsoft)


Efficient Gradient Boosted Decision Tree Training on GPUs
Zeyi Wen (National University of Singapore), Bingsheng He (National University of Singapore), Ramamohanarao Kotagiri (The University of Melbourne), Shengliang Lu (National University of Singapore), Jiashuai Shi (National University of Singapore)


BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU
Yuwei Hu (TuSimple Inc), Jidong Zhai (Tsinghua University), Dinghua Li (TuSimple Inc), Yifan Gong (TuSimple Inc), Yuhao Zhu (University of Rochester), Wei Liu (TuSimple Inc), Lei Su (TuSimple Inc), Jiangming Jin (TuSimple Inc)


SESSION 7: Compilers and Libraries


Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort
Michael Axtmann (Karlsruhe Institute of Technology), Armin Wiebigke (Karlsruhe Institute of Technology), Peter Sanders (Karlsruhe Institute of Technology)


Improving Network Throughput with Global Communication Reordering
Wim Lavrijsen (LBNL), Costin Iancu (LBNL), Xing Pan (NC State)


Highly Efficient Compensation-based Parallelism for Wavefront Loops on GPUs
Kaixi Hou (Virginia Tech), Hao Wang (Virginia Tech), Wu-Chun Feng (Virginia Tech), Jeffrey Vetter (Oak Ridge National Lab), Seyong Lee (Oak Ridge National Lab)


Development and application of a hybrid programming environment on an ARM/DSP system for High Performance Computing
Gaurav Mitra (Texas Instruments Inc.), Jonathan Bohmann (Southwest Research Institute), Ian Lintault (nCore HPC), Alistair Rendell (Australian National University)


SESSION 8: Storage Systems


GC-aware Request Steering with Improved Performance and Reliability for SSD-based RAIDs
Suzhen Wu (Xiamen University), Weidong Zhu (Xiamen University), Guixin Liu (Xiamen University), Hong Jiang (University of Texas-Arlington), Bo Mao (Xiamen University)


A Set-aware Key-Value Store on Shingled Magnetic Recording Drives with Dynamic Band
Ting Yao (Wuhan National Laboratory For Optoelectronics, Huazhong University of Science & Technology), Jiguang Wan (Wuhan National Laboratory For Optoelectronics, Huazhong University of Science & Technology), Ping Huang (Temple University), Xubin He (Temple University), Yiwen Zhang (Wuhan National Laboratory For Optoelectronics, Huazhong University of Science & Technology), Zhihu Tan (Huazhong University of Science and Technology), Changsheng Xie (Huazhong University of Science and Technology)


Software-Hardware Managed Last-level Cache Allocation Scheme for Large-Scale NVRAM-based Multicores Executing Parallel Data Analytics Applications
Masab Ahmad (University of Connecticut), Halit Dogan (University of Connecticut), Fabio Checconi (IBM), Xinyu Que (IBM), Daniele Buono (IBM), Omer Khan (University of Connecticut)


MOCA: Memory Object Classification and Allocation in Heterogeneous Memory Systems
Aditya Narayan (Boston University), Tiansheng Zhang (Boston University), Shaizeen Aga (University of Michigan), Satish Narayanasamy (University of Michigan), Ayse Coskun (Boston University)

Afternoon Break 3:30 PM - 4:00 PM

Best Papers

4:00 PM - 6:00 PM

Best Paper Nominees - Plenary


Communication-free Massively Parallel Graph Generation
Daniel Funke (Karlsruhe Institute of Technology), Sebastian Lamm (Karlsruhe Institute of Technology), Peter Sanders (Karlsruhe Institute of Technology), Christian Schulz (University of Vienna), Darren Strash (Colgate University), Moritz von Looz (Karlsruhe Institute of Technology)


Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Tao Lu (New Jersey Institute of Technology), Qing Liu (New Jersey Institute of Technology), Xubin He (Temple University), Huizhang Luo (New Jersey Institute of Technology), Eric Suchyta (Oak Ridge National Laboratory), Norbert Podhorszki (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory), Matthew Wolf (Oak Ridge National Laboratory), Tong Liu (Temple University)


UBIS: Utilization-aware cluster scheduling
Karthik Kambatla (Facebook), Vamsee Yarlagadda (Cloudera), Íñigo Goiri (Microsoft), Ananth Grama (Purdue University)


Hardware Transactional Memory meets Persistent Memory
Daniel Castro (Instituto Superior Técnico & INESC-ID), Paolo Romano (Instituto Superior Técnico & INESC-ID), João Barreto (Instituto Superior Técnico & INESC-ID)

Industry Event
7:00 PM
To be announced

WEDNESDAY - 23 May 2018


Keynote Session
8:30 AM – 9:30 AM



Keren Bergman
Columbia University

Morning Break 9:30 AM - 10:00 AM

PhD Forum
All day

PhD Forum Posters

On Display All Day Tuesday and Wednesday

Parallel Technical
Sessions 9, 10, 11, & 12

10:00 AM - 12:00 PM

SESSION 9: Numerical Algorithms


Large Bandwidth-Efficient FFTs on Multicore and Multi-Socket Systems
Doru Thom Popovici (Carnegie Mellon University), Tze Meng Low (Carnegie Mellon University), Franz Franchetti (Carnegie Mellon University)


Lattice H-Matrices on Distributed-Memory Systems
Akihiro Ida (The University of Tokyo)


Evaluating the Performance and Cost of Accelerating Seismic Processing with CUDA, OpenCL, OpenACC, and OpenMP
Tiago Lobato Gimenes (HPG Lab), Flávia Pisani (LMCAD-Unicamp), Edson Borin (Unicamp)


Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization
Aditya Devarakonda (University of California, Berkeley), Fountoulakis Kimon (University of California, Berkeley), James Demmel (University of California, Berkeley), Michael Mahoney (University of California, Berkeley)


SESSION 10: GPU Hashing and Searching


A Dynamic Hash Table for the GPU
Saman Ashkiani (University of California, Davis), Martin Farach-Colton (Rutgers University), John D. Owens (University of California, Davis)


GPU LSM: A Dynamic Dictionary Data Structure for the GPU
Saman Ashkiani (University of California, Davis), Shengren Li (University of California, Davis), Martin Farach-Colton (Rutgers University), Nina Amenta (University of California, Davis), John D. Owens (University of California, Davis)


WarpDrive: Massively Parallel Hashing on Multi-GPU Nodes
Daniel Jünger (University of Mainz), Christian Hundt (University of Mainz), Bertil Schmidt (University of Mainz)


Quotient Filters: Approximate Membership Queries on the GPU
Afton Geil (University of California, Davis), Martin Farach-Colton (Rutgers University), John Owens (University of California, Davis)


SESSION 11: Domain-Specific, Runtime and Autotuning


BabelFlow: An Embedded Domain Specific Language for Parallel Analysis and Visualization
Steve Petruzza (SCI Institute - University of Utah), Sean Treichler (Stanford University), Valerio Pascucci (SCI Institute - University of Utah), Peer-Timo Bremer (Lawrence Livermore National Lab)


Online Tuning of Parallelism Degree in Parallel Nesting Transactional Memory
Jingna Zeng (IST), Paolo Romano (INESC-ID/IST), Joao Barreto (INESC-ID/Technical University Lisbon), Luis Rodrigues (IST/INESC-ID), Seif Haridi (SICS)


Work-Stealing, Locality-Aware Actor Scheduling
Saman Barghi (University of Waterloo), Martin Karsten (University of Waterloo)


Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction
Michael Driscoll (University of California, Berkeley), Benjamin Brock (University of California, Berkeley), Frank Ong (University of California, Berkeley), Jonathan Tamir (University of California, Berkeley), Hsiou-Yuan Liu (University of California, Berkeley), Michael Lustig (University of California, Berkeley), Armando Fox (University of California, Berkeley), Katherine Yelick (University of California, Berkeley and Lawrence Berkeley National Laboratory)


SESSION 12: Resource Management


Swallow: Joint Online Scheduling and Coflow Compression in Datacenter Networks
Qihua Zhou (Nanjing University of Posts and Telecommunications), Peng Li (The University of Aizu), Kun Wang (Nanjing University of Posts and Telecommunications), Deze Zeng (China University of Geoscience), Song Guo (The Hong Kong Polytechnic University), Minyi Guo (Shanghai Jiao Tong University)


Auto-tuning Streamed Applications on Intel Xeon Phi
Peng Zhang (National University of Defense Technology), Jianbin Fang (National University of Defense Technology), Tao Tang (College of computer science, National University of Defense Technology, China), Canqun Yang (NUDT), Zheng Wang (Lancaster University)


Analyzing Resource Trade-offs in Hardware-overprovisioned Supercomputers
Ryuichi Sakamoto (The University of Tokyo), Tapasya Patki (Lawrence Livermore National Laboratory), Thang Cao (The University of Tokyo), Masaaki Kondo (The University of Tokyo), Koji Inoue (Kyushu University), Masatsugu Ueda (Kyushu University), Daniel Ellsworth (Lawrence Livermore National Laboratory), Barry Rountree (Lawrence Livermore National Laboratory), Martin Schulz (Lawrence Livermore National Laboratory)


Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Vivek Balasubramanian (Rutgers University), Matteo Turilli (Rutgers University), Weiming Hu (The Pennsylvania State University), Matthieu Lefebvre (Princeton University), Wenjie Lei (Princeton University), Guido Cervone (The Pennsylvania State University), Jeroen Tromp (Princeton University), Shantenu Jha (Rutgers University)

Parallel Technical Sessions 13, 14, 15, & 16
1:30 PM – 3:30 PM

SESSION 13: Tensors


A Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats
Peter Ahrens (Massachusetts Institute of Technology), Helen Xu (Massachusetts Institute of Technology), Nicholas Schiefer (Massachusetts Institute of Technology)


Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product
Grey Ballard (Wake Forest University), Nicholas Knight (New York University), Kathryn Rouse (Wake Forest University)


Blocking Optimization Techniques for Sparse Tensor Computation
Jee Choi (IBM), Xing Liu (Intel), Shaden Smith (University of Minnesota), Tyler Simon (University of Maryland, Baltimore County)


Efficient tensor transposition library for GPUs
Jyothi Vedurada (Indian Institute of Technology Madras), Arjun Suresh (The Ohio State University), Aravind Sukumaran Rajam (The Ohio State University), Jinsung Kim (The Ohio State University), Changwan Hong (The Ohio State University), Sriram Krishnamoorthy (Pacific Northwest National Lab), V. Krishna Nandivada (IIT Madras), Ajay Panyala (Pacific Northwest National Lab), Rohit Srivastava (The Ohio State University), P Sadayappan (The Ohio State University)


SESSION 14: Large Scale Applications 2


Do Developers Understand Floating Point?
Peter Dinda (Northwestern University), Conor Hetland (Northwestern University)


sDPF-RSA: Utilizing Floating-point Computing Power of GPUs for Massive Digital Signature Computations
Jiankuo Dong (School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China), Fangyu Zheng (State Key Laboratory of Information Security, Institute of Information Engineering, CAS, Beijing, China), Niall Emmart (School of Computer Science, University of Massachusetts, Amherst), Jingqiang Lin (State Key Laboratory of Information Security, Institute of Information Engineering, CAS, Beijing, China), Charles Weems (School of Computer Science, University of Massachusetts, Amherst)


Rethinking large-scale economic modeling for efficiency: optimizations for GPU and Xeon Phi clusters
Simon Scheidegger (University of Zurich), Dmitry Mikushin (University of Zurich), Felix Kübler (University of Zurich), Olaf Schenk (Institute of Computational Science, Faculty of Informatics, Universit ́a della Svizzera italiana)


A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-evolution Problems on Low-order Unstructured Finite Elements
Tsuyoshi Ichimura (The University of Tokyo), Kohei Fujita (The University of Tokyo), Masashi Horikoshi (Software and Solutions Group, Intel K.K.), Larry Meadows (Data Center Group, Intel Corporation), Kengo Nakajima (The University of Tokyo), Takuma Yamaguchi (The University of Tokyo), Kentaro Koyama (Frontier Computing Center, Fujitsu Limited), Hikaru Inoue (Frontier Computing Center, Fujitsu Limited), Akira Naruse (NVIDIA Corporation), Keisuke Katsushima (The University of Tokyo), Muneo Hori (The University of Tokyo), Maddegedara Lalith (The University of Tokyo)


SESSION 15: Data Operations


Characterizing Scheduling Delay for Low-latency Data Analytics Workloads
Wei Chen (University of Colorado, Colorado Springs), Aidi Pi (University of Colorado, Colorado Springs), Shaoqi Wang (University of Colorado, Colorado Springs), Xiaobo Zhou (University of Colorado, Colorado Springs)


Runtime Scheduling Policies for Distributed Graph Algorithms
Jesun Firoz (Pacific Northwest National Lab), Marcin Zalewski (Pacific Northwest National Lab), Martina Barnas (Indiana University Bloomington), Andrew Lumsdaine (Pacific Northwest National Lab and University of Washington)


Communication Efficient Checking of Big Data Operations
Lorenz Hübschle-Schneider (Karlsruhe Institute of Technology), Peter Sanders (Karlsruhe Institute of Technology)


What size should your burst buffers be ?
Guillaume Aupy (INRIA), Olivier Beaumont (INRIA), Lionel Eyraud-Dubois (INRIA)


SESSION 16: Power and Temperature


THOR: THermal-aware Optimizations for extending ReRAM lifetime
Majed Valad Beigi (Northwestern University), Gokhan Memik (Northwestern University)


CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading
Lifeng Nai (Google), Ramyad Hadidi (Georgia Institute of Technology), He Xiao (Georgia Institute of Technology), Hyojong Kim (Georgia Institute of Technology), Jaewoong Sim (Intel), Hyesoon Kim (Georgia Institute of Technology)


GreenSprint: Effective Computational Sprinting in Green Data Centers
Haoran Cai (HUST), Qiang Cao (HUST), Hong Jiang (University of Texas at Arlington)


Joint Server and Network Energy Saving in Data Centers for Latency-Sensitive Applications
Liang Zhou (University of California Riverside), Chih-Hsun Chou (University of California Riverside), Laxmi Bhuyan (University of California Riverside), K. K. Ramakrishnan (University of California Riverside), Daniel Wong (University of California Riverside)

Afternoon Break 3:30 PM - 4:00 PM

Plenary Program

4:00 PM – 6:00 PM

Details To Be Announced

PhD Forum
Special Session

6:00 PM

Posters on Display


6:30 PM – 7:30 PM

Details To Be Announced

Symposium Banquet

7:30 PM

Details To Be Announced


THURSDAY - 24 May 2018


Keynote Session
8:30 AM - 9:30 AM



Bruce Hendrickson
Lawrence Livermore National Laboratory

Morning Break 9:30 AM - 10:00 AM

Parallel Technical Sessions 17, 18, 19, & 20
10:00 AM - 12:00 PM

SESSION 17: Graph Algorithms 2


Implicit Decomposition for Write-Efficient Connectivity Algorithms
Naama Ben-David (Carnegie Mellon University), Guy Blelloch (Carnegie Mellon University), Jeremy Fineman (Georgetown University), Phillip B. Gibbons (Carnegie Mellon University), Yan Gu (Carnegie Mellon University), Charles McGuffey (Carnegie Mellon University), Julian Shun (Massachusetts Institute of Technology)


Distributed Symmetry Breaking in Graphs with Bounded Diversity and a Separation between MM and MIS
Leonid Barenboim (Open University of Israel), Tzalik Maimon (Open University of Israel)


Complete Visitability for Autonomous Robots on Graphs
Aisha Aljohani (Kent State University), Pavan Poudel (Kent State University), Gokarna Sharma (Kent State University)


Local Mixing Time: Distributed Computation and Applications
Anisur Rahaman Molla (NISER, Bhubaneswar), Gopal Pandurangan (University of Houston)


SESSION 18: Performance Modeling and Analysis


Roofline Guided Design and Analysis of a Multi-stencil CFD Solver for Multicore Performance
Bahareh Mostafazadeh Davani (University of California, Irvine), Ferran Marti (University of California, Irvine), Feng Liu (University of California, Irvine), Aparna Chandramowlishwaran (University of California, Irvine)


Taming the ``Monster'': Overcoming Program Optimization Challenges on SW26010 Through Precise Performance Modeling
Shizhen Xu (Tsinghua Unviersity), Yuanchao Xu (Tsinghua University), Wei Xue (Tsinghua University), Xipeng Shen (North Carolina State University), Xiaomeng Huang (Tsinghua University), Guangwen Yang (Tsinghua University)


Performance and Accuracy Trade-offs of HPC Application Modeling and Simulation
Zhou Tong (Florida State Univerisity), Scott Pakin (Los Alamos National Lab), Mike Lang (Los Alamos National Lab), Xin Yuan (Florida State University)


PADDLE: Performance Analysis using a Data-driven Learning Environment
Jayaraman J. Thiagarajan (Lawrence Livermore National Laboratory), Rushil Anirudh (Lawrence Livermore National Laboratory), Bhavya Kailkhura (Lawrence Livermore National Laboratory), Nikhil Jain (Lawrence Livermore National Laboratory), Tanzima Islam (Western Washington University), Abhinav Bhatele (Lawrence Livermore National Laboratory), Jae-Seung Yeom (Lawrence Livermore National Laboratory), Todd Gamblin (Lawrence Livermore National Laboratory)


SESSION 19: Memory and Data Access


Efficient Solving of Scan Primitive on Multi-GPU Systems
Adrian Perez Dieguez (University of Coruña), Margarita Amor (University of Coruña), Doallo Ramón (University of Coruña), Akira Nukada (Tokyo Institute of Technology), Satoshi Matsuoka (Tokyo Institute of Technology)


Quantifying the Performance and Energy-Efficiency Impact of Hardware Transactional Memory on Scientific Applications on Large-Scale NUMA Systems
Jinsu Park (UNIST), Woongki Baek (UNIST)


GPU-Accelerated Large-Scale Genome Assembly
Sayan Goswami (Louisiana State University), Kisung Lee (Louisiana State University), Shayan Shams (Louisiana State University), Seung-Jong Park (Louisiana State University)


GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann Method
Gregory Herschlag (Duke University), Seyong Lee (Oak Ridge National Laboratory), Jeffery Vetter (Oak Ridge National Laboratory), Amanda Randles (Duke University)


SESSION 20: Exception Handling & Error Detection


SlimFast: Reducing Metadata Redundancy in Sound and Complete Dynamic Data Race Detection
Yuanfeng Peng (University of Pennsylvania), Christian Delozier (University of Pennsylvania), Ariel Eizenberg (University of Pennsylvania), William Mansky (Princeton University), Joseph Devietti (University of Pennsylvania)


Sword: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs
Simone Atzeni (University of Utah), Ganesh Gopalakrishnan (University of Utah), Zvonimir Rakamaric (University of Utah), Ignacio Laguna (Lawrence Livermore National Laboratory), Gregory Lee (Lawrence Livermore National Laboratory), Dong Ahn (Lawrence Livermore National Laboratory)


Unobtrusive Asynchronous Exception Handling with Standard Java Try/Catch Blocks
Mostafa Mehrabi (The University of Auckland), Nasser Giacaman (The University of Auckland), Oliver Sinnen (The University of Auckland)


COMPI: Concolic Testing for MPI Applications
Hongbo Li (University of California, Riverside), Sihuan Li (University of California, Riverside), Zachary Benavides (University of California, Riverside), Zizhong Chen (University of California, Riverside), Rajiv Gupta (University of California, Riverside)

Parallel Technical Sessions 21, 22, 23 & 24
1:30 PM - 3:30 PM

SESSION 21: Graph Algorithms 3


Experimental Design of Work Chunking for Graph Algorithms on High Bandwidth Memory Architectures
George M Slota (Rensselaer Polytechnic Institute), Siva Rajamanickam (Sandia National Labs)


Distributed Louvain Algorithm for Graph Community Detection
Sayan Ghosh (Washington State University), Mahantesh Halappanavar (PNNL), Antonino Tumeo (PNNL), Ananth Kalyanaraman (Washington State University), Hao Lu (ORNL), Daniel Chavarria-Miranda (PNNL), Arif Khan (PNNL), Assefaw Gebremedhin (Washington State University)


Application Codesign of Near-Data Processing for Similarity Search
Vincent T. Lee (University of Washington), Amrita Mazumdar (University of Washington), Carlo C. Del Mundo (University of Washington), Armin Alaghi (University of Washington), Luis Ceze (University of Washington), Mark Oskin (University of Washington)


SESSION 22: Linear Solvers


A 3D distributed sparse LU factorization algorithm
Piyush Sao (Georgia Institute of Technology), Sherry Li (Lawrence Berkeley National Laboratory), Richard Vuduc (Georgia Institute of Technology)


A new GPU algorithm to compute a level set-based analysis for the parallel solution of sparse triangular systems.
Ernesto Dufrechou (Facultad de Ingeniería), Pablo Ezzatti (Udelar)


Performance of Hierarchical-matrix BiCGStab Solver on GPU clusters
Ichitaro Yamazaki (University of Tennessee), Ahmad Abdelfattah (University of Tennessee), Akihiro Ida (The University of Tokyo), Satoshi Ohshima (Kyushu University), Stanimire Tomov (University of Tennessee), Rio Yokota (The University of Tokyo), Jack Dongarra (University of Tennessee)


Convergence Models and Surprising Results for the Asynchronous Jacobi Method
Jordi Wolfson-Pou (Georgia Institute of Technology), Edmond Chow (Georgia Institute of Technology)


SESSION 23: Runtime Systems and Libraries


Overhead-Conscious Format Selection for SpMV-Based Applications
Yue Zhao (North Carolina State University), Weijie Zhou (North Carolina State University), Xipeng Shen (North Carolina State University), Graham Yiu (IBM Toronto Software Lab)


Cudele: An API and Framework for Programmable Consistency and Durability in a Global Namespace
Michael Sevilla (University of California, Santa Cruz), Ivo Jimenez (University of California, Santa Cruz), Noah Watkins (University of California, Santa Cruz), Jeff Lefevre (University of California, Santa Cruz), Shel Finkelstein (University of California, Santa Cruz), Peter Alvaro (University of California, Santa Cruz), Patrick Donnelly (Red Hat, Inc.), Carlos Maltzahn (University of California, Santa Cruz)


SELECT: A Distributed Publish/Subscribe Notification System for Online Social Networks
Nuno Apolónia (Universitat Politècnica de Catalunya), Stefanos Antaris (University of Cyprus, Cyprus), Sarunas Girdzijauskas (KTH Royal Institute of Technology), George Pallis (University of Cyprus, Cyprus), Mario Dikaiakos (University of Cyprus, Cyprus)


A Lightweight Communication Runtime for Distributed Graph Analytics
Hoang-Vu Dang (University of Illinois at Urbana-Champaign), Roshan Dathathri (The University of Texas at Austin), Gurbinder Gill (The University of Texas at Austin), Alex Brooks (University of Illinois at Urbana-Champaign), Nikoli Dryden (University of Illinois at Urbana-Champaign), Andrew Lenharth (Microsoft), Loc Hoang (The University of Texas at Austin), Keshav Pingali (The University of Texas at Austin), Marc Snir (University of Illinois at Urbana-Champaign)


SESSION 24: Networks and Communication


Intra-Cluster Coalescing and CTA Scheduling to Reduce GPU NoC Pressure
Lu Wang (Ghent University), Xia Zhao (Ghent University), David Kaeli (Northeastern University), Lieven Eeckhout (Ghent University)


HybridPass: Hybrid Scheduling for Mixed Flows in Datacenter Networks
Bo Peng (Shanghai Jiao Tong University), Jianguo Yao (Shanghai Jiao Tong University), Zhengwei Qi (Shanghai Jiao Tong University), Haibing Guan (Shanghai Jiao Tong University)


Scalable Power-Efficient Kilo-Core Photonic-Wireless NoC Architectures
Avinash Kodi (Ohio University), Kyle Shiflett (Ohio University), Savas Kaya (Ohio University), Ahmed Louri (George Washington University), Soumyasanta Laha (Ohio University)


Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores
Jahanzeb Maqbool Hashmi (The Ohio State University), Sourav Chakraborty (The Ohio State University), Mohammadreza Bayatpour (The Ohio State University), Hari Subramoni (The Ohio State University), Dhabaleswar Panda (The Ohio State University)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical Sessions 25, 26, 27 & 28
4:00 PM - 6:00 PM

SESSION 25: Distributed Computing


Tiny Groups Tackle Byzantine Adversaries
Mercy Jaiyeola (Mississippi State University), Kyle Patron (Palantir Technologies), Jared Saia (University of New Mexico), Qian Zhou (Mississippi State University), Maxwell Young (Mississippi State University)


Skueue: A Scalable and Sequentially Consistent Distributed Queue
Michael Feldmann (Paderborn University), Christian Scheideler (Paderborn University), Alexander Setzer (Paderborn University)


Self-Stabilizing Supervised Publish-Subscribe Systems
Michael Feldmann (Paderborn University), Christina Kolb (Paderborn University), Christian Scheideler (Paderborn University), Thim Strothmann (Paderborn University)


Spartan: A Framework For Sparse Robust Addressable Networks
John Augustine (Indian Institute of Technology Madras), Sumathi Sivasubramanian (Indian Institute of Technology Madras)


SESSION 26: Graph Algorithms 4


Beyond binary search: parallel in-place construction of implicit search trees
Kyle Berney (University of Hawaii at Manoa), Henri Casanova (University of Hawaii at Manoa), Alyssa Higuchi (University of Hawaii at Manoa), Ben Karsin (University of Hawaii at Manoa), Nodari Sitchinava (University of Hawaii at Manoa)


A Power-Tunable Single-Source Shortest Path Algorithm
Sara Karamati (Georgia Institute of Technology), Jeffrey Young (Georgia Institute of Technology), Richard Vuduc (Georgia Institute of Technology)


Scalable Breadth-First Search on a GPU Cluster
Yuechao Pan (University of California, Davis), Roger Pearce (Lawrence Livermore National Laboratory), John Owens (University of California, Davis)


SESSION 27: Communication Performance


Chameleon: Online Clustering of MPI Program Traces
Amir Bahmani (North Carolina State University), Frank Mueller (North Carolina State University)


Trade-off Study of Localizing Communication and Balancing Network Traffic on Dragonfly System
Xin Wang (Illinois Institute of Technology), Misbah Mubarak (Argonne National Labs), Robert Ross (Argonne National Laboratory), Zhiling Lan (Illinois Institute of Technology)


Level-Spread: A New Job Allocation Policy for Dragonfly Networks
Yijia Zhang (Boston University), Ozan Tuncer (Boston University), Fulya Kaplan (Boston University), Katzalin Olcoz (Universidad Complutense de Madrid), Vitus J. Leung (Sandia National Laboratories), Ayse K. Coskun (Boston University)


SESSION 28: Storage & FileSystem


A Migratory Heterogeneity-Aware Data Layout Scheme for Parallel File Systems
Shuibing He (Wuhan University), Xian-He Sun (Illinois Institute of Technology), Yang Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Chenzhong Xu (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)


LALCA: Locality-Aware Lock Contention Avoidance for NVMe-based Scale-out Storage System
Myoungwon Oh (SK Telecom), Sejin Park (SK Telecom), Jugwan Eom (SK Telecom), Seungmin Kim (SK Telecom), Sangjae Kim (SK Telecom), Kang-Won Lee (SK Telecom), Heon Y Yeom (Seoul National University)


Mitigating Traffic-based Side Channel Attacks in Bandwidth-efficient Cloud Storage
Pengfei Zuo (Huazhong University of Science and Technology), Yu Hua (Huazhong University of Science and Technology), Cong Wang (City University of Hong Kong), Wen Xia (Huazhong University of Science and Technology), Shunde Cao (Huazhong University of Science and Technology), Yukun Zhou (Huazhong University of Science & Technology), Yuanyuan Sun (Huazhong University of Science and Technology)


Chameleon: An Adaptive Wear Balancer for Flash Clusters
Nannan Zhao (Virginia Tech), Ali Anwar (Virginia Tech), Ali Butt (Virginia Tech), Yue Cheng (Virginia Tech)

FRIDAY - 25 May 2018


* See each individual
workshop program
for schedule details



Chapel Implementers and Users Workshop



Parallel and Distributed Scientific and Engineering Computing



Job Scheduling Strategies for Parallel Processing



International Workshop on Automatic Performance Tunings



Parallel and Distributed Processing for Computational Social Systems



Graph Algorithms and Machine Learning



Convergence of Extreme Scale Computing and Big Data Analysis



Parallel Programming Model: Special Edition on Edge/Fog/In-Situ Computing



Parallel Symbolic Computation



Programming Models and Algorithms Workshop



Runtime and Operating Systems for the Many-core Era


Search IPDPS


2018 Registration

March 20th Deadline for Advance Registration

Registration Details

Follow IPDPS


Tweets by @IPDPS

IPDPS 2017 Report

31st IEEE International Parallel &
Distributed Processing Symposium 
May 29 – June 2, 2017
Buena Vista Palace Hotel
Orlando, Florida USA