| 
										 DAYS • Monday • Tuesday • Wednesday • Thursday • Friday 
										This page lists  all the 21 workshops that are part of the IPDPS 2020 program. Click on the  workshop of interest – Monday workshops at top of page and Friday workshops at  bottom – and the link will take you to the home landing page of the workshop.  The workshop web page provides detailed information regarding papers in the  workshop and any other program material and events. Check individual workshop  pages to see what events are planned. 
										The Main Conference  program that follows shows the papers accepted for the conference, organized in  Technical Sessions originally scheduled to be held on Tuesday, Wednesday and  Thursday. Those papers as well as all of the papers in the workshops are all  published in the proceedings and accompanied by presentation slides from the  authors. 
										This publication will be released by May 15th to be available to all  registrants. 
											IPDPS will be  holding virtual events to coincide with the conference dates of 18-22 May.  Participation details available here and in links in the program that follows.  
										
											- Tuesday, May       19: Best paper presentations and Q&A session.  
 
											- Wednesday,       May 20: Best paper announcement and TCPP public meeting.  
 
											- Thursday,       May 21: IPDPS Town Hall meeting.
 
										 
										Events  on these three days will take place from 9:00 AM to 10:00 AM US Central  Daylight Time / 2:00 PM UTC. Check individual workshops for any scheduled  events.  
											
												MONDAY - 18 May 2020
													DAYS • Monday • Tuesday • Wednesday • Thursday • Friday  | 
											 
											
												MONDAY WORKSHOPS 
												 
													  
													Visit individual 
														websites at 
													links shown 
													   | 
												 | 
											 
											
												TUESDAY - 19 May 2020
													DAYS • Monday • Tuesday • Wednesday • Thursday • Friday  | 
											 
											
												Virtual Session 
9:00 to 10:00 US Central Daylight Time / 2:00 UTC  | 
												Best Paper Presentations and Q&A Session 
												  
												See this page for  details and link to join session.  | 
											 
											
												Parallel Technical  
													Sessions 1, 2, 3, & 4  | 
												SESSION 1: Communication & NoCs  
														 
													 
DozzNoC: Reducing Static and Dynamic Energy in NoCs with Low-latency Voltage Regulators using Machine 
													Mark Clark, Yingping Chen, Avinash Karanth, Brian Ma, and Ahmed Louri  
													  
													Neksus: An Interconnect for Heterogeneous System-In-Package Architectures 
													Vidushi Goyal, Xiaowei Wang, Valeria Bertacco, and Reetu Das  
													   
													Accelerated Reply Injection for Removing NoC Bottleneck in GPGPUs 
													Yunfan Li and Lizhong Chen  
													  
													Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures 
													Jahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh, Hari Subramoni, Mohammadreza Bayatpour, and Dhabaleswar K. (DK) Panda 
													  
													   
													SESSION 2: Storage & IO 
													  
ClusterSR: Cluster-Aware Scattered Repair in Erasure-Coded Storage  
													Zhirong Shen, Jiwu Shu, Zhijie Huang, and Yingxun Fu  
													  
													Stitch It Up: Using Progressive Data Storage to Scale Science 
													Jay Lofstead, John Mitchel, and Enze Chen   
													  
													HFetch: Hierarchical Data Prefetching for Scientific Workflows in Multi-Tiered Storage Environments 
													Hariharan Devarajan, Anthony Kougkas, and Xian-He Sun,  
													  
													CanarIO: Sounding the Alarm on IO-Related Performance Degradation 
													Michael Wyatt, Stephen Herbein, Kathleen Shoga, Todd Gamblin, and Michela Taufer  
  
													  
													SESSION 3: Applications													 
													  
													
													A Study of Graph Analytics for Massive Datasets on Large-Scale Distributed GPUs 
													Vishwesh Jatala, Roshan Dathathri, Gurbinder Gill, Loc Hoang, V. Krishna Nandivada, and Keshav Pingali  
													  
													A Highly Efficient Dynamical Core of Atmospheric General Circulation Model based on Leap-Format 
													Hang Cao, Liang Yuan, He Zhang, Baodong Wu, Shigang Li, Pengqi Lu, Yunquan Zhang, Yongjun Xu, and Minghua Zhang  
													  
													Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations 
													Sian Jin, Pascal Grosset, Christopher M. Biwer, Jesus Pulido, Jiannan Tian, Dingwen Tao, and James P. Ahrens  
													  
													Optimizing High Performance Markov Clustering for Pre-Exascale Architectures 
													Oguz Selvitopi, Md Taufique Hussain, Ariful Azad, and Aydin Buluc  
 
														 
													SESSION 4: Distributed Algorithms													 
  
												Tightening Up the Incentive Ratio for Resource Sharing Over the Rings 
												Yukun Cheng, Xiaotie Deng, Yuhao Li  
												  
												Communication-Efficient String Sorting 
												Timo Bingmann, Peter Sanders, and Matthias Schimek  
												  
												SCSL: Optimizing Matching Algorithms to Improve Real-time for Content-based Pub/Sub Systems 
												Tianchen Ding, Shiyou  Qian, Jian Cao, Guangtao Xue, and Minglu Li  
												  
												Distributed Graph Realizations 
												John Augustine, Keerti Choudhary, Avi Cohen, David Peleg, Sumathi Sivasubramaniam, and Suman Sourav   | 
											 
											
												| Parallel Technical Sessions 5, 6, 7, & 8 | 
												SESSION 5: Reliability and QoS
													 
													  
													Transaction-Based Core Reliability 
													Sang Wook Stephen Do and Michel Dubois  
													  
													Understanding the Interplay between Hardware Errors and User Job Characteristics on the Titan Supercomputer 
													Seung-Hwan Lim, Ross Miller, and Sudharshan Vazhkudai,   
													  
													EC-Fusion: An Efficient Hybrid Erasure Coding Framework to Improve Both Application and Recovery Performance in Cloud Storage Systems 
													Han Qiu, Chentao Wu, Jie Li, Minyi Guo,  Tong Liu, Xubin He, Yuanyuan Dong, and Yafei Zhao  
													  
													  
													SESSION 6: Learning Algorithms													 
													  
													Learning an Effective Charging Scheme for Mobile Devices 
													Tang Liu, Baijun Wu, Wenzheng Xu, ,Xiaobo Cao, Jian Peng, and Hongyi Wu  
													  
													Optimize Scheduling of Federated Learning on Battery-powered Mobile Devices 
													Cong Wang, Xin Wei, and Pengzhan Zhou  
													  
													Harnessing Deep Learning via a Single Building Block 
													Kunal Banerjee, Michael J. Anderson, Sasikanth Avancha, Anand Venkat, Gregory M. Henry, Evangelos Georganas, Hans Pabst, Alexander Heinecke, and Dhiraj D. Kalamkar  
													  
													Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning 
													Yufeng Zhan, Peng Li, and Song Guo  
  
													  
													SESSION 7: Data Analysis and Management													 
													  
													An Active Learning Method for Empirical Modeling in Performance Tuning 
													Jiepeng Zhang, Jingwei Sun, Wenju Zhou, and Guangzhong Sun  
													  
													DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection 
													Bin Dong, Veronica Rodriguez, Xin Xing, Suren Byna, Jonathan Ajo-Franklin, and Kesheng Wu  
													  
													Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data 
													Mahesh Balasubramanian, Trevor Ruiz, Brandon Cook, Mr Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, and Kristofer Bouchard  
													  
													GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting 
													Xiaodong Yu, Fengguo Wei, Xinming Ou, Michela Becchi, Tekin Bicer, and Danfeng(Daphne) Yao   
  
													  
													SESSION 8: Edge Computing													 
													  
													Robust Server Placement for Edge Computing 
													Dongyu Lu, Yuben Qu, Fan Wu, Haipeng Dai, Chao Dong, and Guihai Chen  
													  
													EdgeIso: Effective Performance Isolation for Edge Devices 
													Yoonsung Nam, Yongjun Choi, Byeonghun Yoo, Yongseok Son, and Hyeonsang Eom 
  
													Busy-Time Scheduling on Heterogeneous Machines 
													Runtian Ren and Xueyan Tang  
													  
													Scheduling Malleable Jobs Under Topological Constraints  
Evripidis Bampis, Konstantinos Dogeas,  Alexander Kononov, Giorgio Lucarelli, and Fanny Pascual   | 
											 
											
												PLENARY SESSION: 
													Best Papers | 
												Best Papers
													 
													  
													XSP: Across-Stack Profiling and  Analysis of Machine Learning Models on GPUs  
													Cheng Li, Abdul Dakkak, Jinjun  Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu 
													  
													Abstract—There has  been a rapid proliferation of machine learning/deep learning (ML) models and  wide... Read more 
													  
													 
													  
													Exploring the Binary Precision  Capabilities of Tensor Cores for Epistasis Detection  
													Ricardo Nobre, Aleksandar Ilic,  Sergio Santander-Jiménez, and Leonel Sousa  
  
													Abstract—Genome-wide  association studies are performed to correlate a number of diseases and other... Read more 
													  
													 
													  
													Understanding and Improving  Persistent Transactions on Optane DC Memory  
													Pantea Zardoshti, Michael Spear,  Aida Vosoughi, and Garret Swart 
  
													Abstract—Storing data structures in  high-capacity byte-addressable persistent memory instead...   Read more 
													  
													 
													  
CycLedger: A Scalable and Secure  Parallel Protocol for Distributed Ledger via Sharding  
													Mengqian Zhang, JiChen Li, Zhaohua  Chen, Hongyin Chen, and Xiaotie Deng 
  
													Abstract—Traditional  public distributed ledgers have not been able to scale-out well and work...  Read more  | 
											 
											
												WEDNESDAY - 20 May 2020
													DAYS • Monday • Tuesday • Wednesday • Thursday • Friday  | 
											 
											
												Virtual Session 
9:00 to 10:00 US Central Daylight Time / 2:00 UTC  | 
												Best Paper Announcement and TCPP Public Meeting 
  
													See this page for  details and link to join session. 
 | 
											 
											
												Parallel Technical  
													Sessions 9, 10, 11, & 12  | 
												SESSION 9: Cloud Technology													 
													  
													Mitigating Large Response Time Fluctuations through Fast Concurrency Adapting in the Cloud 
													Jianshu Liu, Shungeng Zhang, Qingyang Wang, and Jinpeng Wei  
													  
													DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters 
													Yinggen Xu, Liu Liu, and Zhijun Ding  
													  
													Solving the Container Explosion Problem for Distributed High Throughput Computing 
													Tim Shaffer, Nicholas Hazekamp, Jakob Blomer, and Douglas Thain,   
													  
													Amoeba: QoS-Awareness and Reduced Resource Usage of Microservices with Serverless Computing 
													Zijun Li, Quan Chen, Shuai Xue, Tao Ma, Yong Yang, Zhuo Song, and Minyi Guo  
													  
  
													SESSION 10: Machine Learning													 
													  
													Efficient I/O for Neural Network Training with Compressed Data 
													Zhao Zhang, Lei Huang, J. Gregory Pauloski, and Ian T. Foster  
													  
													Not All Explorations Are Equal: Harnessing Heterogeneous Profiling Cost for Efficient MLaaS Training 
													Jun Yi, Chengliang Zhang, Wei Wang, Cheng Li, and Feng Yan  
													  
													ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning 
													Saeed Soori, Bugra Can, Mert Gurbuzbalaban, and Maryam Dehnavi  
													  
													Benanza: Automatic uBenchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs 
													Cheng Li, Abdul Dakkak, Jinjun Xiong, and Wen-mei Hwu  
  
													  
													SESSION 11: GPUs 
													  
Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription 
													Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem  
													  
													LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment  
													Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydin Buluç, Leonid Oliker, and Katherine Yelick  
  
													Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs 
													Qi Yu, Bruce R. Childers, Libo Huang, Cheng Qian, Hui Guo, and Zhiying Wang  
													  
													A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs 
													Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, and Satoshi Matsuoka  
  
													  
													SESSION 12:Applications													 
													  
												DPF-ECC: Accelerating Elliptic Curve Cryptography with Floating-point Computing Power of GPUs 
												Lili Gao, Fangyu Zheng, Niall Emmart, Jiankuo Dong, Jingqiang Lin, and Charles Weems  
												  
												Scalability Challenges of an Industrial Implicit Finite Element Code 
												Francois-Henry Rouet, Cleve Ashcraft, Jef Dawson, Roger Grimes, Erman Guleryuz, Seid Koric, Robert F. Lucas, James S. Ong, Todd Simons, and Ting-Ting Zhu  
												  
												ETH: An Architecture for Exploring the Design Space of In-Situ Scientific Visualization 
												Greg Abram, Vignesh Adhinarayanan, Wu-chun Feng, David H. Rogers, and James P. Ahrens 
												  
												Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling 
												Alexander van der Grinten and Henning Meyerhenke  | 
											 
											
												| Parallel Technical Sessions 13, 14, 15, & 16 | 
												SESSION 13: Data Management
													 
													  
													Improved Intermediate Data Management for MapReduce Frameworks 
													Haoyu Wang, Haiying Shen, Charles Reiss, Arnim Jain, and Yunqiao Zhang  
													  
													Bandwidth-Aware Page Placement in NUMA  
													David Gureya, João Neto, Reza Karimi, João Barreto, Pramod Bhatotia, Vivien Quema, Rodrigo Rodrigues, Paolo Romano, and Vladimir Vlassov 
  
													HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments 
													Hariharan Devarajan, Anthony Kougkas, Luke Logan, and Xian-He Sun,   
													  
													FRaZ: A Generic High-Fidelity Fixed-Ratio Lossy Compression Framework for Scientific Floating-point Data 
													Robert R. Underwood, Sheng Di, Jon Calhoun, and Franck Cappello  
  
													  
													SESSION 14: Storage & Caching													 
													  
													DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors  
													Nadja Holtryd, Madhavan Manivannan, Per Stenström, and Miquel Pericas 
  
													Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints  
													Mehrzad Nejat, Madhavan Manivannan, Miquel Pericas, and Per Stenström 
  
													StragglerHelper: Alleviating Straggling in Computing Clusters via Sharing Memory Access Patterns 
													Wenjie Liu, Ping Huang, and Xubin He  
  
													  
													SESSION 15: Numerics													 
													  
													Evaluating the Numerical Stability of Posit Floating Point Arithmetic 
													Nicholas Buoncristiani, Sanjana Shah, David Donofrio, and John Shalf  
													  
													Varity: Quantifying Floating-Point Variations in HPC Systems Through Randomized Testing 
													Ignacio Laguna  
													  
													Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply 
													Da Yan, Wei Wang, and Xiaowen Chu  
  
													  
													SESSION 16: IoT and Consensus													 
													  
													Data Collection of IoT Devices Using an Energy-Constrained UAV 
													Yuchen Li, Weifa Liang, Wenzheng Xu, and Xiaohua Jia  
													  
													Argus: Multi-Level Service Visibility Scoping for Internet-of-Things in Enterprise Environments 
													Qian Zhou, Omkant Pandey, and Fan Ye 
													  
													G-PBFT: A Location-based and Scalable Consensus Protocol for IoT-Blockchain Applications 
													LapHou Lao, Xiaohai Dai, Bin Xiao, and Songtao Guo  
													  
													Byzantine Generalized Lattice Agreement 
													Giuseppe Antonio Di Luna, Emmanuelle Anceaume, and Leonardo Querzoni   | 
											 
											
												THURSDAY - 21 May 2020
													DAYS • Monday • Tuesday • Wednesday • Thursday • Friday  | 
											 
											
												Virtual Session 
9:00 to 10:00 US Central Daylight Time / 2:00 UTC  | 
												IPDPS Town Hall Meeting  
													  
													See this page for  details and link to join session.  | 
											 
											
												Parallel Technical Sessions 17, 18, 19, & 20  | 
												SESSION 17: Graph Processing & Coding													 
													  
													A Heterogeneous PIM Hardware-Software Co-Design for Energy-Efficient Graph Processing 
													Yu Huang, Long Zheng, Pengcheng Yao, Jieshan Zhao, Xiaofei Liao, Hai Jin, and Jingling Xue  
													  
													Spara: An Energy-Efficient ReRAM-based Accelerator for Sparse Graph Analytics Applications 
													Long Zheng, Jieshan Zhao, Yu Huang, Qinggang Wang, Zhen Zeng, Jingling Xue, Xiaofei Liao, and Hai Jin  
													  
													Optimal Encoding and Decoding Algorithms for the RAID-6 Liberation Codes 
													Zhijie Huang, Hong Jiang, Zhirong Shen, Hao Che, Nong Xiao, and Ning Li  
													  
													Sturgeon: Preference-aware Co-location for Improving Utilization of Power Constrained Computers 
													Pu Pang, Quan Chen, Deze Zeng, Chao Li, Jingwen Leng, Wenli Zheng, and Minyi Guo  
 
  
													SESSION 18: Parallel Algorithms													 
													  
													A High-Throughput Solver for Marginalized Graph Kernels on GPU 
													Yu-Hang Tang, Oguz Selvitopi, Doru Thom Popovici, and Aydin Buluc  
													  
													Dynamic Graphs on the GPU 
													Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, and John D. Owens  
													  
													Accelerating Parallel Hierarchical Matrix-Vector Products via Data Driven Sampling 
													Lucas Erlandson, Difeng Cai, Yuanzhe Xi, and Edmond Chow  
													  
													NC Algorithms for Popular Matchings in One-Sided Preference Systems and Related Problems 
													Changyong Hu and Vijay Garg													 
													 
												 
													  
													SESSION 19: Performance, Power, and Energy													 
													  
													Smartly Handling Renewable Energy Instability in Supporting A Cloud Datacenter 
													Jiechao Gao, Haoyu Wang, and Haiying Shen  
													  
													A Self-Optimized Generic Workload Prediction Framework for Cloud Computing 
													Vinodh Kumaran Jayakumar, Jaewoo Lee, In Kee Kim, and Wei Wang  
													  
													SeeSAw: Optimizing Performance of In-Situ Analytics Applications under Power Constraints 
													Ivana Marincic, Venkatram Vishwanath, and Henry Hoffmann  
  
													  
													SESSION 20: Resource Management													 
													  
													What does Power Consumption Behavior of HPC Jobs Reveal?  
													Tirthak Patel, Adam Wagenhäuser, Christopher Eibel, Timo Hönig, Thomas Zeiser, and Devesh Tiwari 
  
													Efficient Parallel Adaptive Partitioning for Load-balancing in Spatial Join 
													Jie Yang and Satish Puri  
													  
													Union: An Automatic Workload Manager for Accelerating Network Simulation 
													Xin Wang, Misbah Mubarak, Yao Kang, Robert B. Ross, and Zhiling Lan  
													  
													Auto-Tuning Parameter Choices using Bayesian Optimization 
													Harshitha Menon, Abhinav Bhatele, and Todd Gamblin   | 
											 
											
												Parallel Technical Sessions 21, 22, 23, 24  | 
												SESSION 21: Runtime Systems
													 
													  
													Inter-Job Scheduling of High-Throughput Material Screening Applications 
													Zhihui Du, Xining Hui, Yurui Wang, Jun Jiang, Jason Liu, Baokun Lu, Chongyu Wang  
													  
													Reservation and Checkpointing Strategies for Stochastic Jobs 
													Ana Gainaru, Brice Goglin, Valentin Honore, Guillaume Pallez, Padma Raghavan, Yves Robert, and Hongyang Sun  
													  
													A Scheduling Approach to Incremental Maintenance of Datalog Programs 
													Shikha Singh, Sergey Madaminov, Michael Bender, Michael Ferdman, Ryan Johnson, Benjamin Moseley, Hung Ngo, Dung Nguyen, Soeren Olesen, Kurt Stirewalt, and Geoffrey Washburn  
													  
													Dynamic Scheduling in Distributed Transactional Memory 
													Costas Busch, Maurice Herlihy, Miroslav Popovic, and Gokarna Sharma  
  
													  
													SESSION 22: Performance Analysis													 
													  
													Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling  
													Marcus Ritter, Alexandru Calotoiu, Sebastian Rinke, Thorsten Reimann, Torsten Hoefler, and Felix Wolf 
  
													The Case of Performance Variability on Dragonfly-based Systems 
													Abhinav Bhatele, Jayaraman J. Thiagarajan, Taylor Groves, Rushil Anirudh, Staci A. Smith, Brandon Cook, and David Lowenthal  
													  
													Predicting and Comparing the Performance of Array Management Libraries 
													Donghe Kang, Oliver Ruebel, Suren Byna, and Spyros Blanas  
													  
													Demystifying the Performance of HPC Scientific Applications on NVM-based Memory 
													Ivy B. Peng, Kai Wu, Jie Ren, Dong Li, and Maya Gokhale  
  
													SESSION 23: Communication													 
													  
													Packet-in Request Redirection for Minimizing Control Plane Response Time 
													Rui Xia, Haipeng Dai, Jiaqi Zheng, Hong Xu, Meng Li, and Guihai Chen  
													  
													PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network 
													Chao Tian, Lingxiao Ma, Zhi Yang, and Yafei Dai  
													  
													ConMidbox: Consolidated Middleboxes Selection and Routing in SDN/NFV-Enabled Networks 
													Guiyan Liu, Songtao Guo, Pan Li, and Liang Liu  
													  
													Scalable and Memory-Ef?cient Kernel Ridge Regression  
													Gustavo Chávez, Yang Liu, Pieter Ghysels, Xiaoye Sherry Li, and Elizaveta Rebrova 
  
													  
													SESSION 24: Storage													 
													  
													SSDKeeper: Self-Adapting Channel Allocation to Improve the Performance of SSD Devices 
													Renping Liu, Xianzhang Chen, Yujuan Tan, Runyu Zhang, Liang Liang, and Duo Liu  
													  
													FlashKey:A High-Performance Flash Friendly Key-Value Store 
													Madhurima Ray, Krishna Kant, Peng Li, and Sanjeev Trika  
													  
													Pacon: Improving Scalability and Ef?ciency of Metadata Service through Partial Consistency 
													Yubo Liu1, Yutong Lu, Zhiguang Chen, and Ming Zhao  
													 
												
  | 
											 
											
												Parallel Technical Sessions 25, 26, 27 & 28  | 
												SESSION 25: Program Analysis and Runtime Library
													 
													  
													XPlacer: Automatic Analysis of Data Access Patterns on Heterogeneous CPU/GPU Systems  
Peter Pirkelbauer, Pei-Hung Lin, Tristan Vanderbruggen, and Chunhua Liao  
													  
													Improving Transactional Code Generation via Variable Annotation and Barrier Elision  
													João P.L. de Carvalho, Bruno C. Honorio, Alexandro Baldassin, and Guido Araujo 
													  
													Evaluating Thread Coarsening and Low-cost Synchronization on Intel Xeon Phi  
													Hancheng Wu and Michela Becchi  
													  
													AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation  
														André Müller, Bertil Schmidt, Andreas Hildebrandt, Richard Membarth, Roland Leißa, Matthis Kruse, and Sebastian Hack 
  
													  
													SESSION 26: Scheduling													 
													  
													Analysis of a List Scheduling Algorithm for Task Graphs on Two Types of Resources 
													Lionel Eyraud-Dubois and Suraj Kumar  
													  
													Optimal Convex Hull Formation on a Grid by Asynchronous Robots with Lights 
													Rory Hector, Ramachandran Vaidyanathan, Gokarna Sharma, and Jerry L. Trahan  
													  
													On the Complexity of Conditional DAG Scheduling in Multiprocessor Systems  
													Alberto Marchetti-Spaccamela, Nicole Megow, Jens Schlöter, Martin Skutella, and Leen Stougie 
  
													Weaver: Ef?cient Co?ow Scheduling in Heterogeneous Parallel Networks 
													Xin Sunny Huang, Yiting Xia,  and T. S. Eugene Ng  
  
													  
													SESSION 27: Fault Tolerance													 
													  
													Fault-Tolerant Containers Using NiLiCon  
													Diyu Zhou and Yuval Tamir 
													  
													Aarohi: Making Real-Time Node Failure Prediction Feasible 
													Anwesha Das, Frank Mueller, and Barry Rountree  
													  
													FP4S: Fragment-based Parallel State Recovery for Stateful Stream Applications 
													Pinchao Liu, Hailu Xu, Dilma Da Silva, Qingyang Wang, Sarker Tanzir Ahmed, and Liting Hu  
  
													  
													SESSION 28: Multidisciplinary													 
													  
													Implementation and Evaluation of a Hardware Decentralized Synchronization Lock for MPSoCs  
													Maxime France-Pillois, Jérôme Martin, and Frederic Rousseau 
  
													Communication-Ef?cient Jaccard Similarity for High-Performance Distributed Genome Comparisons  
													Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Ratsch, Torsten Hoefler, and Edgar Solomonik 
  
													Engineering Worst-Case Inputs for Pairwise Merge Sort on GPUs 
													Kyle Berney and Nodari Sitchinava  
													  
													The Impossibility of Fast Transactions 
													Karolos Antoniadis, Diego Didona, Rachid Guerraoui and Willy Zwaenepoel  | 
											 
											
												FRIDAY - 22 May 2020
													DAYS • Monday • Tuesday • Wednesday • Thursday • Friday  | 
											 
											
												FRIDAY WORKSHOPS 
												  
												Visit individual 
													websites at 
												links shown   | 
												 | 
											 
										 
										
										 
										  
										IPDPS  2020 BEST PAPERS
										 
										  
											XSP: Across-Stack Profiling and  Analysis of Machine Learning Models on GPU 
											Cheng Li, Abdul Dakkak, Jinjun  Xiong, Wei Wei, Lingjie Xu, and Wen-mei Hwu 
											Abstract—There has  been a rapid proliferation of machine learning/deep learning (ML) models and  wide adoption of them in many application domains. This has made profiling and  characterization of ML model performance an increasingly pressing task for both  hardware designers and system providers, as they would like to offer the best  possible system to serve ML models with the target latency, throughput, cost,  and energy requirements while maximizing resource utilization. Such an endeavor  is challenging as the characteristics of an ML model depend on the interplay  between the model, framework, system libraries, and the hardware (or the HW/SW  stack). Existing profiling tools are disjoint, however, and only focus on  profiling within a particular level of the stack, which limits the thoroughness  and usefulness of the profiling results. 
											This  paper proposes XSP — an across-stack profiling design that gives a holistic and  hierarchical view of ML model execution. XSP leverages distributed tracing to  aggregate and correlate profile data from different sources. XSP introduces a  leveled and iterative measurement approach that accurately captures the  latencies at all levels of the HW/SW stack in spite of the profiling overhead.  We couple the profiling design with an automated analysis pipeline to  systematically analyze 65 state-of-the-art ML models. We demonstrate that XSP  provides insights which would be difficult to discern otherwise.  
										 
										 
										  
											Exploring the Binary Precision  Capabilities of Tensor Cores for Epistasis Detection  
											Ricardo Nobre, Aleksandar Ilic,  Sergio Santander-Jiménez, and Leonel Sousa 
											Abstract—Genome-wide  association studies are performed to correlate a number of diseases and other  physical or even psychological conditions (phenotype) with substitutions of  nucleotides at specific positions in the human genome, mainly single-nucleotide  polymorphisms (SNPs). Some conditions, possibly because of the complexity of  the mechanisms that give rise to them, have been identified to be more  statistically correlated with genotype when multiple SNPs are jointly taken  into account. However, the discovery of new associations between genotype and  phenotype is exponentially slowed down by the increase of computational power  required when epistasis, i.e., interactions between SNPs, is considered. This  paper proposes a novel graphics processing unit (GPU)-based approach for  epistasis detection that combines the use of modern tensor cores with native  support for processing binarized inputs with algorithmic and target-focused  optimizations. Using only a single mid-range Turing-based GPU, the proposed  approach is able to evaluate 64.8 × 1012 and 25.4 × 1012  sets of SNPs per second, normalized to the number of patients, when considering  2-way and 3-way epistasis detection, respectively. This proposal is able to  surpass the state-of-the-art approach by 6× and 8.2× in  terms of the number of pairs and triplets of SNP allelic patient data evaluated  per unit of time per GPU.										 
										 
										 
										  
											Understanding and Improving  Persistent Transactions on Optane DC Memory  
											Pantea Zardoshti, Michael Spear,  Aida Vosoughi, and Garret Swart 
											Abstract—Storing data structures in  high-capacity byte-addressable persistent memory instead of DRAM or a storage  device offers the opportunity to (1) reduce cost and power consumption compared  with DRAM, (2) decrease the latency and CPU resources needed for an I/O  operation compared with storage, and (3) allow for fast recovery as the data  structure remains in memory after a machine failure. The first commercial  offering in this space is Intel® OptaneTM Direct Connect (OptaneTM DC)  Persistent Memory. OptaneTM DC promises access time within a constant factor of  DRAM, with larger capacity, lower energy consumption, and persistence. We  present an experimental evaluation of persistent transactional memory  performance, and explore how OptaneTM DC durability domains affect the overall  results. Given that neither of the two available durability domains can deliver  performance competitive with DRAM, we introduce and emulate a new durability  domain, called PDRAM, in which the memory controller tracks enough information  (and has enough reserve power) to make DRAM behave like a persistent cache of  OptaneTM DC memory.  
											In this paper we compare the  performance of these durability domains on several configurations of five  persistent transactional memory applications. We find a large throughput  difference, which emphasizes the importance of choosing the best durability  domain for each application and system. At the same time, our results confirm  that recently published persistent transactional memory algorithms are able to  scale, and that recent optimizations for these algorithms lead to strong  performance, with speedups as high as 6× at 16 threads. 
										 
										 
										  
											CycLedger: A Scalable and Secure  Parallel Protocol for Distributed Ledger via Sharding  
											Mengqian Zhang, JiChen Li, Zhaohua  Chen, Hongyin Chen, and Xiaotie Deng										 
											Abstract—Traditional  public distributed ledgers have not been able to scale-out well and work  efficiently. Sharding is deemed as a promising way to solve this problem. By  partitioning all nodes into small committees and letting them work in parallel,  we can significantly lower the amount of communication and computation, reduce  the overhead on each node’s storage, as well as enhance the throughput of the  distributed ledger. Existing sharding-based protocols still suffer from several  serious drawbacks. The first thing is that all non-faulty nodes must connect  well with each other, which demands a huge number of communication channels in  the network. Moreover, previous protocols have faced great loss in efficiency  in the case where the honesty of each committee’s leader is in question. At the  same time, no explicit incentive is provided for nodes to actively participate in  the protocol. 
											We present CycLedger, a scalable and secure parallel  protocol for distributed ledger via sharding. Our protocol selects a leader and  a partial set for each committee, who are in charge of maintaining intra-shard  consensus and communicating with other committees, to reduce the amortized  complexity of communication, computation, and storage on all nodes. We  introduce a novel semi-commitment scheme between committees and a recovery  procedure to prevent the system from crashing even when leaders of committees  are malicious. To add incentive for the network, we use the concept of  reputation, which measures each node’s trusty computing power. As nodes with a  higher reputation receive more rewards, there is an encouragement for nodes  with strong computing ability to work honestly to gain reputation. In this way,  we strike out a new path to establish scalability, security, and incentive for  the sharding-based distributed ledger. 
									 
										 
									 |