IPDPS 2012 Details

• IPDPS Home
• Advance Program
• Workshops
• PhD Forum
• Commercial Participation
- Based Outside China
- China Based Companies
• Registration
• Location & Travel Tips
• Hotel Information
• Student Travel - Closed
• Organization
• Author Resources - Closed
• Call for Papers - Closed

General IPDPS Info

• About IPDPS
• Conference Archive
• Proceedings Library
• Steering Committee
• Contact IPDPS

HOSTED BY

SHANGHAI JIAO TONG UNIVERSITY

(With Support From)

IPDPS 2012 Advance Program

Please visit the IPDPS website regularly for updates, since there may be schedule revisions. Authors who have corrections should send email to contact@ipdps.org giving full details. Note that paper numbers are listed for easy reference.

Abstracts of Contributed Papers

Abstracts for regular conference papers have been compiled to allow authors to check accuracy and so that visitors to this Website may preview the papers to be presented at the conference. Full proceedings of the conference will be published on a cdrom pocketed in a program book to be distributed to registrants at the conference.

View contributed paper abstracts in advance (pdf)

MONDAY - 21 May 2012

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

WORKSHOPS
all day*

* See each individual workshop programs for schedule details

WORKSHOP
HCW	Heterogeneity in Computing Workshop
RAW	Reconfigurable Architectures Workshop
HIPS	Workshop on High-Level Parallel Programming Models & Supportive Environments
NIDISC	Workshop on Nature Inspired Distributed Computing
HiCOMB	Workshop on High Performance Computational Biology
APDCM	Advances in Parallel and Distributed Computing Models
CASS	Communication Architecture for Scalable Systems
HPPAC	High-Performance, Power-Aware Computing
HPGC	High-Performance Grid and Cloud Computing Workshop
SMTPS	Workshop on System Management Techniques, Processes, and Services
STDN	International workshop on Security and Trust of Distributed Networking systems
EduPar	NSF/TCPP Workshop on Parallel and Distributed Computing Education

TUESDAY - 22 May 2012

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Opening Session
8:00 AM - 8:30 AM

Opening Session

Keynote Session
8:30 AM - 9:30 AM

Keynote Speech: Large-Scale Visual Data Analysis

Speaker: Chris Johnson
Director, Scientific Computing and Imaging Institute
University of Utah

Abstract: Modern high performance computers have speeds measured in petaflops and handle data set sizes measured in terabytes and petabytes. Although these machines offer enormous potential for solving very large-scale realistic computational problems, their effectiveness will hinge upon the ability of human experts to interact with their simulation results and extract useful information. One of the greatest scientific challenges of the 21st century is to effectively understand and make use of the vast amount of information being produced. Visual data analysis will be among our most important tools in helping to understand such large-scale information. Our research at the Scientific Computing and Imaging (SCI) Institute at the University of Utah has focused on innovative, scalable techniques for large-scale 3D visual data analysis. In this talk, I will present state-of-the-art visualization techniques, including scalable visualization algorithms and software, cluster-based visualization methods and innovative visualization techniques applied to problems in computational science, engineering, and medicine. I will conclude with an outline for future high performance visualization research challenges and opportunities.

Read more information

Morning Break 9:30-10:30

Parallel Technical
Sessions 1, 2, 3, & 4
10:30 AM - 12:30 PM

Session 1
Parallel Linear Algebra Algorithms I
Chair: Zhaojun Bai

A Predictive Model for Solving Small Linear Algebra Problems in GPU Registers
Michael Anderson (University of California, Berkeley, USA); David Sheffield (University of California, Berkeley, USA); Kurt Keutzer (University of California, Berkeley, USA)

A parallel tiled solver for dense symmetric indefinite systems on multicore architectures
Marc Baboulin (INRIA, France); Dulceneia Becker (University of Tennessee, Knoxville, USA); Jack Dongarra (University of Tennessee, Knoxville, USA)

A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction
Azzam Haidar (University of Tennessee, USA); Hatem Ltaief (KAUST Supercomputing Laboratory, Saudi Arabia); Piotr Luszczek (University of Tennessee, USA); Jack Dongarra (University of Tennessee, Knoxville, USA)

Improving the performance of dynamical simulations via multiple right-hand sides
Xing Liu (Georgia Institute of Technology, USA); Edmond Chow (Georgia Institute of Technology, USA); Karthikeyan Vaidyanathan (Intel Corporation, USA); Mikhail Smelyanskiy (Intel, USA)

Session 2
Bioinformatics and Performance Modeling
Chair: Mark Clement

High-Performance Interaction-Based Simulation of Gut Immunopathologies with ENISI
Keith R Bisset (Virginia Tech, USA); Josep Bassaganya-Riera (Virginia Tech, USA); Adria Carbo (Virginia Tech, USA); Stephen Eubank (Virginia Tech, USA); Raquel Hontecillas (Virgina Tech, USA); Stefan Hoops (Virgina Tech, USA); Madhav Marathe (Virginia Tech, USA); Yongguo Mei (Virginia Tech, USA); Katherine Wendelsdorf (Virginia Tech, USA); Dawen Xie (Virginia Tech, USA); Jae-seung Yeom (Virginia Tech, USA)

A Parallel Algorithm for Spectrum-based Short Read Error Correction
Ankit Shah (Indian Institute of Technology, Bombay, India); Sriram Chockalingam (Indian Institute of Technology, Bombay, India); Srinivas Aluru (Iowa State University, USA)

Enhancing the scalability of consistency-based progressive multiple sequences alignment applications
Miquel Orobitg (Universitat de Lleida, Spain); Fernando Cores (University of Lleida, Spain); Fernando Guirado (Universitat de Lleida, Spain); Carsten Kemena (Bioinformatics and Genomics Programme Centre de Regulació Genòmica, Spain); Cedric Notredame (Centre de Regulació Genòmica, Spain); Ana Ripoll (Universitat Autònoma de Barcelona, Spain)

An Accurate GPU Performance Model for Effective Control Flow Divergence Optimization
Zheng Cui (Advanced Digital Sciences Center, Singapore); Yun Liang (Advanced Digital Sciences Center, Singapore); Kyle Rupnow (Advanced Digital Sciences Center, Singapore); Deming Chen (University of Illinois, USA)

Session 3
Dynamic Pipeline and Transactional Memory Optimizations
Chair: Dhabaleswar Panda

SEL-TM: Selective Eager-Lazy Management for Improved Concurrency in Transactional Memory
Lihang Zhao (University of Southern California/ Information Sciences Institute, USA); Woojin Choi (University of Southern California/Information Sciences Institute, USA); Jeffrey Draper (University of Southern California/ Information Sciences Institute, USA)

Robust SIMD: Dynamically Adapted SIMD Width and Multi-Threading Depth
Jiayuan Meng (Argonne National Laboratory, USA); Jeremy Sheaffer (University of Virginia, USA); Kevin Skadron (University of Virginia, USA)

Dynamic Operands Insertion for VLIW Architecture with a Reduced Bit-width Instruction Set
Jongwon Lee (Seoul National University, Korea); Jonghee Youn (Gangneung-wonju National University, Korea); Yunheung Paek (Seoul National University, Korea); Jihoon Lee (Seoul National University, Korea)

SUV: A Novel Single-Update Version-Management Scheme for Hardware Transactional Memory Systems
Zhichao Yan (Huazhong University of Science and Technology, P.R. China); Hong Jiang (University of Nebraska at Lincoln, USA); Dan Feng (Huazhong University of Science and Technology, P.R. China); Lei Tian (Huazhong University of Science and Technology, P.R. China); Yujuan Tan (Huazhong University of Science and Technology, P.R. China)

Session 4
Software Scheduling
Chair: Cho-li Wang

Heterogeneous Task Scheduling for Accelerated OpenMP
Tom Scogland (Virginia Tech, USA); Barry Rountree (Lawrence Livermore National Laboratory, USA); Wu-chun Feng (Virginia Tech, USA); Bronis R. de Supinski (Lawrence Livermore National Laboratory, USA)

A Source-aware Interrupt Scheduling for Modern Parallel I/O Systems
Hongbo Zou (Illinois Institute of Technology, USA); Xian-He Sun (Illinois Institute of Technology, USA); Siyuan Ma (Illinois Institute of Technology, USA); Xi Duan (Illinois Institute of Technology, USA)

ExPERT: Pareto-Efficient Task Replication on Grids and a Cloud
Orna Agmon Ben-Yehuda (Technion - Israel Institute of Technology, Israel); Assaf Schuster (Technion - Israel Institute of Technology, Israel); Artyom Sharov (Technion, Israel); Mark Silberstein (Technion - Israel Institute of Technology, Israel); Alexandru Iosup (Delft University of Technology, The Netherlands)

Scheduling Closed-Nested Transactions in Distributed Transactional Memory
Junwhan Kim (Virginia Tech, USA); Binoy Ravindran (Virginia Tech, USA)

Parallel Technical
Sessions 5, 6, 7, & 8
1:30 PM - 3:30 PM

Session 5
Multicore Algorithms
Chair: Bo Hong

Power-aware Manhattan routing on chip multiprocessors
Anne Benoit (ENS Lyon, France); Rami Melhem (University of Pittsburgh, USA); Paul Renaud-Goud (LIP, ENS Lyon, France); Yves Robert (ENS Lyon, France)

Efficient Resource Oblivious Algorithms for Multicores with False Sharing
Richard Cole (New York University, USA); Vijaya Ramachandran (University of Texas at Austin, USA)

Competitive Cache Replacement Strategies for Shared Cache Environments
Anil Kumar Katti (University of Texas at Austin, USA); Vijaya Ramachandran (University of Texas at Austin, USA)

A novel sorting algorithm for many-core architectures based on adaptive bitonic sort
Hagen Peters (Christian-Albrechts-University of Kiel, Germany); Ole Schulz-Hildebrandt (CAU Kiel, Germany); Norbert Luttenberger (Christian-Albrechts-University in Kiel, Germany)

Session 6
Scheduling and Load Balancing Algorithms I
Chair: Denis Trystram

Optimizing Busy Time on Parallel Machines
George Mertzios (Durham Univerity, United Kingdom); Mordechai Shalom (Tel-Hai College, Israel); Ariella Voloshin (Technion, Israel); Prudence Wong (University of Liverpool, United Kingdom); Shmuel Zaks (Technion, Israel)

WATS: Workload-Aware Task Scheduling in Asymmetric Multi-core Architectures
Quan Chen (Shanghai Jiao Tong University, P.R. China); Yawen Chen (University of Otago, New Zealand); Zhiyi Huang (University of Otago, New Zealand); Minyi Guo (Shanghai Jiao Tong University, P.R. China)

Parametric Utilization Bounds for Fixed-Priority Multiprocessor Scheduling
Guan Nan (Uppsala University, Sweden); Martin Stigge (Uppsala University, Sweden); Wang Yi (Uppsala University, Sweden); Ge Yu (Northeastern University, P.R. China)

Minimizing Weighted Mean Completion Time for Malleable Tasks Scheduling
Olivier Beaumont (INRIA, France); Nicolas Bonichon (LaBRI -- INRIA Bordeaux Sud-Ouest, France); Lionel Eyraud-Dubois (INRIA Bordeaux Sud-Ouest, France); Loris Marchal (CNRS, France)

Session 7
Scientific Applications
Chair: Rich Vuduc

Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers
Humayun Arafat (The Ohio State University, USA); James Dinan (Argonne National Laboratory, USA); Sriram Krishnamoorthy (Pacific Northwest National Laboratory, USA); Theresa Windus (Iowa State University, USA); Ponnuswamy Sadayappan (Ohio State University, USA)

Highly Efficient Performance Portable Tracking of Evolving Surfaces
Wei Yu (Citadel Investment Group, USA); Franz Franchetti (Carnegie Mellon University, USA); James C. Hoe (Carnegie Mellon University, USA); Tsuhan Chen (Cornell University, USA)

Advancing Large Scale Many-Body QMC Simulations on GPU accelerated Multicore Systems
Andres Tomas (University of California, Davis, USA); Chia-Chen Chang (University of California, Davis, USA); Richard Scalettar (University of California, Davis, USA); Zhaojun Bai (University of California, Davis, USA)

Reducing Data Movement Costs: Scalable Seismic Imaging on Blue Gene
Michael Perrone (IBM, USA); Lurng-Kuo Liu (IBM T.J. Watson Research Center, USA); Ligang Lu (IBM T. J. Watson Research Center, USA); Karen Magerlein (IBM, USA); Changhoan Kim (IBM, USA); Irina Fedulova (IBM, Russia); Artyom Semenikhin (IBM, Russia)

Session 8
MPI Debugging and Performance Optimization
Chair: Xiaosong Ma

Opportunistic Data-driven Execution of Parallel Programs for Efficient I/O Services
Xuechen Zhang (Wayne State University, USA); Kei Davis (Los Alamos National Laboratory, USA); Song Jiang (Wayne State University, USA)

SyncChecker: Detecting Synchronization Errors Between MPI Applications and Libraries
Zhezhe Chen (The Ohio State University, USA); Xinyu Li (The Ohio State University, USA); Jau-Yuan Chen (The Ohio State University, USA); Hua Zhong (Institute of Software, Chinese Academy of Sciences, P.R. China); Feng Qin (Ohio State University, USA)

Holistic Debugging of MPI Derived Datatypes
Joachim Protze (Technische Universitaet Dresden, Germany); Tobias Hilbrich (Technische Universität Dresden, Germany); Andreas Knüpfer (Technische Universität Dresden, Germany); Bronis R. de Supinski (Lawrence Livermore National Laboratory, USA); Matthias S. Müller (Technische Universität Dresden, Germany)

Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks
Marc Tchiboukdjian (Exascale Computing Research, France); Patrick Carribault (CEA/DAM Ile de France, France); Marc Pérache (CEA/DAM Ile de France, France)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical
Sessions 9, 10, 11, 12 & 13
4:00 PM - 6:00 PM

Session 9
Parallel Graph Algorithms I
Chair: Fredrik Manne

Fast and Efficient Graph Traversal Algorithm for CPUs : Maximizing Single-Node Efficiency
Jatin Chhugani (Intel Corporation, USA); Nadathur Satish (Intel Corporation, USA); Changkyu Kim (Intel Corporation, USA); Jason Sewall (Intel Corporation, USA); Pradeep Dubey (Intel Corporation, USA)

SAHAD: Subgraph Analysis in Massive Networks Using Hadoop
Zhao Zhao (Virginia Tech, USA); Guanying Wang (Virginia Tech, USA); Ali R Butt (Virginia Tech., USA); Maleq Khan (Virginia Tech, USA); Vullikanti S Anil Kumar (Virginia Tech, USA); Madhav Marathe (Virginia Tech, USA)

Accelerating nearest neighbor search on manycore systems
Lawrence Cayton (Max Planck Institute for Intelligent Systems, Germany)

Optimizing large-scale graph analysis on multithreaded, multicore platforms
Guojing Cong (IBM T.J. Watson Research Center, USA)

Session 10
High Performance Computing Algorithms
Chair: Che-Rung Lee

Low-Cost Parallel Algorithms for 2:1 Octree Balance
Tobin Isaac (University of Texas at Austin, USA); Carsten Burstedde (Universitaet Bonn, Germany); Omar Ghattas (The University of Texas at Austin, USA)

A Case Study of Designing Efficient Algorithm-based Fault Tolerant Application for Exascale Parallelism
Erlin Yao (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Rui Wang (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Mingyu Chen (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Guangming Tan (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China)

High Performance Non-uniform FFT on Modern x86-based Multi-core Systems
Dhiraj D Kalamkar (Intel Corporation, India); Joshua Trzasko (Mayo Clinic, USA); Srinivas Sridharan (Intel Corporation, India); Mikhail Smelyanskiy (Intel, USA); Daehyun Kim (Intel Corporation, USA); Armando Manduca (Mayo Clinic, USA); Yunhong Shu (Mayo Clinic, USA); Matt Bernstein (Mayo Clinic, USA); Bharat Kaul (Intel Corporation, India); Pradeep Dubey (Intel Corporation, USA)

NUMA Aware Iterative Stencil Computations on Many-Core Systems
Mohammed Shaheen (Max Planck Informatik, Germany); Robert Strzodka (Max Planck Informatik, Germany)

Session 11
Parallel Numerical Computation
Chair: Xiaoye Li

Algebraic block multi-color ordering method for parallel multi-threaded sparse triangular solver in ICCG method
Takeshi Iwashita (Kyoto University, Japan); Hiroshi Nakashima (Kyoto University, Japan); Yasuhito Takahashi (Doshisha University, Japan)

Parallel Computation of Morse-Smale Complexes
Attila Gyulassy (University of Utah, USA); Valerio Pascucci (University of Utah, USA); Tom Peterka (Argonne National Laboratory, USA); Robert Ross (Argonne National Laboratory, USA)

Hybrid static/dynamic scheduling for already optimized dense matrix factorization
Simplice Donfack (INRIA, France); Laura Grigori (INRIA, France); William D Gropp (University of Illinois at Urbana-Champaign, USA); Vivek Kale (University of Illinois at Urbana-Champaign, USA)

Session 12
Architecture Modeling and Scheduling
Chair: Hiroshi Nakashima

Understanding Cache Hierarchy Contention in CMPs to Improve Job Scheduling
Josue Feliu (Universidad Politecnica de Valencia, Spain); Julio Sahuquillo (Universidad Politecnica de Valencia, Spain); Salvador Petit (Polythecnic University of Valencia, Spain); Jose Duato (Universidad Politecnica de Valencia, Spain)

Optimization of Parallel Discrete Event Simulator for Multi-core Systems
Deepak A Jagtap (State University of New York at Binghamton, USA); Nael Abu-Ghazaleh (State University of New York at Binghamton, USA); Dmitry Ponomarev (State University of New York at Binghamton, USA)

Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory
Eduardo Cruz (UFRGS, Brazil); Matthias Diener (Universidade Federal do Rio Grande do Sul, Brazil); Philippe O. A. Navaux (Universidade Federal do Rio Grande do Sul, Brazil)

Session 13
GPU-Based Computing
Chair: Olivier Beaumont

Automatic Resource Scheduling with Latency Hiding for Parallel Stencil Applications on GPGPU Clusters
Kumiko Maeda (IBM Japan, Ltd, Japan); Masana Murase (IBM Japan, Japan); Munehiro Doi (IBM Japan, Japan); Hideaki Komatsu (IBM Tokyo Research Laboratory, Japan); Shigeho Noda (Advanced Center for Computing and Communication, RIKEN, Japan); Ryutaro Himeno (Riken, Japan)

Productive Programming of GPU Clusters with OmpSs
Javier Bueno (Universitat Politècnica de Catalunya, Spain); Judit Planas (Barcelona Supercomputing Center, Spain); Alejandro Duran (Barcelona Supercomputing Center, Spain); Rosa M. Badia (Barcelona Supercomputing Center, Spain); Xavier Martorell (Universitat Politècnica de Catalunya, Spain); Eduard Ayguade (Universitat Politècnica de Catalunya, Spain); Jesús Labarta (Barcelona Supercomputing Center, Spain)

Generating Device-specific GPU code for Local Operators in Medical
Imaging
Richard Membarth (University of Erlangen-Nuremberg, Germany); Frank Hannig (University of Erlangen-Nuremberg, Germany); Jürgen Teich (University of Erlangen-Nuremberg, Germany); Mario Körner (Siemens Healthcare Sector, Germany); Wieland Eckert (Siemens Healthcare Sector, Germany)

Performance Portability with the Chapel Language
Albert Sidelnik (University of Illinois at Urbana-Champaign, USA); Saeed Maleki (University of Illinois at Urbana-Champaign, USA); Maria Garzaran (University of Illinois at Urbana Champaign, USA); Brad Chamberlain (Cray Inc, USA); David Padua (University of Illinois at Urbana-Champaign, USA)

IEEE-TPDS Meeting
6:00 PM - 7:00 PM

IEEE Transactions on Parallel and Distributed Systems

Editorial Board Meeting

Hosted by:
Ivan Stojmenovic, University of Ottawa, Canada
Editor-in-Chief, IEEE TPDS

WEDNESDAY - 23 May 2012

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Special Plenary Session
8:30 AM - 9:00 AM

Greetings From IEEE Computer Society President John Walz

Keynote Session
9:00 AM – 10:00 AM

Keynote Session: Exascale System Software for the Year of the Dragon

Speaker: Pete Beckman
Director, Exascale Technology and Computing Institute
Argonne National Laboratory

Abstract: As we look to exascale systems and a new generation of computing hardware begins to take shape, new software challenges have also emerged. It is therefore an exciting year for computer scientists. We must not fear the challenges ahead, but be must be willing to break the rules to achieve our exascale goals. Node architectures are rapidly changing. Every hardware company is looking for ways to squeeze out more performance per Watt. System architects are also working on ways to integrate fast networking and memory, increase parallelism, and manage heterogeneous computing elements. Building special-purpose exascale systems from this new technology will fundamentally change many parts of our system software stack. While it may be years before disruptive and emerging technology paths become clear and architectures converge on fundamental design patterns, there are many exciting areas of advanced research that can be addressed today. Other areas are yet to be explored. This presentation will focus on the areas of system software, that code that sits between the application and the hardware, that must either evolve or be reinvented to reach our computing goals.

Read more information

Morning Break 10:00 AM - 10:30 AM

PLENARY SESSION:
Panel Discussion
10:30-12:30 PM

PANEL DISCUSSION: Will exascale computing really require new algorithms and programming models?

Moderator:
Katherine Yelick, Lawrence Berkeley National Lab & University of California at Berkeley, USA

Panel Members:
Zhaojun Bai, University of California, Davis, USA
Zhihui Du, Tsinghua University, Beijing, China
Dhabaleswar K. Panda, The Ohio State University, USA
Bronis R. de Supinski, Lawrence Livermore National Laboratory, USA
Richard Vuduc, Georgia Institute of Technology, US

Read more information

Parallel Technical
Sessions 14, 15, 16 & 17
1:30 PM - 3:30 PM

Session 14
Parallel Matrix Factorizations
Chair: Laura Grigori

Dense LU Factorization on Multicore Supercomputer Nodes
Jonathan Lifflander (University of Illinois, USA); Phil Miller (University of Illinois, USA); Ramprasad Venkataraman (University of Illinois, USA); Anshu Arya (University of Illinois, USA); Laxmikant V. Kale (University of Illinois at Urbana-Champaign, USA); Terry Jones (Oak Ridge National Lab, USA)

Hierarchical QR factorization algorithms for multi-core cluster systems
Jack Dongarra (University of Tennessee, Knoxville, USA); Mathieu Faverge (University of Tennessee, USA); Thomas Herault (University of Tennessee, USA); Julien Langou (University of Colorado Denver, USA); Yves Robert (ENS Lyon, France)

New Scheduling Strategies for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Clusters
Xiaoye Sherry Li (Lawrence Berkeley National Laboratory, USA); Ichitaro Yamazaki (Lawrence Berkeley National Laboratory, USA)

ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms
Sivasankaran Rajamanickam (Sandia National Laboratories, USA); Erik G. Boman (Sandia National Laboratories, USA); Michael A Heroux (Sandia National Laboratories, USA)

Session 15
Distributed Computing and Programming Models
Chair: Matthias Muller

MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters
Wei Jiang (The Ohio State University, USA); Gagan Agrawal (The Ohio State University, USA)

Automated and Agile Server Parameter Tuning with Learning and Control
Yanfei Guo (University of Colorado at Colorado Springs, USA); Palden Lama (University of Colorado at Colorado Springs, USA); Xiaobo Zhou (University of Colorado at Colorado Springs, USA)

A Class of Practical Self-tuning Failure Detection Schemes for Cloud Computing Networks
Naixue Xiong (Georgia State University, US, USA); Athanasios V. Vasilakos (National Technical University of Athens, Greece); Jie Wu (Temple University, USA); Y. Richard Yang (Yale University, USA); Andrew J Rindos (IBM, USA); Yi Pan (Georgia State University, USA)

PGAS for Distributed Numerical Python Targeting Multi-core Clusters
Mads R. B. Kristensen (University of Copenhagen, Denmark); Yili Zheng (Lawrence Berkeley National Laboratory, USA); Brian Vinter (University of Copenhagen, Denmark)

Session 16
Memory Architectures
Chair: Lixin Zhang

Miss-Correlation Folding: Encoding Per-Block Miss Correlations in Compressed DRAM for Data Prefetching
Gang Liu (University of Florida, USA); Jih-Kwon Peir (University of Florida, USA); Victor Lee (Intel, USA)

On the role of NVRAM in data-intensive architectures: an evaluation
Brian Van Essen (Lawrence Livermore National Laboratory, USA); Roger Pearce (Texas A&M University, USA); Sasha Ames (University of California, Santa Cruz, USA); Maya Gokhale (Lawrence Livermore National Laboratory, USA)

iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O
Xuechen Zhang (Wayne State University, USA); Kei Davis (Los Alamos National Laboratory, USA); Song Jiang (Wayne State University, USA)

Switching Optically-Connected Memories in a Large Scale System
Dilma Da Silva (IBM T.J. Watson Research Center, USA); Abhirup Chakraborty (IBM T. J. Watson Research Center, USA); Eugen Schenfeld (IBM T. J. Watson Research Center, USA)

Session 17
High Performance Communication and Networking
Chair: Manish Parashar

Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication
James Dinan (Argonne National Laboratory, USA); Pavan Balaji (Argonne National Laboratory, USA); Jeff Hammond (Argonne National Laboratory, USA); Sriram Krishnamoorthy (Pacific Northwest National Laboratory, USA); Vinod Tipparaju (Oak Ridge National Laboratory, USA)

A uGNI-Based Asynchronous Message-driven Runtime System for Cray Supercomputers with Gemini Interconnect
Yanhua Sun (University of Illinois at Urbana-Champaign, USA); Gengbin Zheng (University of Illinois at Urbana-Champaign, USA); Ryan Olson (Cray Inc., USA); Terry Jones (Oak Ridge National Lab, USA); Laxmikant V. Kale (University of Illinois at Urbana-Champaign, USA)

PAMI: A Parallel Active Message Interface for the BlueGene/Q Supercomputer
Sameer Kumar (IBM Research, USA); Amith Mamidala (IBM, USA); Daniel Faraj (IBM, USA); Brian Smith (IBM Rochester, USA); Michael Blocksome (IBM, USA); Bob Cernohous (IBM, USA); Douglas Miller (IBM, USA); Jeff Parker (IBM, USA); Joseph Ratterman (IBM, USA); Philip Heidelberger (IBM Research, USA); Dong Chen (IBM Research, USA); Burkhard Steinmacher-Burow (IBM, Germany)

High-Performance Design of HBase with RDMA over InfiniBand
Jian Huang (The Ohio State University, USA); Xiangyong Ouyang (The Ohio State University, USA); Jithin Jose (The Ohio State University, USA); Md. Wasi-ur-Rahman (The Ohio State University, USA); Hao Wang (The Ohio State University, USA); Miao Luo (The Ohio State University, USA); Hari Subramoni (The Ohio State University, USA); Chet Murthy (IBM T. J Watson Research Center, USA); Dhabaleswar Panda (The Ohio State University, USA)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical
Sessions 18, 19, 20 & 21
4:00 PM - 6:00 PM

Session 18
Scheduling and Load Balancing Algorithms II
Chair: Luc Bougé

Virtual Machine Resource Allocation for Service Hosting on Heterogeneous Distributed Platforms
Mark Lee Stillwell (Cranfield University, United Kingdom); Frederic Vivien (INRIA, France); Henri Casanova (University of Hawaii at Manoa, USA)

Consistency-aware Partitioning Algorithm in Multi-server Distributed Virtual Environments
Yusen Li (Nanyang Technological University, Singapore); Wentong Cai (Nanyang Technological University, Singapore)

Optimal Resource Rental Planning for Elastic Applications in Cloud Market
Han Zhao (University of Florida, USA); Miao Pan (University of Florida, USA); Xinxin Liu (University of Florida, USA); Xiaolin Li (University of Florida, USA); Yuguang Fang (University of Florida, USA)

Improved Bounds for Discrete Diffusive Load Balancing
Clemens P. J. Adolphs (University of British Columbia, Canada); Petra Berenbrink (Simon Fraser University, Canada)

Session 19
Parallel Graph Algorithms II
Chair: Edmond Chow

Multi-core spanning tree algorithms using the disjoint-set data structure
Fredrik Manne (University of Bergen, Norway); Md. Mostofa Ali Patwary (Northwestern University, USA); Peder Refsnes (University of Bergen, Norway)

Graph Partitioning for Reconfigurable Topology
Deepak Ajwani (University College Cork, Ireland); Shoukat Ali (Dublin Research Lab, IBM, USA); John P. Morrison (University Cork, Ireland)

Multithreaded Clustering for Multi-level Hypergraph Partitioning
Umit V. Catalyurek (The Ohio State University, USA); Mehmet Deveci (The Ohio State University, USA); Kamer Kaya (The Ohio State University, USA); Bora Ucar (CNRS, France)

Multithreaded Algorithms for Maximum Matching in Bipartite Graphs
Ariful Azad (Purdue University, USA); Mahantesh Halappanavar (Pacific Northwest National Laboratory, USA); Sivasankaran Rajamanickam (Sandia National Laboratories, USA); Erik G. Boman (Sandia National Laboratories, USA); Arif Khan (Purdue University, USA); Alex Pothen (Purdue University, USA)

Session 20
Data Intensive and Peer-to-Peer Computing
Chair: Alexey Lastovetsky

Multi-level Layout Optimization for Efficient Spatio-temporal Queries on ISABELA-compressed Data
Zhenhuan Gong (North Carolina State University, USA); Sriram Lakshminarasimhan (North Carolina State University, USA); John Jenkins (North Carolina State University, USA); Hemanth Kolla (Sandia National Laboratory, USA); Stephane Ethier (Princeton Plasma Physics Laboratory, USA); Jackie Chen (Sandia National Laboratory, USA); Robert Ross (Argonne National Laboratory, USA); Scott Klasky (Oak Ridge National Laboratory, USA); Nagiza Samatova (North Carolina State University, USA)

Evaluating Mesh-based P2P Video-on-Demand Systems
Yingwu Zhu (Seattle University, USA)

Query optimization and execution in a parallel analytics DBMS
Todd Eavis (Concordia University, Canada); Ahmad Taleb (Najran University, Saudi Arabia)

Dynamic Message Ordering for Topic-Based Publish/Subscribe Systems
Roberto Baldoni (La Sapienza Roma, Italy); Silvia Bonomi (University of Roma, Italy); Marco Platania (University of Rome "La Sapienza", Italy); Leonardo Querzoni (University of Rome "La Sapienza", Italy)

Session 21
Disk and Memory Software Optimization
Chair: Hai Jin

iHarmonizer: Improving the Disk Efficiency of I/O-intensive Multithreaded Codes
Yizhe Wang (WMware. Inc, USA); Kei Davis (Los Alamos National Laboratory, USA); Yuehai Xu (Wayne State University, USA); Song Jiang (Wayne State University, USA)

Improving Parallel IO Performance of Cell-based AMR Applications
Yongen Yu (Illinois Institute of Technology, USA); Douglas Rudd (Yale University, USA); Zhiling Lan (Illinois Institute of Technology, USA); Nickolay Gnedin (The University of Chicago, USA); Andrey Kravtsov (The University of Chicago, USA); Jingjin Wu (Illinois Institute of Technology, USA)

Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications
Dong Li (Oak Ridge National Laboratory, USA); Jeffrey Vetter (Oak Ridge National Laboratory, USA); Gabriel Marin (Oak Ridge National Laboratory, USA); Collin McCurdy (Oak Ridge National Laboratory, USA); Cristian Cira (Auburn University, USA); Zhuo Liu (Auburn University, USA); Weikuan Yu (Auburn University, USA)

NVMalloc: Exposing an Aggregate SSD Store as a Memory Partition in Extreme-Scale Machines
Chao Wang (ORNL, USA); Sudharshan S Vazhkudai (Oak Ridge National Laboratory, USA); Xiaosong Ma (NC State University, USA); Fei Meng (North Carolina State University, USA); Youngjae Kim (Oak Ridge National Laboratory, USA); Christian Engelmann (Oak Ridge National Laboratory, USA)

THURSDAY - 24 May 2012

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

PhD Forum
8:00 AM – 6:00 PM

PhD Forum Posters

POSTERS WILL BE ON DISPLAY ALL DAY THURDAY

See PhD Forum page for list of student authors.

Special Plenary Session
8:30 AM – 9:00 AM

TCPP Annual Report & Awards

IEEE Computer Society Technical Committee on Parallel Processing Chair Ajay Gupta

Keynote Session
9:00 AM - 10:00 AM

Keynote Speech: Building Billion-Threads Computer and Elastic Processor

Speaker: Guo-Jie Li
Institute for Computing Technology
Chinese Academy of Sciences

Abstract: The characteristics of IT applications in the future decades are computing for the masses. What datacenters will deal with is a high number of active users, a high number of applications, a high number of parallel requests, massive amount of data, etc. This challenge is especially serious for China, since there is a huge population. The emergence of Internet-of-Thing makes the class of applications ever more and more, and the ossified computer architecture cannot be suitable for the various niche applications. To address these big issues, Chinese Academy of Sciences (CAS) has started up the Future Information Technology (FIT) Initiative, a 10-year frontier research project for targeting applications and markets of 2020-2030. The State Key Lab on Computer Architecture (CARCH), which is located at the Institute of Computing Technology (ICT) and is the unique SKL in the area of computer architecture in China, is one of major undertakings of the FIT project. The research directions of CARCH include building billion-threads computer, elastic processor, cloud-sea computing, etc. In this talk, we will survey the motivations and basic ideas of these projects. Moreover, we will briefly introduce another foresighted research going on ICT: service-oriented future Internet architecture.

Read more information

Morning Break 10:00 AM - 10:30 AM

PLENARY SESSION:
Best Papers
10:30 AM - 12:30 PM

SESSION: Best Papers
Chair: Leonid Oliker

HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters
Teng Ma (University of Tennessee, USA); George Bosilca (The University of Tennessee, USA); Aurelien Bouteiller (University of Tennessee Knoxville, USA); Jack Dongarra (University of Tennessee, Knoxville, USA)

BRISA: Combining Efficiency and Reliability in Epidemic Data Dissemination
Miguel Matos (Universidade do Minho, Portugal); Valerio Schiavoni (University of Neuchatel, Switzerland); Pascal Felber (University of Neuchatel, Switzerland); Rui Oliveira (Universidade do Minho, Portugal); Etienne Riviere (University of Neuchatel, Switzerland)

Locality Principle Revisited: A Probability-Based Quantitative Approach
Saurabh Gupta (North Carolina State University, USA); Ping Xiang (North Carolina State University, USA); Yi Yang (North Carolina State University, USA); Huiyang Zhou (North Carolina State University, USA)

Evaluating the Impact of TLB Misses on Future HPC Systems
Alessandro Morari (Barcelona Supercomputing Center, Spain); Roberto Gioiosa (Barcelona Supercomputing Center, Spain); Robert Wisniewski (IBM Research, USA); Bryan Rosenburg (IBM Research, USA); Todd Inglett (International Business Machines, USA); Mateo Valero (Universidad Politécnica de Cataluña, Spain)

Parallel Technical
Sessions 22, 23, 24 & 25
1:30 PM - 3:30 PM

Session 22
Network Algorithms
Chair: Jian Cao

Optimal algorithms and approximation algorithms for replica placement with distance constraints in tree networks
Anne Benoit (ENS Lyon, France); Hubert Larchevêque (Institut Polytechnique de Bordeaux, France); Paul Renaud-Goud (LIP, ENS Lyon, France)

On Nonblocking Multirate Multicast Fat-tree Data Center Networks with Server Redundancy
Zhiyang Guo (Stony Brook University, USA); Yuanyuan Yang (Stony Brook University, USA)

Distributed Transactional Memory for General Networks
Gokarna Sharma (Louisiana State University, USA); Costas Busch (Louisiana State University, USA); Srivathsan Srinivasagopalan (Louisiana State University, USA)

On Lambda-Alert Problem
Marek Klonowski (TU Wroclaw, Poland); Dominik Pajak (WUT, Poland)

Session 23
GPU Acceleration
Chair: Frank Mueller

Efficient Quality Threshold Clustering for Parallel Architectures
Anthony Danalis (University of Tennessee Knoxville, USA); Collin McCurdy (Oak Ridge National Laboratory, USA); Jeffrey Vetter (Oak Ridge National Laboratory, USA)

A Highly Parallel Reuse Distance Analysis Algorithm on GPUs
Huimin Cui (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Qing Yi (University of Texas at San Antonio, USA); Jingling Xue (University of New South Wales, Australia); Lei Wang (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Yang Yang (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China); Xiaobing Feng (Institute of Computing Technology, Chinese Academy of Sciences, P.R. China)

Accelerating Large Scale Image Analyses on Parallel CPU-GPU Equipped Systems
George Teodoro (Emory University, USA); Tahsin Kurc (Emory University, USA); Tony Pan (Emory University, USA); Lee Cooper (Emory University, USA); Jun Kong (Emory University, USA); Patrick Widener (Emory University, USA); Joel Saltz (Emory University, USA)

Radio Astronomy Beam Forming on Many-Core Architectures
Alessio Sclocco (Vrije Universiteit Amsterdam, The Netherlands); Ana Lucia Varbanescu (Delft University of Technology, The Netherlands); Jan David Mol (Astron, The Netherlands); Rob V van Nieuwpoort (ASTRON, The Netherlands)

Session 24
Interconnection Networks
Chair: Tarek El-Ehazawi

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Hypre Solvers
Krishna Chaitanya Kandalla (The Ohio State University, USA); Ulrike Yang (Lawrence Livermore National Laboratory, USA); Jeff Keasler (Lawrence Livermore National Laboratory, USA); Tzanio Kolev (Lawrence Livermore National Lab, USA); Adam Moody (Lawrence Livermore National Lab, USA); Hari Subramoni (The Ohio State University, USA); Karen Tomko (Ohio Supercomputer Center, USA); Jerome Vienne (The Ohio State University, USA); B. R. de Supinski(Lawrence Livermore National Lab, USA); Dhabaleswar Panda (The Ohio State University, USA)

Exploring the Scope of the InfiniBand Congestion Control Mechanism
Ernst G. Gran (Simula Research Laboratory, Norway); Sven-Arne Reinemo (Simula Research Laboratory, Norway); Olav Lysne (Simula Research Laboratory, Norway); Tor Skeie (Simula Research Lab, Norway)

DCAF - A Directly Connected Arbitration-Free Photonic Crossbar For Energy-Efficient High Performance Computing
Christopher Nitta (University of California, Davis, USA); Matthew Farrens (University of California, Davis, USA); Venkatesh Akella (University of California, Davis, USA)

Cross-layer Energy and Performance Evaluation of a Nanophotonic Manycore Processor System using Real Application Workloads
George Kurian (Massachusetts Institute of Technology, USA); Chen Sun (Massachusetts Institute of Technology, USA); Chia-Hsin Owen Chen (Massachusetts Institute of Technology, USA); Jason E Miller (Massachusetts Institute of Technology, USA); Lan Wei (Massachusetts Institute of Technology, USA); Jurgen Michel (MIT, USA); Dimitri Antoniadis (Massachusetts Institute of Technology, USA); Li-Shiuan Peh (MIT, USA); Lionel C. Kimerling (Massachusetts Institute of Technology, USA); Vladimir Marko Stojanovic (Massachusetts Institute of Technology, USA); Anant Agarwal (MIT CSAIL, USA)

Session 25
Software Reliablity
Chair: Bronis de Supinski

Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems
Ana Gainaru (University of Illinois at Urbana-Champaign, USA); Franck Cappello (INRIA and University of Illinois at Urbana Champaign, France); William Kramer (National Center for Supercomputing Applications, USA)

Meteor Shower: A Reliable Stream Processing System for Commodity Data Centers
Huayong Wang (MIT, USA); Li-Shiuan Peh (MIT, USA); Emmanouil Koukoumidis (Princeton University, USA); Shao Tao (National Unversity of Singapore, Singapore); Mun Choon Chan (National University of Singapore, Singapore)

Hybrid Transactions: Lock Allocation and Assignment for Irrevocability
Jaswanth Sreeram (Georgia Institute of Technology, USA); Santosh Pande (Georgia Institute of Technology, USA)

Profiling-based Adaptive Contention Management for Software Transactional Memory
Zhengyu He (Georgia Institute of Technology, USA); Xiao Yu (Georgia Institute of Technology, USA); Bo Hong (Georgia Institute of Technology, USA)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical
Sessions 26, 27, 28 & 29
4:00 PM - 6:00 PM

Session 26
Communication Protocols and Benchmarking Algorithms
Chair: Julien Langou

HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications
Amina Guermouche (INRIA Saclay, France); Thomas Ropars (EPFL, Switzerland); Marc Snir (University of Illinois at Urbana Champaign, USA); Franck Cappello (INRIA and University of Illinois at Urbana Champaign, France)

Distributed Demand and Response Algorithm for Optimizing Social-Welfare in Smart Grid
Qifen Dong (Zhejiang University of Technology, P.R. China); Li Yu (Zhejiang University of Technology, P.R. China); Wen-Zhan Song (Georgia State University, USA); Lang Tong (Cornell University, USA); Shaojie Tang (Illinois Institute of Technology, USA)

Scalable Distributed Consensus to Support MPI Fault Tolerance
Darius Buntinas (Argonne National Laboratory, USA)

ScalaBenchGen: Auto-Generation of Communication Benchmarks Traces
Xing Wu (NCSU, USA); Vivek Deshpande (NCSU, USA); Frank Mueller (NCSU, USA)

Session 27
Parallel Algorithms
Chair: Umit Catalyurek

A Self-Stabilization Process for Small-World Algorithms
Sebastian Kniesburges (University of Paderborn, Germany); Andreas Koutsopoulos (University of Paderborn, Germany); Christian Scheideler (Paderborn University, Germany)

Self-organizing Particle Systems
Max Drees (University of Paderborn, Germany); Martina Hüllmann (University of Paderborn, Germany); Andreas Koutsopoulos (University of Paderborn, Germany); Christian Scheideler (Paderborn University, Germany)

PARDA: A Fast Parallel Reuse Distance Analysis Algorithm
Qingpeng Niu (The Ohio State University); James Dinan (Argonne National Laboratory, USA); Qingda Lu (Intel Corp, USA); Ponnuswamy Sadayappan (Ohio State University, USA)

A Lower Bound On Proximity Preservation by Space Filling Curves
Pan Xu (Iowa State University, USA); Srikanta Tirthapura (Iowa State University, USA)

Session 28
Software Performance Analysis and Optimization
Chair: Pavan Balaji

Analyzing Key Performance Factors of Shared Memory MapReduce
Devesh Tiwari (NC State University, USA); Yan Solihin (North Carolina State University, USA)

Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model
Minjang Kim (Georgia Institute of Technology, USA); Pranith Kumar (Georgia Institute of Technology, USA); Hyesoon Kim (Georgia Tech, USA); Bevin Brett (Intel Corporation, USA)

Scalable Critical-Path Based Performance Analysis
David Böhme (German Research School for Simulation Sciences, Germany); Bronis R. de Supinski (Lawrence Livermore National Laboratory, USA); Markus Geimer (Forschungszentrum Jülich, Germany); Martin Schulz (Lawrence Livermore National Laboratory, USA); Felix Wolf (German Research School for Simulation Sciences, Germany)

FractalMRC: An Online Cache Miss Rate Curves Generating Method On Commodity Systems
Lulu He (Huazhong University of Science and Technology, P.R. China); Zhibin Yu (Huazhong University of Science and Technology, P.R. China); Hai Jin (Huazhong University of Science and Technology, P.R. China)

Session 29
Performance Optimization Frameworks and Methods
Chair: Junwei Cao

Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform
Fan Zhang (Rutgers, The State University of New Jersey, USA); Manish Parashar (Rutgers, The State University of New Jersey, USA); Ciprian Docan (Rutgers, The State University of New Jersey, USA); Scott Klasky (Oak Ridge National Laboratory, USA); Norbert Podhorszki (Oak Ridge National Laboratory, USA); Hasan Abbasi (Oak Ridge National Laboratory, USA)

GTI: A Generic Tools Infrastructure for Event Based Tools in Parallel Systems
Tobias Hilbrich (Technische Universität Dresden, Germany); Matthias S. Müller (Technische Universität Dresden, Germany); Bronis R. de Supinski (Lawrence Livermore National Laboratory, USA); Martin Schulz (Lawrence Livermore National Laboratory, USA); Wolfgang E. Nagel (Technische Universitaet Dresden, Germany)

An Efficient Framework for Multi-dimensional Tuning of High Performance Computing Applications
Guojing Cong (IBM T.J. Watson Research Center, USA)

An SMT-Selection Metric to Improve Multithreaded Applications' Performance
Justin Funston (Simon Fraser University, USA); Kaoutar El Maghraoui (IBM T. J. Watson Research Center, USA); Joefon Jann (IBM T. J. Watson Research Center, USA); Pratap Pattnaik (IBM T. J. Watson Research Center, USA); Alexandra Fedorova (Simon Fraser University, Canada)

FRIDAY - 25 May 2012

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

WORKSHOPS
all day*

* See each individual workshop programs for schedule details

WORKSHOP
PDSEC	Workshop on Parallel and Distributed Scientific and Engineering Computing
DPDNS	Dependable Parallel, Distributed and Network-Centric Systems
MTAAP	Workshop on Multi-Threaded Architectures and Applications
LSPP	Workshop on Large-Scale Parallel Processing
PCO	Parallel Computing and Optimization
ASHES	Accelerators and Hybrid Exascale Systems
ParLearning	Parallel and Distributed Computing for Machine Learning and Inference Problems
HPDIC	High Performance Data Intensive Computing
CloudFlow	Workflow Models, Systems, Services and Applications in the Cloud
JSSPP	Workshop on Job Scheduling Strategies for Parallel Processing
LSDSS	Large Scale Distributed Service-oriented Systems
PLC	Multicore and GPU Programming Models, Languages and Compilers Workshop

IPDPS 2012 Tuesday
KEYNOTE SPEAKER

Chris Johnson
Director, Scientific Computing and Imaging Institute
University of Utah

Title: Large-Scale Visual Data Analysis

Abstract: Modern high performance computers have speeds measured in petaflops and handle data set sizes measured in terabytes and petabytes. Although these machines offer enormous potential for solving very large-scale realistic computational problems, their effectiveness will hinge upon the ability of human experts to interact with their simulation results and extract useful information. One of the greatest scientific challenges of the 21st century is to effectively understand and make use of the vast amount of information being produced. Visual data analysis will be among our most important tools in helping to understand such large-scale information. Our research at the Scientific Computing and Imaging (SCI) Institute at the University of Utah has focused on innovative, scalable techniques for large-scale 3D visual data analysis. In this talk, I will present state-of-the-art visualization techniques, including scalable visualization algorithms and software, cluster-based visualization methods and innovative visualization techniques applied to problems in computational science, engineering, and medicine. I will conclude with an outline for future high performance visualization research challenges and opportunities.

Bio: Chris Johnson directs the Scientific Computing and Imaging (SCI) Institute at the University of Utah where he is a Distinguished Professor of Computer Science and holds faculty appointments in the Departments of Physics and Bioengineering. His research interests are in the areas of scientific computing and scientific visualization. Dr. Johnson founded the SCI research group in 1992, which has since grown to become the SCI Institute employing over 200 faculty, staff and students. Professor Johnson serves on several international journal editorial boards, as well as on advisory boards to several national research centers. Professor Johnson has received several awards, including the NSF Presidential Faculty Fellow (PFF) award from President Clinton in 1995 and the Governor's Medal for Science and Technology from Governor Michael Leavitt in 1999. He is a Fellow of the American Institute for Medical and Biological Engineering, a Fellow of the American Association for the Advancement of Science, and in 2009 he was elected a Fellow of the Society for Industrial and Applied Mathematics (SIAM) and received the Utah Cyber Pioneer Award. In 2010 Professor Johnson received the Rosenblatt Award from the University of Utah and the IEEE Visualization Career Award.

IPDPS 2012 Wednesday
KEYNOTE SPEAKER

Pete Beckman
Director, Exascale Technology and Computing Institute
Argonne National Laboratory

Title: Exascale System Software for the Year of the Dragon

Bio: Pete Beckman is a recognized global expert in high-end computing systems. During the past 25 years, he has designed and built software and architectures for large-scale parallel and distributed computing systems. Peter helped found Indiana University’s Extreme Computing Laboratory. He also founded the Linux cluster team at the Advanced Computing Laboratory, Los Alamos National Laboratory and a Turbolinux-sponsored research laboratory that developed the world’s first dynamic provisioning system for cloud computing and HPC clusters. Furthermore, he acted as vice president of Turbolinux’s worldwide engineering efforts. Pete joined Argonne National Laboratory in 2002. As director of engineering and chief architect for the TeraGrid, he designed and deployed the world’s most powerful Grid computing system for linking production high performance computing centers for the National Science Foundation. He served as director of the Argonne Leadership Computing Facility from 2008 to 2010. He is currently the director of the Exascale Technology and Computing Institute, where he leads Argonne’s exascale computing strategic initiative. He is co-founder of the International Exascale Software Project (IESP).

IPDPS 2012 Thursday
KEYNOTE SPEAKER

Guo-Jie Li
Institute for Computing Technology
Chinese Academy of Sciences

Title:Building Billion-Threads Computer and Elastic Processor

Bio:Guo-Jie Li obtained his Ph.D. degree from Purdue University, USA in 1985. In 1987, he came back to China and worked with the Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS). He was the director of the National Research Center for Intelligent Computing Systems (NCIC) between 1990 and 2000. From 2000 to 2011, he served as the director of ICT, CAS. Over the past decades, he has conducted research in parallel algorithm, computer architecture, network and artificial intelligence, published over 150 papers and 5 books. Under his leadership, NCIC and ICT successfully developed the Dawning series of high-performance computers and the Loongson (Godson) general-purpose CPU chips. Guo-Jie Li currently serves as a professor and the chief scientist of ICT-CAS, the honorary president of the China Computer Federation, and the editor-in-chief of the Journal of Computer Science and Technology. He was elected as a member of the Chinese Academy of Engineering (CAE) in 1995 and a fellow of the Academy of Sciences for the Developing World (TWAS) in 2002.

IPDPS 2012 Wednesday
PANEL DISCUSSION

PANEL MODERATOR
Katherine Yelick
Lawrence Berkeley National Lab & University of California at Berkeley, USA

Professor Katherine Yelick is the co-author of two books and more than 100 refereed technical papers on parallel languages, compilers, algorithms, libraries, architecture, and storage. She co-invented the UPC and Titanium languages and demonstrated their applicability across computer architectures through the use of novel runtime and compilation methods. She also co-developed techniques for self-tuning numerical libraries, including the first self-tuned library for sparse matrix kernels which automatically adapt the code to properties of the matrix structure and machine. Her work includes performance analysis and modeling as well as optimization techniques for memory hierarchies, multicore processors, communication libraries, and processor accelerators. She has worked with interdisciplinary teams on application scaling, and her own applications work includes parallelization of a model for blood flow in the heart. She earned her Ph.D. in Electrical Engineering and Computer Science from MIT and has been a professor of Electrical Engineering and Computer Sciences at UC Berkeley since 1991, with a joint research appointment at Berkeley Lab since 1996. She has received multiple research and teaching awards and is a member of the California Council on Science and Technology, a member of the Computer Science and Telecommunications Board and a member of the National Academies Committee on Sustaining Growth in Computing Performance.

PANEL MEMBERS

Zhaojun Bai
University of California, Davis, USA
Zhaojun Bai is a Professor in the Department of Computer Science and Department of Mathematics, University of California, Davis. His research interests are in the areas of scientific computing, numerical linear algebra design and analysis and mathematical software engineering. He participated in a number of large scale synergistic computational science and engineering projects, such as LAPACK for solving the most common problems in numerical linear algebra and DOE SciDAC project PETAMAT on next generation multi-scale quantum simulation software on multicore systems with GPU acceleration.

Zhihui Du
Tsinghua University, Beijing, China
Zhihui Du received the BE degree in 1992 in the computer department of Tianjian University. He received the MS and PhD degrees in computer science, respectively, in 1995 and 1998, from Peking University. From 1998 to 2000, he held a post-doctoral position at Tsinghua University. From 2001 to now, he worked at Tsinghua University as an associate professor in the Department of Computer Science and Technology. His research areas include high performance computing and grid computing. Currently his research focuses on parallel algorithm design on multicore or many core systems (such as GPU) and energy efficient scheduling algorithm design.

Dhabaleswar K. Panda
The Ohio State University, USA
Dr. Dhabaleswar K. (DK) Panda is a Professor of Computer Science at The Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high performance computing, communication protocols, files systems, network-based computing, and Quality of Service, and he has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda has been associated with IPDPS since its beginning and was a founding member of TCPP. He is currently serving as the Program Chair for HiPC '12 and as Vice Chair for CCGrid '12. He has served as an Associate Editor of IEEE TPDS and currently serves as an Associate Editor of IEEE Transactions on Computers (IEEE TC) and Journal of Parallel and Distributed Computing (JPDC). Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand and 10GE/iWARP; they are collaborating with National Laboratories and leading InfiniBand and 10GE/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH/MVAPICH2 (High Performance MPI over InfiniBand and iWARP) open-source software programs, developed by his research group, are currently being used by more than 1,840 organizations worldwide (in 65 countries). Dr. Panda is a Fellow of IEEE and a member of ACM.

Bronis R. de Supinski
Lawrence Livermore National Laboratory, USA
Bronis R. de Supinski is the principal investigator and leader of the Exascale Computing Technlogies (ExaCT) project and the co-leader of the Advanced Simulation and Computing (ASC) program's Application Development Environment and Performance Team (ADEPT) at Lawrence Livermore National Laboratory (LLNL). He is also an Adjunct Associate Professor in the Department of Computer Science and Engineering at Texas A&M University. His research interests include high performance computer architectures, performance modeling and analysis, message passing implementations and tools, large-scale debugging, memory performance improvement, cache coherence and distributed shared memory, consistency semantics and programming models. Bronis earned his Ph.D. in Computer Science from the University of Virginia in 1998 and he joined LLNL's Center for Applied Scientific Computing (CASC) in July 1998. Currently, his projects include scalable debugging methods, investigations into mechanisms and tools to improve memory performance, applications of data mining techniques to tools for large-scale systems, resiliency techniques, a variety of optimization techniques and tools for MPI and several issues with OpenMP, including its memory model and tool support. He pursues the last set of topics as the Chair of the OpenMP Language Committee. Throughout his career, Bronis has won several awards, including the prestigious Gordon Bell Prize in 2005 and 2006. He is a member of the ACM and the IEEE Computer Society.

Richard Vuduc
Georgia Institute of Technology, US
Richard (Rich) Vuduc is an assistant professor in the School of Computational Science and Engineering at Georgia Tech. His research lab, the HPC Garage (hpcgarage.org), is interested in high-performance computing with focus areas in parallel algorithms, performance analysis, tuning, and debugging. He is a recipient of the U.S. National Science Foundation's CAREER award and an invited member of the U.S. DARPA's Computer Science Study Group. His lab was also part of the Georgia Tech team that won the Gordon Bell Prize in 2010.