IEEE International Parallel & Distributed Processing Symposium

Technical Committee on
Parallel Processing
IPDPS 2009 Hosted By

IPDPS 2009 Advance Program


Please visit the IPDPS website regularly for updates, since there may be schedule revisions. Authors who have corrections, contact Note that paper numbers are listed for easy reference.

IPDPS 2009 Advance Program Abstracts
Abstracts for contributed papers and all workshops have been compiled to allow authors to check accuracy and so that visitors to this Website may preview the papers to be presented at the conference. Full proceedings of the conference will be published on a cdrom pocketed in a program book to be distributed to registrants at the conference.

Click here to view abstracts in advance (pdf)

MONDAY - 25 May 2009
all day*

* See each individual workshop programs for schedule details

HCW Heterogeneity in Computing Workshop
RAW Reconfigurable Architectures Workshop
HIPS Workshop on High-Level Parallel Programming Models & Supportive Environments
JAVAPDC Workshop on Java and Components for Parallelism, Distribution and Concurrency
NIDISC Workshop on Nature Inspired Distributed Computing
HiCOMB Workshop on High Performance Computational Biology
APDCM Advances in Parallel and Distributed Computing Models
CAC Communication Architecture for Clusters
HPPAC High-Performance, Power-Aware Computing
HPGC High Performance Grid Computing
SMTPS Workshop on System Management Techniques, Processes, and Services
Commercial Tutorial
6:30 PM

Commercial Tutorial

John Goodhue - SiCortex CTO
Topic: SiCortex High-Productivity, Low-Power Computers

• Read the abstract for this talk

TUESDAY - 26 May 2009
Opening & Keynote
8:00 AM -
9:30 AM

chair: Horst Simon

Wen-Mei Hwu
University of Illinois, Urbana-Champaign, USA
Topic: Many-core Parallel Computing: Can compilers and tools do the heavy lifting?

• Read the abstract for this keynote

Morning Break 9:30 AM - 10:00 AM
Parallel Sessions
1, 2, 3 & 4

10:00 AM -
12:00 PM

Algorithms - Scheduling I

Chair: Cynthia Phillips

On Scheduling Dags to Maximize Area
Gennaro Cordasco (University of Salerno, IT); Arnold Rosenberg (Colorado State University, US)

Efficient Scheduling of Task Graph Collections on Heterogeneous Resources
Matthieu Gallet (École normale supérieure de Lyon, FR); Loris Marchal (CNRS, FR); Frédéric Vivien (INRIA, FR)

Static Strategies for Worksharing with Unrecoverable Interruptions
Anne Benoit (ENS Lyon, FR); Yves Robert (École Normale Supérieure de Lyon, FR); Arnold Rosenberg (Colorado State University, US); Frédéric Vivien (INRIA, FR)

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms
Anne Benoit (ENS Lyon, FR); Fanny Dufossé (LIP, ENS Lyon, FR); Yves Robert (École Normale Supérieure de Lyon, FR)

Applications - Biological Applications

Chair: David Konerding

Sequence Alignment with GPU: Performance and Design Challenges
Gregory Striemer (University of Arizona, US); Ali Akoglu (University of Arizona, US)

Evaluating the Use of GPUs for Life Science Applications
John Paul Walters (University at Buffalo, US); Vidyananth Balu (University at Buffalo, US); Suryaprakash Kompalli (University at Buffalo, US); Vipin Chaudhary (University at Buffalo, SUNY, US)

Improving MPI-HMMER's Scalability With Parallel I/O
John Paul Walters (University at Buffalo, US); Rohan Darole (University at Buffalo, US); Vipin Chaudhary (University at Buffalo, SUNY, US)

Accelerating Leukocyte Tracking using CUDA: A Case Study in Leveraging Manycore Coprocessors
Michael Boyer (University of Virginia, US); David Tarjan (University of Virginia, US); Scott Acton (University of Virginia, US); Kevin Skadron (University of Virginia, US)

Architecture - Memory Hierarchy and Transactional Memory

Chair: Per Stenstrom

Efficient Shared Cache Management through Sharing-Aware Replacement and Streaming-Aware Insertion Policy
Yu Chen (Tsinghua University,CN); Wenlong Li ( Intel Corp, CN); Changkyu Kim (Intel Corp, US); Zhizhong Tang (Tsinghua Univsersity, CN)

Core-aware Memory Access Scheduling Schemes
Zhibin Fang (Illinois Institute of Technology, US); Xian-He Sun (Illinois Institute of Technology, US); Yong Chen (Illinois Institute of Technology, US); Surendra Byna (Illinois Institute of Technology, US)

Using Hardware Transactional Memory for Data Race Detection
Shantanu Gupta (University of Michigan, US); Florin Sultan (NEC Laboratories America, US); Srihari Cadambi (NEC Laboratories America, Inc, US); Franjo Ivancic (NEC Laboratories America, Inc., US); Martin Roetteler (NEC Labs America, Inc., US)

Speculation-Based Conflict Resolution in Hardware Transactional Memory
Rubén Titos (University of Murcia, ES); Manuel Acacio (Universidad de Murcia, ES); José M. García (University of Murcia, ES)

Software - Fault Tolerance and Runtime Systems

Chair: DK Panda

Compiler-Enhanced Incremental Checkpointing for OpenMP Applications
Greg Bronevetsky (Lawrence Livermore National Laboratory, US); Keshav Pingali (U. Texas at Austin, US); Daniel Marques (University of Texas, US); Radu Rugina (Cornell University, US); Sally McKee (Chalmers University of Technology, SE)

DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop
Jason Ansel (MIT, US); Kapil Arya (Northeastern University, US); Gene Cooperman (Northeastern University, US)

Elastic Scaling of Data Parallel Operators in Stream Processing
Scott Schneider (Virginia Tech, US); Henrique Andrade (IBM T. J. Watson Research Center, US); Bugra Gedik (IBM T. J. Watson Research Center, US); Alian Biem (IBM Research, US); Kun-Lung Wu (IBM T. J. Watson Research Center, US)

Scalable RDMA performance in PGAS languages
Montse Farreras (Universitat Politècnica de Catalunya (UPC), ES); George Almasi (IBM T.J. Watson Research Center, US); Calin Cascaval (IBM T.J. Watson Research Center, US); Toni Cortes (Technical University of Catalonia, ES)

Parallel Sessions
5, 6, 7 & 8

2:00 PM -
4:00 PM

Algorithms - Resource Management

Chair: Michele Flammini

Singular Value Decomposition on GPU using CUDA
Sheetal Lahabar (International Institute of Information Technology, IN)

Coupled Placement in Modern Data Centers
Madhukar Korupolu (IBM Almaden Research Center, US); Aameek Singh (IBM Almaden Research Center, US); Bhuvan Bamba (Georgia Institute of Technology, US)

An Upload Bandwidth Threshold for Peer-to-Peer Video-on-Demand Scalability
Yacine Boufkhad (Paris Diderot University, FR); Fabien Mathieu (Orange Labs, FR); Fabien de Montgolfier (Université Paris 7, FR); Diego Perino (Orange Labs, FR); Laurent Viennot (INRIA, FR)

Competitive Buffer Management with Packet Dependencies
Alex Kesselman (Google, US); Boaz Patt-Shamir (Tel Aviv University, IL); Gabriel Scalosub (University of Toronto, CA)

Applications - System Software and Applications

Chair: Leonid Oliker

Annotation-Based Empirical Performance Tuning Using Orio
Albert Hartono (Ohio State University, US); Boyana Norris (Argonne National Laboratory, US); Ponnuswamy Sadayappan (Ohio State University, US)

Automatic detection of parallel applications computation phases
Juan Gonzalez Garcia (Universitat Politecnica de Catalunya, ES); Judit Gimenez (Universitat Politecnica de Catalunya, ES); Jesus Labarta (Technical University of Catalonia, ES)

Handling OS Jitter in Multicore Multithreaded Systems
Pradipta De (IBM Research, New Delhi, India, IN); Vijay Mann (IBM Research, New Delhi, India, IN); Umang Mittal (Indian Institute of Technology, New Delhi, India, IN)

Building A Parallel Pipelined External Memory Algorithm Library
Andreas Beckmann (Goethe-Universität Frankfurt am Main, DE); Roman Dementiev (Universität Karlsruhe, DE); Johannes Singler (Universität Karlsruhe, DE)

Architecture - Power Efficiency and Process Variability

Chair: Grigorios Magklis

On reducing misspeculations on a pipelined scheduler
Ruben Gran Tejero (University of Zaragoza, ES); Enric Morancho (Universitat Politècnica de Catalunya, ES); Angel Olive (UPC-dac, ES); Jose Maria Llaberia (Universidad Politecnica de Cataluña, ES)

Efficient Microarchitecture Policies for Accurately Adapting to Power Constraints
Juan Cebrián (University of Murcia, ES); Juan Aragón (University of Murcia, SPAIN, ES); José M. García (University of Murcia, ES); Pavlos Petoumenos (University of Patras, GR); Stefanos Kaxiras (University of Patras, GR)

An On/Off Link Activation Method for Low-Power Ethernet in PC Clusters
Michihiro Koibuchi (National Institute of Informatics, JP); Tomohiro Otsuka (Keio University, JP); Hiroki Matsutani (Keio University, JP); Hideharu Amano (Keio University, JP)

A new mechanism to deal with process variability in NoC links
Carles Hernandez (Technical University of Valencia, ES); Federico Silla (Technical University of Valencia, ES); Vicente Santonja (Universidad Politecnica de Valencia, ES); Jose Duato (Universidad Politecnica de Valencia, ES)

Software - Data Parallel Programming Frameworks

Chair: Michael Gerndt

A framework for efficient and scalable execution of domain-specific templates on GPUs
Narayanan Sundaram (University of California, Berkeley, US); Anand Raghunathan (NEC-Labs America, US); Srimat Chakradhar (NEC Research Labs, US)

CellMR: A Framework for Supporting MapReduce on Asymmetric Cell-Based Clusters
M. Mustafa Rafique (Virginia Tech, US); Benjamin Rose (Virginia Tech, US); Ali Butt (Virginia Tech., US); Dimitrios Nikolopoulos (Virginia Tech, US)

A Cross-Input Adaptive Framework for GPU Programs Optimizations
Yixun Liu (The College of William and Mary, US); Eddy Zhang (College of William and Mary, US); Xipeng Shen (The College of William and Mary, US)

Message Passing on Data-Parallel Architectures
Jeffery Stuart (University of California, Davis, US): John Owens (University of California, Davis, US)

Afternoon Break 4:00 PM - 4:30 PM
Parallel Sessions
9, 10, 11 & 12

4:30 PM -
6:30 PM

Algorithms - Scheduling II

Chair: Boaz Patt-Shamir

Online time constrained scheduling with penalties
Nicolas Thibault (University of Evry, FR)

Minimizing Total Busy Time in Parallel Scheduling with Application to Optical Networks
Michele Flammini (University of L'Aquila, IT); Tami Tamir (Efi Arazi School of Computer Science and Engineering, IL); Gianpiero Monaco (Università di L'Aquila, IT); Luca Moscardelli (Università di L'Aquila, IT); Hadas Shachnai (Technion, IL); Mordechai Shalom (Technion, Israel Institute of Technology, IL); Shmuel Zaks (Technion, IL)

Energy Minimization for Periodic Real-Time Tasks on Heterogeneous Processing Units
Jian-Jia Chen (ETH Zurich, CH); Andreas Schranzhofer (TIK ETH Zurich, CH); Lothar Thiele (ETH Zurich, CH)

Multi-Users Scheduling in Parallel Systems
Erik Saule (Institut Polytechnique Grenoble, FR); Denis Trystram (Univ. of Grenoble, FR)

Applications - Graph and String Applications

Chair: Robert Farber

Input-independent, Scalable and Fast String Matching on the Cray XMT
Oreste Villa (PNNL, US); Daniel Chavarria (Pacific Northwest National Laboratory, US); Kristyn Maschhoff (Cray, Inc., US)

Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis
Kamesh Madduri (Lawrence Berkeley National Laboratory, US); David Bader (Georgia Institute of Technology, US)

Transitive Closure on the Cell Broadband Engine: A study on Self-Scheduling in a Multicore Processor
Sudhir Vinjamuri (University of Southern California, US); Viktor Prasanna (University of Southern California, US)

Parallel Short Sequence Mapping for High Throughput Genome Sequencing
Doruk Bozdag (The Ohio State University, US); Catalin Barbacioru (Applied Biosystems, US); Umit Catalyurek (The Ohio State University, US)

Architecture - Networks and Interconnects

Chair: Jose Manuel Garcia

TupleQ: Fully-Asynchronous and Zero-Copy MPI over InfiniBand
Matthew Koop (The Ohio State University, US); Jaidev Sridhar (The Ohio State University, US); Dhabaleswar Panda (The Ohio State University, US)

Disjoint-Path Routing: Efficient Communication for Streaming Applications
DaeHo Seo (Purdue University, US); Mithuna Thottethodi (Purdue University, US)

Performance Analysis of Optical Packet Switches Enhanced with Electronic Buffering
Zhenghao Zhang (Florida State University, US); Yuanyuan Yang (Stony Brook University, US)

An Approach for Matching Communication Patterns in Parallel Applications
Yong-Meng Teo (National University of Singapore, SG)

Software - I/O and File Systems

Chair: Dimitrios Nikolopoulos

Adaptable, Metadata Rich IO Methods for Portable High Performance IO
Jay Lofstead (Georgia Institute of Technology, US); Fang Zheng (Georgia Tech, US); Scott Klasky (Oak Ridge National Laboratory, US); Karsten Schwan (Georgia Tech, US)

Small File Access in Parallel File Systems
Philip Carns (Argonne National Laboratory, US); Sam Lang (Argonne National Laboratory, US); Robert Ross (Argonne National Laboratory, US); Murali Vilayannur (Vmware Inc., US); Julian Kunkel (University of Heidelberg, DE); Thomas Ludwig (University of Heidelberg, DE)

Making Resonance a Common Case: A High-Performance Implementation of Collective I/O on Parallel File System
Xuechen Zhang (Wayne State University, US); Song Jiang (Wayne State University, US); Kei Davis (Los Alamos National Laboratory, US)

Design, Implementation, and Evaluation of Transparent pNFS on Lustre
Weikuan Yu (Oak Ridge National Laboratory, US); Oleg Drokin (Sun Microsystems Inc., US); Jeffrey Vetter (Oak Ridge National Laboratory, US)

Symposium Tutorial
6:30 PM -
10:00 PM

Symposium Tutorial

Title: Tools for Scalable Performance Analysis on Petascale Systems

Presenters: I-Hsin Chung, S.R. Seelam (IBM T.J. Watson, USA) B. Mohr (Research Center Juelich, Germany) J. Labarta (UPC Barcelona, Spain)

• Read the abstract for this talk

WEDNESDAY - 27 May 2009
Keynote Session
8:30 AM -
9:30 AM

Chair: Christian Scheideler, TU Munich

Nir Shavit
Tel Aviv University, Israel
Topic: Software Transactional Memory: Where do we come from? What are we? Where are we going?

• Read the abstract for this keynote

Morning Break 9:30 AM - 10:00 AM
Best Papers - Plenary
10:00 AM -
12:00 PM

Best Papers - Plenary

Chair: Per Stenstrom

Crash Fault Detection in Celerating Environments
Srikanth Sastry (Texas A&M University, US); Scott Pike (Texas A&M University, US); Jennifer Welch (Texas A&M University, US)

HPCC RandomAccess Benchmark for Next Generation Supercomputers
Vikas Aggarwal (IBM India Research Lab, IN); Yogish Sabharwal (IBM India Research Lab, IN); Rahul Garg (IBM India Research Lab, IN); Philip Heidelberger (IBM Research, US)

Exploring the Multi-GPU Design Space
Dana Schaa (Northeastern University, US); David Kaeli (Northeastern University, US)

Accommodating Bursts in Distributed Stream Processing Systems
Yannis Drougas (University of California, Riverside, US); Vana Kalogeraki (University of California, Riverside, US)

Parallel Sessions
13, 14, 15 & 16
2:00 PM -
4:00 PM

Algorithms - General Theory

Chair: Jennifer Welch

Combinatorial Properties for Efficient Communication in Distributed Networks with Local Interactions
Sotiris Nikoletseas (University of Patras and Computer Technology Institute, GR); Christoforos Raptopoulos (U. of Patras, GR); Paul Spirakis (University of Patras, GR)

Remote-Spanners: What to Know beyond Neighbors
Laurent Viennot (INRIA, FR); Philippe Jacquet (INRIA, FR)

A Fusion-based Approach for Tolerating Faults in Finite State Machines
Vinit Ogale (University of Texas at Austin, US); Bharath Balasubramanian (University of Texas at Austin, US); Vijay Garg (IBM India Research Lab., India)

The Weak Mutual Exclusion Problem
Paolo Romano (Inesc-ID, PT); Luis Rodrigues (Inesc-ID/IST, PT); Nuno Carvalho (Inesc-ID/IST, PT)

Applications - Data Intensive Applications

Chair: Wolfgang Nagel

Best-Effort Parallel Execution for Recognition and Mining Applications
Jiayuan Meng (University of Virginia, US); Anand Raghunathan (NEC-Labs America, US); Srimat Chakradhar (NEC Research Labs, US)

Multi-Dimensional Characterization of Temporal Data Mining on Graphics Processors
Jeremy Archuleta (Virginia Tech, US); Yong Cao (Virginia Tech, US); Wu-chun Feng (Virginia Tech, US); Tom Scogland (Virginia Tech, US)

A Partition-based Approach to Support Streaming Updates over Persistent Data in an Active Data Warehouse
Abhirup Chakraborty (University of Waterloo, Canada); Ajit Singh (University of Waterloo, CA)

Architectural Implications for Spatial Object Association Algorithms
Vijay Kumar (The Ohio State University, US); Tahsin Kurc (Emory University, US); Joel Saltz (Emory University, US); Ghaleb Abdulla (Lawrence Livermore National Laboratory, US); Scott Kohn (Lawrence Livermore National Laboratory, US); Celeste Matarazzo (Lawrence Livermore National Laboratory, US)

Architecture - Emerging Architectures and Performance Modeling

Chair: Josep Torrellas

vCUDA: GPU Accelerated High Performance Computing in Virtual Machines
Hao Chen (Hunan University, CN)

Understanding the Design Trade-offs among Current Multicore Systems for Numerical Computations
Seunghwa Kang (Georgia Institute of Technology, US); David Bader (Georgia Institute of Technology, US); Richard Vuduc (Georgia Institute of Technology, US)

Parallel Data-Locality Aware Stencil Computations on Modern Micro-Architectures
Matthias Christen (University of Basel, CH); Olaf Schenk (University of Basel, CH); Esra Neufeld (IT'IS Foundation, ETH Zurich, CH); Peter Messmer (Tech-X Corporation, US); Helmar Burkhart (University of Basel, CH)

Performance Projection of HPC Applications Using SPEC CFP2006 Benchmarks
Sameh Sharkawi (Texas A&M University, US); Don DeSota (IBM, US); Raj Panda (IBM, US); Rajeev Indukuru (IBM, US); Stephen Stevens (IBM, US); Valerie Taylor (Texas A&M University, US); Xingfu Wu (Texas A&M University, US)

Software - Distributed Systems, Scheduling and Memory Management

Chair: Greg Bronevetsky

Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism
Yi Guo (Rice University, US); Rajkishore Barik (Rice University, US); Raghavan Raman (Rice University, US); Vivek Sarkar (Rice University, US)

Autonomic management of non-functional concerns in distributed and parallel application programming
Marco Aldinucci (University of Pisa, IT); Marco Danelutto (Univesity of Pisa, IT); Peter Kilpatrick (Queen's University of Belfast, UK)

Scheduling Resizable Parallel Applications
Rajesh Sudarsan (Virginia Tech, US); Calvin Ribbens (Virginia Tech, US)

Helgrind+: An Efficient Dynamic Race Detector
Ali Jannesari (University of Karlsruhe, DE); Kaibin Bao (University of Karlsruhe, DE); Victor Pankratius (University of Karlsruhe, DE); Walter Tichy (University Karlsruhe, DE)

Symposium Panel
4:30 PM -
6:30 PM


Symposium Panel

Topic: How to Build a Useful Thousand-Core System?
Moderator: Josep Torrellas, University of Illinois, Urbana-Champaign

• Read the abstract for this talk

• Laxmikant Kale, University of Illinois at Urbana-Champaign
• Jesus Labarta, Supercomputing Center, Universitat Politecnica de
   Catalunya, Barcelona
• Keshav Pingali, University of Texas at Austin
• Per Stenstrom, Chalmers University


Symposium Buffet Dinner

Details to be announced

THURSDAY - 28 May 2009
Keynote Session
8:30 AM - 9:30 AM

Chair: Frank Mueller

Leonid Oliker
Lawrence Berkeley National Laboratory, USA
Title: Green Flash: Designing an energy efficient climate supercomputer

• Read the abstract for this keynote

Morning Break 9:30 AM- 10:00 AM
1:00 PM - 6:00 PM


Selected Poster Presentations
PhD Forum presenters will be available during the Forum session to discuss their work. Posters may also be viewed during the TCPP Reception Thursday evening.

Parallel Sessions
17, 18, 19 & 20
10:00 AM -
12:00 PM

Algorithms - Wireless Networks

Chair: Geppino Pucci

Sensor Network Connectivity with Multiple Directional Antennae of a Given Angular Sum
Evangelos Kranakis (Carleton University, CA); Danny Krizanc (Wesleyan University, US); Binay Bhattacharya (Professor, CA); Yuzhuang Hu (Simon Fraser University, CA); Qiaosheng Shi (Simon Fraser University, CA)

Unit Disk Graph and Physical Interference Model: Putting Pieces Together
Emmanuelle Lebhar (CNRS (France) and CMM-University of Chile (Chile), FR); Zvi Lotker (Ben Gurion University, Beer Sheva, IL)

Path-Robust Multi-Channel Wireless Networks
Arnold Rosenberg (Colorado State University, US)

Information Spreading in Stationary Markovian Evolving Graphs
Andrea Clementi (Universita' di Roma "Tor Vergata", IT); Angelo Monti (University of Rome, "La Sapienza", IT); Francesco Pasquale (University of Rome ''Tor Vergata'', IT); Riccardo Silvestri (University of Rome, ''La Sapienza'', IT)

Applications I - Cluster/Grid/P2P Computing

Chair: Anne Elster

Multiple Priority Customer Service Guarantees in Cluster Computing
Kaiqi Xiong (North Carolina State University, US)

Treat-Before-Trick : Free-riding Prevention for BitTorrent-like Peer-to-Peer Networks
Kyuyong Shin (North Carolina State University, US); Douglas S. Reeves (North Carolina State University, US); Injong Rhee (North Carolina State University, US)

A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments
Qian Zhu (The Ohio State University, US); Gagan Agrawal (The Ohio State University, US)

Applications II - Multicore

Chair: Dan Katz

High-Order Stencil Computations on Multicore Clusters
Liu Peng; Richard Seymour; Ken-ichi Nomura; Rajiv K. Kalia; Aiichiro Nakano; Priya Vashishta (University of Southern California, US) Alexander Loddoch; Michael Netzband; William R. Volz; Chap C. Wong (Chevron ETC, US)

Dynamic Iterations for the Solution of Ordinary Differential Equations on Multicore Processors
Ashok Srinivasan (Florida State University, US); Yanan Yu (Florida State University, US)

Efficient Large-Scale Model Checking
Kees Verstoep (Vrije Universiteit, NL); Henri Bal (Vrije Universiteit, NL); Jiri Barnat (Masaryk University, CZ); Lubos Brim (Masaryk University, CZ)

Software - Parallel Compilers and Languages

Chair: Frank Mueller

Scalable Autotuning Framework for Compiler Optimization
Ananta Tiwari (University of Maryland at College Park, US); Chun Chen (USC/ISI, US); Jacqueline Chame (USC/ISI, US); Mary Hall (USC/ISI, US); Jeffrey Hollingsworth (University of Maryland, US)

Taking the heat off transactions: dynamic selection of pessimistic concurrency control
Nehir Sonmez (Universitat Politècnica de Catalunya, ES); Adrian Cristal (Barcelona Supercomputing Center, ES); Tim Harris (Microsoft Research, UK); Osman Unsal (Barcelona Supercomputing Center, ES); Mateo Valero (Universidad Politécnica de Cataluña, ES)

Packer: an Innovative Space-Time-Efficient Parallel Garbage Collection Algorithm Based on Virtual Spaces
Shaoshan Liu (University of California, Irvine, US); Ligang Wang (Intel China Research Center, CN); Jean-Luc Gaudiot (University of California, US); Xiao-Feng Li (Middleware Products Division, Software and Solutions Group, Intel Corp, CN)

Concurrent SSA for General Barrier-Synchronized Parallel Programs
Harshit Shah (Tata Institute of Fundamental Research, IN); R. k. Shyamasundar (Tata Institute of Fundamental Research, IN); Pradeep Varma (IBM India Research Laboratory, IN)

Parallel Sessions
21, 22 & 23

2:00 PM -
4:00 PM

Algorithms - Self-Stabilization

Chair: Christian Scheideler

Optimal Deterministic Self-stabilizing Vertex Coloring in Unidirectional Anonymous Networks
Samuel Bernard (Universite Paris 6, FR); Stephane Devismes (VERIMAG Grenoble, FR); Maria Gradinariu (University Paris 6, FR); Sebastien Tixeuil (Univ. Pierre & Marie Curie, FR)

Self-stabilizing minimum-degree spanning tree within one from the optimal degree
Lelia Blin (IBISC-University of Evry Val d'Essones, FR); Maria Gradinariu (University Paris 6, FR); Stephane Rovedakis (Université d'Evry (Laboratoire IBISC), FR)

A snap-stabilizing point-to-point communication protocol in message-switched networks
Alain Cournier (Université de Picardie Jules Verne, FR); Swan Dubois (Université Pierre et Marie Curie, INRIA Rocquencourt, FR); Vincent Villain (University of Picardie Jules Verne, FR)

An Asynchronous Leader Election Algorithm for Dynamic Networks
Jennifer Welch (Texas A&M University, US); Jennifer Walter (Vassar College, US)

Applications - Scientific Applications

Chair: Kuang Jin Oh

A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations
Ken-ichi Nomura (University of Southern California, US); Richard Seymour (University of Southern California, US); Weiqiang Wang (University of Southern California, US); Rajiv Kalia (University of Southern California, US); Aiichiro Nakano (University of Southern California, US); Priya Vashishta (University of Southern California, US); Fuyuki Shimojo (Kumamoto University, JP); Lin Yang (Lawrence Livermore National Laboratory, US)

Scaling Challenges for Massively Parallel AMR Applications
Brian Van Straalen (Lawrence Berkeley National Laboratory, US); Terry Ligocki (Lawrence Berkeley National Laboratory, US); John Shalf (Lawrence Berkeley National Laboratory, US); Noel Keen (Lawrence Berkeley National Laboratory, US); Woo-Sun Yang (Cray Inc., US)

Parallel Accelerated Cartesian Expansions for Particle Dynamics Simulations
Melapudi Vikram (Michigan State University, US); Andrew Baczewski (Michigan State University, US); B. Shanker (Michigan State University, US); Srinivas Aluru (Iowa State University, US)

Parallel Implementation of Irregular Terrain Model on IBM Cell Broadband Engine
Yang Song (University of Arizona, US); Jeffrey Rudin (Mercury Computer Systems, US); Ali Akoglu (University of Arizona, US)

Software - Communications Systems

Chair: Jeff Hollingsworth

Phaser Accumulators: a New Reduction Construct for Dynamic Parallelism
Jun Shirako (Rice University, US); David Peixotto (Rice University, US); Vivek Sarkar (Rice University, US); William Scherer III (Rice University, US)

NewMadeleine: An Efficient Support for High-Performance Networks in MPICH2
Guillaume Mercier (INRIA-Labri, Université Bordeaux 1, FR); Francois Trahay (Université Bordeaux 1, FR); Darius Buntinas (Argonne National Laboratory, US); Elisabeth Brunet (Université Bordeaux 1, FR)

Scaling Communication Intensive Applications on BlueGene/P Using One-Sided Communication and Overlap
Rajesh Nishtala (UC Berkeley, US); Paul Hargrove (Lawrence Berkeley National Laboratory, US); Dan Bonachea (UC Berkeley, US); Katherine Yelick (UC Berkeley, US)

Dynamic High-Level Scripting in Parallel Applications
Filippo Gioachin (University of Illinois at Urbana-Champaign, US); Laxmikant Kale (University of Illinois at Urbana-Champaign, US)

Afternoon Break 4:00 PM - 4:30 PM
Parallel Sessions
24 & 25

4:30 PM -
6:30 PM

Algorithms - Network Algorithms

Chair: Leszek Gasieniec

Map Construction and Exploration by Mobile Agents Scattered in a Dangerous Network
Paola Flocchini (University of Ottawa, CA); Matthew Kellett (Defence R&D Canada - Ottawa, CA); Peter Mason (Defence Research & Development Canada, CA); Nicola Santoro (Carleton University, CA)

A General Approach to Toroidal Mesh Decontamination with Local Immunity
Fabrizio Luccio (University of Pisa, IT); Linda Pagli (Universita' di Pisa, IT)

On the Tradeoff Between Playback Delay and Buffer Space in Streaming
Alix L. H. Chow (University of Southern California, US); Leana Golubchik (USC, US); Samir Khuller (University of Maryland at College Park, US); Yuan Yao (University of Southern California, US)

Applications - Sorting and FFTs

Chair: Terry Ligocki

A Performance Model for Fast Fourier Transform on Multi-core Architecture
Yan Li (IBM China Research Lab, CN); Li Zhao (Academy of Mathematicalalals and Systems Science, Chinese Academy of Science, CN); Haibo Lin (IBM China Research Lab, CN); Alex Chow (IBM Corp., US); Jeffery R. Diamond (IBM, US)

Designing Efficient Sorting Algorithms for Manycore GPUs
Nadathur Satish (University of California, Berkeley, US); Mark Harris (NVIDIA, AU); Michael Garland (NVIDIA Corporation, US)

Minimizing Startup Costs for Performance-Critical Threading
Clint Whaley (University of Texas at San Antonio, US); Anthony Castaldo (University of Texas at San Antonio, US)

TCPP Membership Meeting & Reception
6:30 PM -
8:30 PM

TCPP Invited Speaker


Michael Garland
Topic: Parallel Computing on Manycore GPUs

Read the abstract for this talk

FRIDAY - 29 May 2009
all day*

* See each individual workshop programs for schedule details

PDSEC Workshop on Parallel and Distributed Scientific and Engineering Computing
PMEO Performance Modeling, Evaluation, and Optimisation of Ubiquitous Computing and Networked Systems
DPDNS Dependable Parallel, Distributed and Network-Centric Systems
SSN International Workshop on Security in Systems and Networks
HOTP2P International Workshop on Hot Topics in Peer-to-Peer Systems
PCGRID Workshop on Large-Scale, Volatile Desktop Grids
MTAAP Workshop on Multi-Threaded Architectures and Applications
PDCoF Workshop on Parallel and Distributed Computing in Finance
LSPP Workshop on Large-Scale Parallel Processing
JSSPP Workshop on Job Scheduling Strategies for Parallel Processing

Monday, May 25, 2009, 6:30 PM

John Goodhue, CTO

SiCortex High-Productivity, Low-Power Computers

In order to work efficiently, clusters for high performance computing require a balance between the compute, memory, inter-node communication, and I/O. Fast communications among one thousand multicore nodes requires short wire paths and power-efficient CPUs tightly integrated with memory, communication, and I/O controllers. The tutorial describes the characteristics of a six thousand core cluster that puts all of these elements on a single chip, dramatically reducing cost and power consumption while increasing reliability and performance compared to commodity clusters.

Mr. Goodhue has worked in the computer and communications technology business for more than 25 years. He began his career as an engineer at BBN, where he held positions ranging from software engineer to Vice President Engineering in both its high performance computing businesses. Mr. Goodhue is also co-founder of several startup businesses, including Dash Strauss and Goodhue, a compliance consulting firm, and Lightstream Inc., a networking company that was acquired by Cisco Systems in 1995. At Cisco, John served as Director of Engineering in the ATM and Core Router business units and as General Manager of Cisco's broadband aggregation business unit. Mr. Goodhue has a B.S. in Computer Science from Massachusetts Institute of Technology (MIT).


Tuesday, May 26, 2009, 8:30 AM - 9:30 AM

Wen-mei W. Hwu, University of Illinois at Urbana-Champaign

Many-core Parallel Computing – Can compilers and tools do the heavy lifting?

Modern GPUs such as the NVIDIA GeForce GTX280, ATI Radeon 4860, and the upcoming Intel Larrabee are massively parallel, many-core processors. Today, application developers for these many-core chips are reporting 10X-100X speedup over sequential code on traditional microprocessors. According to the semiconductor industry roadmap, these processors could scale up to over 1,000X speedup over single cores by the end of the year 2016. Such a dramatic performance difference between parallel and sequential execution will motivate an increasing number of developers to parallelize their applications. Today, an application programmer has to understand the desirable parallel programming idioms, manually work around potential hardware performance pitfalls, and restructure their application design in order to achieve their performance objectives on many-core processors. Although many researchers have given up on parallelizing compilers, I will show evidence that by systematically incorporating high-level application design knowledge into the source code, a new generation of compilers and tools can take over the heavy lifting in developing and tuning parallel applications. I will also discuss roadblocks whose removal will require innovations from the entire research community.

Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, and software for high performance computer systems. He is the director of the IMPACT research group ( For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the Tau Beta Pi Daniel C. Drucker Eminent Faculty Award, and ISCA Most Influential Paper Award. He is a fellow of IEEE and ACM. Hwu serves on the Executive Committee of the MARCO/DARPA C2S2 ( and GSRC ( Focus Research Centers. He leads the GSRC Concurrent Systems Theme. He co-directs the new $18M UIUC Intel/Microsoft Universal Parallel Computing Research Center with Marc Snir and serves as one of the principal investigators of the $208M NSF Blue Waters Petascale computer project. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.

Tuesday, May 26, 2009, 6:30 PM

Tools for Scalable Performance Analysis on Petascale Systems

I-Hsin Chung, S.R. Seelam (IBM T.J. Watson, USA)
B. Mohr (Research Center Juelich, Germany)
J. Labarta (UPC Barcelona, Spain)

Tools are becoming increasingly important to efficiently utilize the computing power available in contemporary large scale systems. The drastic increase in the size and complexity of systems requires tools to be scalable while producing meaningful and easily digestible information that may help the user pin-point problems at scale. The goal of this tutorial is to introduce some state-of-the-art performance tools from three different organizations to a diverse audience group. Together these tools provide a broad spectrum of capabilities necessary to analyze the performance of scientific and engineering applications on a variety of large and small scale systems. The tutorial will consist of one-hour presentations on three tools:

Presenters will provide demonstrations of real-world examples and based on available hardware will allow users to gain hands on experience with demonstration codes on large scale systems. This tutorial should have broad appeal to a large community as its content is suited for performance analysis on small scale server systems as well as Petascale systems.


Wednesday, May 27, 2009, 8:30 AM - 9:30 AM

Nir Shavit
Computer Science Department
Tel-Aviv University, Israel

Software Transactional Memory: Where do we come from? What are we?
Where are we going?"

The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. Combining sequences of concurrent operations into atomic transactions seems to promise a great reduction in the complexity of both programming and verification, by making parts of the code appear to be sequential without the need to program fine-grained locks. Software transactional memory offers to deliver a transactional programming environment without the need for costly modifications in processor design. However, the story of software transactional memory reminds one of garbage collection in its time: performance is improving, and the semantics are becoming clearer, yet there is still a long road ahead, a road strewn with stones below and crows hovering above, predicting its demise. This talk will try to take a sober look at software transactional memory, its history, the state of research today, and what we can expect to achieve it in the foreseeable future.

Nir Shavit received a B.A. and M.Sc. from the Technion and a Ph.D. from the Hebrew University, all in Computer Science. He was a Postdoctoral Researcher at IBM Almaden Research Center, Stanford University, and MIT, and a Visiting Professor at MIT. He joined the computer science department at Tel-Aviv university in 1992 and was at various time a Member of Technical Staff at Sun Microsystems Laboratories. Prof. Shavit is the recipient of the Israeli Industry Research Prize in 1993 and the ACM/EATCS Goedel Prize in Theoretical Computer Science in 2004. His research interests include software aspects of Multiprocessor Synchronization, the design and implementation of Concurrent Data-Structures, and the Theoretical Foundations of Asynchronous Computability. He designed (together with his students) the first Software Transactional Memory system, and has been involved in the design of several of today's state of the art STMs.


Wednesday, May 27, 2009 4:30 PM - 6:300 PM

How to Build a Useful Thousand-Core System?

Josep Torrellas, University of Illinois, Urbana-Champaign

• Laxmikant Kale, University of Illinois at Urbana-Champaign
• Jesus Labarta, Supercomputing Center, Universitat Politecnica de Catalunya, Barcelona
• Keshav Pingali, University of Texas at Austin
• Per Stenstrom, Chalmers University

Current hardware roadmaps call for doubling the number of on-chip cores approximately every two years. If this trend materializes, in at most a decade and a half, we will reach one thousand cores. This scenario has mind-boggling consequences for the IPDPS research community. There are many questions to answer. For example, at the architecture level, how are we going to power these chips and provide the required bandwidth? At the software level, how are we going to manage possibly-heterogeneous resources with low overhead, efficiently compile for these machines, and provide programmer-friendly programming models? At the application level, what kinds of applications and algorithms will we use? This panel will provide an opportunity for the conference attendees to discuss all of these topics.


Thursday, May 28, 2009, 8:30 AM - 9:30 AM

Leonid Oliker
Computational Research Division
Lawrence Berkeley National Laboratory, Berkeley, USA

Green Flash: Designing an energy efficient climate supercomputer

It is clear from both the cooling demands and the electricity costs, that the growth in scientific computing capabilities of the last few decades is not sustainable unless fundamentally new ideas are brought to bear. In this talk we propose a novel approach to supercomputing design that leverages the sophisticated tool chains of the consumer electronics marketplace. We analyze our framework in the context of high-resolution global climate change simulations – an application with multi-trillion dollar ramifications to the world economies. A key aspect of our methodology is hardware-software co-tuning, which utilizes fast and accurate FPGA-based architectural emulation. This enables the design of future exaflop-class supercomputing systems to be defined by scientific requirements instead of constraining science to the machine configurations. Our talk will provide detailed design requirements for a kilometer-scale global cloud system resolving climate models and point the way toward Green Flash: an application-targeted exascale machine that could be efficiently implemented using mainstream embedded design processes. Overall, we believe that our proposed approach can provide a quantum leap in hardware and energy utilization, and may significantly impact the design of the next generation of HPC systems.

Lenny Oliker is a Computer Scientist in the Future Technologies Group at Lawrence Berkeley National Laboratory. He received bachelor degrees in Computer Engineering and Finance from the University of Pennsylvania, and performed both his doctoral and postdoctoral work at NASA Ames research center. Lenny has co-authored over 60 technical articles, and has received four best paper awards, including IPDPS 2007 and 2008. His research interests include HPC characterization, multi-core auto-tuning, and power-efficient computing.


THURSDAY TCPP Invited Speaker
Thursday, May 28, 2009, 6:30 PM - 8:30 PM

Invited Speaker:
Michael Garland

Parallel Computing on Manycore GPUs

The ongoing evolution of single-core sequential processors into manycore parallel processors is the most significant trend in modern chip architecture. Parallelism, rather than improved single-thread performance, has become the primary force driving higher computational throughput. At the leading edge of this class of massively parallel chip architectures is the modern GPU (graphics processing unit). Modern NVIDIA GPUs are fully programmable processors, delivering a peak computational throughput of up to 1 TFLOPS across 30K co-resident threads, which is a level of parallel computation that was once the preserve of supercomputers. Programming such massively parallel processors presents many interesting challenges. In this talk, I will explore the essential architectural characteristics of manycore processors in general, and the GPU in particular. I will introduce CUDA, NVIDIA's architecture for scalable parallel programming. Finally, I will examine the impact these architectures have on algorithm design, sketching some techniques for implementing common parallel algorithms for CUDA-capable processors.

Michael Garland is a research scientist at NVIDIA. Dr. Garland holds B.S. and Ph.D. degrees in Computer Science from Carnegie Mellon University, and is an adjunct professor in the Department of Computer Science of the University of Illinois at Urbana-Champaign. He has published numerous articles in leading conferences and journals on a range of topics including surface simplification, remeshing, texture synthesis, novice-friendly modeling, free-form animation, scientific visualization, graph mining, and visualizing complex graphs. His current research interests include computer graphics and visualization, geometric algorithms, and parallel algorithms and programming models.