IEEE International Parallel & Distributed Processing Symposium

• Home
• About IPDPS
• Conference Archive
• Proceedings Library
• Steering Committee
• Contact IPDPS

IPDPS 2010

• Advance Program
• Registration Details
• Hotel Information
• Workshops
• PhD Forum
• Symposium Tutorials
• Commercial Participation
• Organization
• Location & Travel Tips
• Call For Papers (closed)
• Author Resources (closed)

Hosted By

Patrons

IPDPS 2010 Advance Program

Please visit the IPDPS website regularly for updates, since there may be schedule revisions. Authors who have corrections, contact info@ipdps.org. Note that paper numbers are listed for easy reference.

Abstracts of Contributed Papers
Abstracts for regular conference papers have been compiled to allow authors to check accuracy and so that visitors to this Website may preview the papers to be presented at the conference. Abstracts for all workshops and the PhD Forum will be posted to their respective Web page by the end of February. Full proceedings of the conference will be published on a cdrom pocketed in a program book to be distributed to registrants at the conference.

Click here to view contributed paper abstracts in advance (pdf)

MONDAY - 19 April 2010

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

WORKSHOPS
all day*

* See each individual workshop programs for schedule details

1	HCW	Heterogeneity in Computing Workshop
2	RAW	Reconfigurable Architectures Workshop
3	HIPS	Workshop on High-Level Parallel Programming Models & Supportive Environments
4	NIDISC	Workshop on Nature Inspired Distributed Computing
5	HiCOMB	Workshop on High Performance Computational Biology
6	APDCM	Advances in Parallel and Distributed Computing Models
7	CAC	Communication Architecture for Clusters
8	HPPAC	High-Performance, Power-Aware Computing
9	HPGC	High Performance Grid Computing
10	SMTPS	Workshop on System Management Techniques, Processes, and Services

TCPP Reception, Meeting & Invited Talk
6:00 PM

IEEE Computer Society Technical Committee on Parallel Processing

Membership Meeting
TCPP Chair: Sushil K. Prasad

7:00 PM Invited Talk
Craig Stunkel, IBM T. J. Watson Research Center
Title: Exascale: Parallelism Gone Wild!

Read more information

TUESDAY - 20 April 2010

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Opening Session
8:00 AM -
8:30 AM

Opening Session
Chair: David Bader

Keynote Session
8:30 AM -
9:30 AM

Chair: Andrew Lumsdaine

Keynote Speech: Operating System Resource Management
Speaker: Burton Smith
Technical Fellow, Microsoft Corporation

Abstract: Resource management is the dynamic allocation and de-allocation by an operating system of processor cores, memory pages, and various types of bandwidth to computations that compete for those resources. The objective is to allocate resources so as to optimize responsiveness subject to the finite resources available. Historically, resource management solutions have been relatively unsystematic, and now the very assumptions underlying the traditional strategies fail to hold. First, applications increasingly differ in their ability to exploit resources, especially processor cores. Second, application responsiveness is approximately two-valued for "Quality-Of-Service" (QOS) applications, depending on whether deadlines are met. Third, power and battery energy have become constrained. This talk will propose a scheme for addressing the operating system resource management problem.

Read more information

Morning Break 9:30 AM - 10:00 AM

Commercial Book Exhibits - All Day

Parallel Technical Sessions
1, 2, 3, 4 & 5
10:00 AM -
12:00 PM

Session 1
Algorithms for Network Management
Chair: Anne Benoit

Distributed Advance Network Reservation with Delay Guarantees
Niloofar Fazlollahi (Boston University, USA); David Starobinski (Boston University, USA)

A General Algorithm for Detecting Faults under the Comparison Diagnosis Model
Iain A Stewart (Durham University, UK)

On the Importance of Bandwidth Control Mechanisms for Scheduling on Large Scale Heterogeneous Platforms
Olivier Beaumont (INRIA, FR); Hejer Rejeb (LaBRI-INRIA, FR)

Broadcasting on Large Scale Heterogeneous Platforms under the Bounded Multi-Port Model
Olivier Beaumont (INRIA, FR); Lionel Eyraud-Dubois (INRIA Bordeaux Sud-Ouest, FR); Shailesh Kumar Agrawal (INRIA, FR)

Session 2
Scientific Computing with GPUs
Chair: Ling Zhou

Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs
Michela Taufer (University of Delaware, US); Philip Saponaro (University of Delaware, US); Omar Padron (Kean University, US); Sandeep Patel (University of Delaware, US)

Implementing the Himeno Benchmark with CUDA on GPU Clusters
Everett Phillips (NVIDIA, US); Massimiliano Fatica (NVIDIA, US)

Direct Self-Consistent Field Computations on GPU Clusters
Guochun Shi, Volodymyr Kindratenko (National Center for Supercomputing Applications, US); Ivan Ufimtsev, Todd Martinez (Stanford University, US)

Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs
Lifan Xu (University of Delaware, US); Michela Taufer (University of Delaware, US); Stuart Collins (University of Delaware, US); Dionisios Vlacho (University of Delaware, US)

Session 3
Data Storage and Memory Systems
Chair: Bradley Kuszmaul

DEBAR: A Scalable High-Performance De-duplication Storage System for Backup and Archiving
Tianming Yang (Huazhong University of Science and Technology, PRC); Hong Jiang (University of Nebraska, US); Dan Feng (Huazhong University of Science and Technology, PRC); Zhongying Niu (Huazhong University of Science and Technology, PRC); Ke Zhou (Huazhong University of Science and Technology, PRC); Yaping Wan (Huazhong University of Science and Technology, PRC)

HPDA: A Hybrid Parity-based Disk Array for Enhanced Performance and Reliability
Bo Mao (Huazhong University of Science and Technology, PRC); Hong Jiang (University of Nebraska, US); Dan Feng (Huazhong University of Science and Technology, PRC); Suzhen Wu (Huazhong University of Science and Technology, PRC); Jianxi Chen (Huazhong University of Science and Technology, PRC); Lingfang Zeng (Huazhong University of Science and Technology, PRC); Lei Tian (Huazhong University of Science and Technology, PRC)

Fine-Grained QoS Scheduling for PCM-based Main Memory Systems
Ping Zhou (University of Pittsburgh, US); Yu Du (University of Pittsburgh, US); Youtao Zhang (University of Pittsburgh, US); Jun Yang (University of Pittsburgh, US)

Performance Impact of Resource Contention in Multicore Systems
Robert Hood (CSC-NASA Ames, US); Haoqiang Jin (NASA Ames Research Center, US); Piyush Mehrotra (NASA Ames Research Center, US); Johnny Chang (CSC-NASA Ames Research Center, US); Jahed Djomehri (NASA Ames Research Center, US); Sharad Gavali (NASA Ames Research Center, US); Dennis Jespersen (NASA Ames Research Center, US); Kenichi Taylor (Silicon Graphics International, US); and Rupak Biswas (NASA Ames Research Center, US)

Session 4
Fault Tolerance
Chair: Almadena Chtchelkanova

Improving the Performance of Hypervisor-Based Fault Tolerance
Jun Zhu (Peking University, PRC); Wei Dong (Peking University, PRC); ZheFu Jiang (Peking University, PRC); Xiaogang Shi (Peking University, PRC); Zhen Xiao (Peking University, PRC); XiaoMing Li (Peking University, PRC)

Supporting Fault Tolerance in a Data-Intensive Computing Middleware
Tekin Bicer (The Ohio State University, US); Wei Jiang (The Ohio State University, US); Gagan Agrawal (The Ohio State University, US)

A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs
Naoya Maruyama (Tokyo Institute of Technology, JPN); Akira Nukada (Tokyo Institute of Technology, JPN); Satoshi Matsuoka (Tokyo Institute of Technology, JPN)

Scalable Failure Recovery for High-performance Data Aggregation
Dorian Arnold (University of New Mexico, US); Barton Miller (University of Wisconsin, US)

Session 5
Sorting
Chair: George Biros

High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs
Xiaochun Ye (Chinese Academy of Sciences, PRC); Dongrui Fan (Chinese Academy of Sciences, PRC); Wei Lin (Chinese Academy of Sciences, PRC); Nan Yuan (Chinese Academy of Sciences, PRC); Paolo Ienne (EPFL, Switzerland)

GPU Sample Sort
Vitaly Osipov (Karlsruhe Institute of Technology, Germany); Peter Sanders (University of Karlsruhe, Germany); Nikolaj Leischner (University of Karlsruhe, Germany)

Highly Scalable Parallel Sorting
Edgar Solomonik (University of Illinois at Urbana-Champaign, US); Laxmikant Kale (University of Illinois at Urbana-Champaign, US)

Lunch 12 Noon – 1:30 PM (on your own)

PhD Forum Posters
12 Noon
(on display until Wednesday evening)

PhD Forum Posters
Posters will be on display from noon Tuesday to the end-of-day on Wednesday. See PhD Forum page for list of student authors.

Parallel Technical Sessions
6, 7, 8, 9 & 10
1:30 PM -
3:30 PM

Session 6
Scheduling
Chair: David Bunde

A Scheduling Framework for Large-Scale, Parallel, and Topology-Aware Applications
Valentin Kravtsov (Technion - Israel Institute of Technology, Israel); Pavel Bar (Technion - Israel Institute of Technology, Israel); David Carmeli (Technion - Israel Institute of Technology, Israel); Assaf Schuster (Technion - Israel Institute of Technology, Israel); Martin Swain (Technion - Israel Institute of Technology, Israel);

Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors
Risat Pathan (Chalmers University of Technology, Sweden); Jan Jonsson (Chalmers University of Technology, Sweden)

Scheduling Algorithms for Linear Workflow Optimization
Kunal Agrawal (Washington University in St. Louis, US); Anne Benoit (Ecole Normale Superieure de Lyon Lyon, FR); Loic Magnan (Ecole Normale Superieure de Lyon Lyon, FR); Yves Robert (Ecole Normale Superieure de Lyon, FR)

Hypergraph-Based Task-Bundle Scheduling Towards Efficiency and Fairness in Heterogeneous Distributed Systems
Han Zhao (Oklahoma State University, US); Xinxin Liu (Oklahoma State University, US); Xiaolin (Andy) Li (Oklahoma State University, US)

Session 7
Performance/Scalability Improvement for Scientific Applications
Chair: Srinivas Aluru

Improving the Performance of Uintah: A Large-Scale Adaptive Meshing Computational Framework
Justin Luitjens (University of Utah, US); Martin Berzins (University of Utah, US)

Optimizing and Tuning the Fast Multipole Method for State-of-the-Art Multicore Architectures
Aparna Chandramowlishwaran (Georgia Institute of Technology, US); Samuel W Williams (Lawrence Berkeley National Laboratory, US); Leonid Oliker (Lawrence Berkeley National Laboratory, US); Ilya Lashuk (Georgia Institute of Technology, US); George Biros (Georgia Institute of Technology, US); Richard Vuduc (Georgia Institute of Technology, US)

Parallelization of DQMC Simulation for Strongly Correlated Electron Systems
Che-Rung Lee (National Tsing Hua University, Taiwan);
I-Hsin Chung (IBM T.J. Watson Research Center, US); Zhaojun Bai (University of California, Davis, US)

Parallel I/O Performance: From Events to Ensembles
Andrew Uselton (Lawrence Berkeley National Laboratory, US); Mark Hawison (Lawrence Berkeley National Laboratory, US); Nicholas J. Wright (Lawrence Berkeley National Laboratory, US); David Skinner (Lawrence Berkeley National Laboratory, US); Noel Keen (Lawrence Berkeley National Laboratory, US); John Shalf (Lawrence Berkeley National Laboratory, US); Karen L Karavanic (Portland State University, US); Leonid Oliker (Lawrence Berkeley National Laboratory, US)

Session 8
Network Architecture and Algorithms
Chair: Neeraj Mittal

Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup
Deng Pan (Florida International University, US); Kia Makki (Florida International University, US); Niki Pissinou (Florida International University, US)

Distributive Waveband Assignment in Multi-granular Optical Networks
Yang Wang (Georgia State University, US); Xiaojun Cao (Georgia State University, US)

QoS Aware BiNoC Architecture
Shih-Hsin Lo (National Taiwan University, Taiwan); Ying-Cherng Lan (National Taiwan University, Taiwan); Hsin-Hsien Yeh (National Taiwan University, Taiwan); Wen-Chung Tsai (National Taiwan University, Taiwan); Yu Hen Hu (National Taiwan University, Taiwan); Sao-Jie Chen (National Taiwan University, Taiwan)

First Experiences with Congestion Control in InfiniBand Hardware
Ernst Gran (Simula Research Laboratory, Norway); Magne Eimot (Simula Research Laboratory, Norway); Sven-Arne Reinemo (Simula Research Laboratory, Norway); Tor Skeie (Simula Research Laboratory, Norway); Olav Lysne (Simula Research Laboratory, Norway); Lars Paul Huse (Simula Research Laboratory, Norway)

Session 9
Software Support for Using GPUs
Chair: Anne Elster

Object-Oriented Stream Programming using Aspects
Mingliang Wang (Rutgers University, US); Manish Parashar (Rutgers University, US)

Optimal Loop Unrolling for GPGPU Programs
Giridhar Sreenivasa Murthy (The Ohio State University, US); Muthu Ravishankar (The Ohio State University, US); Muthu Manikandan Baskaran (The Ohio State University, US); Ponnuswamy Sadayappan (The Ohio State University, US);

Speculative Execution on Multi-GPU Systems
Gregory Diamos (Georgia Institute of Technology, US); Sudakhar Yalamanchili (Georgia Institute of Technology, US)

Dynamic Load Balancing on Single- and Multi-GPU Systems
Long Chen (University of Delaware, US); Oreste Villa (Pacific Northwest National Laboratory, US); Sriram Krishnamoorthy (Pacific Northwest National Laboratory, US); Guang Gao (University of Delaware, US)

Session 10
Performance Prediction and Benchmarking Tools
Chair: George Bosilca

Servet: A Benchmark Suite for Autotuning on Multicore Clusters
Jorge González-Domínguez (University of A Coruna, Spain); Guillermo Lopez Taboada (University of A Coruna, Spain); Basilio Fraguela (University of A Coruna, Spain); María J. Martín (University of A Coruna, Spain); Juan Tourino (University of A Coruna, Spain);

KRASH: Reproducible CPU Load Generation on Many-Cores Machines
Swann Perarnau (INRIA Moais Research Team, FR); Guillaume Huard (ID Laboratory, FR)

Power-aware MPI Task Aggregation Prediction for High-End Computing Systems
Dong Li (Virginia Tech, US); Dimitrios Nikolopoulos (Foundation of Research and Technology Hellas, Greece); Kirk Cameron (Virginia Tech, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Technical Sessions
11, 12, 13, 14 & 15
4:00 PM -
6:00 PM

Session 11
Resource Allocation
Chair: Anne Benoit

Varying Bandwidth Resource Allocation Problem with Bag Constraints
Venkatesan Chakaravarthy (IBM Research, India); Vinayaka Pandit (IBM Research, India); Yogish Sabharwal (IBM Research, India); Deva Seetharam (IBM Research, India)

Decentralized Resource Management for Multi-core Desktop Grids
Jaehwan Lee (University of Maryland, College Park, US); Pete Keleher (University of Maryland, US); Alan Sussman (University of Maryland, US)

Dynamic Fractional Resource Scheduling for HPC Workloads
Mark Lee Stillwell (University of Hawaii at Manoa, US); Frédéric Vivien (INRIA, FR); Henri Casanova (University of Hawaii at Manoa)

ADEPT Scalability Predictor in Support of Adaptive Resource Allocation
Arash Deshmeh (University of Windsor, Canada); Jacob Machina (University of Windsor, Canada); Angela Sodan (University of Windsor, Canada)

Session 12
Image Processing and Data Mining
Chair: David Konerding

Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution
Jiayuan Meng (University of Virginia); Anand Raghunathan (NEC Research Labs, US); Srimat Chakradhar (NEC Research Labs, US); Surendra Byna (NEC Research Labs, US)

Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms
Konstantis Daloukas (University of Thessaly, Greece); Christos Antonopoulos (University of Thessaly, Greece); Nikos Bellas (University of Thessaly, Greece); Sek Chai (Motorola, US)

Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
Yongpeng Zhang (North Carolina State University, US); Frank Mueller (North Carolina State University, US); Xiaohui Cui (Oak Ridge National Laboratory, US); Thomas Potok (Oak Ridge National Laboratory, US)

eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in Windows Azure Platform
Jie Li (University of Virginia, US); Deb Agarwal (Lawrence Berkeley National Laboratory, US); Marty Humphrey (University of Virginia, Charlottesville, US); Catharine van Ingen (Microsoft Research); Keith Jackson (Lawrence Berkeley National Laboratory, US); Youngryel Ryu (University of California at Berkeley, US)

Session 13
Transactional Memory
Chair: Anne Elster

Locality-Aware Adaptive Grain Signatures for Transactional Memories
Woojin Choi (University of Southern California, US); Jeffrey Draper (University of Southern California, US)

Dynamic Analysis of the Relay Cache-Coherence Protocol for Distributed Transactional Memory
Bo Zhang (Virginia Tech, US); Binoy Ravindran (Virginia Tech, US)

Runtime Checking of Serializability in Software Transactional Memory
Arnab Sinha (Princeton University, US); Sharad Malik (Princeton University, US)

Consistency in Hindsight, A Fully Decentralized STM Algorithm
Annette Bieniusa (University of Freiburg, US); Thomas Fuhrmann (Technische Universitat Munchen, Germany)

Session 14
Tools for Performance and Correctness Analysis
Chair: Almadena Chtchelkanova

Identifying Ad-hoc Synchronization for Enhanced Race Detection
Ali Jannesari (University of Karlsruhe, Germany); Water F. Tichy (University of Karlsruhe, Germany)

Improving the Performance of Program Monitors with Compiler Support in Multi-Core Environment
Guojin He (University of Minnesota, US); Antonia Zhai (University of Minnesota, US)

On-Line Detection of Large-Scale Parallel Application's Structure
German Llort (Barcelona Supercomputing Center, Spain); Juan Gonzalez Garcia (Universitat Politècnica de Catalunya, Spain); Harald Servat (Barcelona Supercomputing Center, Spain); Judit Gimenez (Barcelona Supercomputing Center, Spain); Jesus Labarta (Barcelona Supercomputing Center, Spain)

Adaptive Sampling-Based Profiling Techniques for Optimizing the Distributed JVM Runtime
King Tim Lam (The University of Hong Kong, Hong Kong); Yang Luo (The University of Hong Kong, Hong Kong); Cho-Li Wang (The University of Hong Kong, Hong Kong)

Session 15
Parallel Linear Algebra I
Chair: Esmond Ng

Algorithmic Cholesky Factorization Fault Recovery
Douglas Hakkarinen (Colorado School of Mines, US); Zizhong Chen (Colorado School of Mines, US)

Analyzing the Soft-Error Resiliance of Linear Solvers on Multicore Multiprocessors
Konrad Malkowski (The Pennsylvania State University); Padma Raghavan (The Pennsylvania State University); Mahmut Taylan Kandemir (The Pennsylvania State University)

A Parallel Architecture for Meaning Comparison
Suneil Mohan (Texas A&M University, US); Amitava Biswas (Texas A&M University, US); Aalap Tripathy (Texas A&M University, US); Jagannath Panigraphy (Texas A&M University, US); Rabi Mahapatra (Texas A&M University, US)

IEEE-TPDS Metting
6:00 PM - 7:00 PM

IEEE Transactions on Parallel and Distributed Systems
All IPDPS attendees are invited to attend this open meeting
Hosted by: Ivan Stojmenovic, University of Ottawa, Canada
Editor-in-Chief, IEEE TPDS

All Symposium Tutorial
7:00 PM -
10:00 PM

MapReduce Programming with Apache Hadoop
Presenter: Milind Bhandarkar
Yahoo! Inc. (Hadoop Solutions Architect)
Read more information

WEDNESDAY - 21 April 2010

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Keynote Session
8:30 AM -
9:30 AM

Chair: Bradley Kuszmaul

Keynote Speech: Chip Multiprocessor Architecture: A Programmability-Driven Approach
Speaker: Kunle Olukotun
Stanford University

Abstract: Chip multiprocessors (CMPs) are now the dominant architecture in microprocessor design. However, in many software environments, due to the difficulty of writing correct and high performing parallel programs, the capability of CMPs is underutilized. In this talk, I will argue that to enable more parallel programs to be written more easily, the design of CMPs should be driven by the needs of programmability. To demonstrate the benefits of this approach, I will describe example CMP designs where a focus on programmability has resulted in a simpler programming model and excellent performance. I will also describe a programming environment where programmability can be used to drive the development of future CMP architectures.

Read more information

Morning Break 9:30 AM - 10:00 AM

Commercial Book Exhibits - All Day

Plenary Session- Best Papers
10:00 AM -
12:00 PM

Plenary Session - Best Papers
Chair: Cynthia Phillips

Extreme Scale Computing: Modeling the Impact of System Noise in Multicore Clustered Systems
Seetharami R Seelam (IBM Research, US); Liana Fong (IBM T.J. Watson Research Center, US); Asser Tantawi (IBM T.J. Watson Research Center, US); John Lewars (IBM Systems and Technology Group, US); John Divirgilio (IBM, US); Kevin Gildea (IBM, US)

Oblivious Algorithms for Multicores and Network of Processors
Rezaul Chowdhury (University of Texas at Austin, US); Francesco Silvestri (University of Padova, Italy); Brandon Blakeley (University of Texas, US); Vijaya Ramachandran (University of Texas at Austin, US)

Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P
Wei Tang (Illinois Institute of Technology, US); Narayan Desai (Argonne National Laboratory, US), Daniel Buettner (Argonne National Laboratory, US); Zhiling Lan (Illinois Instititue of Technology, US)

Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems
Aparna Chandramowlishwaran (Georgia Institute of Technology, US); Kathleen Knobe (Intel, US); Richard W. Vuduc (Georgia Institute of Technology, US)

Lunch 12 Noon – 1:30 PM (on your own)

Parallel Sessions
16, 17, 18 & 19
1:30 PM -
3:30 PM

Session 16
P2P Algorithms
Chair: Amitabha Bagchi

A Hybrid Interest Management Mechanism for Peer-to-Peer Networked Virtual Environments
Ke Pan (Nanyang Technological University, Singapore); Wentong Cai (Nanyang Technological University, Singapore); Xueyan Tang (Nanyang Technological University, Singapore); Suiping Zhou (Nanyang Technological University, Singapore); Stephen John Turner (Nanyang Technological University, Singapore)

Attack-Resistant Frequency Counting
Bo Wu (University of New Mexico, US); Valerie King (University of Victoria, Canada); Jared Saia (University of New Mexico, US)

Overlays with preferences: Approximation algorithms for matching with preference lists
Giorgos Georgiadis (Chalmers University of Technology, Sweden); Marina Papatriantafilou (Chalmers University of Technology, Sweden)

Analysis of Durability in Replicated Distributed Storage Systems
Joseph Pasquale (University of California, San Diego, US); Sriram Ramabhadran (University of California, San Diego, US)

Session 17
Parallel Solutions for String and Sequence Problems
Chair: Ruppa Thulasiram

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching
Weirong Jiang (University of Southern California, US); Yi-Hua Yang (University of Southern California, US); Viktor K. Prasanna (University of Southern California, US)

Head-Body Partitioned String Matching for Deep Packet Inspection with Scalable and Attack-Resilient Performance
Yi-Hua Yang (University of Southern California, US); Viktor K. Prasanna (University of Southern California, US); Chenqian Jiang (University of Southern California, US)

Parallel de novo Assembly of Large Genomes from High-Throughput Short Reads
Benjamin G. Jackson (AOL, US); Matthew Regennitter (Iowa State University, US); Xiao Yang (Iowa State University, US); Patrick Schnable (Iowa State University, US); Srinivas Aluru (Iowa State University, US)

Efficient Parallel Algorithms for Maximum-Density Segment Problem
Xue Wang (Georgia State University, US); Fasheng Qiu (Georgia State University, US); Sushil Prasad (Georgia State University, US); Guantao Chen (Georgia State University, US)

Session 18
Energy-aware Task Management
Chair: David Bunde

Hybrid MPI/OpenMP Power-aware Computing
Dong Li (Virginia Tech, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US); Kirk Cameron (Virginia Tech, US); Dimitrios S. Nikolopoulos (Foundation for Research and Technology Hellas, Greece)

Performance and Energy Optimization of Concurrent Pipelined Applications
Anne Benoit (Ecole Normale Supérieure de Lyon, FR); Paul Renaud-Goud (Ecole Normale Supérieure de Lyon, FR); Yves Robert (Ecole Normale Supérieure de Lyon, FR)

Robust Control-theoretic Thermal Balancing for Server Clusters
Yong Fu (Washington University in St. Louis, US); Chenyang Lu (Washington University in St. Louis, US); Hongan Wang (Washington University in St. Louis, US)

A Simple Thermal Model for Multi-core Processors and Its Application to Slack Allocation
Zhe Wang (University of Florida, US); Sanjay Ranka (University of Florida, US)

Session 19
Parallel Operating Systems and System Software
Chair: George Bosilca

GenerOS: An Asymmetric Operating System Kernel for Multi-core Systems
Qingbo Yuan (Institute of Compute Technology, PRC); Jianbo Zhao (Institute of Compute Technology, PRC); Mingyu Chen (Institute of Compute Technology, PRC); Ninghui Sun (Institute of Compute Technology, PRC)

Palacios and Kitten: New High Performance Operating Systems for Scalable Virtualized and Native Supercomputing
John Lange (Northwestern University, US); Kevin Pedretti (Sandia National Laboratories, US); Trammell Hudson (Sandia National Laboratories, US); Peter Dinda (Northwestern University, US); Zheng Cui (University of New Mexico, US); Lei Xia (Northwestern University, US); Patrick Bridges (University of New Mexico, US); Andy Gocke (Northwestern University, US); Steven Jaconette (Northwestern University, US); Michael Levenhagen (Sandia National Laboratories, US); and Ron Brightwell (Sandia National Laboratories, US)

MMT: Exploiting Fine-Grained Parallelism in Dynamic Memory Management
Devesh Tiwari (North Carolina State University, US); Sanghoon Lee (North Carolina State University, US); James Tuck (North Carolina State University, US); Yan Solihin (North Carolina State University, US)

Optimization of Applications with Non-blocking Neighborhood Collectives via Multisends on the Blue Gene/P Supercomputer
Sameer Kumar (IBM Research, US); Philip Heidelberger (IBM Research, USA); Dong Chen (IBM Research, US); Michael Hines (IBM Research, US)

Afternoon Break 3:30 PM - 4:00 PM

Parallel Sessions 20, 21, 22, & 23
4:00 PM – 6:00 PM

Session 20
Parallel Graph Algorithms I
Chair: Cynthia Phillips

A Multi-Source Label-Correcting Algorithm for the All-Pairs Shortest Paths Problem
Hiroki Yanagisawa (IBM, Japan)

Parallel Computation of Best Connections in Public Transportation Networks
Daniel Delling (Microsoft Research, Germany); Bastian Katz (Karlsruhe Institute of Technology, Germany); Thomas Pajor (Universitat Karlsruhe)

Dynamically Tuned Push-Relabel Algorithm for the Maximum Flow Problem on CPU-GPU-Hybrid Platforms
Zhengyu He (Georgia Institute of Technology, US); Bo Hong (Georgia Institute of Technology, US)

A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis
Shuangshuang Jin (Pacific Northwest National Laboratory, US); Zhenyu Huang (Pacific Northwest National Laboratory, US); Yousu Chen (Pacific Northwest National Laboratory, US); Daniel Gerardo Chavarria (Pacific Northwest National Laboratory, US); John Feo (Pacific Northwest National Laboratory, US); Pak Wong (Pacific Northwest National Laboratory, US)

Session 21
Parallel Linear Algebra II
Chair: Esmond Ng

Adapting Communication-Avoiding LU and QR Factorizations to Multicore Architectures
Laura Grigori (INRIA, FR); Simplice Donfack (INRIA, FR); Alok Kumar Gupta (BCCS, Norway)

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment
Emmanuel Agullo (University of Tennessee, US); Camille Coti (INRIA, Saclay-Ile de France, FR); Jack Dongarra (University of Tennessee, Knoxville, US); Thomas Herault (Universite Paris Sud (LRI), FR); Julien Langou (University of Colorado Denver, US)

Tile QR Factorization with Parallel Panel Processing for Multicore Architectures
Bilel Hadri (University of Tennessee, US); Hatem Ltaief (University of Tennessee, US); Emmanuel Agullo (University of Tennessee, US); Jack Dongarra (University of Tennessee, Knoxville, US)

Linpack Evaluation on a Supercomputer with Heterogenous Accelerators
Toshio Endo (Tokyo Institute of Technology, Japan); Akira Nukada (Tokyo Institute of Technology, Japan); Satoshi Matsuoka (Tokyo Institute of Technology, Japan); Naoya Maruyama (Tokyo Institute of Technology, Japan)

Session 22
Caches and Caching
Chair: Richard Murphy

Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement Policies
Kamil Kedzierski (Technical University of Catalonia, UPC, Spain); Miquel Moreto (Universitat Politecnica de Catalunya, Spain); Francisco Cazorla (Barcelona Supercomputing Center); Mateo Valero (Technical University of Catalonia, Spain)

Exploiting Set-Level Non-Uniformity of Capacity Demand to Enhance CMP Cooperative Caching
Dongyuan Zhan (University of Nebraska at Lincoln, US); Hong Jiang (University of Nebraska at Lincoln, US); Sharad Seth (University of Nebraska at Lincoln, US)

Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene System
Seetharami Seelam (IBM T.J. Watson Research Center); I-Hsin Chung (IBM T.J. Watson Research Center); John Bauer (IBM T.J. Watson Research Center); Hui-Fang Wen (IBM T.J. Watson Research Center)

Intra-Application Cache Partitioning
Sai Prashanth Muralidhara (The Pennsylvania State University, US); Mahmut Taylan Kandemir (The Pennsylvania State University, US); Padma Raghavan (The Pennsylvania State University, US)

Session 23
Thread Scheduling
Chair: Guang Gao

SLAW: a Scalable Locality-aware Adaptive Work-stealing Scheduler
Yi Guo (Rice University); Jlsheng Zhao (Rice University); Vincent Cave (Rice University); Vivek Sarkar (Rice University)

Executing Task Graphs Using Work-Stealing
Kunal Agrawal (Washington University in St. Louis, US); Charles Leiserson (Massachusetts Institute of Technology, US); Jim Sukha (Massachusetts Institute of Technology, US)

Structuring Execution of OpenMP Applications for Multicore Architectures
François Broquedis (University of Bordeaux, FR); Olivier Aumage (University of Bordeaux, FR); Brice Goglin (INRIA Bordeaux - Sud Ouest, FR); Samuel Thibault (University of Bordeaux, FR); Pierre-Andre Wacrenier (University of Bordeaux, FR); Raymond Namyst (University of Bordeaux, FR)

Oversubscription on Multicore Processors
Costin Iancu (Lawrence Berkeley National Laboratory); Steven Hofmeyr (Lawrence Berkeley National Laboratory); Yili Zheng (Lawrence Berkeley National Laboratory); Filip Blagojevic (Lawrence Berkeley National Laboratory)

PhD Forum
6:00 - 7:30 PM

Student Authors Available to Discuss Poster Research with Conference Attendees

Pre-Banquet Reception
6:30 PM – 7:30 PM

Pre-Banquet Reception
(in Forum area)

Symposium Banquet
7:30 PM

Symposium Banquet

THURSDAY - 22 April 2010

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

Keynote Session
8:30 AM - 9:30 AM

Chair: Rajmohan Rajaraman

Keynote Speech: Engineering an Algorithm for Parallel External Sorting of Massive Data Sets
Speaker: Peter Sanders
Universität Karlsruhe

Abstract: This talk describes algorithm engineering (AE) as a methodology for algorithmic research where design, analysis, implementation and experimental evaluation of algorithms form a feedback cycle driving the development of efficient algorithms. Additional important components of the methodology include realistic models, algorithm libraries, and collections of realistic benchmark instances. We use one main example throughout this talk: sorting huge data sets using many multi-core processors and disks. The described system is the current record holder for the GraySort and MinuteSort sorting benchmarks.

Read more information

Morning Break 9:30 AM- 10:00 AM

Commercial Book Exhibits - All Day

Parallel Sessions 24, 25, 26, 27, & 28
10:00 AM – 12:00 PM

Session 24
Distributed Algorithms
Chair: Amitabha Bagchi

A Scalable Algorithm for Maintaining Perpetual System Connectivity in Dynamic Distributed Systems
Tarun Bansal (The Ohio State University, US); Neeraj Mittal (The University of Texas at Dallas, US)

Algorithmic Mechanisms for Internet-based Master-Worker Computing with Untrusted and Selfish Workers
Antonio Fernández Anta (Universidad Rey Juan Carlos, Spain); Chryssis Georgiou (University of Cyprus, Cyprus); Miguel Mosteiro (Rutgers University, US and Universidad Rey Juan Carlos, Spain)

Stabilizing Pipelines for Streaming Applications
Andrew Berns (The University of Iowa, US); Anurag Dasgupta (The University of Iowa, US); Sukumar Ghosh (The University of Iowa, US)

A Dynamic Approach for Characterizing Collusion in Desktop Grids
Louis-Claude Canon (Nancy University); Emmanuel Jeannot (INRIA Bordeaux Sud-Ouest, FR): Jon Weissman (University of Minnesota, Twin Cities, US)

Session 25
Automatic Tuning and Automatic Parallelization
Chair: Guang Gao

Offline Library Adaptation Using Automatically Generated Heuristics
Frédéric de Mesmay (Carnegie Mellon University, US); Yevgen Voronenko (Carnegie Mellon University, US); Markus Pueschel (Carnegie Mellon University, US)

An Auto-Tuning Framework for Parallel Multicore Stencil Computations
Shoaib Kamil (Lawrence Berkeley National Laboratory, US); Cy Chan (Massachusetts Institute of Technology, US); Leonid Oliker (Lawrence Berkeley National Laboratory, US); John Shalf (Lawrence Berkeley National Laboratory, US); Samuel Williams (Lawrence Berkeley National Laboratory, US)

DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors
Albert Hartono (Ohio State University, US); Muthu Manikandan Baskaran (Ohio State University, US); J. Ram Ramanujan (Louisiana State University); Ponnuswamy Sadayappan (Ohio State University, US)

Using Focused Regression For Accurate Time-Constrained Scaling of Scientific Applications
Bradley Barnes (University of Georgia, US); Jeonifer Garren (University of Georgia, US); David Lowenthal (University of Arizona, US); Jaxk Reeves (University of Georgia, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US); Barry Rountree (University of Georgia, US)

Session 26
Architectural Support for Runtime Systems
Chair: Arun Rodrigues

A Low Cost Split-Issue Technique to Improve Performance of SMT Clustered VLIW Processors
Manoj Gupta (Universitat Politècnica de Catalunya, Spain); Fermín Sánchez (Universitat Politècnica de Catalunya, Spain); Josep Llosa (Universitat Politècnica de Catalunya, Spain)

Exploiting Inter-thread Temporal Locality for Chip Multithreading
Jiayuan Meng (University of Virginia, US); Jeremy Sheaffer (NVIDIA, US); Kevin Skadron (University of Virginia, US)

Profitability-Based Power Allocation for Speculative Multithreaded Systems
Polychronis Xekalakis (University of Edinburgh, UK); Nikolas Ioannou (University of Edinburgh, UK); Salman Khan (University of Edinburgh, UK); Marcelo Cintra (University of Edinburgh, UK)

Evaluating Standard-Based Self-Virtualizing Devices: A Performance Study on 10 GbE NICs with SR-IOV Support
Jiuxing Liu (IBM T.J. Watson Research Center, US)

Session 27
Client-Server System Management and Analysis
Chair: Chen Ding

QoS Assessment of WS-BPEL Processes through non-Markovian Stochastic Petri Nets
Dario Bruneo (Universita di Messina, Italy); Salvatore Distefano (Universita di Messina, Italy); Francesco Longo (Universita di Messina, Italy); Marco Scarpa (Universita di Messina, Italy)

Power-aware Resource Provisioning in Cluster Computing
Kaiqi Xiong (North Carolina State University, US)

Using the Middle Tier to Understand Cross-Tier Delay in a Multi-tier Application
Haichuan Wang (IBM Research, PRC); Qiming Teng (IBM Research, PRC); Xiao Zhong (IBM Research, PRC); Peter Sweeney (IBM T.J. Watson Research Center, US)

Service and Resource Discovery in Cycle-Sharing Environments with a Utility Algebra
João Nuno Silva (Technical University of Lisbon, Portugal); Paulo Ferreira (Technical University of Lisbon, Portugal); Luís Veiga (Technical University of Lisbon, Portugal)

Session 28
Parallel Graph Algorithms II
Chair: Padma Raghavan

Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA
Zheng Wei (University of Maryland, US); Joseph Jaja (University of Maryland, College Park, US)

Parallel External Memory Graph Algorithms
Lars Arge (Aarhus University, Denmark); Michael Goodrich (University of California, Irvine, US); Nodari Sitchinava (Aarhus University, Denmark)

Engineering a Scalable High Quality Graph Partitioner
Mauel HoltGrewe (University of Karlsruhe, Germany); Peter Sanders (University of Karlsruhe, Germany); Christian Schulz (University of Karlsruhe, Germany)

Lunch 12 Noon – 1:30 PM (on your own)

Parallel Sessions
29, 30, 31 &32
1:30 PM -
3:30 PM

Session 29
Algorithms for Wireless Networks
Chair: Neeraj Mittal

Sparse Power-Efficient Topologies for Wireless Ad Hoc Sensor Networks
Amitabha Bagchi (Indian Institute of Technology, Delhi, India)

Contention-based Georouting with Guaranteed Delivery, Minimal Communication Overhead, and Shorter Paths in Wireless Sensor Networks
Stefan Rührup (OFFIS - Institute for Information Technology, Germany); Ivan Stojmenovic (University of Ottawa, Canada)

Midpoint Routing Algorithms for Delaunay Triangulations
Albert Zomaya (University of Sydney, Australia); Weisheng Si (University of Sydney, Australia)

A Local, Distributed Constant-Factor Approximation Algorithm for the Dynamic Facility Location Problem
Bastian Degener (University of Paderborn, Germany); Barbara Kempkes (University of Paderborn, Germany); Peter Pietrzyk (University of Paderborn, Germany)

Session 30
Analysis of heterogeneity and future platforms
Chair: Richard Murphy

Toward Understanding Heterogeneity in Computing
Arnold Rosenberg (Colorado State University, US); Ron Chi-Lung Chiang (Colorado State University, US)

Balls into Non-uniform Bins
Petra Berenbrink (Simon Fraser University, Canada); André Brinkmann (University of Paderborn, Germany); Tom Friedetzky (Durham University, UK); Lars Nagel (Durham University, UK)

An Introductory Exascale Feasibility Study for FFTs and Multigrid
Hormozd Gahvari (University of Illinois at Urbana-Champaign, US); William Gropp (Argonne National Laboratory, US)

Session 31
Data Management
Chair: Zhihui Du

A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems
Dong Yuan (Swinburne University of Technology, Australia); Yun Yang (Swinburne University of Technology, Australia); Xiao Liu (Swinburne University of Technology, Australia); Jinjun Chen (Swinburne University of Technology, Australia)

BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications
Bogdan Nicolae (University of Rennes, FR), Diana Moise (INRIA, Rennes, FR); Gabriel Antoniu (INRIA Rennes-Bretagne, FR); Luc Bougé (IRISA/Ecole Normale Superieure Cachan Brittany, FR); Matthieu Dorier (Ecole Normale Superieure Cachan, FR)

PreDatA - Preparatory Data Analytics on Peta-Scale Machines
Fang Zheng (Georgia Institute of Technology, US); Hasan Abbasi (University of Sydney, Australia); Ciprian Docan (Rutgers University, US); Jay Lofstead (Georgia Institute of Technology, US); Qing Liu (Oak Ridge National Laboratory, US); Scott Klasky (Oak Ridge National Laboratory, US); Manish Prashar (Rutgers University, US); Norbert Podhorszki (Oak Ridge National Laboratory, US); Karsten Schwan (Georgia Institute of Technology, US); Matt Wolf (Georgia Institute of Technology, US)

Reconciling Scratch Space Consumption, Exposure, and Volatility to Achieve Timely Staging of Job Input Data
Henry Monti (Virginia Tech, US); Ali R Butt (Virginia Tech, US); Sudharshan S Vazhkudai (Oak Ridge National Laboratory, US)

Session 32
Synchronization
Chair: Chen Ding

Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism
Jun Shirako (Rice University, US); Vivek Sarkar (Rice University, US)

Clustering JVMs with Software Transactional Memory Support
Christos Kotselidis (University of Manchester, UK); Mikel Luján (University of Manchester, UK); Behram Khan (University of Manchester, UK); Mohammad Ansari (University of Manchester, UK); Konstantinos Malakasis (University of Manchester, UK); Chris Kirkham (University of Manchester, UK); Ian Watson (University of Manchester, UK)

Inter-Block GPU Communication via Fast Barrier Synchronization
Shucai Xiao (Virginia Tech, US); Wu-chun Feng (Virginia Tech, US)

A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring
Patrick Pak-Ching Lee (The Chinese University of Hong Kong, Hong Kong); Tian Bu (Bell Labs, Lucent, US); Girish Chandranmenon (Lucent Technologies, US)

Afternoon Break 3:30 PM - 4:00 PM

Symposium Panel Discussion
Plenary Session
4:00 PM – 6:00 PM

Topic: Unconventional Wisdom in Multicore Computing

Panel Organizer:
Richard (Rich) Vuduc, Georgia Institute of Technology

Panelists:
Thomas Cormen, Dartmouth College
Markus Pueschel, Carnegie Mellon University
Karthikeyan (Karu) Sankaralingam, University of Wisconsin
Vivek Sarkar, Rice University
Jeffrey Vetter, Oak Ridge National Laboratory / Georgia Tech

Description:
The aim of this panel is to assess the prevailing assumptions in today's multicore software R&D and computing education, and ask in particular whether the parallel computing community's current trajectory will produce the "right" algorithmic and software infrastructure for tomorrow's applications.

All Symposium Tutorial
7:00 PM – 10:00 PM

(Open to all IPDPS 2010 attendees)

Title: Parallel Computing with CUDA

Presenter: Michael Garland, NVIDIA

Abstract: NVIDIA's CUDA architecture provides a powerful platform for writing highly parallel programs. By providing simple abstractions for hierarchical thread organization, memories, and synchronization, the CUDA programming model allows programmers to write scalable programs without the burden of learning a multitude of new programming constructs. The CUDA architecture can support many languages and programming environments, including C, Fortran, OpenCL, and DirectX Compute. In this tutorial, I will provide an overview of modern GPU processor design and its implications for successful parallel programming models. I will present the programming model defined by the CUDA architecture, and demonstrate how this is exposed in the C/C++ language. Finally, I will sketch some techniques for implementing common data-parallel algorithms in the CUDA model.

Read more information

FRIDAY - 23 April 2010

DAYS • Monday • Tuesday • Wednesday • Thursday • Friday

WORKSHOPS
all day*

* See each individual workshop programs for schedule details

11	PDSEC	Workshop on Parallel and Distributed Scientific and Engineering Computing
12	PMEO	Performance Modeling, Evaluation, and Optimisation of Ubiquitous Computing and Networked Systems
13	DPDNS	Dependable Parallel, Distributed and Network-Centric Systems
14	HOTP2P	International Workshop on Hot Topics in Peer-to-Peer Systems
15	MTAAP	Workshop on Multi-Threaded Architectures and Applications
16	PDCoF	Workshop on Parallel and Distributed Computing in Finance
17	LSPP	Workshop on Large-Scale Parallel Processing
18	JSSPP	Workshop on Job Scheduling Strategies for Parallel Processing

IPDPS 2010
Information on Keynote Speakers & Tutorial Presenters

IPDPS 2010 Monday
TCPP INVITED SPEAKER

Craig Stunkel
IBM T.J. Watson Research Center
Topic: Exascale: Parallelism gone wild!

Abstract: Although Petaflop systems have only recently become a reality, scientists and governments are eagerly anticipating Exascale capabilities. There is a significant effort to achieve an Exaflop within the decade. However, unlike Petascale systems, Exascale systems will not be straightforward extrapolations from predecessors, and success is not a foregone conclusion. The chief combatant is power consumption: systems must become enormously more energy-efficient to make Exascale practical. General-purpose cores are too power-hungry, and thus more specialized forms of parallelism must be exploited. However, it is highly likely that this will require changes in applications and programming models, especially considering that these systems will require tens of millions of cores. Complicating matters further, error detection and resiliency are already major issues for Petascale systems, and improving fault tolerance typically requires more energy. We will discuss innovations in technology, architecture, and software that attack these challenges.

Bio: Craig Stunkel is a senior manager at IBM's T. J. Watson Research Center in Yorktown Heights, NY. He received the B.S. and M.S. degrees from Oklahoma State University in 1982 and 1983, and the Ph.D. degree in electrical engineering from the University of Illinois, Urbana in 1990. Craig joined IBM Research in 1990. He contributed extensively to the interconnection networks of several generations of IBM supercomputing systems. Craig currently leads the Exploratory Scalable Architectures department, which is designing a workload-optimized supercomputer. His research interests include parallel architectures, applications, algorithms, and performance analysis.

IPDPS 2010 Tuesday
KEYNOTE SPEAKER

Burton Smith
Microsoft Corp.
Topic: Operating System Resource Management

Abstract: Resource management is the dynamic allocation and de-allocation by an operating system of processor cores, memory pages, and various types of bandwidth to computations that compete for those resources. The objective is to allocate resources so as to optimize responsiveness subject to the finite resources available. Historically, resource management solutions have been relatively unsystematic, and now the very assumptions underlying the traditional strategies fail to hold. First, applications increasingly differ in their ability to exploit resources, especially processor cores. Second, application responsiveness is approximately two-valued for "Quality-Of-Service" (QOS) applications, depending on whether deadlines are met. Third, power and battery energy have become constrained. This talk will propose a scheme for addressing the operating system resource management problem.

Bio: Burton J. Smith, Technical Fellow for Microsoft Corporation, works with various groups within the company to help define and expand efforts in the areas of parallel and high performance computing. He reports to Dan Reed, head of the eXtreme Computing Group at Microsoft Research. He received the Seymour Cray Computing Engineering Award from the IEEE Computer Society and was elected to the National Academy of Engineering in 2003. He received the Eckert-Mauchly Award in 1991 given jointly by the Institute for Electrical and Electronic Engineers and the Association for Computing Machinery and was elected a fellow of each organization in 1994. Smith attended the University of New Mexico, where he earned a BSEE degree, and the Massachusetts Institute of Technology, where he earned SM, EE, and Sc.D degrees.

IPDPS 2010 Wednesday
KEYNOTE SPEAKER

Kunle Olukotun
Stanford University
Title: Chip Multiprocessor Architecture: A Programmability-Driven Approach

Abstract: Chip multiprocessors (CMPs) are now the dominant architecture in microprocessor design. However, in many software environments, due to the difficulty of writing correct and high performing parallel programs, the capability of CMPs is underutilized. In this talk, I will argue that to enable more parallel programs to be written more easily, the design of CMPs should be driven by the needs of programmability. To demonstrate the benefits of this approach, I will describe example CMP designs where a focus on programmability has resulted in a simpler programming model and excellent performance. I will also describe a programming environment where programmability can be used to drive the development of future CMP architectures.

Bio: Kunle Olukotun is a Professor of Electrical Engineering and Computer Science at Stanford University and he has been on the faculty since 1991. Olukotun is a pioneer in multicore processor design and is well known for leading the Stanford Hydra research project which developed one of the first chip multiprocessors (CMP) with support for thread-level speculation (TLS). Olukotun founded Afara Websystems to develop high-throughput, low power server systems with chip multiprocessor technology. The Afara microprocessor technology, called Niagara, was acquired by Sun Microsystems. Niagara based systems are one of Sun's fastest ramping server products ever. Olukotun continues to advise and develop computer start-up companies. Olukotun is actively involved in research in computer architecture, parallel programming environments and scalable parallel systems. Olukotun co-lead the Transactional Coherence and Consistency (TCC) project whose goal was to simplify parallel programming for average programmers. Olukotun currently directs the Stanford Pervasive Parallelism Lab (PPL) which seeks to proliferate the use of parallelism in all application areas using Domain Specific Languages (DSLs) and heterogeneous parallel architectures. Olukotun is an ACM Fellow (2006) and IEEE fellow (2007) for contributions to multiprocessors on a chip and multi threaded processor design. He has authored many papers on CMP design and parallel software and recently completed a book on CMP architecture. Olukotun received his Ph.D. in Computer Engineering from The University of Michigan.

IPDPS 2010 Thursday
KEYNOTE SPEAKER

Peter Sanders
Universität Karlsruhe
Title: Engineering an Algorithm for Parallel External Sorting of Massive Data Sets

Abstract: This talk describes algorithm engineering (AE) as a methodology for algorithmic research where design, analysis, implementation and experimental evaluation of algorithms form a feedback cycle driving the development of efficient algorithm. Additional important components of the methodology include realistic models, algorithm libraries, and collections of realistic benchmark instances. We use one main example throughout this paper: sorting huge data sets using many multi-core processors and disks. The described system is the current record holder for the GraySort and MinuteSort sorting benchmarks.

Bio: Peter Sanders received his PhD in computer science from Universität Karlsruhe, Germany in 1996. After seven years at the Max-Planck-Institute for Informatics in Saarbrücken he returned to Karlsruhe as a full professor in 2004. In 2004 he was also awarded the Alcatel SEL Research Prize. He has more than one-hundred-thirty publications, mostly on algorithms for large data sets. This includes parallel algorithms (load balancing), memory hierarchies, graph algorithms (route planning, graph partitioning...), randomized algorithms, full text indices, et al. He is very active in promoting the methodology of algorithm engineering that integrate design, analysis, implementation, and experimental evaluation of algorithms. For example, he currently heads a focus project on AE in Germany.

IPDPS 2010 Symposium Tutorial 1
Tuesday Evening

MapReduce Programming with Apache Hadoop
Presenter: Milind Bhandarkar, Yahoo! Inc.
Hadoop Solutions Architect

Abstract: Apache Hadoop has become the platform of choice for developing large-scale data-intensive applications. In this tutorial, we will discuss design philosophy of Hadoop, describe how to design and develop Hadoop applications and higher-level application frameworks to crunch several terabytes of data, using anywhere from four to 4,000 computers. We will discuss solutions to common problems encountered in maximizing Hadoop application performance. We will also describe several frameworks and utilities developed using Hadoop that increase programmer-productivity and application-performance.

Bio: Milind Bhandarkar has been contributing and working with Hadoop since version 0.1.0. He started the Yahoo! Grid solutions team focused on training, consulting, and supporting hundreds of new migrants to Hadoop. He has been focused on parallel programming languages and paradigms for over 20 years. He worked at the Center for Development of Advanced Computing (C-DAC), Center for Simulation of Advanced Rockets, Siebel Systems, and Pathscale Inc. (acquired by QLogic) before settling at Yahoo! in 2005.

IPDPS 2010 Symposium Tutorial 2
Thursday Evening

Parallel Computing with CUDA
Presenter: Michael Garland, NVIDIA

Abstract: NVIDIA's CUDA architecture provides a powerful platform for writing highly parallel programs. By providing simple abstractions for hierarchical thread organization, memories, and synchronization, the CUDA programming model allows programmers to write scalable programs without the burden of learning a multitude of new programming constructs. The CUDA architecture can supports many languages and programming environments, including C, Fortran, OpenCL, and DirectX Compute. In this tutorial, I will provide an overview of modern GPU processor design and its implications for successful parallel programming models. I will present the programming model defined by the CUDA architecture, and demonstrate how this is exposed in the C/C++ language. Finally, I will sketch some techniques for implementing common data-parallel algorithms in the CUDA model.

Bio: Michael Garland is a research scientist at NVIDIA and one of the founding members of NVIDIA Research. Dr. Garland holds B.S. and Ph.D. degrees in Computer Science from Carnegie Mellon University, and is an adjunct professor in the Department of Computer Science of the University of Illinois at Urbana-Champaign. He has published numerous articles in leading conferences and journals on a range of topics including surface simplification, remeshing, texture synthesis, novice-friendly modeling, free-form animation, scientific visualization, graph mining, and visualizing complex graphs. His current research interests include computer graphics and visualization, geometric algorithms, and parallel algorithms and programming models.