IPDPS 2010 Presentations

Table of Contents

1 IPDPS 2010 Presentations

Home for IPDPS 2010 presentations, including some screencasts and audio files used to work around the volcano issues. The tutorials and plenary sessions were recorded and are available on-line.

2 Keynotes

  • Exascale: Parallelism gone wild! Craig Stunkel (IBM T.J. Watson Research Center); TCPP INVITED SPEAKER
  • Operating System Resource Management. Burton Smith (Microsoft)
  • The new era in genomics: Opportunities and challenges for high performance computing. Srinivas Aluru (Iowa State University)
  • Where Is Your Dog's Belly Button? or IC-Scheduling Theory: A New Scheduling Paradigm for Task-Hungry Platforms. Arnold L. Rosenberg (Colorado State University and University of Massachusetts Amherst)

3 Multicore Panel

4 Tutorials

  • Milind Bhandarkar, MapReduce Programming with Apache Hadoop
  • Michael Garland, Parallel Computing with CUDA

5 Workshops

5.1 HCW: Heterogeneity in Computing Workshop

5.2 RAW: Reconfigurable Architectures Workshop

5.3 HIPS: Workshop on High-Level Parallel Programming Models & Supportive Environments

5.4 NIDISC: Workshop on Nature Inspired Distributed Computing

5.5 HiCOMB: Workshop on High Performance Computational Biology

5.6 APDCM: Advances in Parallel and Distributed Computing Models

5.7 CAC: Communication Architecture for Clusters

5.8 HPPAC: High-Performance, Power-Aware Computing

5.9 HPGC: High Performance Grid Computing

5.10 SMTPS: Workshop on System Management Techniques, Processes, and Services

5.11 PDSEC: Workshop on Parallel and Distributed Scientific and Engineering Computing

5.12 PMEO: Performance Modeling, Evaluation, and Optimisation of Ubiquitous Computing and Networked Systems

5.13 DPDNS: Dependable Parallel, Distributed and Network-Centric Systems

5.14 HOTP2P: International Workshop on Hot Topics in Peer-to-Peer Systems

5.15 MTAAP: Workshop on Multi-Threaded Architectures and Applications

5.16 PDCoF: Workshop on Parallel and Distributed Computing in Finance

5.17 LSPP: Workshop on Large-Scale Parallel Processing

5.18 JSSPP: Workshop on Job Scheduling Strategies for Parallel Processing

6 Sessions

6.1 Session 1: Algorithms for Network Management

Chair: Anne Benoit

6.2 Session 2: Scientific Computing with GPUs

Chair: Ling Zhou

  • Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs. Michela Taufer (University of Delaware, US); Philip Saponaro (University of Delaware, US); Omar Padron (Kean University, US); Sandeep Patel (University of Delaware, US)
  • Implementing the Himeno Benchmark with CUDA on GPU Clusters. Everett Phillips (NVIDIA, US); Massimiliano Fatica (NVIDIA, US)
  • Direct Self-Consistent Field Computations on GPU Clusters. (pptx) Guochun Shi, Volodymyr Kindratenko (National Center for Supercomputing Applications, US); Ivan Ufimtsev, Todd Martinez (Stanford University, US)
  • Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs. Lifan Xu (University of Delaware, US); Michela Taufer (University of Delaware, US); Stuart Collins (University of Delaware, US); Dionisios Vlacho (University of Delaware, US)

6.3 Session 3: Data Storage and Memory Systems

Chair: Bradley Kuszmaul

  • DEBAR: A Scalable High-Performance De-duplication Storage System for Backup and Archiving. Tianming Yang (Huazhong University of Science and Technology, PRC); Hong Jiang (University of Nebraska, US); Dan Feng (Huazhong University of Science and Technology, PRC); Zhongying Niu (Huazhong University of Science and Technology, PRC); Ke Zhou (Huazhong University of Science and Technology, PRC); Yaping Wan (Huazhong University of Science and Technology, PRC)
  • HPDA: A Hybrid Parity-based Disk Array for Enhanced Performance and Reliability. Bo Mao (Huazhong University of Science and Technology, PRC); Hong Jiang (University of Nebraska, US); Dan Feng (Huazhong University of Science and Technology, PRC); Suzhen Wu (Huazhong University of Science and Technology, PRC); Jianxi Chen (Huazhong University of Science and Technology, PRC); Lingfang Zeng (Huazhong University of Science and Technology, PRC); Lei Tian (Huazhong University of Science and Technology, PRC)
  • Fine-Grained QoS Scheduling for PCM-based Main Memory Systems. Ping Zhou (University of Pittsburgh, US); Yu Du (University of Pittsburgh, US); Youtao Zhang (University of Pittsburgh, US); Jun Yang (University of Pittsburgh, US)
  • Performance Impact of Resource Contention in Multicore Systems. Robert Hood (CSC-NASA Ames, US); Haoqiang Jin (NASA Ames Research Center, US); Piyush Mehrotra (NASA Ames Research Center, US); Johnny Chang (CSC-NASA Ames Research Center, US); Jahed Djomehri (NASA Ames Research Center, US); Sharad Gavali (NASA Ames Research Center, US); Dennis Jespersen (NASA Ames Research Center, US); Kenichi Taylor (Silicon Graphics International, US); and Rupak Biswas (NASA Ames Research Center, US)

6.4 Session 4: Fault Tolerance

Chair: Almadena Chtchelkanova

  • Improving the Performance of Hypervisor-Based Fault Tolerance. Jun Zhu (Peking University, PRC); Wei Dong (Peking University, PRC); ZheFu Jiang (Peking University, PRC); Xiaogang Shi (Peking University, PRC); Zhen Xiao (Peking University, PRC); XiaoMing Li (Peking University, PRC)
  • Supporting Fault Tolerance in a Data-Intensive Computing Middleware. (ppt) Tekin Bicer (The Ohio State University, US); Wei Jiang (The Ohio State University, US); Gagan Agrawal (The Ohio State University, US)
  • A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs. Naoya Maruyama (Tokyo Institute of Technology, JPN); Akira Nukada (Tokyo Institute of Technology, JPN); Satoshi Matsuoka (Tokyo Institute of Technology, JPN)
  • Scalable Failure Recovery for High-performance Data Aggregation. Dorian Arnold (University of New Mexico, US); Barton Miller (University of Wisconsin, US)

6.5 Session 5: Sorting

Chair: George Biros

  • High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs. (ppt) Xiaochun Ye (Chinese Academy of Sciences, PRC); Dongrui Fan (Chinese Academy of Sciences, PRC); Wei Lin (Chinese Academy of Sciences, PRC); Nan Yuan (Chinese Academy of Sciences, PRC); Paolo Ienne (EPFL, Switzerland)
  • GPU Sample Sort. (notes) Vitaly Osipov (Karlsruhe Institute of Technology, Germany); Peter Sanders (University of Karlsruhe, Germany); Nikolaj Leischner (University of Karlsruhe, Germany)
  • Highly Scalable Parallel Sorting. Edgar Solomonik (University of Illinois at Urbana-Champaign, US); Laxmikant Kale (University of Illinois at Urbana-Champaign, US)

6.6 Session 6: Scheduling

Chair: David Bunde

  • A Scheduling Framework for Large-Scale, Parallel, and Topology-Aware Applications. Valentin Kravtsov (Technion - Israel Institute of Technology, Israel); Pavel Bar (Technion - Israel Institute of Technology, Israel); David Carmeli (Technion - Israel Institute of Technology, Israel); Assaf Schuster (Technion - Israel Institute of Technology, Israel); Martin Swain (Technion - Israel Institute of Technology, Israel);
  • Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors. Risat Pathan (Chalmers University of Technology, Sweden); Jan Jonsson (Chalmers University of Technology, Sweden)
  • Scheduling Algorithms for Linear Workflow Optimization. Kunal Agrawal (Washington University in St. Louis, US); Anne Benoit (Ecole Normale Superieure de Lyon Lyon, FR); Loic Magnan (Ecole Normale Superieure de Lyon Lyon, FR); Yves Robert (Ecole Normale Superieure de Lyon, FR)
  • Hypergraph-Based Task-Bundle Scheduling Towards Efficiency and Fairness in Heterogeneous Distributed Systems. Han Zhao (Oklahoma State University, US); Xinxin Liu (Oklahoma State University, US); Xiaolin (Andy) Li (Oklahoma State University, US)

6.7 Session 7: Performance/Scalability Improvement for Scientific Applications

Chair: Srinivas Aluru

6.8 Session 8: Network Architecture and Algorithms

Chair: Neeraj Mittal

  • Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup. Deng Pan (Florida International University, US); Kia Makki (Florida International University, US); Niki Pissinou (Florida International University, US)
  • Distributive Waveband Assignment in Multi-granular Optical Networks. Yang Wang (Georgia State University, US); Xiaojun Cao (Georgia State University, US)
  • QoS Aware BiNoC Architecture. Shih-Hsin Lo (National Taiwan University, Taiwan); Ying-Cherng Lan (National Taiwan University, Taiwan); Hsin-Hsien Yeh (National Taiwan University, Taiwan); Wen-Chung Tsai (National Taiwan University, Taiwan); Yu Hen Hu (National Taiwan University, Taiwan); Sao-Jie Chen (National Taiwan University, Taiwan)
  • First Experiences with Congestion Control in InfiniBand Hardware. (ppt, wmv) Ernst Gran (Simula Research Laboratory, Norway); Magne Eimot (Simula Research Laboratory, Norway); Sven-Arne Reinemo (Simula Research Laboratory, Norway); Tor Skeie (Simula Research Laboratory, Norway); Olav Lysne (Simula Research Laboratory, Norway); Lars Paul Huse (Simula Research Laboratory, Norway)

6.9 Session 9: Software Support for Using GPUs

Chair: Anne Elster

  • Object-Oriented Stream Programming using Aspects. Mingliang Wang (Rutgers University, US); Manish Parashar (Rutgers University, US)
  • Optimal Loop Unrolling for GPGPU Programs. Giridhar Sreenivasa Murthy (The Ohio State University, US); Muthu Ravishankar (The Ohio State University, US); Muthu Manikandan Baskaran (The Ohio State University, US); Ponnuswamy Sadayappan (The Ohio State University, US);
  • Speculative Execution on Multi-GPU Systems. Gregory Diamos (Georgia Institute of Technology, US); Sudakhar Yalamanchili (Georgia Institute of Technology, US)
  • Dynamic Load Balancing on Single- and Multi-GPU Systems. Long Chen (University of Delaware, US); Oreste Villa (Pacific Northwest National Laboratory, US); Sriram Krishnamoorthy (Pacific Northwest National Laboratory, US); Guang Gao (University of Delaware, US)

6.10 Session 10: Performance Prediction and Benchmarking Tools

Chair: George Bosilca

  • Servet: A Benchmark Suite for Autotuning on Multicore Clusters. Jorge González-Domínguez (University of A Coruna, Spain); Guillermo Lopez Taboada (University of A Coruna, Spain); Basilio Fraguela (University of A Coruna, Spain); María J. Martín (University of A Coruna, Spain); Juan Tourino (University of A Coruna, Spain);
  • KRASH: Reproducible CPU Load Generation on Many-Cores Machines. Swann Perarnau (INRIA Moais Research Team, FR); Guillaume Huard (ID Laboratory, FR)
  • Power-aware MPI Task Aggregation Prediction for High-End Computing Systems. Dong Li (Virginia Tech, US); Dimitrios Nikolopoulos (Foundation of Research and Technology Hellas, Greece); Kirk Cameron (Virginia Tech, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US)

6.11 Session 11: Resource Allocation

Chair: Anne Benoit

  • Varying Bandwidth Resource Allocation Problem with Bag Constraints. Venkatesan Chakaravarthy (IBM Research, India); Vinayaka Pandit (IBM Research, India); Yogish Sabharwal (IBM Research, India); Deva Seetharam (IBM Research, India)
  • Decentralized Resource Management for Multi-core Desktop Grids. Jaehwan Lee (University of Maryland, College Park, US); Pete Keleher (University of Maryland, US); Alan Sussman (University of Maryland, US)
  • Dynamic Fractional Resource Scheduling for HPC Workloads. Mark Lee Stillwell (University of Hawaii at Manoa, US); Frédéric Vivien (INRIA, FR); Henri Casanova (University of Hawaii at Manoa)
  • ADEPT Scalability Predictor in Support of Adaptive Resource Allocation. Arash Deshmeh (University of Windsor, Canada); Jacob Machina (University of Windsor, Canada); Angela Sodan (University of Windsor, Canada)

6.12 Session 12: Image Processing and Data Mining

Chair: David Konerding

  • Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution. Jiayuan Meng (University of Virginia); Anand Raghunathan (NEC Research Labs, US); Srimat Chakradhar (NEC Research Labs, US); Surendra Byna (NEC Research Labs, US)
  • Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms. Konstantis Daloukas (University of Thessaly, Greece); Christos Antonopoulos (University of Thessaly, Greece); Nikos Bellas (University of Thessaly, Greece); Sek Chai (Motorola, US)
  • Large-Scale Multi-Dimensional Document Clustering on GPU Clusters. Yongpeng Zhang (North Carolina State University, US); Frank Mueller (North Carolina State University, US); Xiaohui Cui (Oak Ridge National Laboratory, US); Thomas Potok (Oak Ridge National Laboratory, US)
  • eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in Windows Azure Platform. (pptx) Jie Li (University of Virginia, US); Deb Agarwal (Lawrence Berkeley National Laboratory, US); Marty Humphrey (University of Virginia, Charlottesville, US); Catharine van Ingen (Microsoft Research); Keith Jackson (Lawrence Berkeley National Laboratory, US); Youngryel Ryu (University of California at Berkeley, US)

6.13 Session 13: Transactional Memory

Chair: Anne Elster

  • Locality-Aware Adaptive Grain Signatures for Transactional Memories. Woojin Choi (University of Southern California, US); Jeffrey Draper (University of Southern California, US)
  • Dynamic Analysis of the Relay Cache-Coherence Protocol for Distributed Transactional Memory. Bo Zhang (Virginia Tech, US); Binoy Ravindran (Virginia Tech, US)
  • Runtime Checking of Serializability in Software Transactional Memory. Arnab Sinha (Princeton University, US); Sharad Malik (Princeton University, US)
  • Consistency in Hindsight, A Fully Decentralized STM Algorithm. (slidecast) Annette Bieniusa (University of Freiburg, US); Thomas Fuhrmann (Technische Universitat Munchen, Germany)

6.14 Session 14: Tools for Performance and Correctness Analysis

Chair: Almadena Chtchelkanova

  • Identifying Ad-hoc Synchronization for Enhanced Race Detection. Ali Jannesari (University of Karlsruhe, Germany); Water F. Tichy (University of Karlsruhe, Germany)
  • Improving the Performance of Program Monitors with Compiler Support in Multi-Core Environment. Guojin He (University of Minnesota, US); Antonia Zhai (University of Minnesota, US)
  • On-Line Detection of Large-Scale Parallel Application's Structure. (proposed transcript doc) German Llort (Barcelona Supercomputing Center, Spain); Juan Gonzalez Garcia (Universitat Politècnica de Catalunya, Spain); Harald Servat (Barcelona Supercomputing Center, Spain); Judit Gimenez (Barcelona Supercomputing Center, Spain); Jesus Labarta (Barcelona Supercomputing Center, Spain)
  • Adaptive Sampling-Based Profiling Techniques for Optimizing the Distributed JVM Runtime. King Tim Lam (The University of Hong Kong, Hong Kong); Yang Luo (The University of Hong Kong, Hong Kong); Cho-Li Wang (The University of Hong Kong, Hong Kong)

6.15 Session 15: Parallel Linear Algebra I

Chair: Esmond Ng

  • Algorithmic Cholesky Factorization Fault Recovery. Douglas Hakkarinen (Colorado School of Mines, US); Zizhong Chen (Colorado School of Mines, US)
  • Analyzing the Soft-Error Resiliance of Linear Solvers on Multicore Multiprocessors. Konrad Malkowski (The Pennsylvania State University); Padma Raghavan (The Pennsylvania State University); Mahmut Taylan Kandemir (The Pennsylvania State University)
  • A Parallel Architecture for Meaning Comparison. Suneil Mohan (Texas A&M University, US); Amitava Biswas (Texas A&M University, US); Aalap Tripathy (Texas A&M University, US); Jagannath Panigraphy (Texas A&M University, US); Rabi Mahapatra (Texas A&M University, US)

6.16 Plenary Session - Best Papers

Chair: Cynthia Phillips

  • Extreme Scale Computing: Modeling the Impact of System Noise in Multicore Clustered Systems. Seetharami R Seelam (IBM Research, US); Liana Fong (IBM T.J. Watson Research Center, US); Asser Tantawi (IBM T.J. Watson Research Center, US); John Lewars (IBM Systems and Technology Group, US); John Divirgilio (IBM, US); Kevin Gildea (IBM, US)
  • Oblivious Algorithms for Multicores and Network of Processors. Rezaul Chowdhury (University of Texas at Austin, US); Francesco Silvestri (University of Padova, Italy); Brandon Blakeley (University of Texas, US); Vijaya Ramachandran (University of Texas at Austin, US)
  • Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P. Wei Tang (Illinois Institute of Technology, US); Narayan Desai (Argonne National Laboratory, US), Daniel Buettner (Argonne National Laboratory, US); Zhiling Lan (Illinois Instititue of Technology, US)
  • Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems. Aparna Chandramowlishwaran (Georgia Institute of Technology, US); Kathleen Knobe (Intel, US); Richard W. Vuduc (Georgia Institute of Technology, US)

6.17 Session 16: P2P Algorithms

Chair: Amitabha Bagchi

  • A Hybrid Interest Management Mechanism for Peer-to-Peer Networked Virtual Environments. Ke Pan (Nanyang Technological University, Singapore); Wentong Cai (Nanyang Technological University, Singapore); Xueyan Tang (Nanyang Technological University, Singapore); Suiping Zhou (Nanyang Technological University, Singapore); Stephen John Turner (Nanyang Technological University, Singapore)
  • Attack-Resistant Frequency Counting. Bo Wu (University of New Mexico, US); Valerie King (University of Victoria, Canada); Jared Saia (University of New Mexico, US)
  • Overlays with preferences: Approximation algorithms for matching with preference lists. Giorgos Georgiadis (Chalmers University of Technology, Sweden); Marina Papatriantafilou (Chalmers University of Technology, Sweden)
  • Analysis of Durability in Replicated Distributed Storage Systems. Joseph Pasquale (University of California, San Diego, US); Sriram Ramabhadran (University of California, San Diego, US)

6.18 Session 17: Parallel Solutions for String and Sequence Problems

Chair: Ruppa Thulasiram

  • Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching. Weirong Jiang (University of Southern California, US); Yi-Hua Yang (University of Southern California, US); Viktor K. Prasanna (University of Southern California, US)
  • Head-Body Partitioned String Matching for Deep Packet Inspection with Scalable and Attack-Resilient Performance. Yi-Hua Yang (University of Southern California, US); Viktor K. Prasanna (University of Southern California, US); Chenqian Jiang (University of Southern California, US)
  • Parallel de novo Assembly of Large Genomes from High-Throughput Short Reads. Benjamin G. Jackson (AOL, US); Matthew Regennitter (Iowa State University, US); Xiao Yang (Iowa State University, US); Patrick Schnable (Iowa State University, US); Srinivas Aluru (Iowa State University, US)
  • Efficient Parallel Algorithms for Maximum-Density Segment Problem. Xue Wang (Georgia State University, US); Fasheng Qiu (Georgia State University, US); Sushil Prasad (Georgia State University, US); Guantao Chen (Georgia State University, US)

6.19 Session 18: Energy-aware Task Management

Chair: David Bunde

  • Hybrid MPI/OpenMP Power-aware Computing. Dong Li (Virginia Tech, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US); Kirk Cameron (Virginia Tech, US); Dimitrios S. Nikolopoulos (Foundation for Research and Technology Hellas, Greece)
  • Performance and Energy Optimization of Concurrent Pipelined Applications. Anne Benoit (Ecole Normale Supérieure de Lyon, FR); Paul Renaud-Goud (Ecole Normale Supérieure de Lyon, FR); Yves Robert (Ecole Normale Supérieure de Lyon, FR)
  • Robust Control-theoretic Thermal Balancing for Server Clusters. Yong Fu (Washington University in St. Louis, US); Chenyang Lu (Washington University in St. Louis, US); Hongan Wang (Washington University in St. Louis, US)
  • A Simple Thermal Model for Multi-core Processors and Its Application to Slack Allocation. Zhe Wang (University of Florida, US); Sanjay Ranka (University of Florida, US)

6.20 Session 19: Parallel Operating Systems and System Software

Chair: George Bosilca

  • GenerOS: An Asymmetric Operating System Kernel for Multi-core Systems. Qingbo Yuan (Institute of Compute Technology, PRC); Jianbo Zhao (Institute of Compute Technology, PRC); Mingyu Chen (Institute of Compute Technology, PRC); Ninghui Sun (Institute of Compute Technology, PRC)
  • Palacios and Kitten: New High Performance Operating Systems for Scalable Virtualized and Native Supercomputing. John Lange (Northwestern University, US); Kevin Pedretti (Sandia National Laboratories, US); Trammell Hudson (Sandia National Laboratories, US); Peter Dinda (Northwestern University, US); Zheng Cui (University of New Mexico, US); Lei Xia (Northwestern University, US); Patrick Bridges (University of New Mexico, US); Andy Gocke (Northwestern University, US); Steven Jaconette (Northwestern University, US); Michael Levenhagen (Sandia National Laboratories, US); and Ron Brightwell (Sandia National Laboratories, US)
  • MMT: Exploiting Fine-Grained Parallelism in Dynamic Memory Management. Devesh Tiwari (North Carolina State University, US); Sanghoon Lee (North Carolina State University, US); James Tuck (North Carolina State University, US); Yan Solihin (North Carolina State University, US)
  • Optimization of Applications with Non-blocking Neighborhood Collectives via Multisends on the Blue Gene/P Supercomputer. Sameer Kumar (IBM Research, US); Philip Heidelberger (IBM Research, USA); Dong Chen (IBM Research, US); Michael Hines (IBM Research, US)

6.21 Session 20: Parallel Graph Algorithms I

Chair: Cynthia Phillips

  • A Multi-Source Label-Correcting Algorithm for the All-Pairs Shortest Paths Problem. Hiroki Yanagisawa (IBM, Japan)
  • Parallel Computation of Best Connections in Public Transportation Networks. Daniel Delling (Microsoft Research, Germany); Bastian Katz (Karlsruhe Institute of Technology, Germany); Thomas Pajor (Universitat Karlsruhe)
  • Dynamically Tuned Push-Relabel Algorithm for the Maximum Flow Problem on CPU-GPU-Hybrid Platforms. Zhengyu He (Georgia Institute of Technology, US); Bo Hong (Georgia Institute of Technology, US)
  • A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis. Shuangshuang Jin (Pacific Northwest National Laboratory, US); Zhenyu Huang (Pacific Northwest National Laboratory, US); Yousu Chen (Pacific Northwest National Laboratory, US); Daniel Gerardo Chavarria (Pacific Northwest National Laboratory, US); John Feo (Pacific Northwest National Laboratory, US); Pak Wong (Pacific Northwest National Laboratory, US)

6.22 Session 21: Parallel Linear Algebra II

Chair: Esmond Ng

  • Adapting Communication-Avoiding LU and QR Factorizations to Multicore Architectures.. Laura Grigori (INRIA, FR); Simplice Donfack (INRIA, FR); Alok Kumar Gupta (BCCS, Norway)
  • QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment. Emmanuel Agullo (University of Tennessee, US); Camille Coti (INRIA, Saclay-Ile de France, FR); Jack Dongarra (University of Tennessee, Knoxville, US); Thomas Herault (Universite Paris Sud (LRI), FR); Julien Langou (University of Colorado Denver, US)
  • Tile QR Factorization with Parallel Panel Processing for Multicore Architectures. Bilel Hadri (University of Tennessee, US); Hatem Ltaief (University of Tennessee, US); Emmanuel Agullo (University of Tennessee, US); Jack Dongarra (University of Tennessee, Knoxville, US)
  • Linpack Evaluation on a Supercomputer with Heterogenous Accelerators. Toshio Endo (Tokyo Institute of Technology, Japan); Akira Nukada (Tokyo Institute of Technology, Japan); Satoshi Matsuoka (Tokyo Institute of Technology, Japan); Naoya Maruyama (Tokyo Institute of Technology, Japan)

6.23 Session 22: Caches and Caching

Chair: Richard Murphy

  • Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement Policies. Kamil Kedzierski (Technical University of Catalonia, UPC, Spain); Miquel Moreto (Universitat Politecnica de Catalunya, Spain); Francisco Cazorla (Barcelona Supercomputing Center); Mateo Valero (Technical University of Catalonia, Spain)
  • Exploiting Set-Level Non-Uniformity of Capacity Demand to Enhance CMP Cooperative Caching. Dongyuan Zhan (University of Nebraska at Lincoln, US); Hong Jiang (University of Nebraska at Lincoln, US); Sharad Seth (University of Nebraska at Lincoln, US)
  • Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene System. Seetharami Seelam (IBM T.J. Watson Research Center); I-Hsin Chung (IBM T.J. Watson Research Center); John Bauer (IBM T.J. Watson Research Center); Hui-Fang Wen (IBM T.J. Watson Research Center)
  • Intra-Application Cache Partitioning. Sai Prashanth Muralidhara (The Pennsylvania State University, US); Mahmut Taylan Kandemir (The Pennsylvania State University, US); Padma Raghavan (The Pennsylvania State University, US)

6.24 Session 23: Thread Scheduling

Chair: Guang Gao

  • SLAW: a Scalable Locality-aware Adaptive Work-stealing Scheduler. Yi Guo (Rice University); Jlsheng Zhao (Rice University); Vincent Cave (Rice University); Vivek Sarkar (Rice University)
  • Executing Task Graphs Using Work-Stealing. Kunal Agrawal (Washington University in St. Louis, US); Charles Leiserson (Massachusetts Institute of Technology, US); Jim Sukha (Massachusetts Institute of Technology, US)
  • Structuring Execution of OpenMP Applications for Multicore Architectures. François Broquedis (University of Bordeaux, FR); Olivier Aumage (University of Bordeaux, FR); Brice Goglin (INRIA Bordeaux - Sud Ouest, FR); Samuel Thibault (University of Bordeaux, FR); Pierre-Andre Wacrenier (University of Bordeaux, FR); Raymond Namyst (University of Bordeaux, FR)
  • Oversubscription on Multicore Processors. Costin Iancu (Lawrence Berkeley National Laboratory); Steven Hofmeyr (Lawrence Berkeley National Laboratory); Yili Zheng (Lawrence Berkeley National Laboratory); Filip Blagojevic (Lawrence Berkeley National Laboratory)

6.25 Session 24: Distributed Algorithms

Chair: Amitabha Bagchi

  • A Scalable Algorithm for Maintaining Perpetual System Connectivity in Dynamic Distributed Systems. Tarun Bansal (The Ohio State University, US); Neeraj Mittal (The University of Texas at Dallas, US)
  • Algorithmic Mechanisms for Internet-based Master-Worker Computing with Untrusted and Selfish Workers. Antonio Fernández Anta (Universidad Rey Juan Carlos, Spain); Chryssis Georgiou (University of Cyprus, Cyprus); Miguel Mosteiro (Rutgers University, US and Universidad Rey Juan Carlos, Spain)
  • Stabilizing Pipelines for Streaming Applications. Andrew Berns (The University of Iowa, US); Anurag Dasgupta (The University of Iowa, US); Sukumar Ghosh (The University of Iowa, US)
  • A Dynamic Approach for Characterizing Collusion in Desktop Grids. Louis-Claude Canon (Nancy University); Emmanuel Jeannot (INRIA Bordeaux Sud-Ouest, FR): Jon Weissman (University of Minnesota, Twin Cities, US)

6.26 Session 25: Automatic Tuning and Automatic Parallelization

Chair: Guang Gao

  • Offline Library Adaptation Using Automatically Generated Heuristics. Frédéric de Mesmay (Carnegie Mellon University, US); Yevgen Voronenko (Carnegie Mellon University, US); Markus Pueschel (Carnegie Mellon University, US)
  • An Auto-Tuning Framework for Parallel Multicore Stencil Computations. Shoaib Kamil (Lawrence Berkeley National Laboratory, US); Cy Chan (Massachusetts Institute of Technology, US); Leonid Oliker (Lawrence Berkeley National Laboratory, US); John Shalf (Lawrence Berkeley National Laboratory, US); Samuel Williams (Lawrence Berkeley National Laboratory, US)
  • DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors. Albert Hartono (Ohio State University, US); Muthu Manikandan Baskaran (Ohio State University, US); J. Ram Ramanujan (Louisiana State University); Ponnuswamy Sadayappan (Ohio State University, US)
  • Using Focused Regression For Accurate Time-Constrained Scaling of Scientific Applications. Bradley Barnes (University of Georgia, US); Jeonifer Garren (University of Georgia, US); David Lowenthal (University of Arizona, US); Jaxk Reeves (University of Georgia, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US); Barry Rountree (University of Georgia, US)

6.27 Session 26: Architectural Support for Runtime Systems

Chair: Arun Rodrigues

  • A Low Cost Split-Issue Technique to Improve Performance of SMT Clustered VLIW Processors. Manoj Gupta (Universitat Politècnica de Catalunya, Spain); Fermín Sánchez (Universitat Politècnica de Catalunya, Spain); Josep Llosa (Universitat Politècnica de Catalunya, Spain)
  • Exploiting Inter-thread Temporal Locality for Chip Multithreading. Jiayuan Meng (University of Virginia, US); Jeremy Sheaffer (NVIDIA, US); Kevin Skadron (University of Virginia, US)
  • Profitability-Based Power Allocation for Speculative Multithreaded Systems. Polychronis Xekalakis (University of Edinburgh, UK); Nikolas Ioannou (University of Edinburgh, UK); Salman Khan (University of Edinburgh, UK); Marcelo Cintra (University of Edinburgh, UK)
  • Evaluating Standard-Based Self-Virtualizing Devices: A Performance Study on 10 GbE NICs with SR-IOV Support. Jiuxing Liu (IBM T.J. Watson Research Center, US)

6.28 Session 27: Client-Server System Management and Analysis

Chair: Chen Ding

  • QoS Assessment of WS-BPEL Processes through non-Markovian Stochastic Petri Nets. Dario Bruneo (Universita di Messina, Italy); Salvatore Distefano (Universita di Messina, Italy); Francesco Longo (Universita di Messina, Italy); Marco Scarpa (Universita di Messina, Italy)
  • Power-aware Resource Provisioning in Cluster Computing. Kaiqi Xiong (North Carolina State University, US)
  • Using the Middle Tier to Understand Cross-Tier Delay in a Multi-tier Application. Haichuan Wang (IBM Research, PRC); Qiming Teng (IBM Research, PRC); Xiao Zhong (IBM Research, PRC); Peter Sweeney (IBM T.J. Watson Research Center, US)
  • Service and Resource Discovery in Cycle-Sharing Environments with a Utility Algebra. João Nuno Silva (Technical University of Lisbon, Portugal); Paulo Ferreira (Technical University of Lisbon, Portugal); Luís Veiga (Technical University of Lisbon, Portugal)

6.29 Session 28: Parallel Graph Algorithms II

Chair: Padma Raghavan

  • Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA. Zheng Wei (University of Maryland, US); Joseph Jaja (University of Maryland, College Park, US)
  • Parallel External Memory Graph Algorithms. Lars Arge (Aarhus University, Denmark); Michael Goodrich (University of California, Irvine, US); Nodari Sitchinava (Aarhus University, Denmark)
  • Engineering a Scalable High Quality Graph Partitioner. Mauel HoltGrewe (University of Karlsruhe, Germany); Peter Sanders (University of Karlsruhe, Germany); Christian Schulz (University of Karlsruhe, Germany)

6.30 Session 29: Algorithms for Wireless Networks

Chair: Neeraj Mittal

  • Sparse Power-Efficient Topologies for Wireless Ad Hoc Sensor Networks. Amitabha Bagchi (Indian Institute of Technology, Delhi, India)
  • Contention-based Georouting with Guaranteed Delivery, Minimal Communication Overhead, and Shorter Paths in Wireless Sensor Networks. Stefan Rührup (OFFIS - Institute for Information Technology, Germany); Ivan Stojmenovic (University of Ottawa, Canada)
  • Midpoint Routing Algorithms for Delaunay Triangulations. (ppt) Albert Zomaya (University of Sydney, Australia); Weisheng Si (University of Sydney, Australia)
  • A Local, Distributed Constant-Factor Approximation Algorithm for the Dynamic Facility Location Problem. Bastian Degener (University of Paderborn, Germany); Barbara Kempkes (University of Paderborn, Germany); Peter Pietrzyk (University of Paderborn, Germany)

6.31 Session 30: Analysis of heterogeneity and future platforms

Chair: Richard Murphy

6.32 Session 31: Data Management

Chair: Zhihui Du

  • A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems. Dong Yuan (Swinburne University of Technology, Australia); Yun Yang (Swinburne University of Technology, Australia); Xiao Liu (Swinburne University of Technology, Australia); Jinjun Chen (Swinburne University of Technology, Australia)
  • BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications. Bogdan Nicolae (University of Rennes, FR), Diana Moise (INRIA, Rennes, FR); Gabriel Antoniu (INRIA Rennes-Bretagne, FR); Luc Bougé (IRISA/Ecole Normale Superieure Cachan Brittany, FR); Matthieu Dorier (Ecole Normale Superieure Cachan, FR)
  • PreDatA - Preparatory Data Analytics on Peta-Scale Machines. Fang Zheng (Georgia Institute of Technology, US); Hasan Abbasi (University of Sydney, Australia); Ciprian Docan (Rutgers University, US); Jay Lofstead (Georgia Institute of Technology, US); Qing Liu (Oak Ridge National Laboratory, US); Scott Klasky (Oak Ridge National Laboratory, US); Manish Prashar (Rutgers University, US); Norbert Podhorszki (Oak Ridge National Laboratory, US); Karsten Schwan (Georgia Institute of Technology, US); Matt Wolf (Georgia Institute of Technology, US)
  • Reconciling Scratch Space Consumption, Exposure, and Volatility to Achieve Timely Staging of Job Input Data. Henry Monti (Virginia Tech, US); Ali R Butt (Virginia Tech, US); Sudharshan S Vazhkudai (Oak Ridge National Laboratory, US)

6.33 Session 32: Synchronization

Chair: Chen Ding

  • Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism. (pptx) Jun Shirako (Rice University, US); Vivek Sarkar (Rice University, US)
  • Clustering JVMs with Software Transactional Memory Support. Christos Kotselidis (University of Manchester, UK); Mikel Luján (University of Manchester, UK); Behram Khan (University of Manchester, UK); Mohammad Ansari (University of Manchester, UK); Konstantinos Malakasis (University of Manchester, UK); Chris Kirkham (University of Manchester, UK); Ian Watson (University of Manchester, UK)
  • Inter-Block GPU Communication via Fast Barrier Synchronization. Shucai Xiao (Virginia Tech, US); Wu-chun Feng (Virginia Tech, US)
  • A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring. Patrick Pak-Ching Lee (The Chinese University of Hong Kong, Hong Kong); Tian Bu (Bell Labs, Lucent, US); Girish Chandranmenon (Lucent Technologies, US)

7 Posters

