IPDPS 2010 Presentations

Table of Contents

1 IPDPS 2010 Presentations

Home for IPDPS 2010 presentations, including some screencasts and audio files used to work around the volcano issues.

2 Keynotes

  • Exascale: Parallelism gone wild! Craig Stunkel (IBM T.J. Watson Research Center); TCPP INVITED SPEAKER
  • Operating System Resource Management. Burton Smith (Microsoft)
  • The new era in genomics: Opportunities and challenges for high performance computing. Srinivas Aluru (Iowa State University)
  • Where Is Your Dog's Belly Button? or IC-Scheduling Theory: A New Scheduling Paradigm for Task-Hungry Platforms. Arnold L. Rosenberg (Colorado State University and University of Massachusetts Amherst)

3 Multicore Panel

4 Workshops

4.1 HCW: Heterogeneity in Computing Workshop

4.2 RAW: Reconfigurable Architectures Workshop

4.3 HIPS: Workshop on High-Level Parallel Programming Models & Supportive Environments

4.4 NIDISC: Workshop on Nature Inspired Distributed Computing

4.5 HiCOMB: Workshop on High Performance Computational Biology

4.6 APDCM: Advances in Parallel and Distributed Computing Models

4.7 CAC: Communication Architecture for Clusters

4.8 HPPAC: High-Performance, Power-Aware Computing

4.9 HPGC: High Performance Grid Computing

4.10 SMTPS: Workshop on System Management Techniques, Processes, and Services

4.11 PDSEC: Workshop on Parallel and Distributed Scientific and Engineering Computing

4.12 PMEO: Performance Modeling, Evaluation, and Optimisation of Ubiquitous Computing and Networked Systems

4.13 DPDNS: Dependable Parallel, Distributed and Network-Centric Systems

4.14 HOTP2P: International Workshop on Hot Topics in Peer-to-Peer Systems

4.15 MTAAP: Workshop on Multi-Threaded Architectures and Applications

4.16 PDCoF: Workshop on Parallel and Distributed Computing in Finance

4.17 LSPP: Workshop on Large-Scale Parallel Processing

4.18 JSSPP: Workshop on Job Scheduling Strategies for Parallel Processing

5 Sessions

5.1 Session 1: Algorithms for Network Management

Chair: Anne Benoit

5.2 Session 2: Scientific Computing with GPUs

Chair: Ling Zhou

  • Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs. Michela Taufer (University of Delaware, US); Philip Saponaro (University of Delaware, US); Omar Padron (Kean University, US); Sandeep Patel (University of Delaware, US)
  • Implementing the Himeno Benchmark with CUDA on GPU Clusters. Everett Phillips (NVIDIA, US); Massimiliano Fatica (NVIDIA, US)
  • Direct Self-Consistent Field Computations on GPU Clusters. (pptx) Guochun Shi, Volodymyr Kindratenko (National Center for Supercomputing Applications, US); Ivan Ufimtsev, Todd Martinez (Stanford University, US)
  • Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs. Lifan Xu (University of Delaware, US); Michela Taufer (University of Delaware, US); Stuart Collins (University of Delaware, US); Dionisios Vlacho (University of Delaware, US)

5.3 Session 3: Data Storage and Memory Systems

Chair: Bradley Kuszmaul

  • DEBAR: A Scalable High-Performance De-duplication Storage System for Backup and Archiving. Tianming Yang (Huazhong University of Science and Technology, PRC); Hong Jiang (University of Nebraska, US); Dan Feng (Huazhong University of Science and Technology, PRC); Zhongying Niu (Huazhong University of Science and Technology, PRC); Ke Zhou (Huazhong University of Science and Technology, PRC); Yaping Wan (Huazhong University of Science and Technology, PRC)
  • HPDA: A Hybrid Parity-based Disk Array for Enhanced Performance and Reliability. Bo Mao (Huazhong University of Science and Technology, PRC); Hong Jiang (University of Nebraska, US); Dan Feng (Huazhong University of Science and Technology, PRC); Suzhen Wu (Huazhong University of Science and Technology, PRC); Jianxi Chen (Huazhong University of Science and Technology, PRC); Lingfang Zeng (Huazhong University of Science and Technology, PRC); Lei Tian (Huazhong University of Science and Technology, PRC)
  • Fine-Grained QoS Scheduling for PCM-based Main Memory Systems. Ping Zhou (University of Pittsburgh, US); Yu Du (University of Pittsburgh, US); Youtao Zhang (University of Pittsburgh, US); Jun Yang (University of Pittsburgh, US)
  • Performance Impact of Resource Contention in Multicore Systems. Robert Hood (CSC-NASA Ames, US); Haoqiang Jin (NASA Ames Research Center, US); Piyush Mehrotra (NASA Ames Research Center, US); Johnny Chang (CSC-NASA Ames Research Center, US); Jahed Djomehri (NASA Ames Research Center, US); Sharad Gavali (NASA Ames Research Center, US); Dennis Jespersen (NASA Ames Research Center, US); Kenichi Taylor (Silicon Graphics International, US); and Rupak Biswas (NASA Ames Research Center, US)

5.4 Session 4: Fault Tolerance

Chair: Almadena Chtchelkanova

  • Improving the Performance of Hypervisor-Based Fault Tolerance. Jun Zhu (Peking University, PRC); Wei Dong (Peking University, PRC); ZheFu Jiang (Peking University, PRC); Xiaogang Shi (Peking University, PRC); Zhen Xiao (Peking University, PRC); XiaoMing Li (Peking University, PRC)
  • Supporting Fault Tolerance in a Data-Intensive Computing Middleware. (ppt) Tekin Bicer (The Ohio State University, US); Wei Jiang (The Ohio State University, US); Gagan Agrawal (The Ohio State University, US)
  • A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs. Naoya Maruyama (Tokyo Institute of Technology, JPN); Akira Nukada (Tokyo Institute of Technology, JPN); Satoshi Matsuoka (Tokyo Institute of Technology, JPN)
  • Scalable Failure Recovery for High-performance Data Aggregation. Dorian Arnold (University of New Mexico, US); Barton Miller (University of Wisconsin, US)

5.5 Session 5: Sorting

Chair: George Biros

  • High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs. (ppt) Xiaochun Ye (Chinese Academy of Sciences, PRC); Dongrui Fan (Chinese Academy of Sciences, PRC); Wei Lin (Chinese Academy of Sciences, PRC); Nan Yuan (Chinese Academy of Sciences, PRC); Paolo Ienne (EPFL, Switzerland)
  • GPU Sample Sort. (notes) Vitaly Osipov (Karlsruhe Institute of Technology, Germany); Peter Sanders (University of Karlsruhe, Germany); Nikolaj Leischner (University of Karlsruhe, Germany)
  • Highly Scalable Parallel Sorting. Edgar Solomonik (University of Illinois at Urbana-Champaign, US); Laxmikant Kale (University of Illinois at Urbana-Champaign, US)

5.6 Session 6: Scheduling

Chair: David Bunde

  • A Scheduling Framework for Large-Scale, Parallel, and Topology-Aware Applications. Valentin Kravtsov (Technion - Israel Institute of Technology, Israel); Pavel Bar (Technion - Israel Institute of Technology, Israel); David Carmeli (Technion - Israel Institute of Technology, Israel); Assaf Schuster (Technion - Israel Institute of Technology, Israel); Martin Swain (Technion - Israel Institute of Technology, Israel);
  • Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors. Risat Pathan (Chalmers University of Technology, Sweden); Jan Jonsson (Chalmers University of Technology, Sweden)
  • Scheduling Algorithms for Linear Workflow Optimization. Kunal Agrawal (Washington University in St. Louis, US); Anne Benoit (Ecole Normale Superieure de Lyon Lyon, FR); Loic Magnan (Ecole Normale Superieure de Lyon Lyon, FR); Yves Robert (Ecole Normale Superieure de Lyon, FR)
  • Hypergraph-Based Task-Bundle Scheduling Towards Efficiency and Fairness in Heterogeneous Distributed Systems. Han Zhao (Oklahoma State University, US); Xinxin Liu (Oklahoma State University, US); Xiaolin (Andy) Li (Oklahoma State University, US)

5.7 Session 7: Performance/Scalability Improvement for Scientific Applications

Chair: Srinivas Aluru

5.8 Session 8: Network Architecture and Algorithms

Chair: Neeraj Mittal

  • Achieve Constant Performance Guarantees using Asynchronous Crossbar Scheduling without Speedup. Deng Pan (Florida International University, US); Kia Makki (Florida International University, US); Niki Pissinou (Florida International University, US)
  • Distributive Waveband Assignment in Multi-granular Optical Networks. Yang Wang (Georgia State University, US); Xiaojun Cao (Georgia State University, US)
  • QoS Aware BiNoC Architecture. Shih-Hsin Lo (National Taiwan University, Taiwan); Ying-Cherng Lan (National Taiwan University, Taiwan); Hsin-Hsien Yeh (National Taiwan University, Taiwan); Wen-Chung Tsai (National Taiwan University, Taiwan); Yu Hen Hu (National Taiwan University, Taiwan); Sao-Jie Chen (National Taiwan University, Taiwan)
  • First Experiences with Congestion Control in InfiniBand Hardware. (ppt, wmv) Ernst Gran (Simula Research Laboratory, Norway); Magne Eimot (Simula Research Laboratory, Norway); Sven-Arne Reinemo (Simula Research Laboratory, Norway); Tor Skeie (Simula Research Laboratory, Norway); Olav Lysne (Simula Research Laboratory, Norway); Lars Paul Huse (Simula Research Laboratory, Norway)

5.9 Session 9: Software Support for Using GPUs

Chair: Anne Elster

  • Object-Oriented Stream Programming using Aspects. Mingliang Wang (Rutgers University, US); Manish Parashar (Rutgers University, US)
  • Optimal Loop Unrolling for GPGPU Programs. Giridhar Sreenivasa Murthy (The Ohio State University, US); Muthu Ravishankar (The Ohio State University, US); Muthu Manikandan Baskaran (The Ohio State University, US); Ponnuswamy Sadayappan (The Ohio State University, US);
  • Speculative Execution on Multi-GPU Systems. Gregory Diamos (Georgia Institute of Technology, US); Sudakhar Yalamanchili (Georgia Institute of Technology, US)
  • Dynamic Load Balancing on Single- and Multi-GPU Systems. Long Chen (University of Delaware, US); Oreste Villa (Pacific Northwest National Laboratory, US); Sriram Krishnamoorthy (Pacific Northwest National Laboratory, US); Guang Gao (University of Delaware, US)

5.10 Session 10: Performance Prediction and Benchmarking Tools

Chair: George Bosilca

  • Servet: A Benchmark Suite for Autotuning on Multicore Clusters. Jorge González-Domínguez (University of A Coruna, Spain); Guillermo Lopez Taboada (University of A Coruna, Spain); Basilio Fraguela (University of A Coruna, Spain); María J. Martín (University of A Coruna, Spain); Juan Tourino (University of A Coruna, Spain);
  • KRASH: Reproducible CPU Load Generation on Many-Cores Machines. Swann Perarnau (INRIA Moais Research Team, FR); Guillaume Huard (ID Laboratory, FR)
  • Power-aware MPI Task Aggregation Prediction for High-End Computing Systems. Dong Li (Virginia Tech, US); Dimitrios Nikolopoulos (Foundation of Research and Technology Hellas, Greece); Kirk Cameron (Virginia Tech, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US)

5.11 Session 11: Resource Allocation

Chair: Anne Benoit

  • Varying Bandwidth Resource Allocation Problem with Bag Constraints. Venkatesan Chakaravarthy (IBM Research, India); Vinayaka Pandit (IBM Research, India); Yogish Sabharwal (IBM Research, India); Deva Seetharam (IBM Research, India)
  • Decentralized Resource Management for Multi-core Desktop Grids. Jaehwan Lee (University of Maryland, College Park, US); Pete Keleher (University of Maryland, US); Alan Sussman (University of Maryland, US)
  • Dynamic Fractional Resource Scheduling for HPC Workloads. Mark Lee Stillwell (University of Hawaii at Manoa, US); Frédéric Vivien (INRIA, FR); Henri Casanova (University of Hawaii at Manoa)
  • ADEPT Scalability Predictor in Support of Adaptive Resource Allocation. Arash Deshmeh (University of Windsor, Canada); Jacob Machina (University of Windsor, Canada); Angela Sodan (University of Windsor, Canada)

5.12 Session 12: Image Processing and Data Mining

Chair: David Konerding

  • Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution. Jiayuan Meng (University of Virginia); Anand Raghunathan (NEC Research Labs, US); Srimat Chakradhar (NEC Research Labs, US); Surendra Byna (NEC Research Labs, US)
  • Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms. Konstantis Daloukas (University of Thessaly, Greece); Christos Antonopoulos (University of Thessaly, Greece); Nikos Bellas (University of Thessaly, Greece); Sek Chai (Motorola, US)
  • Large-Scale Multi-Dimensional Document Clustering on GPU Clusters. Yongpeng Zhang (North Carolina State University, US); Frank Mueller (North Carolina State University, US); Xiaohui Cui (Oak Ridge National Laboratory, US); Thomas Potok (Oak Ridge National Laboratory, US)
  • eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in Windows Azure Platform. (pptx) Jie Li (University of Virginia, US); Deb Agarwal (Lawrence Berkeley National Laboratory, US); Marty Humphrey (University of Virginia, Charlottesville, US); Catharine van Ingen (Microsoft Research); Keith Jackson (Lawrence Berkeley National Laboratory, US); Youngryel Ryu (University of California at Berkeley, US)

5.13 Session 13: Transactional Memory

Chair: Anne Elster

  • Locality-Aware Adaptive Grain Signatures for Transactional Memories. Woojin Choi (University of Southern California, US); Jeffrey Draper (University of Southern California, US)
  • Dynamic Analysis of the Relay Cache-Coherence Protocol for Distributed Transactional Memory. Bo Zhang (Virginia Tech, US); Binoy Ravindran (Virginia Tech, US)
  • Runtime Checking of Serializability in Software Transactional Memory. Arnab Sinha (Princeton University, US); Sharad Malik (Princeton University, US)
  • Consistency in Hindsight, A Fully Decentralized STM Algorithm. (slidecast) Annette Bieniusa (University of Freiburg, US); Thomas Fuhrmann (Technische Universitat Munchen, Germany)

5.14 Session 14: Tools for Performance and Correctness Analysis

Chair: Almadena Chtchelkanova

  • Identifying Ad-hoc Synchronization for Enhanced Race Detection. Ali Jannesari (University of Karlsruhe, Germany); Water F. Tichy (University of Karlsruhe, Germany)
  • Improving the Performance of Program Monitors with Compiler Support in Multi-Core Environment. Guojin He (University of Minnesota, US); Antonia Zhai (University of Minnesota, US)
  • On-Line Detection of Large-Scale Parallel Application's Structure. (proposed transcript doc) German Llort (Barcelona Supercomputing Center, Spain); Juan Gonzalez Garcia (Universitat Politècnica de Catalunya, Spain); Harald Servat (Barcelona Supercomputing Center, Spain); Judit Gimenez (Barcelona Supercomputing Center, Spain); Jesus Labarta (Barcelona Supercomputing Center, Spain)
  • Adaptive Sampling-Based Profiling Techniques for Optimizing the Distributed JVM Runtime. King Tim Lam (The University of Hong Kong, Hong Kong); Yang Luo (The University of Hong Kong, Hong Kong); Cho-Li Wang (The University of Hong Kong, Hong Kong)

5.15 Session 15: Parallel Linear Algebra I

Chair: Esmond Ng

  • Algorithmic Cholesky Factorization Fault Recovery. Douglas Hakkarinen (Colorado School of Mines, US); Zizhong Chen (Colorado School of Mines, US)
  • Analyzing the Soft-Error Resiliance of Linear Solvers on Multicore Multiprocessors. Konrad Malkowski (The Pennsylvania State University); Padma Raghavan (The Pennsylvania State University); Mahmut Taylan Kandemir (The Pennsylvania State University)
  • A Parallel Architecture for Meaning Comparison. Suneil Mohan (Texas A&M University, US); Amitava Biswas (Texas A&M University, US); Aalap Tripathy (Texas A&M University, US); Jagannath Panigraphy (Texas A&M University, US); Rabi Mahapatra (Texas A&M University, US)

5.16 Plenary Session - Best Papers

Chair: Cynthia Phillips

  • Extreme Scale Computing: Modeling the Impact of System Noise in Multicore Clustered Systems. Seetharami R Seelam (IBM Research, US); Liana Fong (IBM T.J. Watson Research Center, US); Asser Tantawi (IBM T.J. Watson Research Center, US); John Lewars (IBM Systems and Technology Group, US); John Divirgilio (IBM, US); Kevin Gildea (IBM, US)
  • Oblivious Algorithms for Multicores and Network of Processors. Rezaul Chowdhury (University of Texas at Austin, US); Francesco Silvestri (University of Padova, Italy); Brandon Blakeley (University of Texas, US); Vijaya Ramachandran (University of Texas at Austin, US)
  • Analyzing and Adjusting User Runtime Estimates to Improve Job Scheduling on the Blue Gene/P. Wei Tang (Illinois Institute of Technology, US); Narayan Desai (Argonne National Laboratory, US), Daniel Buettner (Argonne National Laboratory, US); Zhiling Lan (Illinois Instititue of Technology, US)
  • Performance Evaluation of Concurrent Collections on High-Performance Multicore Computing Systems. Aparna Chandramowlishwaran (Georgia Institute of Technology, US); Kathleen Knobe (Intel, US); Richard W. Vuduc (Georgia Institute of Technology, US)

5.17 Session 16: P2P Algorithms

Chair: Amitabha Bagchi

  • A Hybrid Interest Management Mechanism for Peer-to-Peer Networked Virtual Environments. Ke Pan (Nanyang Technological University, Singapore); Wentong Cai (Nanyang Technological University, Singapore); Xueyan Tang (Nanyang Technological University, Singapore); Suiping Zhou (Nanyang Technological University, Singapore); Stephen John Turner (Nanyang Technological University, Singapore)
  • Attack-Resistant Frequency Counting. Bo Wu (University of New Mexico, US); Valerie King (University of Victoria, Canada); Jared Saia (University of New Mexico, US)
  • Overlays with preferences: Approximation algorithms for matching with preference lists. Giorgos Georgiadis (Chalmers University of Technology, Sweden); Marina Papatriantafilou (Chalmers University of Technology, Sweden)
  • Analysis of Durability in Replicated Distributed Storage Systems. Joseph Pasquale (University of California, San Diego, US); Sriram Ramabhadran (University of California, San Diego, US)

5.18 Session 17: Parallel Solutions for String and Sequence Problems

Chair: Ruppa Thulasiram

  • Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching. Weirong Jiang (University of Southern California, US); Yi-Hua Yang (University of Southern California, US); Viktor K. Prasanna (University of Southern California, US)
  • Head-Body Partitioned String Matching for Deep Packet Inspection with Scalable and Attack-Resilient Performance. Yi-Hua Yang (University of Southern California, US); Viktor K. Prasanna (University of Southern California, US); Chenqian Jiang (University of Southern California, US)
  • Parallel de novo Assembly of Large Genomes from High-Throughput Short Reads. Benjamin G. Jackson (AOL, US); Matthew Regennitter (Iowa State University, US); Xiao Yang (Iowa State University, US); Patrick Schnable (Iowa State University, US); Srinivas Aluru (Iowa State University, US)
  • Efficient Parallel Algorithms for Maximum-Density Segment Problem. Xue Wang (Georgia State University, US); Fasheng Qiu (Georgia State University, US); Sushil Prasad (Georgia State University, US); Guantao Chen (Georgia State University, US)

5.19 Session 18: Energy-aware Task Management

Chair: David Bunde

  • Hybrid MPI/OpenMP Power-aware Computing. Dong Li (Virginia Tech, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US); Kirk Cameron (Virginia Tech, US); Dimitrios S. Nikolopoulos (Foundation for Research and Technology Hellas, Greece)
  • Performance and Energy Optimization of Concurrent Pipelined Applications. Anne Benoit (Ecole Normale Supérieure de Lyon, FR); Paul Renaud-Goud (Ecole Normale Supérieure de Lyon, FR); Yves Robert (Ecole Normale Supérieure de Lyon, FR)
  • Robust Control-theoretic Thermal Balancing for Server Clusters. Yong Fu (Washington University in St. Louis, US); Chenyang Lu (Washington University in St. Louis, US); Hongan Wang (Washington University in St. Louis, US)
  • A Simple Thermal Model for Multi-core Processors and Its Application to Slack Allocation. Zhe Wang (University of Florida, US); Sanjay Ranka (University of Florida, US)

5.20 Session 19: Parallel Operating Systems and System Software

Chair: George Bosilca

  • GenerOS: An Asymmetric Operating System Kernel for Multi-core Systems. Qingbo Yuan (Institute of Compute Technology, PRC); Jianbo Zhao (Institute of Compute Technology, PRC); Mingyu Chen (Institute of Compute Technology, PRC); Ninghui Sun (Institute of Compute Technology, PRC)
  • Palacios and Kitten: New High Performance Operating Systems for Scalable Virtualized and Native Supercomputing. John Lange (Northwestern University, US); Kevin Pedretti (Sandia National Laboratories, US); Trammell Hudson (Sandia National Laboratories, US); Peter Dinda (Northwestern University, US); Zheng Cui (University of New Mexico, US); Lei Xia (Northwestern University, US); Patrick Bridges (University of New Mexico, US); Andy Gocke (Northwestern University, US); Steven Jaconette (Northwestern University, US); Michael Levenhagen (Sandia National Laboratories, US); and Ron Brightwell (Sandia National Laboratories, US)
  • MMT: Exploiting Fine-Grained Parallelism in Dynamic Memory Management. Devesh Tiwari (North Carolina State University, US); Sanghoon Lee (North Carolina State University, US); James Tuck (North Carolina State University, US); Yan Solihin (North Carolina State University, US)
  • Optimization of Applications with Non-blocking Neighborhood Collectives via Multisends on the Blue Gene/P Supercomputer. Sameer Kumar (IBM Research, US); Philip Heidelberger (IBM Research, USA); Dong Chen (IBM Research, US); Michael Hines (IBM Research, US)

5.21 Session 20: Parallel Graph Algorithms I

Chair: Cynthia Phillips

  • A Multi-Source Label-Correcting Algorithm for the All-Pairs Shortest Paths Problem. Hiroki Yanagisawa (IBM, Japan)
  • Parallel Computation of Best Connections in Public Transportation Networks. Daniel Delling (Microsoft Research, Germany); Bastian Katz (Karlsruhe Institute of Technology, Germany); Thomas Pajor (Universitat Karlsruhe)
  • Dynamically Tuned Push-Relabel Algorithm for the Maximum Flow Problem on CPU-GPU-Hybrid Platforms. Zhengyu He (Georgia Institute of Technology, US); Bo Hong (Georgia Institute of Technology, US)
  • A Novel Application of Parallel Betweenness Centrality to Power Grid Contingency Analysis. Shuangshuang Jin (Pacific Northwest National Laboratory, US); Zhenyu Huang (Pacific Northwest National Laboratory, US); Yousu Chen (Pacific Northwest National Laboratory, US); Daniel Gerardo Chavarria (Pacific Northwest National Laboratory, US); John Feo (Pacific Northwest National Laboratory, US); Pak Wong (Pacific Northwest National Laboratory, US)

5.22 Session 21: Parallel Linear Algebra II

Chair: Esmond Ng

  • Adapting Communication-Avoiding LU and QR Factorizations to Multicore Architectures.. Laura Grigori (INRIA, FR); Simplice Donfack (INRIA, FR); Alok Kumar Gupta (BCCS, Norway)
  • QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment. Emmanuel Agullo (University of Tennessee, US); Camille Coti (INRIA, Saclay-Ile de France, FR); Jack Dongarra (University of Tennessee, Knoxville, US); Thomas Herault (Universite Paris Sud (LRI), FR); Julien Langou (University of Colorado Denver, US)
  • Tile QR Factorization with Parallel Panel Processing for Multicore Architectures. Bilel Hadri (University of Tennessee, US); Hatem Ltaief (University of Tennessee, US); Emmanuel Agullo (University of Tennessee, US); Jack Dongarra (University of Tennessee, Knoxville, US)
  • Linpack Evaluation on a Supercomputer with Heterogenous Accelerators. Toshio Endo (Tokyo Institute of Technology, Japan); Akira Nukada (Tokyo Institute of Technology, Japan); Satoshi Matsuoka (Tokyo Institute of Technology, Japan); Naoya Maruyama (Tokyo Institute of Technology, Japan)

5.23 Session 22: Caches and Caching

Chair: Richard Murphy

  • Adapting Cache Partitioning Algorithms to Pseudo-LRU Replacement Policies. Kamil Kedzierski (Technical University of Catalonia, UPC, Spain); Miquel Moreto (Universitat Politecnica de Catalunya, Spain); Francisco Cazorla (Barcelona Supercomputing Center); Mateo Valero (Technical University of Catalonia, Spain)
  • Exploiting Set-Level Non-Uniformity of Capacity Demand to Enhance CMP Cooperative Caching. Dongyuan Zhan (University of Nebraska at Lincoln, US); Hong Jiang (University of Nebraska at Lincoln, US); Sharad Seth (University of Nebraska at Lincoln, US)
  • Masking I/O Latency using Application Level I/O Caching and Prefetching on Blue Gene System. Seetharami Seelam (IBM T.J. Watson Research Center); I-Hsin Chung (IBM T.J. Watson Research Center); John Bauer (IBM T.J. Watson Research Center); Hui-Fang Wen (IBM T.J. Watson Research Center)
  • Intra-Application Cache Partitioning. Sai Prashanth Muralidhara (The Pennsylvania State University, US); Mahmut Taylan Kandemir (The Pennsylvania State University, US); Padma Raghavan (The Pennsylvania State University, US)

5.24 Session 23: Thread Scheduling

Chair: Guang Gao

  • SLAW: a Scalable Locality-aware Adaptive Work-stealing Scheduler. Yi Guo (Rice University); Jlsheng Zhao (Rice University); Vincent Cave (Rice University); Vivek Sarkar (Rice University)
  • Executing Task Graphs Using Work-Stealing. Kunal Agrawal (Washington University in St. Louis, US); Charles Leiserson (Massachusetts Institute of Technology, US); Jim Sukha (Massachusetts Institute of Technology, US)
  • Structuring Execution of OpenMP Applications for Multicore Architectures. François Broquedis (University of Bordeaux, FR); Olivier Aumage (University of Bordeaux, FR); Brice Goglin (INRIA Bordeaux - Sud Ouest, FR); Samuel Thibault (University of Bordeaux, FR); Pierre-Andre Wacrenier (University of Bordeaux, FR); Raymond Namyst (University of Bordeaux, FR)
  • Oversubscription on Multicore Processors. Costin Iancu (Lawrence Berkeley National Laboratory); Steven Hofmeyr (Lawrence Berkeley National Laboratory); Yili Zheng (Lawrence Berkeley National Laboratory); Filip Blagojevic (Lawrence Berkeley National Laboratory)

5.25 Session 24: Distributed Algorithms

Chair: Amitabha Bagchi

  • A Scalable Algorithm for Maintaining Perpetual System Connectivity in Dynamic Distributed Systems. Tarun Bansal (The Ohio State University, US); Neeraj Mittal (The University of Texas at Dallas, US)
  • Algorithmic Mechanisms for Internet-based Master-Worker Computing with Untrusted and Selfish Workers. Antonio Fernández Anta (Universidad Rey Juan Carlos, Spain); Chryssis Georgiou (University of Cyprus, Cyprus); Miguel Mosteiro (Rutgers University, US and Universidad Rey Juan Carlos, Spain)
  • Stabilizing Pipelines for Streaming Applications. Andrew Berns (The University of Iowa, US); Anurag Dasgupta (The University of Iowa, US); Sukumar Ghosh (The University of Iowa, US)
  • A Dynamic Approach for Characterizing Collusion in Desktop Grids. Louis-Claude Canon (Nancy University); Emmanuel Jeannot (INRIA Bordeaux Sud-Ouest, FR): Jon Weissman (University of Minnesota, Twin Cities, US)

5.26 Session 25: Automatic Tuning and Automatic Parallelization

Chair: Guang Gao

  • Offline Library Adaptation Using Automatically Generated Heuristics. Frédéric de Mesmay (Carnegie Mellon University, US); Yevgen Voronenko (Carnegie Mellon University, US); Markus Pueschel (Carnegie Mellon University, US)
  • An Auto-Tuning Framework for Parallel Multicore Stencil Computations. Shoaib Kamil (Lawrence Berkeley National Laboratory, US); Cy Chan (Massachusetts Institute of Technology, US); Leonid Oliker (Lawrence Berkeley National Laboratory, US); John Shalf (Lawrence Berkeley National Laboratory, US); Samuel Williams (Lawrence Berkeley National Laboratory, US)
  • DynTile: Parametric Tiled Loop Generation for Parallel Execution on Multicore Processors. Albert Hartono (Ohio State University, US); Muthu Manikandan Baskaran (Ohio State University, US); J. Ram Ramanujan (Louisiana State University); Ponnuswamy Sadayappan (Ohio State University, US)
  • Using Focused Regression For Accurate Time-Constrained Scaling of Scientific Applications. Bradley Barnes (University of Georgia, US); Jeonifer Garren (University of Georgia, US); David Lowenthal (University of Arizona, US); Jaxk Reeves (University of Georgia, US); Bronis R. de Supinski (Lawrence Livermore National Laboratory, US); Martin Schulz (Lawrence Livermore National Laboratory, US); Barry Rountree (University of Georgia, US)

5.27 Session 26: Architectural Support for Runtime Systems

Chair: Arun Rodrigues

  • A Low Cost Split-Issue Technique to Improve Performance of SMT Clustered VLIW Processors. Manoj Gupta (Universitat Politècnica de Catalunya, Spain); Fermín Sánchez (Universitat Politècnica de Catalunya, Spain); Josep Llosa (Universitat Politècnica de Catalunya, Spain)
  • Exploiting Inter-thread Temporal Locality for Chip Multithreading. Jiayuan Meng (University of Virginia, US); Jeremy Sheaffer (NVIDIA, US); Kevin Skadron (University of Virginia, US)
  • Profitability-Based Power Allocation for Speculative Multithreaded Systems. Polychronis Xekalakis (University of Edinburgh, UK); Nikolas Ioannou (University of Edinburgh, UK); Salman Khan (University of Edinburgh, UK); Marcelo Cintra (University of Edinburgh, UK)
  • Evaluating Standard-Based Self-Virtualizing Devices: A Performance Study on 10 GbE NICs with SR-IOV Support. Jiuxing Liu (IBM T.J. Watson Research Center, US)

5.28 Session 27: Client-Server System Management and Analysis

Chair: Chen Ding

  • QoS Assessment of WS-BPEL Processes through non-Markovian Stochastic Petri Nets. Dario Bruneo (Universita di Messina, Italy); Salvatore Distefano (Universita di Messina, Italy); Francesco Longo (Universita di Messina, Italy); Marco Scarpa (Universita di Messina, Italy)
  • Power-aware Resource Provisioning in Cluster Computing. Kaiqi Xiong (North Carolina State University, US)
  • Using the Middle Tier to Understand Cross-Tier Delay in a Multi-tier Application. Haichuan Wang (IBM Research, PRC); Qiming Teng (IBM Research, PRC); Xiao Zhong (IBM Research, PRC); Peter Sweeney (IBM T.J. Watson Research Center, US)
  • Service and Resource Discovery in Cycle-Sharing Environments with a Utility Algebra. João Nuno Silva (Technical University of Lisbon, Portugal); Paulo Ferreira (Technical University of Lisbon, Portugal); Luís Veiga (Technical University of Lisbon, Portugal)

5.29 Session 28: Parallel Graph Algorithms II

Chair: Padma Raghavan

  • Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA. Zheng Wei (University of Maryland, US); Joseph Jaja (University of Maryland, College Park, US)
  • Parallel External Memory Graph Algorithms. Lars Arge (Aarhus University, Denmark); Michael Goodrich (University of California, Irvine, US); Nodari Sitchinava (Aarhus University, Denmark)
  • Engineering a Scalable High Quality Graph Partitioner. Mauel HoltGrewe (University of Karlsruhe, Germany); Peter Sanders (University of Karlsruhe, Germany); Christian Schulz (University of Karlsruhe, Germany)

5.30 Session 29: Algorithms for Wireless Networks

Chair: Neeraj Mittal

  • Sparse Power-Efficient Topologies for Wireless Ad Hoc Sensor Networks. Amitabha Bagchi (Indian Institute of Technology, Delhi, India)
  • Contention-based Georouting with Guaranteed Delivery, Minimal Communication Overhead, and Shorter Paths in Wireless Sensor Networks. Stefan Rührup (OFFIS - Institute for Information Technology, Germany); Ivan Stojmenovic (University of Ottawa, Canada)
  • Midpoint Routing Algorithms for Delaunay Triangulations. (ppt) Albert Zomaya (University of Sydney, Australia); Weisheng Si (University of Sydney, Australia)
  • A Local, Distributed Constant-Factor Approximation Algorithm for the Dynamic Facility Location Problem. Bastian Degener (University of Paderborn, Germany); Barbara Kempkes (University of Paderborn, Germany); Peter Pietrzyk (University of Paderborn, Germany)

5.31 Session 30: Analysis of heterogeneity and future platforms

Chair: Richard Murphy

5.32 Session 31: Data Management

Chair: Zhihui Du

  • A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems. Dong Yuan (Swinburne University of Technology, Australia); Yun Yang (Swinburne University of Technology, Australia); Xiao Liu (Swinburne University of Technology, Australia); Jinjun Chen (Swinburne University of Technology, Australia)
  • BlobSeer: Bringing High Throughput under Heavy Concurrency to Hadoop Map/Reduce Applications. Bogdan Nicolae (University of Rennes, FR), Diana Moise (INRIA, Rennes, FR); Gabriel Antoniu (INRIA Rennes-Bretagne, FR); Luc Bougé (IRISA/Ecole Normale Superieure Cachan Brittany, FR); Matthieu Dorier (Ecole Normale Superieure Cachan, FR)
  • PreDatA - Preparatory Data Analytics on Peta-Scale Machines. Fang Zheng (Georgia Institute of Technology, US); Hasan Abbasi (University of Sydney, Australia); Ciprian Docan (Rutgers University, US); Jay Lofstead (Georgia Institute of Technology, US); Qing Liu (Oak Ridge National Laboratory, US); Scott Klasky (Oak Ridge National Laboratory, US); Manish Prashar (Rutgers University, US); Norbert Podhorszki (Oak Ridge National Laboratory, US); Karsten Schwan (Georgia Institute of Technology, US); Matt Wolf (Georgia Institute of Technology, US)
  • Reconciling Scratch Space Consumption, Exposure, and Volatility to Achieve Timely Staging of Job Input Data. Henry Monti (Virginia Tech, US); Ali R Butt (Virginia Tech, US); Sudharshan S Vazhkudai (Oak Ridge National Laboratory, US)

5.33 Session 32: Synchronization

Chair: Chen Ding

  • Hierarchical Phasers for Scalable Synchronization and Reductions in Dynamic Parallelism. (pptx) Jun Shirako (Rice University, US); Vivek Sarkar (Rice University, US)
  • Clustering JVMs with Software Transactional Memory Support. Christos Kotselidis (University of Manchester, UK); Mikel Luján (University of Manchester, UK); Behram Khan (University of Manchester, UK); Mohammad Ansari (University of Manchester, UK); Konstantinos Malakasis (University of Manchester, UK); Chris Kirkham (University of Manchester, UK); Ian Watson (University of Manchester, UK)
  • Inter-Block GPU Communication via Fast Barrier Synchronization. Shucai Xiao (Virginia Tech, US); Wu-chun Feng (Virginia Tech, US)
  • A Lock-Free, Cache-Efficient Multi-Core Synchronization Mechanism for Line-Rate Network Traffic Monitoring. Patrick Pak-Ching Lee (The Chinese University of Hong Kong, Hong Kong); Tian Bu (Bell Labs, Lucent, US); Girish Chandranmenon (Lucent Technologies, US)

6 Posters

