Preface |
|
v | |
|
|
|
|
|
Conference Organisation |
|
vii | |
|
|
|
The Future of High Performance Computing in Europe |
|
|
3 | (4) |
|
|
|
PRACE: Europe's Supercomputing Research Infrastructure |
|
|
7 | (12) |
|
|
|
|
Comparison of Admission Control Policies for Service Provision in Public Clouds |
|
|
19 | (10) |
|
|
|
|
Program Execution Models for Massively Parallel Computing |
|
|
29 | (12) |
|
|
Advances in Physarum Machines Gates, Hulls, Mazes and Routing with Slime Mould |
|
|
41 | (16) |
|
|
|
|
Parallel Remeshing in Tree Codes for Vortex Particle Methods |
|
|
57 | (8) |
|
|
|
|
A Case Study of the Task-Based Parallel Wavefront Pattern |
|
|
65 | (8) |
|
|
|
|
|
|
Design and Evaluation of a Parallel Execution Framework for the CLEVER Clustering Algorithm |
|
|
73 | (8) |
|
|
|
Panitee Charoenrattanaruk |
|
|
|
|
|
The BL-Octree: An Efficient Data Structure for Discretized Block-Based Adaptive Mesh Refinement |
|
|
81 | (10) |
|
|
|
|
Automatic Parallelisation |
|
|
|
Towards Parallelizing Object-Oriented Programs Automatically |
|
|
91 | (8) |
|
|
|
Heap Dependence Analysis for Sequential Programs |
|
|
99 | (10) |
|
|
|
|
|
|
Energy Aware Consolidation Policies |
|
|
109 | (8) |
|
|
|
|
MapReduce for Scientific Computing - Viability for Non-Embarrassingly Parallel Algorithms |
|
|
117 | (8) |
|
|
|
|
An Autonomic Management System for Choreography-Based Workflows on Grids and Clouds |
|
|
125 | (8) |
|
|
|
Remote Utilization of OpenCL for Flexible Computation Offloading Using Embedded ECUs, CE Devices and Cloud Servers |
|
|
133 | (10) |
|
|
|
|
|
Monte Carlo Option Pricing with Graphics Processing Units |
|
|
143 | (8) |
|
|
|
Speeding-Up the Discrete Wavelet Transform Computation with Multicore and GPU-Based Algorithms |
|
|
151 | (8) |
|
|
|
|
|
Flexible Runtime Support for Efficient Skeleton Programming on Heterogeneous GPU-Based Systems |
|
|
159 | (8) |
|
|
|
|
Lattice Boltzmann for Large-Scale GPU Systems |
|
|
167 | (8) |
|
|
|
|
|
High-Fidelity Real-Time Antiship Cruise Missile Modeling on the GPU |
|
|
175 | (8) |
|
|
|
|
|
Egomotion Compensation and Moving Objects Detection Algorithm on GPU |
|
|
183 | (8) |
|
|
|
|
Jose Maria Gonzalez-Linares |
|
|
|
|
Performance Model for a Cellular Automata Implementation on a GPU Cluster |
|
|
191 | (8) |
|
|
|
|
(iPU-Based Image Processing Use Cases: A High-Level Approach |
|
|
199 | (10) |
|
|
|
|
|
|
|
|
|
Parallel Likelihood Function Evaluation on Heterogeneous Many-Core Systems |
|
|
209 | (8) |
|
|
|
|
|
|
A Model-Based Software Generation Approach Qualified for Heterogeneous GPGPU-Enabled Platforms |
|
|
217 | (10) |
|
|
|
|
|
High Performance Applications |
|
|
|
Trajectory-Search on ScaleMP's vSMP Architecture |
|
|
227 | (8) |
|
|
|
|
|
|
|
|
Towards an Application of High-Performance Computer Systems to 3D Simulations of High Energy Density Plasmas in Z-Pinches |
|
|
235 | (8) |
|
|
|
|
|
|
|
|
|
|
|
|
|
On-the-Fly Singular Value Decomposition for Aitken's Acceleration of the Schwarz Domain Decomposition Method |
|
|
243 | (8) |
|
|
|
|
|
A Software Concept for Cache-Efficient Simulation on Dynamically Adaptive Structured Triangular Grids |
|
|
251 | (10) |
|
|
|
|
Performance Artalysis of an Ultrasound Reconstruction Algorithm for Non Destructive Testing |
|
|
261 | (10) |
|
|
|
|
|
|
|
|
|
Corento - SIMD Parallelism from Portable High-Level Code |
|
|
271 | (10) |
|
|
|
|
A Parallel Benchmark Suite for Fortran Coarrays |
|
|
281 | (8) |
|
|
SAC on a Niagara T3-4 Server: Lessons and Experiences |
|
|
289 | (8) |
|
|
|
Declarative Parallel Programming for GPUs |
|
|
297 | (10) |
|
|
|
|
|
|
|
|
|
Balancing CPU Load for Irregular MPI Applications |
|
|
307 | (10) |
|
|
|
|
Reactive Rebalancing for Scientific Simulations Running on ExaScale High Performance Computers |
|
|
317 | (10) |
|
|
|
|
|
|
Processing with a Million Cores |
|
|
327 | (8) |
|
|
|
|
|
The Fresh Breeze Program Execution Model |
|
|
335 | (8) |
|
|
|
|
|
|
Using Fast and Accurate Simulation to Explore Hardware/Software Trade-Offs in the Multi-Core Era |
|
|
343 | (8) |
|
|
|
|
|
|
|
A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems |
|
|
351 | (10) |
|
|
|
|
|
|
|
|
|
|
|
The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous Many-Core Architectures |
|
|
361 | (8) |
|
|
|
|
|
|
|
|
|
|
|
An Efficient Parallel Set Container for Multicore Architectures |
|
|
369 | (8) |
|
|
|
|
Use of High Accuracy and Interval Arithmetic on Multicore Processors |
|
|
377 | (8) |
|
|
Andriele Busatto do Carmo |
|
|
|
|
Engineering Concurrent Software Guided by Statistical Performance Analysis |
|
|
385 | (12) |
|
|
|
|
|
|
|
|
|
|
|
|
|
Solving the Generalized Symmetric Eigenvalue Problem Using Tile Algorithms on Multicore Architectures |
|
|
397 | (8) |
|
|
|
|
|
Improving Performance of Triangular Matrix-Vector BLAS Routines on GPUs |
|
|
405 | (8) |
|
|
|
Accelerating Grid Kernels for Virtual Screening on Graphics Processing Units |
|
|
413 | (8) |
|
|
|
|
Parallelism on the Nonnegative Matrix Factorization |
|
|
421 | (8) |
|
|
|
|
|
|
|
|
Exploiting Fine-Grain Parallelism in Recursive LU Factorization |
|
|
429 | (8) |
|
|
|
|
|
Parareal Acceleration of Matrix Multiplication |
|
|
437 | (10) |
|
|
|
|
|
A First Implementation of Parallel IO in Chapel for Block Data Distribution |
|
|
447 | (8) |
|
|
|
|
|
Optimizations for Two-Phase Collective I/O |
|
|
455 | (10) |
|
|
|
|
|
|
Performance Modelling and Analysis |
|
|
|
JuBE-Based Automatic Testing and Performance Measurement System for Fusion Codes |
|
|
465 | (8) |
|
|
|
|
|
|
Visualization of MPI(-IO) Datatypes |
|
|
473 | (8) |
|
|
|
Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries |
|
|
481 | (10) |
|
|
|
|
|
|
|
Tools for Analyzing the Behavior and Performance of Parallel Applications |
|
|
491 | (8) |
|
|
Benchmarks Based on Anti-Parallel Patterns for the Evaluation of GPUs |
|
|
499 | (10) |
|
|
|
|
|
Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems |
|
|
509 | (10) |
|
|
|
Network Monitoring on Multicores with Algorithmic Skeletons |
|
|
519 | (10) |
|
|
|
|
|
|
Experience Using Lazy Task Creation in OpenMP Task for the UTS Benchmark |
|
|
529 | (8) |
|
|
|
Folding Applications into High Dimensional Torus Networks |
|
|
537 | (8) |
|
|
Composable Parallelism Foundations in the Intel® Threading Building Blocks Task Scheduler |
|
|
545 | (12) |
|
|
|
|
|
Cray's Approach to Heterogeneous Computing |
|
|
557 | (8) |
|
|
|
Integrated Simulation Workflows in Computer Aided Engineering on HPC Resources |
|
|
565 | (10) |
|
|
|
|
Mini-Symposium "ParaFPGA" |
|
|
|
ParaFPGA 2011 - High Performance Computing with Multiple FPGAs: Design, Methodology and Applications |
|
|
575 | (4) |
|
|
|
|
A Framework for Self-Adaptive Collaborative Computing on Reconfigurable Platforms |
|
|
579 | (8) |
|
|
|
|
Accelerating HMMER Search Using FPGA Grid |
|
|
587 | (8) |
|
|
|
Reconfigurable Computing Cluster - A Five-Year Perspective of the Project |
|
|
595 | (8) |
|
|
|
|
From Mono-FPGA to Multi-FPGA Emulation Platform for NoC Performance Evaluations |
|
|
603 | (8) |
|
|
|
|
A Dynamically Reconfigurable Pattern Matcher for Regular Expressionson FPGA |
|
|
611 | (10) |
|
|
|
|
|
Mini-Symposium "Exascale" |
|
|
|
Hybrid Parallel Programming with MPI/StarSs |
|
|
621 | (8) |
|
|
|
|
|
|
GPI - Global Address Space Programming Interface - Experiences on Scalability |
|
|
629 | (10) |
|
|
TEMANEJO - A Debugger for Task Based Parallel Programming Models |
|
|
639 | (8) |
|
|
|
|
|
Characterizing I/O Performance Using the TAU Performance System |
|
|
647 | (10) |
|
|
|
|
|
Symmetric Rank-k Update on Clusters of Multicore Processors with SMPSs |
|
|
657 | (8) |
|
|
|
|
|
|
|
Author Index |
|
665 | |