Preface |
|
v | |
|
|
|
|
|
|
Conference Organisation |
|
vii | |
|
|
|
Extreme Data Science at the National Energy Research Scientific Computing (NERSC) Center |
|
|
3 | (16) |
|
|
|
|
|
|
|
|
|
|
|
Performance Analysis Techniques for the Exascale Co-Design Process |
|
|
19 | (16) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Parallel Programming Models XMP-IO Function and Its Application to MapReduce on the K Computer |
|
|
35 | (8) |
|
|
|
POLCA -- A Programming Model for Large Scale, Strongly Heterogeneous Infrastructures |
|
|
43 | (10) |
|
|
|
|
Exploitation of Quality/Throughput Tradeoffs in Image Processing Through Invasive Computing |
|
|
53 | (10) |
|
|
|
|
|
An Efficient Thread Mapping Strategy for Multiprogramming on Manycore Processors |
|
|
63 | (9) |
|
|
|
A Scalable Farm Skeleton for Heterogeneous Parallel Programming |
|
|
72 | (10) |
|
|
|
Towards Truly Boolean Arrays in Data-Parallel Array Processing |
|
|
82 | (10) |
|
|
|
Deep Packet Inspection on Commodity Hardware Using FastFlow |
|
|
92 | (11) |
|
|
|
|
|
Performance Analysis and Tools |
|
|
|
Formalizing Bottlenecks in Task-Based OpenMP Applications |
|
|
103 | (10) |
|
|
|
|
Characterizing Performance of Applications on Blue Gene/Q |
|
|
113 | (10) |
|
|
|
|
|
|
|
|
|
Specification of Periscope Tuning Framework Plugins |
|
|
123 | (12) |
|
|
|
|
|
|
|
Parallel Numerical Linear Algebra |
|
|
|
On Using Speculative Computations for Parallel Reduction to Tridiagonal Form |
|
|
135 | (8) |
|
|
Fast Approximate Solution of the Non-Symmetric Generalized Eigenvalue Problem on Multicore Architectures |
|
|
143 | (10) |
|
|
|
|
Locality Optimization on a NUMA Architecture for Hybrid LU Factorization |
|
|
153 | (10) |
|
|
|
|
|
Variable Block Algebraic Recursive Multilevel Solver (VBARMS) for Sparse Linear Systems |
|
|
163 | (10) |
|
|
|
|
A Proposal of a Single-Synchronized Solver Suited to Large Scale Linear Systems on Parallel Computers with Distributed Memory |
|
|
173 | (10) |
|
|
|
|
Approximate Inverse Preconditioners for Krylov Methods on Heterogeneous Parallel Computers |
|
|
183 | (10) |
|
|
|
Cache and Energy Efficiency of Sparse Matrix-Vector Multiplication for Different BLAS Numerical Types with the RSB Format |
|
|
193 | (10) |
|
|
Heterogeneous Sparse Matrix Computations on Hybrid GPU/CPU Platforms |
|
|
203 | (12) |
|
|
|
|
|
|
MapReduce Streaming Algorithms for Laplace Relaxation on the Cloud |
|
|
215 | (10) |
|
|
|
Space Exploration Using Parallel Orbits: A Study in Parallel Symbolic Computing |
|
|
225 | (8) |
|
|
|
|
|
|
|
SFC-Based Communication Metadata Encoding for Adaptive Mesh Refinement |
|
|
233 | (10) |
|
|
|
|
Graph Repartitioning with Both Dynamic Load and Dynamic Processor Allocation |
|
|
243 | (10) |
|
|
|
ForestClaw: Hybrid Forest-of-Octrees AMR for Hyperbolic Conservation Laws |
|
|
253 | (10) |
|
|
|
|
|
A Space-Time Parallel Solver for the Three-Dimensional Heat Equation |
|
|
263 | (10) |
|
|
|
|
|
|
An Efficient Pipelined Implementation of Space-Time Parallel Applications |
|
|
273 | (12) |
|
|
|
GPU Computing and Applications |
|
|
|
Efficient GPU-Based Optimization of Volume Meshes |
|
|
285 | (10) |
|
|
|
|
|
|
Fast Uniform Grid Construction on GPGPUs Using Atomic Operations |
|
|
295 | (10) |
|
|
|
|
Porting Large HPC Applications to GPU Clusters: The Codes GENE and VERTEX |
|
|
305 | (10) |
|
|
|
|
Numerical Simulation of the Low Compressible Viscous Gas Flows on GPU-Based Hybrid Supercomputers |
|
|
315 | (9) |
|
|
|
Simulation of Multiphase Flows in the Subsurface on GPU-Based Supercomputers |
|
|
324 | (10) |
|
|
|
|
|
Atomic Computing -- A Different Perspective on Massively Parallel Problems |
|
|
334 | (13) |
|
|
|
|
|
|
Parallelisation and Optimisation of Large-Scale Applications |
|
|
|
Accelerating SeisSol by Generating Vectorized Code for Sparse Matrix Operators |
|
|
347 | (10) |
|
|
|
|
|
Experience with the MPI/StarSs Programming Model on a Large Production Code |
|
|
357 | (10) |
|
|
|
|
|
|
|
Exploiting Data- and Task-Parallelism in the Solution of Riccati Equations on Multicore Servers and GPUs |
|
|
367 | (8) |
|
|
|
|
|
Testing and Implementing Some New Algorithms Using the FFTW Library on Massively Parallel Supercomputers |
|
|
375 | (12) |
|
|
|
|
|
|
Performance Measurements of MHD Simulation for Planetary Magnetosphere on Peta-Scale Computer FX10 |
|
|
387 | (8) |
|
|
|
|
Parallel Simulations of Self-Propelled Microorganisms |
|
|
395 | (10) |
|
|
|
|
|
|
|
Improving Communication Performance of Sparse Linear Algebra for an Atomistic Simulation Application |
|
|
405 | (10) |
|
|
|
|
Nemorb's Fourier Filter and Distributed Matrix Transposition on Petaflop Systems |
|
|
415 | (12) |
|
|
|
Parallel Computing Design for Exact Diagonalization Scheme on Multi-Band Hubbard Cluster Models |
|
|
427 | (12) |
|
|
|
|
|
|
|
439 | (2) |
|
|
|
Numerical Experiments with New Algorithms for Parallel Decomposition of Large Computational Meshes |
|
|
441 | (10) |
|
|
|
|
|
A Distributed Algorithm for the Permutation Flow Shop Problem -- An Empirical Analysis |
|
|
451 | (10) |
|
|
|
|
GPI2 for GPUs: A PGAS Framework for Efficient Communication in Hybrid Clusters |
|
|
461 | (10) |
|
|
A Fault Tolerant Implementation of Multi-Level Monte Carlo Methods |
|
|
471 | (10) |
|
|
|
|
High Performance CPU/GPU Multiresolution Poisson Solver |
|
|
481 | (12) |
|
|
|
|
|
Mini-Symposium "Parallel Computing with FPGAs (ParaFPGA2013)" |
|
|
|
ParaFPGA 2013: Harnessing Programs, Power and Performance in Parallel FPGA Applications |
|
|
493 | (4) |
|
|
|
|
High-Level Synthesis Revised: Generation of FPGA Accelerators from a Domain-Specific Language Using the Polyhedron Model |
|
|
497 | (10) |
|
|
|
|
|
Compiling a Dataflow-Based Language Abstraction onto an FPGA |
|
|
507 | (8) |
|
|
Timing Driven C-Slow Retiming on RTL for MultiCores on FPGAs |
|
|
515 | (8) |
|
|
Performance and Resource Modeling for FPGAs Using High-Level Synthesis Tools |
|
|
523 | (9) |
|
|
|
|
|
Interactive Graph Cuts Using FPGA |
|
|
532 | (8) |
|
|
|
An Image Filter System Based on Dynamic Partial Reconfiguration on FPGA |
|
|
540 | (8) |
|
|
|
Investigating Energy Consumption of an SRAM-Based FPGA for Duty-Cycle Applications |
|
|
548 | (15) |
|
|
|
Mini-Symposium "High-Dimensional Meets Parallel -- Algorithms and Applications" |
|
|
|
High-Dimensional Meets Parallel: Algorithms and Applications |
|
|
563 | (1) |
|
|
|
|
Global Communication Schemes for the Sparse Grid Combination Technique |
|
|
564 | (10) |
|
|
|
|
|
|
Load Balancing for Massively Parallel Computations with the Sparse Grid Combination Technique |
|
|
574 | (10) |
|
|
|
|
A Parallel Fault Tolerant Combination Technique |
|
|
584 | (9) |
|
|
|
Managing Complexity in the Parallel Sparse Grid Combination Technique |
|
|
593 | (10) |
|
|
|
|
|
|
|
|
|
|
Scalability and Fault Tolerance of the Alternating Direction Method of Multipliers for Sparse Grids |
|
|
603 | (12) |
|
|
|
|
Mini-Symposium "Application Autotuning for HPC (Architectures)" |
|
|
|
Mini-Symposium on Application Autotuning for HPC |
|
|
615 | (1) |
|
|
|
|
|
|
Investigating Performance Benefits from OpenACC Kernel Directives |
|
|
616 | (10) |
|
|
|
|
Application-Independent Autotuning for GPUs |
|
|
626 | (10) |
|
|
|
|
|
Autotuning of Pattern Runtimes for Accelerated Parallel Systems |
|
|
636 | (10) |
|
|
|
|
|
Empirical Performance Modeling of GPU Kernels Using Active Learning |
|
|
646 | (10) |
|
|
|
|
|
|
|
Crowdtuning: Systematizing Auto-Tuning Using Predictive Modeling and Crowdsourcing |
|
|
656 | (12) |
|
|
|
Autotuning the Energy Consumption |
|
|
668 | (10) |
|
|
|
|
|
Potentials and Limitations for Energy Efficiency Auto-Tuning |
|
|
678 | (13) |
|
|
|
|
Mini-Symposium "Extreme Scaling on SuperMUC" |
|
|
|
Extreme Scaling Workshop at the LRZ |
|
|
691 | (7) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Extreme Scaling of Lattice Quantum Chromodynamics |
|
|
698 | (5) |
|
|
|
|
End-to-End Parallel Simulations with APES |
|
|
703 | (9) |
|
|
|
|
Towards Petaflops Capability of the VERTEX Supernova Code |
|
|
712 | (10) |
|
|
|
|
|
Scaling of the GROMACS 4.6 Molecular Dynamics Code on SuperMUC |
|
|
722 | (9) |
|
|
|
|
|
Mini-Symposium "Parallel Programming for Heterogeneous Architectures" |
|
|
|
Parallel Programming for Heterogeneous Architectures |
|
|
731 | (2) |
|
|
|
|
Execution Schemes for the NPB-MZ Benchmarks on Hybrid Architectures: A Comparative Study |
|
|
733 | (10) |
|
|
|
Scilab on a Hybrid Platform |
|
|
743 | (10) |
|
|
|
|
Divide and Conquer Parallelization of Finite Element Method Assembly |
|
|
753 | (10) |
|
|
|
|
|
|
Cudagrind: A Valgrind Extension for CUDA |
|
|
763 | (10) |
|
|
|
Profiling Hybrid HMPP Applications with Score-P on Heterogeneous Hardware |
|
|
773 | (10) |
|
|
|
|
|
|
Binary Instrumentation for Scalable Performance Measurement of OpenMP Applications |
|
|
783 | (10) |
|
|
|
|
|
|
|
|
A Case Study: Holistic Performance Analysis on Heterogeneous Architectures Using the Vampir Toolchain |
|
|
793 | (12) |
|
|
|
|
|
|
|
Further Mini-Symposium Contributions |
|
|
|
PRACE DECI (Distributed European Computing Initiative) Minisymposium |
|
|
805 | (8) |
|
|
|
|
|
|
|
|
|
|
|
A Generic Prototype to Benchmark Algorithms and Data Structures for Hierarchical Hybrid Grids |
|
|
813 | (10) |
|
|
|
|
|
Towards a Performance Engineering Workflow for OpenMP 4.0 |
|
|
823 | (10) |
|
|
|
|
|
|
Theoretical Measures of Cache Efficiency for Tetrahedral Adaptive Meshes. A Case Study with a Quasi Space-Filling Curve Order |
|
|
833 | (10) |
|
|
Author Index |
|
843 | |