Preface |
|
v | |
|
|
|
|
|
Conference Organisation |
|
vii | |
Invited Talks |
|
|
Bio-Inspired Massively-Parallel Computation |
|
|
3 | (8) |
|
|
Automatic Tuning of Task Scheduling Policies on Multicore Architectures |
|
|
11 | (14) |
|
|
|
|
|
Architectures and Performance Algorithms |
|
|
Algorithmic Scheme for Hybrid Computing with CPU, Xeon-Phi/MIC and GPU |
|
|
|
Devices on a Single Machine |
|
|
25 | (10) |
|
|
|
A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads |
|
|
35 | (10) |
|
|
|
|
|
|
CPU Performance Analysis Using Score-P on PRIMEHPC FX100 Supercomputer |
|
|
45 | (8) |
|
|
Performance Improvements of Polydisperse DEM Simulations Using a Loose Octree Approach |
|
|
53 | (10) |
|
|
|
|
Execution Performance Analysis of the ABySS Genome Sequence Assembler Using Scalasca on the K Computer |
|
|
63 | (10) |
|
|
|
|
Performance Model Based on Memory Footprint for OpenMP Memory Bound Applications |
|
|
73 | (10) |
|
|
|
|
|
Evaluating OpenMP Performance on Thousands of Cores on the Numascale Architecture |
|
|
83 | (10) |
|
|
|
|
Acceleration of Large Scale OpenFOAM Simulations on Distributed Systems with Multicore CPUs and GPUs |
|
|
93 | (10) |
|
|
|
Optimized Variant-Selection Code Generation for Loops on Heterogeneous Multicore Systems |
|
|
103 | (10) |
|
|
|
MPI Communication on MPPA Many-Core NoC: Design, Modeling and Performance Issues |
|
|
113 | (10) |
|
|
|
|
Benoit Dupont de Dinechin |
|
|
|
Drivers for Device to Device Streaming |
|
|
123 | (12) |
|
|
Programming Models and Methods |
|
|
|
|
Portable Parallelization of the EDGE CFD Application for GPU-Based Systems Using the SkePU Skeleton Programming Library |
|
|
135 | (10) |
|
|
|
|
|
|
Structured Parallel Implementation of Tree Echo State Network Model Selection |
|
|
145 | (10) |
|
|
|
|
|
|
Java Implementation of Data Parallel Skeletons on GPUs |
|
|
155 | (10) |
|
|
|
Data Parallel Patterns in Erlang/OpenCL |
|
|
165 | (10) |
|
|
|
Hybrid Coarrays: A PGAS Feature for Many-Core Architectures |
|
|
175 | (10) |
|
|
|
|
|
Lapedo: Hybrid Skeletons for Programming Heterogeneous Multicore Machines in Erlang |
|
|
185 | (12) |
|
|
|
|
|
|
Evaluation of 3-D Stencil Codes on the Intel Xeon Phi Coprocessor |
|
|
197 | (10) |
|
|
|
|
|
Hierarchical Parallelism in a Physical Modelling Synthesis Code |
|
|
207 | (10) |
|
|
|
|
Harnessing CUDA Dynamic Parallelism for the Solution of Sparse Linear Systems |
|
|
217 | (10) |
|
|
|
|
|
Model-Driven Development of GPU Applications |
|
|
227 | (10) |
|
|
|
Exploring the Offload Execution Model in the Intel Xeon Phi via Matrix Inversion |
|
|
237 | (10) |
|
|
|
|
|
Programming GPUs with C++14 and Just-In-Time Compilation |
|
|
247 | (10) |
|
|
|
|
|
|
Active Packet Pacing as a Congestion Avoidance Technique in Interconnection Network |
|
|
257 | (8) |
|
|
Hybrid Parallelization of Hyper-Dimensional Vlasov Code with OpenMP Loop Collapse Directive |
|
|
265 | (10) |
|
|
|
Active Resource Management for Multi-Core Runtime Systems Serving Malleable Applications |
|
|
275 | (10) |
|
|
Improving Energy-Efficiency of Static Schedules by Core Consolidation and Switching Off Unused Cores |
|
|
285 | (10) |
|
|
|
|
Efficient Parallel Linked List Processing |
|
|
295 | (10) |
|
|
|
|
|
Streams as an Alternative to Halo Exchange |
|
|
305 | (12) |
|
|
|
An Embedded C++ Domain-Specific Language for Stream Parallelism |
|
|
317 | (10) |
|
|
|
|
|
Pipeline Template for Streaming Applications on Heterogeneous Chips |
|
|
327 | (12) |
|
|
|
|
|
|
Applications |
|
|
|
|
Efficient and Scalable Distributed-Memory Hierarchization Algorithms for the Sparse Grid Combination Technique |
|
|
339 | (10) |
|
|
|
Adapting a Finite-Element Type Solver for Bioelectromagnetics to the DEEP-ER Platform |
|
|
349 | (12) |
|
|
|
|
|
High Performance Eigenvalue Solver in Exact-Diagonalization Method for Hubbard Model on CUDA GPU |
|
|
361 | (10) |
|
|
|
|
A General Tridiagonal Solver for Coprocessors: Adapting g-Spike for the Intel Xeon Phi |
|
|
371 | (10) |
|
|
|
|
|
|
|
CAHTR: Communication-Avoiding Householder TRidiagonalization |
|
|
381 | (10) |
|
|
|
|
|
|
Simulation of External Aerodynamics of the DrivAer Model with the LBM on GPGPUs |
|
|
391 | (10) |
|
|
|
|
|
|
|
A Parallel Algorithm for Decomposition of Finite Languages |
|
|
401 | (10) |
|
|
|
|
Exploiting the Space Filling Curve Ordering of Particles in the Neighbour Search of Gadget3 |
|
|
411 | (10) |
|
|
|
|
|
|
On-the-Fly Memory Compression for Multibody Algorithms |
|
|
421 | (10) |
|
|
|
|
|
|
Flexible and Generic Workflow Management |
|
|
431 | (8) |
|
|
|
|
|
|
A Massively Parallel Barnes-Hut Tree Code with Dual Tree Traversal |
|
|
439 | (10) |
|
|
|
|
|
|
|
Performance Modeling of a Compressible Hydrodynamics Solver on Multicore CPUs |
|
|
449 | (10) |
|
|
|
|
|
Developing a Scalable and Flexible High-Resolution DNS Code for Two-Phase Flows |
|
|
459 | (10) |
|
|
|
|
|
|
FPGA Port of a Large Scientific Model from Legacy Code: The Emanuel Convection Scheme |
|
|
469 | (10) |
|
Kristian Thorin Hentschel |
|
|
|
|
|
|
How to Keep a Geographic Map Up-To-Date |
|
|
479 | (10) |
|
|
|
|
Static and Dynamic Big Data Partitioning on Apache Spark |
|
|
489 | (12) |
|
|
|
|
|
Mini-Symposium: ParaFPGA-2015: Parallel Computing with FPGAs |
|
|
ParaFPGA15: Exploring Threads and Trends in Programmable Hardware |
|
|
501 | (4) |
|
|
|
|
FPGAs as Components in Heterogeneous High-Performance Computing Systems: Raising the Abstraction Level |
|
|
505 | (10) |
|
|
|
FPGA Acceleration of SAT Preprocessor |
|
|
515 | (10) |
|
|
|
Leveraging FPGA Clusters for SAT Computations |
|
|
525 | (8) |
|
|
High-Speed Calculation of Convex Hull in 2D Images Using FPGA |
|
|
533 | (10) |
|
|
|
|
|
|
Workload Distribution and Balancing in FPGAs and CPUs with OpenCL and TBB |
|
|
543 | (10) |
|
|
|
|
|
A Run-Time System for Partially Reconfigurable FPGAs: The Case of STMicroelectronics SPEAr Board |
|
|
553 | (10) |
|
|
|
|
|
|
Exploring Automatically Generated Platforms in High Performance FPGAs |
|
|
563 | (10) |
|
|
|
|
|
|
Mini-Symposium: Experiences of Porting and Optimising Code for Xeon Phi Processors |
|
|
Symposium on Experiences of Porting and Optimising Code for Xeon Phi Processors |
|
|
573 | (2) |
|
|
|
|
|
Experiences Porting Production Codes to Xeon Phi Processors |
|
|
575 | (10) |
|
|
|
|
|
|
Preparing a Seismic Imaging Code for the Intel Knights Landing Xeon Phi Processor |
|
|
585 | (6) |
|
|
|
|
LU Factorisation on Xeon and Xeon Phi Processors |
|
|
591 | (12) |
|
|
Mini-Symposium: Coordination Programming |
|
|
Mini-Symposium on Coordination Programming - Preface |
|
|
603 | (2) |
|
|
|
Claud: Coordination, Locality and Universal Distribution |
|
|
605 | (10) |
|
|
|
|
|
|
Coordination with Structured Composition for Cyber-Physical Systems |
|
|
615 | (12) |
|
|
Mini-Symposium: Symposium on Parallel Solvers for Very Large PDE Based Systems in the Earth- and Atmospheric Sciences |
|
|
On Efficient Time Stepping Using the Discontinuous Galerkin Method for Numerical Weather Prediction |
|
|
627 | (10) |
|
|
|
Porting the COSMO Dynamical Core to Heterogeneous Platforms Using STELLA Library |
|
|
637 | (10) |
|
|
|
|
|
Towards Compiler-Agnostic Performance in Finite-Difference Codes |
|
|
647 | (14) |
|
|
|
|
|
Mini-Symposium: Is the Programming Environment Ready for Hybrid Supercomputers? |
|
|
Is the Programming Environment Ready for Hybrid Supercomputers? |
|
|
661 | (2) |
|
|
|
Utilizing Hybrid Programming Environments: CSCS Case Studies |
|
|
663 | (10) |
|
|
|
|
SYCL: Single-Source C++ Accelerator Programming |
|
|
673 | (10) |
|
|
|
Using Task-Based Parallelism Directly on the GPU for Automated Asynchronous Data Transfer |
|
|
683 | (14) |
|
|
|
|
A Strategy for Developing a Performance Portable Highly Scalable Application |
|
|
697 | (12) |
|
|
|
|
Mini-Symposium: Symposium on Energy and Resilience in Parallel Programming |
|
|
Mini-Symposium on Energy and Resilience in Parallel Programming |
|
|
709 | (2) |
|
Dimitrios S. Nikolopoulos |
|
|
|
Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures |
|
|
711 | (10) |
|
|
|
|
Dimitrios S. Nikolopoulos |
|
|
|
Compiling for Resilience: The Performance Gap |
|
|
721 | (10) |
|
|
|
|
|
Automation of Significance Analyses with Interval Splitting |
|
|
731 | (10) |
|
|
|
|
Energy Minimization on Heterogeneous Systems Through Approximate Computing |
|
|
741 | (12) |
|
|
|
|
|
|
|
|
Landing Containment Domains on SWARM: Toward a Robust Resiliency Solution on a Dynamic Adaptive Runtime Machine |
|
|
753 | (12) |
|
|
|
|
Mini-Symposium: Symposium on Multi-System Application Extreme-Scaling Imperative |
|
|
MAXI - Multi-System Application Extreme-Scaling Imperative |
|
|
765 | (2) |
|
|
|
|
High Throughput Simulations of Two-Phase Flows on Blue Gene/Q |
|
|
767 | (10) |
|
|
|
|
|
|
|
|
|
Direct Numerical Simulation of Fluid Turbulence at Extreme Scale with psOpen |
|
|
777 | (10) |
|
|
|
|
|
|
|
Simulating Morphologically Detailed Neuronal Networks at Extreme Scale |
|
|
787 | (10) |
|
|
|
|
|
|
|
|
|
FE2TI: Computational Scale Bridging for Dual-Phase Steels |
|
|
797 | (10) |
|
|
|
|
Performance Evaluation of the LBM Solver Musubi on Various HPC Architectures |
|
|
807 | (10) |
|
|
|
|
|
Extreme-Scaling Applications 24/7 on JUQUEEN Blue Gene/Q |
|
|
817 | (10) |
|
|
|
|
Extreme Scale-Out SuperMUC Phase 2 - Lessons Learned |
|
|
827 | (10) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
"K-scale" Applications on the K Computer and Co-Design Effort for the Development of "post-K" |
|
|
837 | (10) |
|
Author Index |
|
847 | |