|
Autotuning and Thread Mapping |
|
|
|
An Analytical Model-Based Auto-tuning Framework for Locality-Aware Loop Scheduling |
|
|
3 | (18) |
|
|
|
|
|
Performance, Design, and Autotuning of Batched GEMM for GPUs |
|
|
21 | (18) |
|
|
|
|
|
TCU: A Multi-Objective Hardware Thread Mapping Unit for HPC Clusters |
|
|
39 | (22) |
|
|
|
|
Data Locality and Decomposition |
|
|
|
Dynamic Sparse-Matrix Allocation on GPUs |
|
|
61 | (20) |
|
|
|
|
|
An Efficient Parallel Load-Balancing Framework for Orthogonal Decomposition of Geometrical Data |
|
|
81 | (17) |
|
|
|
|
|
|
Parallel Community Detection Algorithm Using a Data Partitioning Strategy with Pairwise Subdomain Duplication |
|
|
98 | (18) |
|
|
|
|
|
|
|
TiDA: High-Level Programming Abstractions for Data Locality Management |
|
|
116 | (23) |
|
|
|
|
|
|
|
|
|
|
|
OpenAtom: Scalable Ab-Initio Molecular Dynamics with Diverse Capabilities |
|
|
139 | (20) |
|
|
|
|
|
|
|
|
|
|
|
SPRITE: A Fast Parallel SNP Detection Pipeline |
|
|
159 | (22) |
|
|
|
|
|
Predictive Modeling for Job Power Consumption in HPC Systems |
|
|
181 | (19) |
|
|
|
|
|
|
Towards Machine Learning on the Automata Processor |
|
|
200 | (19) |
|
|
|
|
|
|
AutoMOMML: Automatic Multi-objective Modeling with Machine Learning |
|
|
219 | (24) |
|
|
|
|
|
|
|
|
Supercomputing Centers and Electricity Service Providers: A Geographically Distributed Perspective on Demand Management in Europe and the United States |
|
|
243 | (18) |
|
|
|
|
|
|
|
|
Resource Management for Running HPC Applications in Container Clouds |
|
|
261 | (20) |
|
|
|
|
|
|
|
|
|
|
|
Mitigating MPI Message Matching Misery |
|
|
281 | (19) |
|
|
|
|
INAM2: InfiniBand Network Analysis and Monitoring with MPI |
|
|
300 | (21) |
|
|
|
|
|
|
|
|
Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels |
|
|
321 | (22) |
|
|
|
|
|
|
|
|
|
|
|
|
High Order Seismic Simulations on the Intel Xeon Phi Processor (Knights Landing) |
|
|
343 | (20) |
|
|
|
|
|
Leveraging a Cluster-Booster Architecture for Brain-Scale Simulations |
|
|
363 | (20) |
|
|
|
|
|
|
|
|
|
|
|
Efficient and Predictable Group Communication for Manycore NoCs |
|
|
383 | (21) |
|
|
|
|
Distributed Job Allocation for Large-Scale Manycores |
|
|
404 | (25) |
|
|
|
Extreme-Scale Computations |
|
|
|
Many-Core Acceleration of a Discrete Ordinates Transport Mini-App at Extreme Scale |
|
|
429 | (20) |
|
|
|
|
Efficiency of High Order Spectral Element Methods on Petascale Architectures |
|
|
449 | (20) |
|
|
|
|
|
|
|
|
|
Scalability of Partial Differential Equations Preconditioner Resilient to Soft and Hard Faults |
|
|
469 | (17) |
|
|
|
|
|
|
|
|
|
|
Multi-versioning Performance Opportunities in BGAS System for Resilience |
|
|
486 | (19) |
|
|
|
|
|
Author Index |
|
505 | |