Evaluation of an efficient etack-RLE clustering concept for dynamically adaptive grids
Schreiber, M; Neckel, TN; Bungartz, HJB
Date: 29 November 2016
Journal
SIAM Journal on Scientific Computing
Publisher
Society for Industrial and Applied Mathematics
Publisher DOI
Abstract
Abstract.
One approach to tackle the challenge of efficient implementations for parallel PDE simulations
on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms
possess advantageous properties such as low memory requirements and close-to-optimal partitioning
approaches with linear complexity, ...
Abstract.
One approach to tackle the challenge of efficient implementations for parallel PDE simulations
on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms
possess advantageous properties such as low memory requirements and close-to-optimal partitioning
approaches with linear complexity, they require efficient communication strategies for keeping and
utilizing the connectivity information, in particular for dynamically changing grids. Our approach
is to use a sparse communication graph to store the connectivity information and to transfer data
block-wise. This permits efficient generation of multiple partitions per memory context (denoted
by clustering) which - in combination with a run-length encoding (RLE) - directly leads to elegant
solutions for shared, distributed and hybrid parallelization and allows cluster-based optimizations.
While previous work focused on specific aspects, we present in this paper an overall compact
summary of the stack-RLE clustering approach completed by aspects on the vertex-based communication
that ease up understanding the approach. The central contribution of this work is the proof
of suitability of the stack-RLE clustering approach for an efficient realization of different, relevant
building blocks of Scientific Computing methodology and real-life CSE applications: We show 95%
strong scalability for small-scale scalability benchmarks on 512 cores and weak scalability of over 90%
on 8192 cores for finite-volume solvers and changing grid structure in every time step; optimizations
of simulation data backends by writer tasks; comparisons of analytical benchmarks to analyze the
adaptivity criteria; and a Tsunami simulation as a representative real-world showcase of a wave propagation
for our approach which reduces the overall workload by 95% for parallel fully-adaptive mesh
refinement and, based on a comparison with SFC-ordered regular grid cells, reduces the computation
time by a factor of 7.6 with improved results and a factor of 62.2 with results of similar accuracy of
buoy station data
Computer Science
Faculty of Environment, Science and Economy
Item views 0
Full item downloads 0