Evaluation of an efficient etack-RLE clustering concept for dynamically adaptive grids
SIAM Journal on Scientific Computing
Society for Industrial and Applied Mathematics
Abstract. One approach to tackle the challenge of efficient implementations for parallel PDE simulations on dynamically changing grids is the usage of space-filling curves (SFC). While SFC algorithms possess advantageous properties such as low memory requirements and close-to-optimal partitioning approaches with linear complexity, they require efficient communication strategies for keeping and utilizing the connectivity information, in particular for dynamically changing grids. Our approach is to use a sparse communication graph to store the connectivity information and to transfer data block-wise. This permits efficient generation of multiple partitions per memory context (denoted by clustering) which - in combination with a run-length encoding (RLE) - directly leads to elegant solutions for shared, distributed and hybrid parallelization and allows cluster-based optimizations. While previous work focused on specific aspects, we present in this paper an overall compact summary of the stack-RLE clustering approach completed by aspects on the vertex-based communication that ease up understanding the approach. The central contribution of this work is the proof of suitability of the stack-RLE clustering approach for an efficient realization of different, relevant building blocks of Scientific Computing methodology and real-life CSE applications: We show 95% strong scalability for small-scale scalability benchmarks on 512 cores and weak scalability of over 90% on 8192 cores for finite-volume solvers and changing grid structure in every time step; optimizations of simulation data backends by writer tasks; comparisons of analytical benchmarks to analyze the adaptivity criteria; and a Tsunami simulation as a representative real-world showcase of a wave propagation for our approach which reduces the overall workload by 95% for parallel fully-adaptive mesh refinement and, based on a comparison with SFC-ordered regular grid cells, reduces the computation time by a factor of 7.6 with improved results and a factor of 62.2 with results of similar accuracy of buoy station data
This work was partly supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89).
This is the author accepted manuscript. The final version is available from the Society for Industrial and Applied Mathematics via the DOI in this record.
Vol. 38 (6), pp. C678–C712