dc.contributor.author | Riesinger, CR | |
dc.contributor.author | Bakhtiari, AB | |
dc.contributor.author | Schreiber, M | |
dc.contributor.author | Neumann, PN | |
dc.contributor.author | Bungartz, HJB | |
dc.date.accessioned | 2017-11-27T08:31:15Z | |
dc.date.issued | 2017-11-30 | |
dc.description.abstract | Heterogeneous clusters are a widely utilized class of supercomputers assembled from
different types of computing devices, for instance CPUs and GPUs, providing a huge computational
potential. Programming them in a scalable way exploiting the maximal performance introduces
numerous challenges such as optimizations for different computing devices, dealing with multiple
levels of parallelism, the application of different programming models, work distribution, and hiding
of communication with computation. We utilize the lattice Boltzmann method for fluid flow as
a representative of a scientific computing application and develop a holistic implementation for
large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and
techniques ranging from optimizations for the particular computing devices to the orchestration
of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with
an implementation using all the available computational resources for the lattice Boltzmann
method operators. Our approach shows excellent scalability behavior making it future-proof for
heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of
more than 90% are achieved leading to 2,604.72 GLUPS utilizing 24,576 CPU cores and 2,048 GPUs of
the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 · 109
lattice cells. | en_GB |
dc.description.sponsorship | This work was supported by the German Research Foundation (DFG) as part of the
Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89). In addition, this work was
supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d68. We further
thank the Max Planck Computing & Data Facility (MPCDF) and the Global Scientific Information and Computing
Center (GSIC) for providing computational resources. | en_GB |
dc.identifier.citation | Vol. 5 (4), article 48 | en_GB |
dc.identifier.doi | 10.3390/computation5040048 | |
dc.identifier.uri | http://hdl.handle.net/10871/30463 | |
dc.language.iso | en | en_GB |
dc.publisher | MDPI | en_GB |
dc.rights | © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/) | en_GB |
dc.subject | GPU clusters | en_GB |
dc.subject | heterogeneous clusters | en_GB |
dc.subject | hybrid implementation | en_GB |
dc.subject | lattice Boltzmann method | en_GB |
dc.subject | multilevel parallelism | en_GB |
dc.subject | petascale | en_GB |
dc.subject | resource assignment | en_GB |
dc.subject | scalability | en_GB |
dc.title | A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters | en_GB |
dc.type | Article | en_GB |
dc.identifier.issn | 2079-3197 | |
dc.description | This is the author accepted manuscript. The final version is available from MDPI via the DOI in this record. | en_GB |
dc.identifier.journal | Computation | en_GB |