Show simple item record

dc.contributor.authorRiesinger, CR
dc.contributor.authorBakhtiari, AB
dc.contributor.authorSchreiber, M
dc.contributor.authorNeumann, PN
dc.contributor.authorBungartz, HJB
dc.date.accessioned2017-11-27T08:31:15Z
dc.date.issued2017-11-30
dc.description.abstractHeterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90% are achieved leading to 2,604.72 GLUPS utilizing 24,576 CPU cores and 2,048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 · 109 lattice cells.en_GB
dc.description.sponsorshipThis work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89). In addition, this work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d68. We further thank the Max Planck Computing & Data Facility (MPCDF) and the Global Scientific Information and Computing Center (GSIC) for providing computational resources.en_GB
dc.identifier.citationVol. 5 (4), article 48en_GB
dc.identifier.doi10.3390/computation5040048
dc.identifier.urihttp://hdl.handle.net/10871/30463
dc.language.isoenen_GB
dc.publisherMDPIen_GB
dc.rights© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)en_GB
dc.subjectGPU clustersen_GB
dc.subjectheterogeneous clustersen_GB
dc.subjecthybrid implementationen_GB
dc.subjectlattice Boltzmann methoden_GB
dc.subjectmultilevel parallelismen_GB
dc.subjectpetascaleen_GB
dc.subjectresource assignmenten_GB
dc.subjectscalabilityen_GB
dc.titleA holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clustersen_GB
dc.typeArticleen_GB
dc.identifier.issn2079-3197
dc.descriptionThis is the author accepted manuscript. The final version is available from MDPI via the DOI in this record.en_GB
dc.identifier.journalComputationen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record