A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters

Riesinger, CR; Bakhtiari, AB; Schreiber, M; Neumann, PN; Bungartz, HJB

dc.contributor.author	Riesinger, CR
dc.contributor.author	Bakhtiari, AB
dc.contributor.author	Schreiber, M
dc.contributor.author	Neumann, PN
dc.contributor.author	Bungartz, HJB
dc.date.accessioned	2017-11-27T08:31:15Z
dc.date.issued	2017-11-30
dc.description.abstract	Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the application of different programming models, work distribution, and hiding of communication with computation. We utilize the lattice Boltzmann method for fluid flow as a representative of a scientific computing application and develop a holistic implementation for large-scale CPU/GPU heterogeneous clusters. We review and combine a set of best practices and techniques ranging from optimizations for the particular computing devices to the orchestration of tens of thousands of CPU cores and thousands of GPUs. Eventually, we come up with an implementation using all the available computational resources for the lattice Boltzmann method operators. Our approach shows excellent scalability behavior making it future-proof for heterogeneous clusters of the upcoming architectures on the exaFLOPS scale. Parallel efficiencies of more than 90% are achieved leading to 2,604.72 GLUPS utilizing 24,576 CPU cores and 2,048 GPUs of the CPU/GPU heterogeneous cluster Piz Daint and computing more than 6.8 · 109 lattice cells.	en_GB
dc.description.sponsorship	This work was supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre “Invasive Computing” (SFB/TR 89). In addition, this work was supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID d68. We further thank the Max Planck Computing & Data Facility (MPCDF) and the Global Scientific Information and Computing Center (GSIC) for providing computational resources.	en_GB
dc.identifier.citation	Vol. 5 (4), article 48	en_GB
dc.identifier.doi	10.3390/computation5040048
dc.identifier.uri	http://hdl.handle.net/10871/30463
dc.language.iso	en	en_GB
dc.publisher	MDPI	en_GB
dc.rights	© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)	en_GB
dc.subject	GPU clusters	en_GB
dc.subject	heterogeneous clusters	en_GB
dc.subject	hybrid implementation	en_GB
dc.subject	lattice Boltzmann method	en_GB
dc.subject	multilevel parallelism	en_GB
dc.subject	petascale	en_GB
dc.subject	resource assignment	en_GB
dc.subject	scalability	en_GB
dc.title	A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters	en_GB
dc.type	Article	en_GB
dc.identifier.issn	2079-3197
dc.description	This is the author accepted manuscript. The final version is available from MDPI via the DOI in this record.	en_GB
dc.identifier.journal	Computation	en_GB

Files in this item

Name:: 2017 - A holistic scalable ...
Size:: 807.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science

Show simple item record

Show Statistical Information