A case study for a new invasive extension of Intel’s threading building blocks
Association for Computing Machinery (ACM)
Reason for embargo
Currently under an indefinite embargo pending publication by the publisher
We study codes deploying multiple MPI ranks to one node where each rank is parallelised with TBB. A static assignment of cores to ranks here is disadvantageous if the load is not perfectly balanced, the runtime is subject to fluctuations or one MPI rank runs through phases with low concurrency. We propose an extension to TBB where developers manually annotate which code parts could exploit further cores. The cores are then dynamically associated with ranks. Our approach is decentralised, lightweight and minimally invasive w.r.t. code modifications. Some brief performance studies suggest that a flexible, permanently changing assignment of cores to compute ranks can outperform a static distribution, while greedily haggling over cores throughout a simulation might perform even better.
The authors appreciate support received from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 671698 (ExaHyPE). Thanks are due to all members of the ExaHyPE consortium who made this research possible, notably Dominic E. Charrier and Benjamin Hazelwood. This work made use of the facilities of the Hamilton HPC Service of Durham University. Both authors appreciate former funding through the Transregional Collaborative Research Centre 89—Invasive Computing (DFG funded). REFERENCE
3rd COSH Workshop on Co-Scheduling of HPC Applications, 23 January 2018, Manchester, UK
This is the author accepted manuscript.
Awaiting citation and DOI in ACM Digital Library