Inc-part: incremental partitioning for load balancing in large-scale behavioral simulations
IEEE Transactions on Parallel and Distributed Systems
Institute of Electrical and Electronics Engineers (IEEE)
This is the author accepted manuscript. The final version is available from Institute of Electrical and Electronics Engineers (IEEE) via the DOI in this record.
© 2014 IEEE. Large-scale behavioral simulations are widely used to study real-world multi-agent systems. Such programs normally run in discrete time-steps or ticks, with simulated space decomposed into domains that are distributed over a set of workers to achieve parallelism. A distinguishing feature of behavioral simulations is their frequent and high-volume group migration, the phenomenon in which simulated objects traverse domains in groups at massive scale in each tick. This results in continual and significant load imbalance among domains. To tackle this problem, traditional load balancing approaches either require excessive load re-profiling and redistribution, which lead to high computation/communication costs, or perform poorly because their statically partitioned data domains cannot reflect load changes brought by group migration. In this paper, we propose an effective and low-cost load balancing scheme, named Inc-part, based on a key observation that an object is unlikely to move a long distance (across many domains) within a single tick. This localized mobility property allows one to efficiently estimate the load of a dynamic domain incrementally, based on merely the load changes occurring in its neighborhood. The domains experiencing significant load changes are then partitioned or merged, and redistributed to redress load imbalance among the workers. Experiments on a 64-node (1,024-core) platform show that Inc-part can attain excellent load balance with dramatically lowered costs compared to state-of-the-art solutions.
Vol. 26, pp. 1900 - 1909