University of Exeter
Browse

MAD3PG: A Framework for Multi-Agent Deep Denoising Diffusion Policy Gradient Optimization

journal contribution
posted on 2025-12-02, 14:41 authored by Shan Zhong, Gang Wang, Jingkui Zhang, Xiaoyang WangXiaoyang Wang, Kah Chan Teh, He Diao, Ping Zhang, Jiacheng He, Zhi Zeng, Tee Hiang Cheng, Bei Peng
<p dir="ltr">Distributional reinforcement learning enhances policy robustness and learning efficiency by modeling the full value distribution rather than merely estimating its expectation. However, most existing methods are designed for single-agent discrete tasks and typically rely on prior assumptions about the form of the value distribution, such as fixed discrete atoms or quantile regression. These assumptions become restrictive in multi-agent scenarios, where information from different agents must be fused and the resulting joint value distribution often exhibits complex multimodal characteristics that cannot be effectively captured by predefined distributional forms. To address the above limitations, this paper proposes Multi-Agent Deep Denoising Diffusion Policy Gradient Optimization (MAD3PG), a novel framework that leverages diffusion models to model the value distribution of joint state-action pairs. Unlike existing distributional methods that rely on prior assumptions, diffusion models inherently possess the capacity to capture value distribution, thereby demonstrating clear advantages in multi-agent environments. Since true value samples are unavailable during online training, we introduce a K-repeated sampling strategy combined with temporal-difference targets to efficiently optimize the conditional variational evidence lower bound. Furthermore, we provide a theoretical convergence analysis of MAD3PG. We demonstrate MAD3PG’s ability to fit multi-modal distributions through toy examples. Furthermore, we validate its effectiveness by conducting comprehensive experiments in the Multi-Agent Particle Environment (MPE) as well as in the MuJoCo environment. The experimental results indicate that MAD3PG significantly outperforms traditional algorithms based on deterministic value estimation</p>

Funding

Study on Mechanism and Method of Microfluidic Nano-electroporation for Efficient and Controllable Cell Transfection

National Natural Science Foundation of China

Find out more...

China Scholarship Council CSC202406070141

History

Rights

© [2025]. This version is made available under the CC-BY-NC-ND licence: https://creativecommons.org/licenses/by-nc-nd/4.0/

Rights Retention Status

  • No

Submission date

2025-08-24

Notes

This is the author accepted manuscript.

Journal

Information Fusion: An International Journal on Multi-Sensor, Multi-Source Information Fusion

Publisher

Elsevier

Version

  • Accepted Manuscript

Language

en

Department

  • Computer Science