Show simple item record

dc.contributor.authorHuang, T
dc.contributor.authorZhu, Z
dc.contributor.authorJin, G
dc.contributor.authorLiu, L
dc.contributor.authorWang, Z
dc.contributor.authorLiu, S
dc.date.accessioned2025-02-24T14:51:28Z
dc.date.issued2025
dc.date.updated2025-02-24T13:23:30Z
dc.description.abstractLarge Language Models (LLMs) have demonstrated exceptional performance across diverse tasks, yet their training remains highly resource intensive and susceptible to critical challenges such as training instability. A predominant source of this instability stems from gradient and loss spikes, which disrupt the learning process, often leading to costly interventions like checkpoint recovery and experiment restarts, further amplifying inefficiencies. This paper presents a comprehensive investigation into gradient spikes observed during LLM training, revealing their prevalence across multiple architectures and datasets. Our analysis shows that these spikes can be up to 1000× larger than typical gradients, substantially deteriorating model performance. To address this issue, we propose Spike-Aware Adam with Momentum Reset (SPAM), a novel optimizer designed to counteract gradient spikes through momentum reset and spike-aware gradient clipping. Extensive experiments, including both pre-training and fine-tuning, demonstrate that SPAM consistently surpasses Adam and its variants across a range of model scales. Additionally, SPAM facilitates memory-efficient training by enabling sparse momentum, where only a subset of momentum terms are maintained and updated. When operating under memory constraints, SPAM outperforms state-of-the-art memory-efficient optimizers such as GaLore and Adam-Mini. Our work underscores the importance of mitigating gradient spikes in LLM training and introduces an effective optimization strategy that enhances both training stability and resource efficiency at scale. Code is available at https://github.com/TianjinYellow/SPAM-Optimizer.giten_GB
dc.identifier.citationICLR 2025 - The Thirteenth International Conference on Learning Representations, 24 - 28 April 2025, Singapore. Awaiting full citation and linken_GB
dc.identifier.urihttp://hdl.handle.net/10871/140194
dc.identifierORCID: 0000-0002-7740-8843 (Huang, Tianjin)
dc.language.isoenen_GB
dc.publisherInternational Conference on Learning Representationsen_GB
dc.relation.urlhttps://iclr.cc/Conferences/2025en_GB
dc.relation.urlhttps://iclr.cc/virtual/2025/papers.htmlen_GB
dc.relation.urlhttps://iclr.cc/virtual/2025/poster/30015en_GB
dc.relation.urlhttps://github.com/TianjinYellow/SPAM-Optimizer.giten_GB
dc.rights.embargoreasonUnder embargo until close of conferenceen_GB
dc.rights© 2025 The author(s)en_GB
dc.titleSPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Trainingen_GB
dc.typeConference paperen_GB
dc.date.available2025-02-24T14:51:28Z
dc.descriptionThis is the final version.en_GB
dc.rights.urihttp://www.rioxx.net/licenses/all-rights-reserveden_GB
rioxxterms.versionVoRen_GB
rioxxterms.licenseref.startdate2025-01-12
rioxxterms.typeConference Paper/Proceeding/Abstracten_GB
refterms.dateFCD2025-02-24T14:31:50Z
refterms.versionFCDVoR
refterms.panelBen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record