dc.contributor.author | Mills, J | |
dc.contributor.author | Hu, J | |
dc.contributor.author | Min, G | |
dc.date.accessioned | 2023-05-17T08:03:35Z | |
dc.date.issued | 2023-05-17 | |
dc.date.updated | 2023-05-16T14:33:12Z | |
dc.description.abstract | In Federated Learning (FL) client devices connected over the internet collaboratively train a machine learning model without
sharing their private data with a central server or with other clients. The seminal Federated Averaging (FedAvg) algorithm trains a
single global model by performing rounds of local training on clients followed by model averaging. FedAvg can improve the
communication-efficiency of training by performing more steps of Stochastic Gradient Descent (SGD) on clients in each round.
However, client data in real-world FL is highly heterogeneous, which has been extensively shown to slow model convergence and harm
final performance when K > 1 steps of SGD are performed on clients per round. In this work we propose decaying K as training
progresses, which can jointly improve the final performance of the FL model whilst reducing the wall-clock time and the total
computational cost of training compared to using a fixed K. We analyse the convergence of FedAvg with decaying K for
strongly-convex objectives, providing novel insights into the convergence properties, and derive three theoretically-motivated decay
schedules for K. We then perform thorough experiments on four benchmark FL datasets (FEMNIST, CIFAR100, Sentiment140,
Shakespeare) to show the real-world benefit of our approaches in terms of real-world convergence time, computational cost, and
generalisation performance | en_GB |
dc.description.sponsorship | Engineering and Physical Sciences Research Council (EPSRC) | en_GB |
dc.description.sponsorship | UK Research and Innovation | en_GB |
dc.description.sponsorship | European Union Horizon 2020 | en_GB |
dc.identifier.citation | Published online 17 May 2023 | en_GB |
dc.identifier.doi | 10.1109/TPDS.2023.3277367 | |
dc.identifier.grantnumber | EP/X019160/1 | en_GB |
dc.identifier.grantnumber | EP/X038866/1 | en_GB |
dc.identifier.grantnumber | 101086159 | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/133155 | |
dc.identifier | ORCID: 0000-0001-5406-8420 (Hu, Jia) | |
dc.language.iso | en | en_GB |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | en_GB |
dc.rights | © 2023 IEEE | |
dc.subject | Federated Learning | en_GB |
dc.subject | Deep Learning | en_GB |
dc.subject | Edge Computing | en_GB |
dc.subject | Computational Efficiency | en_GB |
dc.title | Faster Federated Learning with decaying number of local SGD steps | en_GB |
dc.type | Article | en_GB |
dc.date.available | 2023-05-17T08:03:35Z | |
dc.identifier.issn | 1045-9219 | |
dc.description | This is the author accepted manuscript. The final version is available from IEEE via the DOI in this record | en_GB |
dc.identifier.eissn | 1558-2183 | |
dc.identifier.journal | IEEE Transactions on Parallel and Distributed Systems | en_GB |
dc.rights.uri | http://www.rioxx.net/licenses/all-rights-reserved | en_GB |
dcterms.dateAccepted | 2023-05-09 | |
dcterms.dateSubmitted | 2021-10-27 | |
rioxxterms.version | AM | en_GB |
rioxxterms.licenseref.startdate | 2023-05-09 | |
rioxxterms.type | Journal Article/Review | en_GB |
refterms.dateFCD | 2023-05-16T14:33:14Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2023-05-26T13:56:54Z | |
refterms.panel | B | en_GB |