Show simple item record

dc.contributor.authorMaier, HR
dc.contributor.authorZheng, F
dc.contributor.authorGupta, H
dc.contributor.authorChen, J
dc.contributor.authorMai, J
dc.contributor.authorSavic, D
dc.contributor.authorLoritz, R
dc.contributor.authorWu, W
dc.contributor.authorGuo, D
dc.contributor.authorBennett, A
dc.contributor.authorJakeman, A
dc.contributor.authorRazavi, S
dc.contributor.authorZhao, J
dc.date.accessioned2023-10-05T09:04:37Z
dc.date.issued2023-07-31
dc.date.updated2023-10-05T06:45:46Z
dc.description.abstractModels play a pivotal role in advancing our understanding of Earth's physical nature and environmental systems, aiding in their efficient planning and management. The accuracy and reliability of these models heavily rely on data, which are generally partitioned into subsets for model development and evaluation. Surprisingly, how this partitioning is done is often not justified, even though it determines what model we end up with, how we assess its performance and what decisions we make based on the resulting model outputs. In this study, we shed light on the paramount importance of meticulously considering data partitioning in the model development and evaluation process, and its significant impact on model generalization. We identify flaws in existing data-splitting approaches and propose a forward-looking strategy to effectively confront the “elephant in the room”, leading to improved model generalization capabilities.en_GB
dc.description.sponsorshipNational Natural Science Foundation of Chinaen_GB
dc.description.sponsorshipAustralian Research Council (ARC)en_GB
dc.format.extent105779-
dc.identifier.citationVol. 167, article 105779en_GB
dc.identifier.doihttps://doi.org/10.1016/j.envsoft.2023.105779
dc.identifier.grantnumber52261160379en_GB
dc.identifier.grantnumberDE210100117en_GB
dc.identifier.urihttp://hdl.handle.net/10871/134168
dc.identifierORCID: 0000-0001-9567-9041 (Savic, Dragan)
dc.identifierScopusID: 35580202000 (Savic, Dragan)
dc.identifierResearcherID: G-2071-2012 | L-8559-2019 (Savic, Dragan)
dc.language.isoenen_GB
dc.publisherElsevieren_GB
dc.rights© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).en_GB
dc.subjectModel developmenten_GB
dc.subjectModel evaluationen_GB
dc.subjectData partitioningen_GB
dc.subjectData splittingen_GB
dc.subjectCalibrationen_GB
dc.subjectValidationen_GB
dc.subjectUncertaintyen_GB
dc.subjectEarth systemsen_GB
dc.titleOn how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalizationen_GB
dc.typeArticleen_GB
dc.date.available2023-10-05T09:04:37Z
dc.identifier.issn1364-8152
exeter.article-number105779
dc.descriptionThis is the final version. Available on open access from Elsevier via the DOI in this recorden_GB
dc.descriptionData availability: No data was used for the research described in the article.en_GB
dc.identifier.eissn1873-6726
dc.identifier.journalEnvironmental Modelling and Softwareen_GB
dc.relation.ispartofEnvironmental Modelling & Software, 167
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_GB
dcterms.dateAccepted2023-07-27
rioxxterms.versionVoRen_GB
rioxxterms.licenseref.startdate2023-07-31
rioxxterms.typeJournal Article/Reviewen_GB
refterms.dateFCD2023-10-05T08:47:36Z
refterms.versionFCDVoR
refterms.dateFOA2023-10-05T09:04:38Z
refterms.panelBen_GB


Files in this item

This item appears in the following Collection(s)

Show simple item record

© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Except where otherwise noted, this item's licence is described as © 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).