University of Exeter
Browse

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization

Download (1.95 MB)
journal contribution
posted on 2025-08-02, 10:41 authored by HR Maier, F Zheng, H Gupta, J Chen, J Mai, D Savic, R Loritz, W Wu, D Guo, A Bennett, A Jakeman, S Razavi, J Zhao
Models play a pivotal role in advancing our understanding of Earth's physical nature and environmental systems, aiding in their efficient planning and management. The accuracy and reliability of these models heavily rely on data, which are generally partitioned into subsets for model development and evaluation. Surprisingly, how this partitioning is done is often not justified, even though it determines what model we end up with, how we assess its performance and what decisions we make based on the resulting model outputs. In this study, we shed light on the paramount importance of meticulously considering data partitioning in the model development and evaluation process, and its significant impact on model generalization. We identify flaws in existing data-splitting approaches and propose a forward-looking strategy to effectively confront the “elephant in the room”, leading to improved model generalization capabilities.

Funding

52261160379

Australian Research Council (ARC)

DE210100117

National Natural Science Foundation of China

History

Related Materials

Rights

© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Notes

This is the final version. Available on open access from Elsevier via the DOI in this record Data availability: No data was used for the research described in the article.

Journal

Environmental Modelling and Software

Pagination

105779-

Publisher

Elsevier

Version

  • Version of Record

Language

en

FCD date

2023-10-05T08:47:36Z

FOA date

2023-10-05T09:04:38Z

Citation

Vol. 167, article 105779

Department

  • Engineering

Usage metrics

    University of Exeter

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC