Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy
dc.contributor.author | Cardoso, P | |
dc.contributor.author | Dennis, JM | |
dc.contributor.author | Bowden, J | |
dc.contributor.author | Shields, BM | |
dc.contributor.author | McKinley, TJ | |
dc.contributor.author | MASTERMIND Consortium | |
dc.date.accessioned | 2023-12-11T11:04:06Z | |
dc.date.issued | 2024-01-08 | |
dc.date.updated | 2023-12-11T09:57:44Z | |
dc.description.abstract | Background: The handling of missing data is a challenge for inference and regression modelling. A particular challenge is dealing with missing predictor information, particularly when trying to build and make predictions from models for use in clinical practice. Methods: We utilise a flexible Bayesian approach for handling missing predictor information in regression models. This provides practitioners with full posterior predictive distributions for both the missing predictor information (conditional on the observed predictors) and the outcome-of-interest. We apply this approach to a previously proposed counterfactual treatment selection model for type 2 diabetes second-line therapies. Our approach combines a regression model and a Dirichlet process mixture model (DPMM), where the former defines the treatment selection model, and the latter provides a flexible way to model the joint distribution of the predictors. Results: We show that DPMMs can model complex relationships between predictor variables and can provide powerful means of fitting models to incomplete data (under missing-completely-at-random and missing-at-random assumptions). This framework ensures that the posterior distribution for the parameters and the conditional average treatment effect estimates automatically reflect the additional uncertainties associated with missing data due to the hierarchical model structure. We also demonstrate that in the presence of multiple missing predictors, the DPMM model can be used to explore which variable(s), if collected, could provide the most additional information about the likely outcome. Conclusions: When developing clinical prediction models, DPMMs offer a flexible way to model complex covariate structures and handle missing predictor information. DPMM-based counterfactual prediction models can also provide additional information to support clinical decision-making, including allowing predictions with appropriate uncertainty to be made for individuals with incomplete predictor data | en_GB |
dc.description.sponsorship | Medical Research Council (MRC) | en_GB |
dc.description.sponsorship | Research England | en_GB |
dc.identifier.citation | Vol. 24, article 12 | en_GB |
dc.identifier.doi | 10.1186/s12911-023-02400-3 | |
dc.identifier.grantnumber | MR/N00633X/1 | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/134768 | |
dc.identifier | ORCID: 0000-0002-9485-3236 (McKinley, Trevelyan) | |
dc.language.iso | en | en_GB |
dc.publisher | BMC | en_GB |
dc.relation.url | https://cprd.com/research-applications | en_GB |
dc.rights | © The Author(s) 2023. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecom mons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. | |
dc.subject | Dirichlet process mixture model | en_GB |
dc.subject | treatment selection model | en_GB |
dc.subject | precision medicine | en_GB |
dc.subject | type 2 diabetes | en_GB |
dc.subject | Bayesian modelling | en_GB |
dc.title | Dirichlet process mixture models to impute missing predictor data in counterfactual prediction models: an application to predict optimal type 2 diabetes therapy | en_GB |
dc.type | Article | en_GB |
dc.date.available | 2023-12-11T11:04:06Z | |
dc.identifier.issn | 1472-6947 | |
dc.description | This is the final version. Available on open access from BMC via the DOI in this record | en_GB |
dc.description | Availability of data and materials: The routine clinical data analysed during the current study are available in the CPRD repository (CPRD; https://cprd.com/research-applications), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. For re-using these data, an application must be made directly to CPRD. A synthetic sample data is available on GitHub within the repository “PM-Cardoso/DPMM-tsm” | en_GB |
dc.identifier.eissn | 1472-6947 | |
dc.identifier.journal | BMC Medical Informatics and Decision Making | en_GB |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_GB |
dcterms.dateAccepted | 2023-12-11 | |
dcterms.dateSubmitted | 2023-01-25 | |
rioxxterms.version | VoR | en_GB |
rioxxterms.licenseref.startdate | 2023-12-11 | |
rioxxterms.type | Journal Article/Review | en_GB |
refterms.dateFCD | 2023-12-11T09:57:50Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2024-02-02T16:27:05Z | |
refterms.panel | A | en_GB |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's licence is described as © The Author(s) 2023. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecom mons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.