Estimating disease prevalence in large datasets using genetic risk scores
dc.contributor.author | Evans, BD | |
dc.contributor.author | Słowiński, P | |
dc.contributor.author | Hattersley, AT | |
dc.contributor.author | Jones, SE | |
dc.contributor.author | Sharp, S | |
dc.contributor.author | Kimmitt, RA | |
dc.contributor.author | Weedon, MN | |
dc.contributor.author | Oram, RA | |
dc.contributor.author | Tsaneva-Atanasova, K | |
dc.contributor.author | Thomas, NJ | |
dc.date.accessioned | 2021-11-09T14:58:59Z | |
dc.date.issued | 2021-11-08 | |
dc.date.updated | 2021-11-08T12:35:42Z | |
dc.description.abstract | Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria. | en_GB |
dc.description.sponsorship | Wellcome Trust | en_GB |
dc.description.sponsorship | Engineering and Physical Sciences Research Council (EPSRC) | en_GB |
dc.description.sponsorship | National Institute for Health Research (NIHR) | en_GB |
dc.description.sponsorship | Diabetes UK | en_GB |
dc.description.sponsorship | Medical Research Council (MRC) | en_GB |
dc.identifier.citation | Vol. 12, article 6441 | en_GB |
dc.identifier.doi | https://doi.org/10.1038/s41467-021-26501-7 | |
dc.identifier.grantnumber | WT204909MA | en_GB |
dc.identifier.grantnumber | 204909/Z/16/Z | en_GB |
dc.identifier.grantnumber | EP/N014391/1 | en_GB |
dc.identifier.grantnumber | EP/T017856/1 | en_GB |
dc.identifier.grantnumber | 17/000575 | en_GB |
dc.identifier.grantnumber | 16/0005529 | en_GB |
dc.identifier.grantnumber | WT097835MF | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/127703 | |
dc.identifier | ORCID: 0000-0002-6612-9902 (Słowiński, Piotr) | |
dc.identifier | ORCID: 0000-0001-5620-473X (Hattersley, Andrew T) | |
dc.identifier | ORCID: 0000-0003-3581-8980 (Oram, Richard A) | |
dc.language.iso | en | en_GB |
dc.publisher | Nature Research | en_GB |
dc.relation.url | https://doi.org/10.5281/zenodo.5512651 | en_GB |
dc.relation.url | https://github.com/bdevans/DPE | en_GB |
dc.rights | © The Author(s) 2021. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. | en_GB |
dc.title | Estimating disease prevalence in large datasets using genetic risk scores | en_GB |
dc.type | Article | en_GB |
dc.date.available | 2021-11-09T14:58:59Z | |
exeter.article-number | 6441 | |
dc.description | This is the final version. Available on open access from Nature Research via the DOI in this record | en_GB |
dc.description | Data availability: UK Biobank data can be obtained after completing an online application, see details at http://www.ukbiobank.ac.uk/using-the-resource/ Wellcome Trust Case Control Consortium genotype data can be obtained through by application to the Wellcome Trust Case Control Consortium Data Access Committee. The procedure is described in more detail at https://www.wtccc.org.uk/info/access_to_data_samples.html. | en_GB |
dc.description | Code availability: The Distribution Proportion Estimation software (v1.0.0) used to analyse the data was developed and tested in Python 3.8.2 and Matlab release 2020b (that includes other algorithms mentioned in the manuscript). The Distribution Proportion Estimation software (v1.0.0) implementing these methods is archived at https://doi.org/10.5281/zenodo.5512651. The code is open-source and available under version-control here: https://github.com/bdevans/DPE. | en_GB |
dc.identifier.eissn | 2041-1723 | |
dc.identifier.journal | Nature Communications | en_GB |
dc.relation.ispartof | Nature Communications, 12(1) | |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | en_GB |
dcterms.dateAccepted | 2021-09-30 | |
rioxxterms.version | VoR | en_GB |
rioxxterms.licenseref.startdate | 2021-11-08 | |
rioxxterms.type | Journal Article/Review | en_GB |
refterms.dateFCD | 2021-11-08T14:57:43Z | |
refterms.versionFCD | VoR | |
refterms.dateFOA | 2021-11-09T14:59:06Z | |
refterms.panel | A | en_GB |
refterms.dateFirstOnline | 2021-11-08 |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's licence is described as © The Author(s) 2021. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.