Show simple item record

dc.contributor.authorEvans, BD
dc.contributor.authorSłowiński, P
dc.contributor.authorHattersley, AT
dc.contributor.authorJones, SE
dc.contributor.authorSharp, S
dc.contributor.authorKimmitt, RA
dc.contributor.authorWeedon, MN
dc.contributor.authorOram, RA
dc.contributor.authorTsaneva-Atanasova, K
dc.contributor.authorThomas, NJ
dc.date.accessioned2021-11-09T14:58:59Z
dc.date.issued2021-11-08
dc.date.updated2021-11-08T12:35:42Z
dc.description.abstractClinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria.en_GB
dc.description.sponsorshipWellcome Trusten_GB
dc.description.sponsorshipEngineering and Physical Sciences Research Council (EPSRC)en_GB
dc.description.sponsorshipNational Institute for Health Research (NIHR)en_GB
dc.description.sponsorshipDiabetes UKen_GB
dc.description.sponsorshipMedical Research Council (MRC)en_GB
dc.identifier.citationVol. 12, article 6441en_GB
dc.identifier.doihttps://doi.org/10.1038/s41467-021-26501-7
dc.identifier.grantnumberWT204909MAen_GB
dc.identifier.grantnumber204909/Z/16/Zen_GB
dc.identifier.grantnumberEP/N014391/1en_GB
dc.identifier.grantnumberEP/T017856/1en_GB
dc.identifier.grantnumber17/000575en_GB
dc.identifier.grantnumber16/0005529en_GB
dc.identifier.grantnumberWT097835MFen_GB
dc.identifier.urihttp://hdl.handle.net/10871/127703
dc.identifierORCID: 0000-0002-6612-9902 (Słowiński, Piotr)
dc.identifierORCID: 0000-0001-5620-473X (Hattersley, Andrew T)
dc.identifierORCID: 0000-0003-3581-8980 (Oram, Richard A)
dc.language.isoenen_GB
dc.publisherNature Researchen_GB
dc.relation.urlhttps://doi.org/10.5281/zenodo.5512651en_GB
dc.relation.urlhttps://github.com/bdevans/DPEen_GB
dc.rights© The Author(s) 2021. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.en_GB
dc.titleEstimating disease prevalence in large datasets using genetic risk scoresen_GB
dc.typeArticleen_GB
dc.date.available2021-11-09T14:58:59Z
exeter.article-number6441
dc.descriptionThis is the final version. Available on open access from Nature Research via the DOI in this recorden_GB
dc.descriptionData availability: UK Biobank data can be obtained after completing an online application, see details at http://www.ukbiobank.ac.uk/using-the-resource/ Wellcome Trust Case Control Consortium genotype data can be obtained through by application to the Wellcome Trust Case Control Consortium Data Access Committee. The procedure is described in more detail at https://www.wtccc.org.uk/info/access_to_data_samples.html.en_GB
dc.descriptionCode availability: The Distribution Proportion Estimation software (v1.0.0) used to analyse the data was developed and tested in Python 3.8.2 and Matlab release 2020b (that includes other algorithms mentioned in the manuscript). The Distribution Proportion Estimation software (v1.0.0) implementing these methods is archived at https://doi.org/10.5281/zenodo.5512651. The code is open-source and available under version-control here: https://github.com/bdevans/DPE.en_GB
dc.identifier.eissn2041-1723
dc.identifier.journalNature Communicationsen_GB
dc.relation.ispartofNature Communications, 12(1)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_GB
dcterms.dateAccepted2021-09-30
rioxxterms.versionVoRen_GB
rioxxterms.licenseref.startdate2021-11-08
rioxxterms.typeJournal Article/Reviewen_GB
refterms.dateFCD2021-11-08T14:57:43Z
refterms.versionFCDVoR
refterms.dateFOA2021-11-09T14:59:06Z
refterms.panelAen_GB
refterms.dateFirstOnline2021-11-08


Files in this item

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2021. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Except where otherwise noted, this item's licence is described as © The Author(s) 2021. Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.