Show simple item record

dc.contributor.authorVellame, DS
dc.contributor.authorShireby, G
dc.contributor.authorMacCalman, A
dc.contributor.authorDempster, EL
dc.contributor.authorBurrage, J
dc.contributor.authorGorrie-Stone, T
dc.contributor.authorSchalkwyk, LS
dc.contributor.authorMill, J
dc.contributor.authorHannon, E
dc.date.accessioned2023-01-10T13:46:45Z
dc.date.issued2022-12-20
dc.date.updated2023-01-10T13:16:19Z
dc.description.abstractThe majority of epigenetic epidemiology studies to date have generated genome-wide profiles from bulk tissues (e.g., whole blood) however these are vulnerable to confounding from variation in cellular composition. Proxies for cellular composition can be mathematically derived from the bulk tissue profiles using a deconvolution algorithm; however, there is no method to assess the validity of these estimates for a dataset where the true cellular proportions are unknown. In this study, we describe, validate and characterize a sample level accuracy metric for derived cellular heterogeneity variables. The CETYGO score captures the deviation between a sample's DNA methylation profile and its expected profile given the estimated cellular proportions and cell type reference profiles. We demonstrate that the CETYGO score consistently distinguishes inaccurate and incomplete deconvolutions when applied to reconstructed whole blood profiles. By applying our novel metric to >6,300 empirical whole blood profiles, we find that estimating accurate cellular composition is influenced by both technical and biological variation. In particular, we show that when using a common reference panel for whole blood, less accurate estimates are generated for females, neonates, older individuals and smokers. Our results highlight the utility of a metric to assess the accuracy of cellular deconvolution, and describe how it can enhance studies of DNA methylation that are reliant on statistical proxies for cellular heterogeneity. To facilitate incorporating our methodology into existing pipelines, we have made it freely available as an R package (https://github.com/ds420/CETYGO).en_GB
dc.description.sponsorshipBiotechnology and Biological Sciences Research Council (BBSRC)en_GB
dc.description.sponsorshipEngineering and Physical Sciences Research Council (EPSRC)en_GB
dc.description.sponsorshipMedical Research Council (MRC)en_GB
dc.description.sponsorshipAlzheimer’s Societyen_GB
dc.identifier.citationPublished online 20 December 2022en_GB
dc.identifier.doihttps://doi.org/10.1080/15592294.2022.2137659
dc.identifier.grantnumberEP/V052527/1en_GB
dc.identifier.grantnumberMR/R005176/1en_GB
dc.identifier.grantnumberMR/K013807/1en_GB
dc.identifier.urihttp://hdl.handle.net/10871/132210
dc.identifierORCID: 0000-0003-1257-5314 (Dempster, Emma L)
dc.identifierORCID: 0000-0003-1115-3224 (Mill, Jonathan)
dc.identifierORCID: 0000-0001-6840-072X (Hannon, Eilis)
dc.language.isoenen_GB
dc.publisherRoutledgeen_GB
dc.relation.urlhttps://www.ncbi.nlm.nih.gov/pubmed/36539387en_GB
dc.relation.urlhttps://github.com/ds420/CETYGOen_GB
dc.relation.urlhttps://github.com/ejh243/CETYGOAnalysesen_GB
dc.rights© 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.en_GB
dc.subjectDNA methylationen_GB
dc.subjectIllumina EPIC arrayen_GB
dc.subjectcellular heterogeneityen_GB
dc.subjectepigenetic epidemiologyen_GB
dc.subjectillumina 450K arrayen_GB
dc.titleUncertainty quantification of reference-based cellular deconvolution algorithmsen_GB
dc.typeArticleen_GB
dc.date.available2023-01-10T13:46:45Z
dc.identifier.issn1559-2294
exeter.place-of-publicationUnited States
dc.descriptionThis is the final version. Available on open access from Routledge via the DOI in this recorden_GB
dc.descriptionData and code availability: The DNAm data used in this study are available as R packages or via GEO (see Supplementary Table 2 for details). We have provided the code for calculating the CETYGO score as an R package available via GitHub (https://github.com/ds420/CETYGO). The code to reproduce the analyses in this manuscript using our R package are also available via GitHub (https://github.com/ejh243/CETYGOAnalyses).en_GB
dc.identifier.eissn1559-2308
dc.identifier.journalEpigeneticsen_GB
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_GB
dcterms.dateAccepted2022-10-12
rioxxterms.versionVoRen_GB
rioxxterms.licenseref.startdate2022-12-20
rioxxterms.typeJournal Article/Reviewen_GB
refterms.dateFCD2023-01-10T13:43:46Z
refterms.versionFCDVoR
refterms.dateFOA2023-01-10T13:46:51Z
refterms.panelAen_GB
refterms.dateFirstOnline2022-12-20


Files in this item

This item appears in the following Collection(s)

Show simple item record

© 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Except where otherwise noted, this item's licence is described as © 2022 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.