Show simple item record

dc.contributor.authorDoherty, T
dc.contributor.authorDempster, E
dc.contributor.authorHannon, E
dc.contributor.authorMill, J
dc.contributor.authorPoulton, R
dc.contributor.authorCorcoran, D
dc.contributor.authorSugden, K
dc.contributor.authorWilliams, B
dc.contributor.authorCaspi, A
dc.contributor.authorMoffitt, TE
dc.contributor.authorDelany, SJ
dc.contributor.authorMurphy, TM
dc.date.accessioned2023-07-07T13:04:09Z
dc.date.issued2023-05-01
dc.date.updated2023-07-07T11:43:50Z
dc.description.abstractBACKGROUND: The field of epigenomics holds great promise in understanding and treating disease with advances in machine learning (ML) and artificial intelligence being vitally important in this pursuit. Increasingly, research now utilises DNA methylation measures at cytosine-guanine dinucleotides (CpG) to detect disease and estimate biological traits such as aging. Given the challenge of high dimensionality of DNA methylation data, feature-selection techniques are commonly employed to reduce dimensionality and identify the most important subset of features. In this study, our aim was to test and compare a range of feature-selection methods and ML algorithms in the development of a novel DNA methylation-based telomere length (TL) estimator. We utilised both nested cross-validation and two independent test sets for the comparisons. RESULTS: We found that principal component analysis in advance of elastic net regression led to the overall best performing estimator when evaluated using a nested cross-validation analysis and two independent test cohorts. This approach achieved a correlation between estimated and actual TL of 0.295 (83.4% CI [0.201, 0.384]) on the EXTEND test data set. Contrastingly, the baseline model of elastic net regression with no prior feature reduction stage performed less well in general-suggesting a prior feature-selection stage may have important utility. A previously developed TL estimator, DNAmTL, achieved a correlation of 0.216 (83.4% CI [0.118, 0.310]) on the EXTEND data. Additionally, we observed that different DNA methylation-based TL estimators, which have few common CpGs, are associated with many of the same biological entities. CONCLUSIONS: The variance in performance across tested approaches shows that estimators are sensitive to data set heterogeneity and the development of an optimal DNA methylation-based estimator should benefit from the robust methodological approach used in this study. Moreover, our methodology which utilises a range of feature-selection approaches and ML algorithms could be applied to other biological markers and disease phenotypes, to examine their relationship with DNA methylation and predictive value.en_GB
dc.description.sponsorshipScience Foundation Irelanden_GB
dc.description.sponsorshipBrain and Behaviour Research Foundation (BBF)en_GB
dc.description.sponsorshipNational Institute for Health and Care Research (NIHR)en_GB
dc.description.sponsorshipNew Zealand Health Research Councilen_GB
dc.description.sponsorshipNew Zealand Ministry of Business, Innovation and Employmenten_GB
dc.description.sponsorshipNational Institutes of Health National Institute of Agingen_GB
dc.description.sponsorshipMedical Research Council (MRC)en_GB
dc.description.sponsorshipJacobs Foundationen_GB
dc.identifier.citationVol. 24(1), article 178en_GB
dc.identifier.doihttps://doi.org/10.1186/s12859-023-05282-4
dc.identifier.grantnumber18/CRT/6183en_GB
dc.identifier.grantnumberR01AG032282en_GB
dc.identifier.grantnumberMR/P005918/1en_GB
dc.identifier.urihttp://hdl.handle.net/10871/133566
dc.identifierORCID: 0000-0001-6840-072X (Hannon, Eilis)
dc.language.isoenen_GB
dc.publisherBMCen_GB
dc.relation.urlhttps://www.ncbi.nlm.nih.gov/pubmed/37127563en_GB
dc.relation.urlhttps://github.com/trevordoherty/DNA-methylation-based-Telomere-Length-estimatoren_GB
dc.relation.urlhttps://moffittcaspi.trinity.duke.edu/research-topics/dunedinen_GB
dc.rights© The Author(s) 2023, corrected publication 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the dataen_GB
dc.subjectAgingen_GB
dc.subjectDNA Methylationen_GB
dc.subjectFeature Reductionen_GB
dc.subjectFeature Selectionen_GB
dc.subjectMachine Learningen_GB
dc.subjectTelomere Lengthen_GB
dc.titleA comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimatoren_GB
dc.typeArticleen_GB
dc.date.available2023-07-07T13:04:09Z
dc.identifier.issn1471-2105
exeter.article-number178
exeter.place-of-publicationEngland
dc.descriptionThis is the final version. Available on open access from BMC via the DOI in this recorden_GB
dc.descriptionAvailability of data and materials: Source code and scripts are available in the GitHub repository https://github.com/trevordoherty/DNA-methylation-based-Telomere-Length-estimator. The Dunedin Study datasets reported in the current article are not publicly available due to a lack of informed consent and ethical approval for public data sharing. The Dunedin study datasets are available on request by qualified scientists. Requests require a concept paper describing the purpose of data access, ethical approval at the applicant’s university and provision for secure data access (https://moffittcaspi.trinity.duke.edu/research-topics/dunedin). We offer secure access on the Duke, Otago and King’s College campuses. For the TWIN study, data is freely available in the supplemental files of the previously published article [51]. The EXTEND study data is deposited in the Gene Expression Omnibus (GEO) database (accession number: GSE113725). For further information on data availability, please contact the corresponding author.en_GB
dc.identifier.eissn1471-2105
dc.identifier.journalBMC Bioinformaticsen_GB
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_GB
dcterms.dateAccepted2023-04-11
rioxxterms.versionVoRen_GB
rioxxterms.licenseref.startdate2023-05-01
rioxxterms.typeJournal Article/Reviewen_GB
refterms.dateFCD2023-07-07T12:59:39Z
refterms.versionFCDVoR
refterms.dateFOA2023-07-07T13:04:10Z
refterms.panelAen_GB
refterms.dateFirstOnline2023-05-01


Files in this item

This item appears in the following Collection(s)

Show simple item record

© The Author(s) 2023, corrected publication 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. 
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in 
a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of 
this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data
Except where otherwise noted, this item's licence is described as © The Author(s) 2023, corrected publication 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data