Guidance for DNA methylation studies: Statistical insights from the Illumina EPIC array
dc.contributor.author | Mansell, G | |
dc.contributor.author | Gorrie-Stone, TJ | |
dc.contributor.author | Bao, Y | |
dc.contributor.author | Kumari, M | |
dc.contributor.author | Schalkwyk, LS | |
dc.contributor.author | Mill, J | |
dc.contributor.author | Hannon, E | |
dc.date.accessioned | 2019-09-27T10:50:57Z | |
dc.date.issued | 2019-05-14 | |
dc.description.abstract | Background: There has been a steady increase in the number of studies aiming to identify DNA methylation differences associated with complex phenotypes. Many of the challenges of epigenetic epidemiology regarding study design and interpretation have been discussed in detail, however there are analytical concerns that are outstanding and require further exploration. In this study we seek to address three analytical issues. First, we quantify the multiple testing burden and propose a standard statistical significance threshold for identifying DNA methylation sites that are associated with an outcome. Second, we establish whether linear regression, the chosen statistical tool for the majority of studies, is appropriate and whether it is biased by the underlying distribution of DNA methylation data. Finally, we assess the sample size required for adequately powered DNA methylation association studies. Results: We quantified DNA methylation in the Understanding Society cohort (n = 1175), a large population based study, using the Illumina EPIC array to assess the statistical properties of DNA methylation association analyses. By simulating null DNA methylation studies, we generated the distribution of p-values expected by chance and calculated the 5% family-wise error for EPIC array studies to be 9 × 10- 8. Next, we tested whether the assumptions of linear regression are violated by DNA methylation data and found that the majority of sites do not satisfy the assumption of normal residuals. Nevertheless, we found no evidence that this bias influences analyses by increasing the likelihood of affected sites to be false positives. Finally, we performed power calculations for EPIC based DNA methylation studies, demonstrating that existing studies with data on ∼ 1000 samples are adequately powered to detect small differences at the majority of sites. Conclusion: We propose that a significance threshold of P < 9 × 10- 8 adequately controls the false positive rate for EPIC array DNA methylation studies. Moreover, our results indicate that linear regression is a valid statistical methodology for DNA methylation studies, despite the fact that the data do not always satisfy the assumptions of this test. These findings have implications for epidemiological-based studies of DNA methylation and provide a framework for the interpretation of findings from current and future studies. | en_GB |
dc.description.sponsorship | Medical Research Council (MRC) | en_GB |
dc.identifier.citation | Vol. 20, 366 | en_GB |
dc.identifier.doi | 10.1186/s12864-019-5761-7 | |
dc.identifier.grantnumber | K013807 | en_GB |
dc.identifier.uri | http://hdl.handle.net/10871/38927 | |
dc.language.iso | en | en_GB |
dc.publisher | BMC (Springer Nature) | en_GB |
dc.rights | This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. | en_GB |
dc.subject | DNA methylation | en_GB |
dc.subject | Epigenome-wide association study (EWAS) | en_GB |
dc.subject | Multiple testing | en_GB |
dc.subject | Illumina EPIC array | en_GB |
dc.subject | Power | en_GB |
dc.title | Guidance for DNA methylation studies: Statistical insights from the Illumina EPIC array | en_GB |
dc.type | Article | en_GB |
dc.date.available | 2019-09-27T10:50:57Z | |
dc.identifier.issn | 1471-2164 | |
dc.description | This si the author accepted manuscript. The final version is available from BMC via the DOI in this record. | en_GB |
dc.identifier.journal | BMC Genomics | en_GB |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | en_GB |
dcterms.dateAccepted | 2019-05-02 | |
exeter.funder | ::Medical Research Council (MRC) | en_GB |
rioxxterms.version | AM | en_GB |
rioxxterms.licenseref.startdate | 2019-05-14 | |
rioxxterms.type | Journal Article/Review | en_GB |
refterms.dateFCD | 2019-09-27T10:47:11Z | |
refterms.versionFCD | AM | |
refterms.dateFOA | 2019-09-27T10:51:00Z | |
refterms.panel | A | en_GB |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's licence is described as This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.