posted on 2025-08-02, 11:05authored byF Zhou, O Soremekun, T Chikowore, S Fatumo, I Barroso, AP Morris, JL Asimit
Statistical fine-mapping helps to pinpoint likely causal variants underlying genetic association signals. Its resolution can be improved by (i) leveraging information between traits; and (ii) exploiting differences in linkage disequilibrium structure between diverse population groups. Using association summary statistics, MGflashfm jointly fine-maps signals from multiple traits and population groups; MGfm uses an analogous framework to analyse each trait separately. We also provide a practical approach to fine-mapping with out-of-sample reference panels. In simulation studies we show that MGflashfm and MGfm are well-calibrated and that the mean proportion of causal variants with PP > 0.80 is above 0.75 (MGflashfm) and 0.70 (MGfm). In our analysis of four lipids traits across five population groups, MGflashfm gives a median 99% credible set reduction of 10.5% over MGfm. MGflashfm and MGfm only require summary level data, making them very useful fine-mapping tools in consortia efforts where individual-level data cannot be shared.
Funding
214205/Z/18/Z
MC_UU_00002/4
MR/R021368/1
MR/W029626/1
Medical Research Council (MRC)
National Institute for Health and Care Research (NIHR)
This is the final version. Available on open access from nature Research via the DOI in this record
Data availability:
The GLGC lipids traits GWAS summary statistics from five genetically similar groups are freely available from http://csg.sph.umich.edu/willer/public/glgc-lipids2021/results/ancestry_specific/. Reference panels for LD and LD scores were generated from the 1000 Genomes data available at https://ctg.cncr.nl/software/MAGMA/ref_data/. The detailed data results of our multi-group multi-trait fine-mapping GLGC results are given in Supplementary Data 1. For ease of access, they are also deposited in a FigShare public data repository (https://doi.org/10.6084/m9.figshare.2326670332). Positions are given according to hg19/build 37.
Code availability:
Our proposed multi-group fine-mapping methods, MGflashfm and MGfm, are freely available as an R library at https://jennasimit.github.io/MGflashfm/ (https://doi.org/10.5281/zenodo.797453533). This library also includes updated versions of expanded JAM and flashfm that have dynamic selection of the maximum number of causal variants, as learned from the data. Custom code for the analysis of the GLGC data is available at https://github.com/fz-cambridge/MGflashfm-GLGC-analysis (https://doi.org/10.5281/zenodo.1003453634). Trait genetic correlations were estimated using LD scores (v1.0.1, https://github.com/bulik/ldsc) together with MTAR (http://www.github.com/baolinwu/MTAR). We simulated genotype data with hapgen2 (http://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html). The annotation tool we used is Ensembl VEP GRCh37 (https://grch37.ensembl.org/info/docs/tools/vep/index.html).