Drainage-structuring of ancestral variation and a common functional pathway shape limited genomic convergence in natural high- and low-predation guppies

Studies of convergence in wild populations have been instrumental in understanding adaptation by providing strong evidence for natural selection. At the genetic level, we are beginning to appreciate that the re-use of the same genes in adaptation occurs through different mechanisms and can be constrained by underlying trait architectures and demographic characteristics of natural populations. Here, we explore these processes in naturally adapted high- (HP) and low-predation (LP) populations of the Trinidadian guppy, Poecilia reticulata. As a model for phenotypic change this system provided some of the earliest evidence of rapid and repeatable evolution in vertebrates; the genetic basis of which has yet to be studied at the whole-genome level. We collected whole-genome sequencing data from ten populations (176 individuals) representing five independent HP-LP river pairs across the three main drainages in Northern Trinidad. We evaluate population structure, uncovering several LP bottlenecks and variable between-river introgression that can lead to constraints on the sharing of adaptive variation between populations. Consequently, we found limited selection on common genes or loci across all drainages. Using a pathway type analysis, however, we find evidence of repeated selection on different genes involved in cadherin signaling. Finally, we found a large repeatedly selected haplotype on chromosome 20 in three rivers from the same drainage. Taken together, despite limited sharing of adaptive variation among rivers, we found evidence of convergent evolution associated with HP-LP environments in pathways across divergent drainages and at a previously unreported candidate haplotype within a drainage.


INTRODUCTION
the replicated adaptation of low-predation (LP) phenotypes from high-predation (HP) sources. 74 For approaching 50 years, this system has provided valuable insights into phenotypic evolution 75 in natural populations, including some of the first evidence of rapid phenotypic evolution in 76 vertebrates across ecological, rather than evolutionary, timescales [38,39]. The guppy has since 77 become a prominent model of phenotypic evolution in nature, but accompanying genomic 78 work has only recently begun to emerge. out Caroni rivers, highlighting population structure within this drainage is stronger than 120 structure between Madamas and Oropouche, despite these rivers being in separate drainages. 121 Similarly, admixture proportions inferred by fineSTRUCTURE were lower on average between 122 rivers within the Caroni drainage than between Oropouche and Madamas ( Figure 1C). This  To quantify genome-wide introgression, we calculated D-statistics for trios with Dsuite (version 148 0.3; [56]). These were used primarily to infer introgression between populations in different 149 rivers, therefore we focussed on trios where P1 and P2 were HP-LP pairs from the same river.
We recovered signals of significant genome-wide introgression in the form of D-statistics 151 (elevated sharing of derived sites between rather than within rivers) between APHP and both  To assess within river population demography (i.e. LP population bottlenecks, HP-LP migration), 167 we performed demographic modelling based on two-dimensional site frequency spectra (2dSFS) 168 using fastsimcoal2 [57]( Figure 1F; Table 1). All demographic models performed better with the 169 addition of migration, which in every case was higher downstream from LP to HP. For three rivers 170 (Aripo, Madamas, Tacarigua) a historic LP bottleneck was detected alongside stable current 171 population size. Guanapo was better supported by a model with no HP population growth, and  Figure S2). Using an intersect of all three may be over-conservative, for example, we 213 would miss instances where divergent selection within a river fixes alternate haplotypes and so 214 both HP and LP population have similarly low heterozygosity (i.e. no XP-EHH outlier but an 215 outlier in XtX and AFD).

217
Comparing the intersecting list of candidates of XtX, AFD and XP-EHH within each river revealed 218 little overlap among rivers, with only a single 10kb window overlapping in more than two rivers 219 ( Figure 2A; for genome-wide plots see Figure S3-5). We then scanned the genome further with 220 100kb sliding windows (50kb increments) to assess potential clustering of outlier windows in 221 larger regions, but this approach similarly revealed little overlap among rivers. We then 222 explored whether outlier regions (10kb windows overlapping in >1 selection scans) were 223 enriched for genes in common biological pathways between rivers using one-to-one zebrafish 224 orthologues, which may suggest repeated pathway modification, albeit through different 225 genes. Using the outlier regions defined above, no pathway was significantly overrepresented 226 in any river. We did however notice cases in which the same pathways exhibited fold-227 enrichments >1 in multiple rivers ( Figure 2B), albeit non-significant within rivers in each case. 228 We used permutations to explore the likelihood of observing fold-enrichment >1 in our five 229 independently-derived outlier sets. This analysis identified that Cadherin-signalling pathway 230 genes were overrepresented across all five rivers relative to by-chance expectations (p = 0.013)

231
( Figure 2B). In total, 20 genes from the Cadherin-signalling pathway were recovered from all 232 five river outlier sets, with some overlap between them (Table S1). This analysis may be over-genes associated with cadherin-signalling were detected by our selection scans, including 235 Cadherin-1 and B-Cadherin in a differentiated region on chromosome 15 (~ 5 Mb) in Oropouche 236 and Tacarigua, but these genes exhibited a many-to-many orthology with zebrafish genes so 237 were omitted. We also examined pathways with fold-enrichment >1 in any four rivers, but 238 these were not significant (p > 0.05) according to permutation tests ( Figure 2B). We next associated allele frequency changes with HP-LP status using BayPass' auxiliary 252 covariate model. This latter approach has the advantage of using all populations together in a 253 single analysis, whilst controlling for genetic covariance. Scans for regions associated with HP-LP 254 classification identified two major clusters of associated 10kb windows on chromosomes 8 and 255 20 ( Figure 2C and 2D). In total, we highlighted 70 10kb windows corresponding to 24 annotated 256 genes (and a number of novel, uncharacterised genes) (Table S2). Intersecting these windows 257 with within-river candidate regions highlighted that most HP-LP associated candidates reflected 258 within-river selection scan outliers in one to three rivers (Table S3). Selection scans may 259 overlook some of our association outlier windows because differentiation at these loci may be   Interestingly, two of the three Caroni LP populations (GLP and TACLP) were fixed or nearly fixed 281 for homozygous ALT haplotypes across the region ( Figure 3A). We will refer to this entire region 282 (~0-2.5 Mb) as the 'CL haplotype' (Caroni LP haplotype). The other haplotype we will refer to as 283 the 'REF haplotype' due to its closer similarity to the reference genome. Moreover, a subset of 284 the candidate region was nearly fixed in APLP (between black lines, between ~1.53 -2.13 Mb, 285 referred to as the CL-AP region Figure 3A). This CL-AP region corresponds to both the region of  from the larger CL haplotype in Aripo only. We therefore explored the potential for inversions 325 and SVs with our aligned read data using smoove [63] (v0.2.5) and Breakdancer [64] (v1.4.5). 326 We did not find evidence for an inversion or an alternative SV spanning either the full CL  It is particularly interesting that given the CL haplotype may be reasonably old, it has been 347 broken down into smaller regions only in the Aripo river, whereas Tacarigua and Guanapo 348 maintain the full haplotype. One unique attribute of Aripo is evidence of introgression from 349 beyond its drainage. We therefore explored introgression signals (D-statistics) along for the REF haplotype (upper cluster, Figure 3C). We looked only at these individuals to remove 352 effects of haplotype heterozygosity in GH and APHP but not OH. This analysis strongly  Using a whole genome sequencing approach, we found a strong candidate haplotype for HP-LP 385 convergence within the Caroni drainage of Northern Trinidad (the only drainage where we have 386 multiple rivers sequenced), but we also found molecular convergence at specific loci is limited 387 among rivers from different drainages. Further, we find evidence that convergence at the level 388 of functional pathways among rivers may facilitate phenotypic convergence across all rivers. phenotypes. This haplotype has largely been maintained as a tightly linked unit, with the 393 exception of Aripo, and likely evolved in these LP populations from shared variation common and demographic histories across Northern Trinidad suggest that the reduced re-use of the 396 same alleles among drainages may be due to strict structuring of genetic variation between 397 some rivers and recurrent bottlenecks during the founding of LP populations from HP sources. 398 Combined, these processes limit shared ancestral genetic variation from which convergent 399 genetic adaptation may occur. This is not true for all rivers however, with some gene flow   Within the CL-AP region we highlighted the plppr5 and plppr4 genes as strong candidates for 435 HP-LP adaptation. These genes correspond to the strongest signals of HP-LP association within 436 the CL-AP region, and in particular the plppr5 allele on the CL haplotype is associated with an exon-subsuming deletion ( Figure S13B). There is limited functional evidence for these genes,    were omitted from our pathway analysis due to many-to-one orthology with zebrafish genes.  (Table S1) is beyond the scope of this work, but in identifying this pathway across rivers we 514 provide evidence for a potentially shared mechanism by which HP-LP phenotypes may similarly 515 evolve across Northern Trinidad.

584
We also estimated D-statistics for all 120 possible trios of the 10 populations using Dsuite [56]. 585 Here, sequencing data from 10 P. picta individuals (5 males, 5 females) was aligned to the male 586 guppy genome using the protocol above to be used as an outgroup. bcftools consensus was 587 used to create a P. picta fasta file based on the final SNP variant calls and ancestral alleles were 588 inserted into the P. reticulata VCF files using vcftools fill-aa. We highlighted introgression 589 candidates as trios with significant D values (Bonferroni-corrected p-value < 0.05) between 590 populations from different rivers i.e. we retained trios for which P1 and P2 were assigned to the 591 same river by including the tree structure in Figure 1B   approximation to estimate demographic parameters from the site frequency spectrum (SFS). As 599 the SFS is sensitive to missing data, a --max-missing filter of 80% was applied to each population 600 vcf containing both monomorphic and polymorphic variants, and to remove LD, the vcf was 601 thinned at an interval of 20kb using vcftools [93]. For each HP-LP population pair, we generated 602 a folded two-dimensional frequency spectrum (2dSFS) using the minor allele frequency, which 603 were generated via projections that maximised the number of segregating sites using easySFS 604 (https://github.com/isaacovercast/easySFS). for each population pair and the run with the lowest delta likelihood was used as input for 620 bootstrapping by simulating 100 SFS. We report the median and 95% confidence intervals for 621 NE and probabilities of migration as provided by bootstrapping.

623
Scans for selection 624 We used three approaches to scan the genome for signatures of selection between HP-LP 625 populations. We first estimated XtX (a Bayesian approximation of FST) within each river using 626 Baypass [58], which has the advantage of including a genetic covariance matrix to account for 627 some demographic variation. Genetic covariance matrices were estimated using LD-pruned 628 (plink --indep-pairwise 50 5 0.2) VCFs for each river, and averaged over 10 independent runs. 629 We determined a significance threshold within each river by simulating neutral XtX of a POD VCF for SVs marked as imprecise; <1kb in size; less read pair support (SU) than the per-river 685 median; without both paired-end and split-read support (PE | SR == 0). Breakdancer was run 686 per river, per chromosome, with results filtered for SVs <1kb in size; less support than per river, 687 per chromosome median support; quality < 99. We calculated FST using smoove VCFs within 688 each river to explore structural variants that may have diverged within rivers. To validate SVs of 689 interest, we plotted all bam files within a river using samplot [99] and visualised regions in igv.

691
Phylogenetic relationships of haplotypes 692 To examine phylogenetic relationships of the CL haplotype, we calculated maximum likelihood 693 trees among homozygote haplogroups using RAxML-NG (v0.9.0; [100]). We limited this analysis 694 to the CL-AP region, which was subsumed within the larger CL haplotype in Guanapo and 695 Tacarigua and included the strongest region of HP-LP association (CL-AP region). Haplogroups 696 were defined on the basis of PC1 clusters ( Figure S9). We retained a random haplotype from 697 homozygote individuals from all populations and constructed a tree using the GTR + Gamma 698 model with bootstrap support added (500 trees) with a cut-off of 3%.

700
To compare against other regions of genome, we randomly sampled 50 regions of the genome 701 of 100kb for the same individuals, repeated the above method for tree calculations, and