Natural Selection and Genetic Diversity in the Butterfly Heliconius melpomene

A combination of selective and neutral evolutionary forces shape patterns of genetic diversity in nature. Among the insects, most previous analyses of the roles of drift and selection in shaping variation across the genome have focused on the genus Drosophila. A more complete understanding of these forces will come from analyzing other taxa that differ in population demography and other aspects of biology. We have analyzed diversity and signatures of selection in the neotropical Heliconius butterflies using resequenced genomes from 58 wild-caught individuals of Heliconius melpomene and another 21 resequenced genomes representing 11 related species. By comparing intraspecific diversity and interspecific divergence, we estimate that 31% of amino acid substitutions between Heliconius species are adaptive. Diversity at putatively neutral sites is negatively correlated with the local density of coding sites as well as nonsynonymous substitutions and positively correlated with recombination rate, indicating widespread linked selection. This process also manifests in significantly reduced diversity on longer chromosomes, consistent with lower recombination rates. Although hitchhiking around beneficial nonsynonymous mutations has significantly shaped genetic variation in H. melpomene, evidence for strong selective sweeps is limited overall. We did however identify two regions where distinct haplotypes have swept in different populations, leading to increased population differentiation. On the whole, our study suggests that positive selection is less pervasive in these butterflies as compared to fruit flies, a fact that curiously results in very similar levels of neutral diversity in these very different insects.

The solution to Lewontin's paradox appears to lie in the influence of natural selection on linked sites. Selection acting on one locus can cause the removal of genetic variation at physically linked, neutral loci. This can occur either through fixation of beneficial alleles ("hitchhiking") (Maynard Smith and Haigh 1974) or by purging of deleterious alleles ("background selection") (Charlesworth et al. 1993). Both of these processes have more pronounced effects in genomic regions of lower recombination rate. The importance of linked selection is supported by a positive correlation between recombination rate and neutral genetic diversity, first and most thoroughly studied in Drosophila melanogaster (Begun and Aquadro 1992;Langley et al. 2012;Mackay et al. 2012;McGaugh et al. 2012;Campos et al. 2014), and subsequently observed in other taxa, including humans (Nachman et al. 1998;Payseur and Nachman 2002;McVicker et al. 2009;Lohmueller et al. 2011), yeast (Cutter and Moses 2011), mice, rabbits (Nachman and Payseur 2012), and chickens . A major recent advance has come from a population genomic analysis of 40 species, which showed not only that this phenomenon is widespread in plants and animals, but importantly, that the effectiveness of selection at removing variation at linked sites is correlated with population size . The increased effectiveness of natural selection in larger populations reduces genetic diversity to a much greater extent than in smaller populations, countering to some degree the reduced influence of genetic drift.
However, this correlative evidence fails to capture the complexities of how natural selection acts in different species. A range of factors will affect the efficiency of natural selection and how strongly it influences linked sites, including the recombinational landscape across the genome, the frequency of adaptive change, and historical population demography (Cutter and Payseur 2013). These factors vary enormously between species, and in-depth analyses of an increasing number of taxa have revealed that not all conform to the same general trends. For example, certain plant species do not show a correlation between recombination and neutral polymorphism (see Cutter and Payseur 2013 for a thorough review). In the insects, most of what we have learned about the action of selection and drift in natural populations comes from studies of the genus Drosophila (Andolfatto 2007;Sella et al. 2009;McGaugh et al. 2012;Campos et al. 2014;Comeron 2014;Lee et al. 2014). For example, in Drosophila simulans, genetic diversity is strongly reduced in the vicinity of recent nonsynonymous substitutions, indicative of strong hitchhiking around beneficial mutations Lee et al. 2014). This contrasts with a more subtle pattern in humans, where a reduction in diversity around functional substitutions is only detectable after accounting for background selection (Hernandez et al. 2011;Enard et al. 2014). It remains to be seen whether the rampant selection seen in Drosophila spp. is typical of insects.
Here we investigate the action of selection and other evolutionary forces in Heliconius butterflies, focusing in par-ticular on H. melpomene. This species differs from D. melanogaster in a number of ways that might influence patterns of selection across the genome. Populations of Heliconius live in tropical rainforests and are characterized by long life spans and stable populations (Ehrlich and Gilbert 1973). In addition, H. melpomene has a similar per base recombination rate to D. melanogaster, but more chromosomes (21 compared to 4), potentially allowing higher overall recombination rates. Although the ecology and evolution of this genus has been the subject of much research (reviewed by Merrill et al. 2015), including recent genomic studies of adaptation and speciation (Arias et al. 2012;Nadeau et al. 2012Nadeau et al. , 2013Kronforst et al. 2013;Supple et al. 2013), whole-genome studies of selection and within-species genetic diversity have been lacking. Data from H. melpomene was included in the recent comparative study of . However, only four individuals from a single population were considered. Here we examine in detail the action and influence of natural selection within and between populations and species using whole genome resequencing data from 59 H. melpomene individuals and an additional 21 samples from 11 related species. We first identify four large but cohesive populations and then explore genetic variation within and between populations and species, describing the footprints of various selective and neutral processes.

Materials and Methods
Mapping, genotyping, and estimation of error rates The analyzed genome sequences from 80 butterflies included both published and new data. Sample information and accession numbers are given in Supplemental Material , Table S1. The 58 wild-caught H. melpomene samples cover much of the species range and included 13 wing pattern races. We also reanalyzed sequence data from a single individual from the inbred H. melpomene reference strain (Heliconius Genome Consortium 2012). For sequences generated in this study, methods were as described by . All sequences analyzed here consisted of paired-end reads obtained by shotgun sequencing using either Illumina's Genome Analyzer IIx system or Illumina's HiSeq 2000 system, according to the manufacturer's protocol (Illumina).
Quality-filtered, paired-end sequence reads were mapped to the H. melpomene genome scaffolds (version 1.1) (Heliconius Genome Consortium 2012) using Stampy version 1.0 . Genotypes were called using the GATK version 2.7 UnifiedGenotyper . See File S1 for detailed methods. Only "high quality" genotype calls (Phred-scaled mapping quality and genotype quality $30) were used in downstream analyses. We optimized our genotype calling procedure by examining the total numbers of genotype calls and estimated error rates produced by different pipelines. The rate of false positive heterozygous genotype calls was estimated by analyzing a homozygous region in the inbred reference sample. We further examined how error rates change with increasing divergence and different read depths using simulated sequence reads generated using seq-gen  and ART . See File S1 for further details.

Analysis of phylogeny and population structure
A maximum-likelihood tree for all 80 samples was generated using only fourfold degenerate (4D) sites that had high-quality genotype calls in at least 60 samples, giving an alignment of 1.7 million bases. RAxML (Stamatakis 2006;Ott et al. 2007;Stamatakis et al. 2008) was used with the GTRGAMMA model, and 100 bootstrap replicates were performed.
We then used two approaches to identify populations that would be considered separately in downstream analyses of diversity: STRUCTURE (Pritchard et al. 2000;Falush et al. 2003), a model-based clustering method that infers the proportion of each individual's genotype made up by each of a defined number of clusters; and principle components analysis (PCA), performed using Eigenstrat SmartPCA . To minimize the influence of selection, both analyses considered only fourfold degenerate sites. Detailed methods are provided in File S1.

Site frequency spectra
We generated unfolded site frequency spectra for each H. melpomene population by counting the number of derived alleles at biallelic sites. Sites were polarized by comparison with the "silvaniform" clade species: H. hecale, H. ethilla, and H. pardalinus. To allow comparison among populations, and account for missing data, each site was randomly down-sampled to the same number of individuals. See File S1 for details.
Inference of historical population size change using pairwise sequentially Markovian coalescent program To infer changes in ancestral population sizes, we used the pairwise sequentially Markovian coalescent (PSMC) program (Li and Durbin 2011). This method fits a model of fluctuating population size by estimating the distribution of times to most recent common ancestor across a diploid genome. Twelve samples were selected a priori for PSMC analysis. These 12 were chosen because they all had similar sequencing depth, similar numbers of genotyped sites, were all male (homogametic, ZZ), and provided a good representation across the species range. Detailed methods are provided in File S1.

Window-based population parameters
Various population parameters were calculated for nonoverlapping 100-kb windows across the genome. Only windows with a sufficient number of sites genotyped in at least 50% of samples were considered. See File S1 for details. We used 100kb windows because linkage disequilibrium (LD) tends to break down almost completely within 10 kb and reaches background levels within 100 kb ( Figure S1), meaning that measures from adjacent windows would be largely free of linkage effects.
Nucleotide diversity (p) and absolute divergence (d XY ) were calculated as the average proportion of differences be-tween all pairs of sequences, either within a sample (p) or between two samples (d XY ). Sites with missing data were excluded in a pairwise manner to maximize the amount of data being considered. Tajima's D  and F ST (as in equation 9 of  were calculated using the EggLib Python module .

Estimating the rate of adaptive substitution
We estimated the genome-wide rate of adaptive substitution (a) using Messer and Petrov's asymptotic method , comparing synonymous and nonsynonymous SNPs covering 11,804 polymorphic genes (11,638 autosomal and 166 Z-linked). Polymorphism was measured in the Western population of H. melpomene, and divergence was measured between the Western population and H. erato. We calculated confidence intervals around the estimated a by performing 1000 bootstraps, in each of which 11,804 genes were resampled, with replacement. Detailed methods are described in File S1.

Multiple regression
We used multiple linear regression to model nucleotide diversity at 4D sites (p 4D ) in 100-kb windows (calculated for each population separately and then averaged). The aim was to assess the influence of selection at linked sites on diversity at neutral sites. Since linked selection is largely modulated by the number of selected sites and the extent of linkage, we included as explanatory variables the local gene density (the proportion of coding sequence per window) as proxy for the density of nearby selected sites  and local recombination rate (r), calculated from the linkage map. To account for genetic hitchhiking, we also included the number of recent nonsynonymous substitutions (D n ) in the H. melpomene lineage per window as an explanatory variable. As an alternative, and a potentially more direct indicator of adaptive substitutions, we also tested a model using summed gene-by-gene estimates of the number of adaptive nonsynonymous substitutions (a), estimated by maximum likelihood using the McDonald-Kreitman test framework . To account for mutation rate variation, the rate of synonymous substitutions per synonymous site (d S ) was also included as an explanatory variable. Lastly, we also included GC content at third codon positions to account for any effects of DNA composition. Detailed methods for the estimation of various explanatory variables and data processing for this analysis are provided in File S1.
To further investigate the interrelationships between the explanatory variables, we used principal component regression (PCR) . This approach can help to tease apart the effects of the various explanatory variables by summarizing the explanatory variables into orthogonal components, thereby accounting for multicollinearity. Regression analyses were performed with the R version 3.0.3 (https://www.R-project.org) using the pls package (Mevik and Wehrens 2007).
To assess the robustness of our findings, we investigated the influence of various modifications to the model, such as restricting the analysis to genes showing minimal codon usage bias, the exclusion of chromosome ends, and use of a different outgroup. Details are provided in File S1.
Multiple linear regression was also performed for whole chromosomes, where the response variable was the mean 4D site diversity per chromosome (p̅ 4D ). Here, rather than using recombination rate estimated from the linkage map, we used chromosome length as a proxy for recombination rate (Kaback et al. 1992;Lander et al. 2001). Thus, the five explanatory variables were as follows: chromosome length, average gene density, average synonymous substitution rate (d ̅ S ), average number of nonsynonymous substitutions per 100 kb (D ̅ n ), and average GC content. As above, p̅ 4D was square-root transformed, but in this model, none of the explanatory variables required transformation to correct for skewness. As above, all explanatory variables were Z-transformed so that their effects could be compared.

Scanning for selective sweeps
To identify candidate selective sweep locations in the Eastern and Western populations, we used SweeD (Pavlidis et al. 2013). This program is based on Sweepfinder  and uses a composite likelihood ratio (CLR) to identify loci showing a strong deviation in the site frequency spectrum toward rare variants. See File S1 for detailed methods.

Data availability
All raw sequence reads are available from the Sequence Read Archive from the National Center for Biotechnology Information. Accession numbers are provided in Table S1. Processed genotype data, along with data files underlying all results, figures and tables, and code used for model fitting are available from Data Dryad (http://dx.doi.org/10.5061/dryad. g0874). The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

Genotyping
The median depth of coverage across all samples was 283. A median of 78% of sites genome-wide, and 96% of coding sites, had high-quality genotype calls for samples of H. melpomene, its two close relatives H. cydno and H. timareta, and the "silvaniform" clade species H. hecale, H. ethilla, and H. pardalinus, which diverged from H. melpomene 3.8 million years ago (MYA) ) ( Figure S4). A few samples had considerably fewer sites genotyped, owing to poor sequence coverage (Table S1). More distant species, including H. wallacei (8.8 MYA), H. doris (9.7 MYA), and H. erato (10.5 MYA) all showed strongly reduced numbers of genotype calls (median 33%), suggesting that many reads from these species were too divergent to be mapped reliably to the H. melpomene reference. However, the number of calls obtained in coding regions showed very little drop-off with phylogenetic distance, with a median of 90% of coding sites genotyped in the most divergent outgroup, H. erato ( Figure  S4). This implies that coding regions are sufficiently conserved to allow read mapping and genotyping across all Heliconius species. Our analyses of the more distant species therefore focused only on coding regions.
We selected a genotyping pipeline that gave a false positive SNP rate of 0.03% per site (three errors in 10,000 calls), when comparing the inbred reference sample to itself (Table S2). Using simulated reads, we found that our pipeline produced higher error rates for more divergent taxa, especially when sequencing depths were low ( Figure S5). Nevertheless, for divergences below 6%, which is typical for coding sequences in this genus, and with appreciable sequencing depth, estimated error rates were still well under 0.05% (five in 10,000). As we are concerned primarily with large-scale genomic trends, with all analyses considering large numbers of sites, rare genotyping errors are unlikely to influence our conclusions.

Population structure and phylogenetic relationships
In order to focus our analyses on biologically meaningful populations, we first investigated population structure among our samples. Analysis of population structure based on 4D sites, using both PCA and STRUCTURE (Pritchard et al. 2000;Falush et al. 2003) gave largely congruent results. Both analyses identified three distinct H. melpomene clusters that were largely partitioned geographically (Figure 1, B and C). Consistent with previous studies using smaller datasets, H. melpomene samples from the eastern and western slopes of the Andes formed two strongly differentiated populations, separated by a deep phylogenetic split ( Figure 1B). The third population was made up of the samples from French Guiana. These three populations will be referred to as the "Eastern," "Western," and "Guianan" populations. The only exception to this geographic clustering was a group of five samples of H. m. melpomene from the eastern slopes of the Andes in Colombia, which formed a monophyletic clade most closely allied with the Western population. However, the STRUCTURE results suggested admixture between these and both the Eastern and Guianan populations ( Figure 1B). Although not differentiated by principal components 1 and 2 ( Figure 1C), these five Colombian samples were differentiated from the Western population by principal component 3 ( Figure  S9). Given the distinct geography and genomic composition of these samples, we made the conservative decision to consider this group as a fourth distinct population ("Colombian").
sizes, but the Eastern population showed evidence of a recent expansion. Tajima's D was consistently negative in the Eastern population, but close to zero in the Western, Colombian, and Guianan populations ( Figure 2A). Negative Tajima's D is indicative of an excess of rare variants, consistent with recent population growth. This finding was substantiated by direct examination of the unfolded site frequency spectrum (SFS). The SFS at 4D sites showed a strong excess of rare variants in the Eastern population compared to the other populations ( Figure 2B). This skew remained when only geographically proximate samples, with high sequencing coverage, were considered ( Figure S10), indicating that it was not an artifact of sampling design. The same trend was also observed at intronic and intergenic sites ( Figure S10). Compared to neutral expectations with constant population size, the other three populations displayed a weak excess of rare variants, but to a much lesser degree than the Eastern population. For the Eastern and Western populations, which were more densely sampled, we were able to compare the SFS down sampling to 20 individuals for each SNP position. This deeper  Table S1 for coordinates). Symbols indicate country of sampling; sizes indicate the number of samples from each location. Colors correspond to major clustering on the STRUCTURE plot (B). Gray shading indicates the approximate location of the Andes Mountains. (B) Compressed RaxML phylogeny based on fourfold degenerate (4D) sites. See Figure S6 for an uncompressed version. Colored bars indicate genotype cluster proportions for each sample inferred by STRUCTURE with k = 6. STRUCTURE plots for k = 5-8 are given in Figure S7, and Ln probabilities for different k values are given in Figure S8. (C) Principal component 1 plotted against principal component 2, which explained 17 and 7.7% of the variance, respectively. Colors and symbols are as in A. In A-C, the Colombian samples discussed in the text are circled in orange.
sampling reproduced the pattern, further showing that the skew was not limited to singleton variants, but also doubletons (derived alleles present twice in the sample) ( Figure  S10). While genotyping error could explain some of the excess of singleton SNPs, it is unlikely to cause the observed excess of doubletons, nor the dramatic skew seen in the Eastern population.
We further verified our hypothesis of a recent expansion in the Eastern population using the PSMC method of Li and Durbin (2011). All four populations showed similar popula-tion size histories up until 200,000 years ago, with a gradual increase in population size beginning 1 MYA and leveling off 300,000 years ago. However, while the Western, Colombian, and Guianan samples showed a subsequent decrease in the inferred N e , that of the Eastern samples rose again, roughly doubling between 100,000 and 30,000 years ago ( Figure 2C). Closer to the present (,30,000 years ago) the inferred individual histories diverged considerably, as may be expected given the dearth of information about recent demography to be gained from analysis of single genomes (Li and Durbin 2011). Nevertheless, there appears to be a tendency for the more southern samples to show greater expansion (e.g., H. m. amandus from Bolivia and H. m. aglaope from Peru). Methods more sensitive to demographic change in the recent past may be necessary to confirm this pattern. This analysis was repeated several times, varying the PSMC input parameters for block size and recombination rate. While absolute population size estimates tended to be lower at smaller block sizes, the observed trend of a population size expansion in each of the Eastern samples was consistent throughout (data not shown). As it is based on heterozygosity in single genomes, this analysis is independent of the site frequency spectrum and therefore provides an additional line of evidence for a recent expansion of the H. melpomene population to the east of the Andes. One potential caveat in this conclusion is that hybridization and gene flow may also produce patterns consistent with population growth, and gene flow is known to occur between the eastern population and H. timareta . However, there is also significant gene flow between the Western population and H. cydno, implying that hybridization alone is unlikely to explain the distinct pattern seen in the Eastern population.
Diversity and divergence across the genome Estimated neutral diversity in H. melpomene was found to be high, and comparable with that in Drosophila spp. Estimates of within-population nucleotide diversity (p) in H. melpomene made use of only those samples with at least 253 depth of coverage, because we found that levels of within-sample heterozygosity tended to be underestimated at sequencing depths below this threshold ( Figure S11). Genome-wide p, averaged over all 100-kb windows across the four populations was 1.9%, and similar when only intergenic (2.0%) or intronic (1.9%) sites were considered (Table 1; Table S3). As expected, diversity was strongly reduced at first and second codon positions (0.6%) and higher at third codon positions (1.5%). Diversity was highest at 4D sites (2.5%).
The peripheral Western and Guianan populations had significantly lower diversity than those at the center of the range ( Figure 3A) (paired Wilcoxon signed rank test, P , 2e-16). To ensure that this trend was not simply driven by population substructure among sampled individuals, we examined levels of heterozygosity within each sample. As mentioned above, this revealed that sequencing depth affected estimates of heterozygosity, but that above a depth of roughly 253, heterozygosity was consistent within each population. Considering only samples with depth of at least 253, we found that average 4D site heterozygosity in the Western samples (2.67%) was only marginally lower than that in the Colombian (2.79%) and Eastern (2.82%) samples, whereas that of the Guianan samples (2.16%) remained considerably lower than the other populations ( Figure S11).
To estimate as closely as possible the value of u = 4N e m, we recalculated average diversity at 4D sites considering only autosomal genes showing minimal codon usage bias. This gave a slightly higher value of 2.7%, ranging from 2.1 to 2.9% across the four populations (Table 2). These values are in the same range as estimated neutral diversity for D. melanogaster in Southern Africa (2%) and D. simulans (3.5%) (Begun et al. 2007;Langley et al. 2012).
Mean divergence at 4D sites between H. melpomene and H. erato, its most distant relative in the genus, was 15.8% [16% when only minimal codon usage bias (CUB) genes were considered]. A calibrated phylogeny places the split between these two species at 10.5 MYA . Assuming four generations per year, this corresponds to a neutral mutation rate of 1.9 3 10 29 per site per generation. This is about two-thirds of the spontaneous mutation rate recently estimated using whole genome sequencing of parents and offspring in H. melpomene (2.9 3 10 29 ) (Keightley et al. 2014). Using these two rates, we estimated N e (u/4m) for the four populations, which ranged from 1.8-2.8 million for the Guianan population to 2.5-3.8 million for the Colombian population (Table 2). We note that these do not represent instantaneous values, but rather aggregates over the course of the coalescent time scale. They are also based on a small subset of putatively neutral sites, making them difficult to compare with the PSMC results ( Figure 2C), as the latter are based on the whole genome data, for which the level of polymorphism is 30% lower ( Table 1). Levels of diversity at 4D sites varied considerably across individual chromosomes ( Figure 3C). This heterogeneity was strongly conserved between the three populations. Interspe-cific divergence also varied across the chromosomes, but to a lesser extent ( Figure 3D). Diversity was also reduced on the Z chromosome relative to autosomes, as expected, given its lower effective population size (N e ). This discrepancy between autosomes and Z was also present in measures of divergence between H. melpomene and its closer relatives, but disappeared at higher levels of divergence, with d XY between H. melpomene and H. erato being nearly identical for autosomes and Z (Table 1). This is consistent with a decreasing contribution of N e to coalescence time for deeper species splits.
One potential concern is that highly variable regions may have been missed due to poor read mapping, in which case diversity and divergence might be underestimated. To test for this bias, we investigated whether p and d XY were correlated with the proportion of missing data per window (measured as sites genotyped in fewer than 50% of samples). Third codon positions averaged just 2.3% missing data among the H. melpomene samples, and just 2.4% across the entire set of 79 wild samples. Neither p nor d XY were correlated with the proportion of missing data ( Figure S12, Figure S13). Hence, there is no evidence for any bias in coding regions. In intergenic regions, which averaged 23% missing data in H. melpomene samples and 25% across the whole sample set, both p and d XY were found to be weakly correlated with the proportion of missing data ( Figure S12). However, the amount of variance explained by missing data was very low (linear regression, R 2 = 0.037 and 0.009, respectively). The effect of missing data on our estimates of diversity and divergence therefore appears to be minimal. We suggest that the reduced rate of read mapping in noncoding regions in H. melpomene and its closer relatives is probably driven more by an abundance of repeats and structural variation rather than by excessively divergent sequences.

The rate of adaptive fixation
We estimated that 31% of fixed amino acid substitutions between Heliconius species are adaptive. We used Messer and Petrov's asymptotic method  to estimate a genome wide a, the proportion of nonsynonymous substitutions driven by positive selection. The exponential model showed a good fit to the data (Figure 4), and gave an estimated a of 31.0%. The 5th and 95th quantiles from 1000 bootstrap replicates were 29.0 and 33.0%, respectively. This value is roughly intermediate between estimates for humans (13%) and D. melanogaster (57%), generated using the same approach.

Selection reduces diversity at linked neutral sites
We explored the influence of selection on linked sites using a multiple linear regression approach, following several previous studies (Cutter and Moses 2011;McGaugh et al. 2012;. Our "main model" had nucleotide diversity at 4D sites (p 4D ) in 100-kb windows as the response variable and five explanatory variables: (i) local gene density, a proxy for the number of nearby selected sites; (ii) local recombination rate (r); (iii) the number of recent nonsynonymous substitutions (D n ), to account for hitchhiking around beneficial mutations; (iv) the synonymous substitution rate (d S ), a proxy for local mutation rate; and (v) GC content, to account for effects of local DNA composition ( Figure S14, Figure S15). The results for the main model are summarized in Table 3 and Table S4 and described below. There was limited serial correlation among windows. The Durbin-Watson statistic Watson 1950, 1951) for the main model was 1.3, suggesting that autocorrelation is unlikely to influence our conclusions (Field 2009). Several modifications of the model were also tested to investigate the robustness of the result. These are all summarized in Table S5, Table S6, Table S7, Table S8. Overall, results were consistent throughout, but several notable differences are described below. The main model explained 34% of the variation in p 4D (F 5,1461 = 151.1, P , 2.2e-16) and all five predictor variables were found to have significant effects (Table 3). Unsurprisingly, synonymous substitution rate (d S ) was a strong predictor of intraspecific diversity (F 1,1461 = 260.89, P , 2.2e-16), implying that at least some of the observed heterogeneity in diversity across the genome is explained by variation in mutation rate. Local gene density also showed a strong negative relationship with p 4D (F 1,1461 = 162.03, P , 2.2e-16), consistent with a considerable effect of selection at linked sites. Local recombination rate showed a positive relationship with diversity (F 1,1461 = 65.27, P , 1.357e-15), also as expected under linked selection. In addition, recombination rate was  Figure 4 Estimating the rate of adaptive substitution. Mean genomewide a was estimated using the "asymptotic" method , based on polymorphism for the Western population of H. melpomene, and divergence between the Western population and H. erato. a(x) was calculated for each derived allele frequency (x) for all x $ 0.1. The solid red line indicates the fit of the asymptotic exponential function a(x) = a + bexp(2cx), extrapolated to x = 1 (dashed red line). The solid red box indicates the range between the 5th and 95th percentiles from 1000 bootstrap samples over the 11,804 genes used.
Population Genomics of Heliconius Butterflies not correlated with d S (Pearson's R = 20.01, P , 0.648; Figure S16), implying that the positive relationship with diversity cannot be explained by a mutagenic effect of recombination. We suggest that the true relationship between recombination rate and neutral diversity may be stronger than our model predicts, as the imperfect placement of scaffolds on the Hmel1.1 linkage map limits our ability to accurately estimate local recombination rates (see below).

Evidence for genetic hitchhiking
There was also a significant negative effect of the number of nonsynonymous substitutions per window (D n ) on p 4D (F = 15.08, P , 0.0001), implying a small but detectable effect of genetic hitchhiking around adaptive substitutions. While multicollinearity among the explanatory variables was generally weak, there was a moderate correlation between gene density and D n (Pearson's R = 0.53; Figure S16). This is unsurprising, since nonsynonymous substitutions can occur only in coding sequence and are thus likely to be more common in gene-rich regions. This can make it difficult to separate the roles of hitchhiking and background selection. However, the variance inflation factors for gene density and D n in our model were low and do not suggest a significant impact of collinearity on the precision of our estimates (Table  3). Moreover, PCR analysis, which first accounts for collinearity among explanatory variables by separating them into principle components before performing a multiple regression, demonstrated an effect of gene density on diversity independent of D n ( Figure S17). One potential concern is that D n could be underestimated in highly divergent genes, where read mapping for the distant outgroup H. erato may be poor. We therefore tested a modified model in which d S and D n were estimated using only the more closely related silvaniform species as outgroups. This made little difference to the results (Table S5).
As an alternative to D n , we also tested a modified model that included maximum-likelihood estimates of a (the number of adaptive nonsynonynmous substitutions) per gene, summed across each window. This revealed a similar significant negative effect of a on p 4D (Table S6), while collinearity with gene density was considerably lower than for D n (Pearson's R = 0.27). Taken together, these finding all support a role for genetic hitchhiking in addition to background selection in shaping neutral variation in H. melpomene.
One striking difference between humans and fruit flies is that genetic hitchhiking in Drosophila spp. is pervasive enough to produce an average trough in diversity around nonsynonymous substitutions (after scaling for mutation rate variation) McGaugh et al. 2012); whereas this is not directly observable in humans (Hernandez et al. 2011). We performed an equivalent test with our data (Figure S18), and found patterns similar to those in humans, with no significant reduction of scaled diversity in the vicinity of nonsynonymous substitutions compared to synonymous substitutions. Enard et al. (2014) suggest that the effect of hitchhiking around nonsynonymous substitutions can be masked by background selection. This may be because background selection tends to be stronger in more conserved genomic regions, where adaptive substitutions are expected to be less common. This might explain the fact that we observe evidence for reduced diversity only around nonsynonymous substitutions in our multiple-regression model. Although our approach does not model the action of background selection explicitly, by including gene density and recombination rate as explanatory variables, it may account for some of the confounding influence of background selection. The relative roles of hitchhiking and background selection may be further resolved in the future by explicitly modeling these different processes .

GC content and codon usage bias
In the main model, GC content was found to be negatively correlated with p 4D (F 1,1461 = 18.98, P = 1.416e-05). This may reflect CUB, with a preference for codons ending in C or G, which would lead to elevated GC content at genes under stronger selection for codon usage . Indeed, in our analysis of codon usage, genes with a higher GC content at the third codon position tended to display stronger evidence for CUB ( Figure S5). In a modified model where p 4D was calculated using only our defined set of minimal CUB genes, GC content became a nonsignificant predictor of diversity, whereas effect sizes for all other explanatory variables were similar (Table S7).

Effect of chromosome ends
We last confirmed that the observed patterns are not predominantly driven by chromosome ends. A model excluding windows in the outer 5% of chromosomes was similar to the main model (Table S8). Generally, effect sizes and P-values were lower, but this may partly reflect the 10% reduction in the number of observations. Interestingly, GC content was no longer a significant predictor of diversity. This might imply that patterns of codon usage change toward the chromosome ends, but this will require further investigation.
The relationship between gene density and neutral diversity is clearly visible Given the large effect of gene density on diversity at 4D sites, we further explored this relationship visually ( Figure 5). There was a clear trend of lower p 4D in gene-rich regions, but also a conspicuous increase in variance in regions of lower gene density ( Figure 5A). This is most likely caused by the smaller number of 4D sites available in such regions, resulting in fewer data and therefore increased noise. We were able to account for this issue in our multiple linear regression model by weighting residuals according to the number of data points available per window, and overall the model showed minimal violation of assumptions ( Figure S14, Figure  S15). On several chromosomes, the correlation between gene density and p 4D was remarkably clear ( Figure 5B).

Longer chromosomes are less polymorphic
There was a strong negative relationship between 4D site diversity and chromosome length (in bases) (Table 4), further supporting the pervasive role of linked selection in shaping genetic diversity in H. melpomene. Long chromosomes tend to have lower recombination rates per base pair (Kaback et al. 1992;Lander et al. 2001;Kawakami et al. 2014), which should lead to stronger linked selection. Although a considerable number of scaffolds in the H. melpomene v1.1 genome are not properly positioned and oriented on a chromosome, the chromosomal assignment could be inferred for almost all large scaffolds (83% of the genome in terms of bases) (Heliconius Genome Consortium 2012), making for fairly robust estimates of chromosome length. We used a multiple regression model similar to that used for 100-kb windows above, but here averaging all parameters over each of the 20 autosomes, and with chromosome length used as a proxy for recombination rate. This model explained 73.34% of the variation in average chromosomal diversity at 4D sites, p̅ 4D (F 5,14 = 7.7, P = 0.0011) ( Table 4). There was a strong negative relationship between p̅ 4D and chromosome length.
Comparing models with and without chromosome length as an explanatory variable, we found that the model including chromosome length had a far better fit to the data (F 1,15 = 20.15, P , 0.0005). Hence, long chromosomes tend to be less variable at neutral sites than short chromosomes, and this trend was clear upon visual inspection ( Figure 6A). As in the window-based model, p̅ 4D was positively correlated with average synonymous substitution rate, d ̅ S (F 1,14 = 9.85, P = 0.0073), but there was no significant relationship between d ̅ S and chromosome length (Pearson's r = 0.08, P = 0.7289) ( Figure 6B), reinforcing our finding that mutation rates are not correlated with recombination rate. The simplest explanation for this pattern is therefore that linked selection drives patterns of diversity not only among small windows, but also among whole chromosomes.
The only other noteworthy correlation with chromosome length was a negative relationship with GC content. We hypothesized that this skew in base composition may be driven by stronger CUB on shorter chromosomes. This would be expected if higher recombination rates on shorter chromosomes allowed for more efficient selection by reducing interference among sites (Hill and Robertson 1966;Felsenstein 1974). Consistent with this hypothesis, the proportion of genes identified as having nonnegligible CUB was negatively correlated with chromosome length (Spearman's r = 20.555, d.f. = 19, P = 0.009) ( Figure S19). Points are shaded according to the number of 4D sites in the window that had genotype calls for at least 50% of analyzed samples. A loess (locally weighted smoothing, span = 0.5) curve with 99% confidence intervals is shown in red. (B) Plots of p 4D (black) and gene density (orange) across chromosomes 10, 12, and 19, which all showed a visually striking correlation. Note that the gene-density axis is inverted and adjusted to aid comparison between the lines. Both lines are loess smoothed with a span equivalent to 4 Mb.

Geographically restricted selective sweeps
We investigated whether there were signatures of strong, recent selective sweeps in H. melpomene and also whether sweeps tended to be geographically restricted to a particular population. We scanned the genome for putative selective sweep signals using SweeD (Pavlidis et al. 2013), which uses the CLR method of  to identify loci displaying a strong skew in the SFS toward rare variants in comparison with the genomic background. A number of regions throughout the genome had outlying CLR values, above a threshold determined using neutral simulations ( Figure  S20). Most of these were restricted to the Eastern population, with only one being restricted to the Western population, and one region with partially overlapping outliers in both populations. Given the excess of outliers in the Eastern population, most of which are represented by just single windows, it seems likely that most of these signals are spurious, perhaps reflecting increased variance in the SFS caused by a less stable demographic history. Only two loci had strong putative sweeps indicated by clusters of outlying CLR values. On chromosome 11, outliers in the Eastern and Western populations roughly overlapped ( Figure 7A), consistent with a beneficial allele spreading across both populations. On chromosome 12, the cluster of outlying CLR values was restricted to the Eastern population ( Figure S20). We investigated patterns of diversity within and divergence between populations at these two loci in finer detail. The results were very similar for both candidate sweep loci, so we present the results for chromosome 11 in Figure 7, and those for chromosome 12 in Figure S21.
At the putative sweep locus on chromosome 11, nucleotide diversity (p) in both the Eastern and Western populations was reduced ( Figure 7B). Absolute divergence between the two populations (d XY ) was similarly reduced ( Figure 7B). However, the fixation index, F ST , between the Eastern and Western populations was strongly elevated in this region ( Figure 7C). This implies that the region carries a low overall amount of genetic variation, but that the variation that is present constitutes fixed or nearly fixed differences between the two populations (Charlesworth 1998). This is consistent with a scenario where the alleles that swept to high frequency in the two populations were not identical, but similar, and potentially carried the same beneficial allele (Bierne 2010). The putative sweep locus on chromosome 12 showed very much the same pattern ( Figure S21).
To further explore the genetic make-up of these regions, we visualized the genotypes of individuals from all four populations at a sample of 600 highly polymorphic biallelic SNPs across the scaffold containing the putative sweep locus. Both scaffolds had regions in which heterozygous genotypes were clearly reduced, and fixed differences between the Eastern and Western populations strongly increased. In both cases, there were also several fixed differences between the Eastern and Guianan populations, but no fixed differences between the Western and Colombian populations among the 600 sampled SNPs. We note that by focusing on highly polymorphic SNPs, we highlight differentiation between the populations and fail to show how much of the region is shared, which must be considerable, given the reduced d XY . Nevertheless, this visualization confirms our hypothesis that distinct alleles reached high frequency in the different populations. The presence of some heterozygous sites in the sweep regions indicates that these may be fairly ancient sweeps and/or that no single allele fixed in each population (i.e., a "soft sweep"). One notable observation is the presence of long runs of heterozygous genotypes in certain individuals. This is also consistent with a soft sweep, but may alternatively reflect gene flow subsequent to the sweep, leading to long haplotypes introgressing between the populations.
Both putative sweep locations contained multiple annotated genes. All genes in these two regions that gave significant BLAST hits to D. melanogaster proteins (18 and 13 genes, respectively) are listed in Table S9. Due to the large number of potential targets of selection, we do not speculate here as to the adaptive significance of these events.

Discussion
The extent to which evolutionary change is a result of neutral or selective forces remains one of the long-standing questions in evolutionary biology. Genomic data permit powerful tests of the various forces that shape genetic variation within and between species, but such studies have only recently been extended beyond a few well-studied taxa. We examined a large number of whole genome sequences to investigate the forces shaping diversity in Heliconius butterflies. Levels of neutral diversity in H. melpomene are similar to those in Southern African populations of D. melanogaster, suggesting comparable effective population sizes. However, actual census population sizes of fruit flies must at times reach numbers far greater than those of Heliconius butterflies, which are characterized by low-density, stable populations (Ehrlich and Gilbert 1973). This paradox of divergent demography but similar diversity is partly explained by the impact of selection on linked sites. Rampant selection across the compact Drosophila genome leads to a dramatic reduction in diversity at linked sites. By contrast, our results suggest that selection is less pervasive in Heliconius, and its influence on linked sites, though significant, is less pronounced. Our findings suggest that positive selection plays an important, but less prominent role in Heliconius. We estimate that 31% of amino acid substitutions between H. melpomene and H. erato were driven by positive selection. This contrasts with an estimated rate of adaptive substitution in D. melanogaster of 57%, made using the same method . Although our multiple-regression model indicated that genetic hitchhiking has had a significant effect on neutral variation, we did not detect the same strong reduction in diversity around nonsynonymous substitutions seen in Drosophila spp. McGaugh et al. 2012). One possible explanation is that positive selection more often targets noncoding regulatory changes, as appears to be the case in humans (Enard et al. 2014). However, even more general signatures of recent selective sweeps in the form of strong skews in the site frequency spectrum were limited. It is important to note that the amount of adaptive evolution depends not just on the efficacy of selection, but also on how often novel, adaptive phenotypes arise. This depends both on the changeability of the fitness landscape and the availability of adaptive variation. Populations of fruit flies may occasionally reach extremely high densities, increasing the potential for selection to detect advantageous mutations (Barton 2010). The relevant population size for adaptive evolution could therefore be much higher than that estimated from neutral variation, especially in species with highly fluctuating populations.
Certain aspects of Heliconius biology might also obscure the footprints of positive selection when it does occur. Barriers to dispersal, such as the Andes Mountains, can reduce the signature of hitchhiking by slowing the progression of sweeps (Barton 2000;Kim 2013). Indeed, at the two putative sweep loci investigated, similar but distinct alleles appear to have swept in the Eastern and Western populations. This is consistent with a scenario where a globally beneficial allele recombined onto a different genetic background as it spread, which would not only soften the sweep signal, but also enhance population differentiation (Slatkin and Wiehe 1998;Bierne 2010). The source of beneficial variation is another important factor, as adaptation from standing or introgressed variation, can result in soft sweeps (Pennings and Hermisson 2006). One likely example is the repeated evolution of certain wing pattern forms. Despite the strong selection known to act upon wing-patterning loci (Mallet and Barton 1989), signatures of selective sweeps at pattern loci have not been observed (Baxter et al. 2010;Nadeau et al. 2012). While the present study was not designed to address this question, all Western population samples shared a red forewing band, controlled by the B locus on chromosome 18 (Baxter et al. 2010); and yet no significant sweep signal was detected at this locus. It appears that wing patterning frequently evolves by sharing of preexisting alleles between populations, and even between species through rare hybridization (Pardo-Diaz et al. 2012;Heliconius Genome Consortium 2012;Wallbank et al. 2016). The presence of variation among these old alleles might eliminate any signature of genetic hitchhiking (Pennings and Hermisson 2006). It is yet to be established whether adaptation from standing and introgressed variation is generally common in Heliconius, but studies in other systems are increasingly suggesting an important role for preexisting adaptive variation in evolution Gosset et al. 2014;Roesti et al. 2014). Despite the limited influence of hard selective sweeps, the genomic landscape of variation in H. melpomene has been shaped significantly by selection on linked sites. Diversity at neutral sites is positively correlated with recombination rate and negatively correlated with local gene density, which is a good proxy for the density of both coding and noncoding functional elements. These trends are consistent with a pervasive influence of selection on linked sites and are similar to patterns in several other animals (Cutter and Payseur 2013;. There is no evidence that recombination rate affects the mutation rate, in agreement with findings in Drosophila (McGaugh et al. 2012) and humans (McVicker et al. 2009). The effects of linked selection are also visible at the whole-chromosome scale. Longer chromosomes are less polymorphic, presumably because they have lower recombination rates per base pair, leading to stronger effects of linked selection on average. A negative relationship between chromosome length and recombination rate has been observed in a range of taxa (Kaback et al. 1992;Lander et al. 2001;Kawakami et al. 2014), and probably stems from a requirement for at least one obligate crossover event per chromosome during meiosis, even on the smallest chromosomes (Kawakami et al. 2014). Increased recombination rates on smaller chromosomes would also be expected to lead to more efficient purifying selection, due to reduced interference among loci (Hill and Robertson 1966;Felsenstein 1974). Indeed, smaller chromosomes display greater evidence of codon usage bias, a phenomenon probably driven by fairly weak selection. This is akin to the observation in D. melanogaster of reduced codon usage bias in regions of minimal recombination (Kliman and Hey 1993).
Further studies of other taxa are necessary to build a complete picture of how selection shapes genetic variation in natural populations. Nevertheless, even a simple comparison between butterflies and fruit flies can be enlightening. The different biology of these two insect groups results in (D) Individual genotypes at 600 biallelic SNPs on scaffold HE672079, which harbors the putative selective sweep. Homozygous genotypes are colored gray (major allele) and black (minor allele), and heterozygotes are colored red. To optimize the detection of differences between populations, SNPs with a high degree of polymorphism (minor allele frequency $0.25) were considered. The 600 SNPs plotted were sampled semirandomly, ensuring that no two sampled SNPs were .1000 bp apart. Protein coding genes are indicated below the plot, with exons shown in black. distinct patterns of adaptive evolution. However, the lower effective population sizes in Heliconius combined with less influence of selection on linked sites leads to levels of neutral diversity similar to those in Drosophila spp. Indeed,  estimate that the impact of selection on linked sites is around three times greater in fruit flies. Although it has long been recognized that levels of neutral variation are not solely determined by population size, whole genome studies such as this are beginning to reveal in detail how different processes combine to shape genetic diversity.

Acknowledgments
We thank the handling editor, David Begun, and two anonymous reviewers, whose comprehensive comments greatly improved this manuscript. Nick Barton and Aylwyn Scally provided helpful comments on an earlier draft. We thank John Davey for advice on the recombination rate analyses and for useful discussions of the results. James Walters provided advice for the analysis of the rate of adaptive substitution, and some of the groundwork for this analysis was performed by Gabriel Jamie and Joseph Harvey. We also thank James Mallet and Lawrence Gilbert for thought-provoking discussions of the life history of Heliconius. Finally, we thank Jenny Barna for computing support.

Fig. S1 Linkage disequilibrium (r 2 ) plotted by distance between SNP pairs
LD was calculated for all SNP pairs with minor allele counts of at least 5 on the top 100 largest scaffolds. SNP pairs were binned by distance in bins of logarithmically increasing size. Dashed lines indicate background r 2 , calculated between unlinked SNPs on different chromosomes.

Fig. S2 Heterozygosity in inbred and outbred individuals
Proportion of heterozygous genotypes in the inbred reference genome individual, melP.HGC1 (solid line), and from a wild-caught sample from the same source population in Panama, melP.CJ18097 (dashed). Lines are smoothed using loess (local regression), for each chromosome with a span equivalent to 3 Mb. The 21 chromosomes are shaded. Each chromosome was constructed by concatenating scaffolds according to their placement in the H. melpomene genome v1.1 linkage map.

Fig. S3 Codon Usage
Each point represents a gene. The effective number of codons (Nc) is plotted against GC content at the third codon position. The expected value for Nc in the absence of codon usage bias is given by the blue line ). Points below the line indicate genes with lower codon variability than expected without codon usage bias. The red line indicates 95% of the null expectation, which formed our cut-off for classifying a gense as having low codon usage bias.

Fig. S4 Number of genotyped sites per individual
Horizontal bars show the number of sites (bottom axis) with high-quality genotypes per individual, considering either all sites (grey) or only coding sites (red). The top axis shows the percentage of sites genotyped, out of 273 Mb for the whole genome (black), or the 16.3 Mb of coding sites (red).
Although the more distantly related species have far fewer sites genotyped overall, they have a good proportion of coding sites genotyped. Some individuals have low numbers of genotypes called overall and among coding sites, suggesting that these are low-coverage or low-quality samples.

Fig. S5. Estimated error rates based on simulated sequence data
The number of incorrect genotype calls per site over 1 million sites.

Fig. S6 Whole genome ML tree
A. Sampling locations of the 58 wild H. melpomene samples (see Table S1 for coordinates). Symbols indicate country of sampling, sizes indicate the number of samples from a location. Colours correspond to the four major populations as described in the main text. Grey shading indicates the approximate location of the Andes mountains. B. RaxML phylogeny based on four-fold degenerate

Fig. S7 Cluster assignments inferred using STRUCTURE
The runs with the highest probability out of five replicates for each k value are shown (k=3 and k=4 are not shown as they gave low probabilities). Each individual is assigned a probability of falling into each of the k clusters, as indicated by colours.

Fig. S8 Mean estimated Ln Probabilities of the data according to STRUCTURE
Five runs for each value of k from 3 to 8 were executed in STRUCTURE. The estimated mean and standard deviation of the Ln Likelihoods for each value of k, as estimated using STRUCTURE HARVESTER (Earl and vonHoldt 2012), are shown.

Fig. S11 Effect of sequencing depth on heterozygosity
Mean individual heterozygosity plotted against sequencing depth (fold coverage). Dashed lines indicate local regression smother, with span=1. Populations are plotted separately. Insert: boxplots of heterozygosity for all 100kb windows for each population, averaged over all samples with sequencing depth of at least 25x.

Fig. S12 Effects of missing data on measures of diversity and divergence
A. Diversity (π) in H. melpomene at third codon positions in 100 kb windows, plotted against the proportion of missing data per window (number of third codon positions that were genotyped in fewer than 50% of individuals). B. As in A, except divergence (dXY) between H. melpomene and H. erato is plotted against the proportion of missing data. C. As in A, except using intergenic sites. D. as in B, except using intergenic sites, and dXY is measured between H. melpomene and the silvaniform species, as H. erato was too divergent at intergenic sites. In all plots, the slope of a linear regression is shown, along with the P-value and r 2 for the linear model.     Scaled diversity (π/dXY) for four-fold degenerate (4D) sites, binned in 100 bp bins according to their distance from the nearest substitution. Red points indicate values for each bin when sites were binned by distance from non-synonymous substitutions, with the moving average (loess, span = 0.5) indicated by the red line. The solid blue line indicates the moving average when sites were binned according to their distance from synonymous substitutions and averaged over 100 bootstraps. The dashed blue line indicates the 5% quantile from the 100 bootstraps. There is no detectable reduction in scaled diversity around non-synonymous substitutions that occurred over the shorter period (B). There appears to be a slight reduction around substitutions that occurred over the longer period (D), but neither are significantly below the 95% confidence threshold.

Fig. S19. Relationship between codon usage bias and chromosome length
The percentage of genes on each chromosome showing evidence of non-trivial codon usage bias (CUB), plotted against the estimated chromosome length. Chromosomes are labelled and the Z chromosome is indicated in gray. Linear regression lines are shown for all chromosomes (grey) or autosomes only (black).

Fig. S20. CLR values from SweeD (A)
Chromosomes are shown with scaffolds shaded light and dark. Vertical lines indicate CLR values for the Western (red) and Eastern (blue) populations. Points indicate values above the cut-off value of 34, defined by analysis of simulated data.

Fig. S21. A putative selective sweep on Chromosome 12
A. Composite likelihood ratio (CLR) values calculated by SweeD (Pavlidis, 2014) for the Eastern and Western populations, for 1000 windows across chromosome 12. Scaffolds are shaded light and dark. B. Nucleotide diversity (π) for the Eastern and Western populations (in colour) and divergence (dXY) between these two populations (black dashed line), calculated for 50 kb sliding windows across chromosome 11, sliding in increments of 10 kb. C. FST between the Eastern and Western populations, calculated in windows as in B. D. Individual genotypes at 600 biallelic SNPs on scaffold HE671629, which harbours the putative selective sweep. Homozygous genotypes are coloured grey (major allele) and black (minor allele), and heterozygoyes are coloured red. To optimize the detection of differences between populations, SNPs with a high degree of polymorphism (minor allele frequency ≥ 0.25) were considered. The 600 SNPs plotted were sampled semi-randomly, ensuring that no two sampled SNPs were more than 1000 bp apart. Protein coding genes are indicated below the plot, with exons shown in black.

Mapping and genotyping
Reads were mapped using Stampy v1.0

Estimation of genotyping error rates
Although we lacked Sanger-sequencing data for any of the sampled individuals, we were able to estimate the error rate in our genotyping pipeline using two separate approaches. We estimated the rate of false-positive SNP calls by examining genotypes of an individual from the inbred (five generations) H. melpomene reference genome strain. We found that this individual (melP.HGC1) had large tracts of homozygosity (Fig. S2). In particular the whole of chromosome 2 appears to be homozygous. We therefore took the proportion of heterozygous genotype calls on scaffolds that mapped to chromosome 2 as a proxy for the false-positive SNP discovery rate.
As the above approach is only informative about genotyping of samples very closely related to the reference, we also estimated genotyping error rates using simulated reads representing a range of divergences and sequencing depths. First, we used seq-gen  to simulate a 1Mb reference sequences with a GC content of 32%, along with diploid sequences (2% heterozygosity) for other taxa at increasing levels of divergence (2, 4, 6, 8 and 10%). Paired end reads of 100 bp were simulated for each taxon using ART , with an error profile mimicking that of the Illumina HiSeq 2000. For each taxon, we simulated four sets of reads, which could be combined to produce mappings of 10x, 20x, 30x and 40x. We then used our genotyping pipeline to call genotypes and compared these to the true genotype for each taxon.

Site annotation and codon usage
The Heliconius melpomene reference genome v1.1 annotation (The Heliconius Genome Consortium 2012) was used to identify sites in the following classes: intergenic, intronic and codon positions one, two and three. CpG islands were identified using the CpGcluster Perl script (Hackenberg et al. 2006), and these regions were excluded from the above classes.
We next identified all fourfold degenerate (4D) sites in the genome as putative sites that experience little or no selection. To be defined as 4D, a third codon position had to meet two requirements: The first and second positions in the codon had to be invariable across all 80 samples, and there had to be no change to the encoded amino acid with any of the four possible bases in the third position (using the standard genetic code). This conservative approach identified 1,868,350 4D sites (12% of coding sequence, 0.7% of the whole genome).
Codon usage was assessed using the method of Wright , which compares the effective number of codons, Nc, against the GC content at third codon positions for each gene (Fig.   S3). Both values were estimated using the program codonW (http://codonw.sourceforge.net). In order to account for the effects of CUB in our downstream analyses, we defined a set of genes with minimal CUB, which deviated less than 5% from a neutral expectation for codon usage (Fig. S3), which constituted 7721 genes (61% of the total gene set).

Assigning scaffolds to chromosomes
Scaffolds were assigned to chromosomes based on the RAD-seq linkage map of the Heliconius melpomene genome v1.1 (The Heliconius Genome Consortium 2012), which has ~83% of the genome assigned to chromosomes. As many Z-linked scaffolds were not identified in that study, a specific assignment of scaffolds to the Z chromosome based on comparative read depth in males and females was performed as part of the study of  and is provided as supplementary material therein. This procedure identified a large number of additional Z-linked scaffolds, and also several miss-assembled scaffolds that were Z/autosome chimeras. Using the most likely breakpoints, Z-linked regions were removed from autosomal scaffolds as were autosomal regions from Z-linked scaffolds. For plots across chromosomes, scaffolds were arranged according to the Heliconius melpomene genome (v1.1) linkage map (The Heliconius Genome Consortium 2012), after correcting for the several Z/autosome chimeric scaffolds, as well as manually correcting the orientation of several scaffold pairs based on mate-paired sequences.

Estimating the extent of linkage disequilibrium
Linkage disequilibrium (LD) was estimated for Eastern and Western populations using all pairs of biallelic sites with high-quality genotype calls for at least 50 of the 58 H. melpomene samples and a minor allele count of at least 5. We estimated r 2 using the maximum likelihood estimator of Clayton and Leung (Clayton and Leung 2007), implemented in the R package snpstats, which does not require phased haplotypes. To investigate how LD breaks down with distance, r 2 values were binned according to distance in logarithmically increasing bin sizes, which accounts for small numbers of SNP pairs at large distances. The top 100 longest scaffolds were analysed, and only SNP pairs on the same scaffold were considered. To obtain an estimate of background LD between unlinked sites, subsets of 500 SNPs were randomly selected and r 2 was estimated for all pairs for which the two SNPs were on separate chromosomes. This procedure was repeated 100 times and the mean was taken.

Analysis of population structure
We ran STRUCTURE for the 58 wild H. melpomene samples, together with the four H. cydno and four H. timareta samples, and tested k values from 3 to 8. Admixture between clusters was allowed, and correlated allele frequencies between clusters were assumed. Each run consisted of a burn-in of 10,000 iterations followed by another 10,000 generations, and five runs were performed for each k value to check consistency. To minimise the effects of direct selection, only four-fold degenerate sites were considered. Sites were required to be genotyped in at least 50 of the 66 analysed samples, and to have a minor allele count of at least two.
The second method for population structure analysis was a Principle Components Analysis (PCA), performed using Eigenstrat . PCA is a method to simplify correlated multidimensional data, and can therefore reduce DNA sequence data for many SNPs to a small number of principal components that capture most of the information about the relationships among the sequences. The same filtered SNP dataset used for STRUCTURE was used, except that the H.
cydno and H. timareta samples were excluded. Eigenstrat incorporates chromosomal information, but assumes that data is from humans. The 20 Heliconius autosomes were therefore labelled as human chromosomes 1-20, and the Heliconius Z chromosome as human chromosome 23 (X).

Site frequency spectra
Unfolded site frequency spectra were generated by counting the number of derived alleles at each site in each H. melpomene population. Sites were polarised using the four silvaniform outgroup samples, and only biallelic sites where the silvaniforms were fixed for one of the two alleles segregating in H. melpomene were considered, with said allele designated as ancestral. Site frequency spectra can only be compared between samples of the same size. The smallest population sample was for the Colombian population, with just five diploid individuals. To compare all of the populations, frequency spectra were estimated by sampling five individuals to represent each site in each population. Sites with fewer than five samples genotyped were ignored. To compare just the Eastern and Western populations, which had more samples, 20 individuals were sampled per site.
Because there were females (heterogametic, ZW) in the dataset, only autosomes were considered so that all individuals were diploid at all sites.

Inference of ancestral population size using PSMC
Beginning with binary sequence alignment map (bam) files, we followed the authors' suggestions for genotyping using Samtools (Li et al. 2009), which involved a quality cut-off of 20 and depth cut-offs of 1 third (minimum) and two thirds (maximum) of mean depth for each sample. PSMC was run with 25 iterations, with 29 interval parameters spread over 58 time intervals (with the command flag -p "28*2+3+5"). A generation time of 0.25 years and a mutation rate of 2x10 -9 was used. A number of different block sizes were considered, but reported results used the default of 100 bp.
Window-based population parameters: Various population parameters were calculated for non-overlapping 100 kb windows across the genome. Only windows with a sufficient number of sites genotyped in at least 50% of samples were considered. This number depended on the site classes being considered; all site classes: 10 000; intergenic: 6 000; intronic: 5 000; codon positions 1, 2 or 3: 500; 4D sites: 250; 4D sites in low-CUB genes: 150. We used 100 kb windows because linkage disequilibrium (LD) tends to break down almost completely within 10 kb, and reaches background levels within 100 kb (Fig. S1), meaning that measures from adjacent windows would be largely free of linkage effects.
Nucleotide diversity (π) and absolute divergence (dXY), were calculated as the average proportion of differences between all pairs of sequences, either within a sample (π) or between two samples (dXY). We used custom-written functions that ignored missing data in a pair-wise manner to maximise the amount of data being considered. Tajima's D  and FST (as in equation 9 of ) were calculated using the EggLib Python module (De Mita and Siol 2012).
Poor mapping and genotyping in highly variable regions could lead to underestimates of diversity and divergence. If the most variable parts of the genome are clustered, then we might expect windows with large amounts of missing data to also have higher levels of diversity and divergence at the sites that were genotyped. We therefore tested whether observed diversity and divergence estimates were correlated with the proportion of missing data per window (measured as the proportion of sites genotyped in fewer than 50% of individuals). We tested both third codon positions and intergenic sites.

Estimating the genome-wide rate of adaptive substitution
We estimated the genome-wide rate of adaptive substitution (α) using Messer and Petrov's asymptotic method . Briefly, α estimates the excess of between-species divergence at non-synonymous sites relative to synonymous sites, by comparison with the ratio of within-species polymorphism at these two site classes (McDonald and Kreitman 1991;Fay et al. 2001). The asymptotic method accounts for a number of potential confounding factors, including linked selection and segregating deleterious variation, by considering polymorphisms at each possible derived allele frequency separately, and fitting an exponential curve to the resulting estimates, the asymptote of which should approximate the true α  we fitted an asymptotic curve to the estimated α values for all derived allele frequencies greater than 0.1.

The rate of synonymous and non-synonymous substitutions
We used Codeml from the PAML package (v. 4.8) (Yang 1997(Yang , 2007 to estimate ω, the ratio of synonymous substitutions per synonymous site to non-synonymous substitutions per nonsynonymous site (dN/dS). We used a five-species input tree, including one species from each of the five major clades, with the topology (((melpomene,hecale),wallacei),hecuba,erato), following our maximum-likelihood phylogeny (Fig. 1B) and the genus-wide multilocus tree of Kozak et al.

Rates of adaptive substitution for individual genes
We also estimated a, the number of adaptive non-synonymous substitutions, for individual protein-coding genes, using the maximum likelihood method implemented in the program Mktest (v2) . Default parameters were used except that f, which determines the strength of purifying selection, and a was allowed to vary among loci. Polymorphism was calculated for the Western population, and divergence was calculated between the Western samples and the four H. erato samples. To minimize missing data, three samples with fewer than 82% of coding sites genotyped (compared to >90% in all other samples) were excluded. Missing genotypes in the remaining samples were substituted with the major allele. To account for the presence of mildlydeleterious alleles contributing to non-synonymous polymorphism, we imposed a minor allele frequency cut-off, implemented by converting minor alleles occurring below this frequency to the major allele. We tested a range of cutoffs and selected a value of 0.24, which appeared to eliminate most segregating deleterious variation (data not shown). Nevertheless, it has been shown that imposing an arbitrary cut-off may still fail to exclude all segregating deleterious variants (Charlesworth and Eyre-Walker 2008;. However, we are primarily interested in the variation among genes and not the absolute value of a.

Local recombination rate estimation
We estimated local recombination rates using the Heliconius melpomene genome (v1.1) linkage map (The Heliconius Genome Consortium 2012). We first corrected for the several Z/autosome chimeric scaffolds as mentioned above, and manually corrected the orientation of several scaffold pairs based on mate-paired sequences. We used local regression (loess) to fit physical distance to marker distance for each chromosome, with a span equivalent to 5 Mb. The recombination rate for each window was then estimated by taking the gradient of the smoothed curve (i.e. bp per CM) between the start and end points of each window. Corbett-Detig et al.  recently used a similar approach to estimate recombination rates from the same map, but using different smoothing approaches. We tested their estimated recombination rates in our multiple regression model, and the result was nearly identical (data not shown). We therefore only report results from analyses using our estimates.

Multiple Regression
We used multiple linear regression to model nucleotide diversity at 4D sites (π4D) in 100 kb windows (calculated for each population separately and then averaged).The following explanatory variables were included: local gene density (the proportion of coding sequence per window), local recombination rate (rr ) across the window, the number of recent non-synonymous substitutions (Dn) in the H. melpomene lineage per window, the rate of synonymous substitutions per synonymous site (dS) and GC-content at third codon positions.
To reduce noise caused by a lack of data in some windows, only windows with at least 250 4D sites genotyped in at least 50% of individuals were considered. Windows overlapping scaffold edges by more than 50% were discarded. Only windows on autosomal scaffolds were considered.
This left 1467 windows for fitting the model. Because these still varied in the amount of available data, residuals were weighted according to the number of analysed 4D sites with genotype calls for at least 50% of individuals.
All variables included in the multiple regression were transformed to reduce skewness in their distributions. The response variable, π4D, was square-root transformed. Explanatory variables dS and Dn and a were square-root transformed, GC-content was log10 transformed and gene density was logit transformed. To allow for comparison of effects, all explanatory variables were then also Ztransformed. Residual analysis of the model revealed no indication that model assumptions were violated.
To further investigate the interrelationships between the explanatory variables, we used principal component regression (PCR) . This approaches can help to tease apart the effects of the various explanatory variables by summarizing the explanatory variables into orthogonal components, thereby accounting for multi-collinearity. Regression analyses were performed with the R version 3.0.3 (https://www.R-project.org) using the pls package (Mevik BH, Wehrens R 2013).
To assess the robustness of our findings, we investigated the influence of various modifications to the model. Firstly, because selection for codon usage can affect diversity at 4D sites, we also tested models in which the response variable, π4D, was calculated only using genes showing minimal evidence of codon usage bias. Chromosome ends may have reduced recombination rates, and could strongly influence the results of our analysis, so we also tested the model excluding windows in the outer 5% of each chromosome. An additional concern was that missing data in highly divergent genes might bias the detection of adaptive non-synonymous substitutions downwards. This would reduce the effect of missing data, but could add noise due to the smaller time-scale being assessed. Lastly, as mentioned above, we tested a model with the estimated gene-by-gene number of adaptive non-synonymous substitutions (a, summed for all genes per window) used in place of Dn to account for the effects of hitchhiking. Due to the large amounts of noise in these genic a estimates, the distribution had long tails, and windows below the 10th and above the 90th percentiles for a were excluded from this model.

Testing for reduced diversity around non-synonymous substitutions
In addition to the multiple regression model, we also tested for a directly observable reduction in diversity around non-synonymous substitutions, after accounting for mutation rate variation. Recent substitutions unique to the H. melpomene lineage were identified as sites where at least two outgroup lineages carried the same (ancestral) allele, but where H. melpomene was fixed for a derived allele. Scaled diversity at 4D sites (π/dXY) was calculated as a function of distance from the nearest substitution. To this end, each 4D site in the genome was binned according to its distance from the nearest non-synonymous substitution, in bins of 50 bp, up to a distance of 50 kb.
Nucleotide diversity (π) in the Eastern and Western populations of H. melpomene, and absolute divergence (dXY) between these two populations and H. erato, were calculated for each 4D site and averaged in each bin. Because fixed substitutions may be more common in low-diversity regions, we might expect reduced diversity near the site of substitutions regardless of whether hitchhiking has occurred. To account for this potential bias, we compared scaled diversity binned by distance from non-synonymous substitutions to that binned by distance from synonymous substitution, using a bootstrapping procedure similar to that of . In each bootstrap replicate, a subset of synonymous substitutions, equal in size to the set of non-synonymous substitutions, was sampled with replacement, and scaled diversity was calculated in bins according to their distance from the nearest substitution in this subset. This procedure was repeated 100 times so that a confidence interval could be calculated.

Scanning for selective sweeps with SweeD
SweeD calculates a composite likelihood ratio (CLR) by comparing the site frequency spectra for individual blocks of sequence to that for the entire region. Therefore, sequences for whole chromosomes were produced by concatenating all scaffolds that mapped to each chromosome in the inferred order as described above. Although in many cases several scaffolds map to the same location, and their orientations are often unknown, this should not dramatically affect the method, as each window is considered independently and only a small subset of windows will cross scaffold boundaries. SweeD was run for each chromosome with a grid size of 1000 blocks. Sites were polarised (i.e. unfolded), where possible, by identifying the ancestral state based on the four silvaniform outgroups. If the outgroups were not fixed for a single allele, the site was designated as folded.
To select a threshold CLR value for identifying blocks with a significantly skewed frequency spectrum, we simulated sequence data for analysis in SweeD. For both the Eastern and Western populations, 100 simulated datasets, of 1 Mb each were simulated using ms (Hudson 2002). Theta (θ) values of 1.7% and 1.8% were used for the two respective populations, and a population recombination rate of 0.1% was used for both. We tested simulations based on the inferred population size histories from the PSMC analysis (see above) and with constant population size.
The constant population size runs gave higher, more conservative threshold CLR values (data not shown), as was found by Nielsen et al. . The 99% quantile for the two populations were similar, so we used the higher value, 34, as the CLR cut-off for both analyses.