Estimating the Effective Population Size from Temporal Allele Frequency Changes in Experimental Evolution

The effective population size (Ne) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term Ne. They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to Ne. Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of Ne, which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate Ne estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide Ne estimates, we extend our method using a recursive partitioning approach to estimate Ne locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their Ne estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.

D URING experimental evolution studies, populations are maintained under specific laboratory conditions (Kawecki et al. 2012;Long et al. 2015;Schlötterer et al. 2015). In sexually reproducing organisms, the census population size is typically kept fixed at fairly low numbers, rarely exceeding 2000 individuals. With such small population sizes, genetic drift causes stochastic fluctuations in allele frequencies. Under neutrality, the level of random frequency changes is determined by the effec-tive population size (N e ) (Wright 1931). Furthermore, the efficacy of selection is influenced by N e : For weakly selected alleles, the probability of fixation is directly proportional to the product of N e and the intensity of selection (Fisher 1930;Kimura 1964). As changes in allele frequency are greatly affected by the population size, it is fundamental to estimate N e accurately to understand molecular variation in experimental evolution studies. Krimbas and Tsakas (1971) estimated N e using the standardized variance of allele frequency (F, see also Falconer and Mackay 1996) from longitudinal samples in natural populations of olive flies. As F was calculated from these samples, they accounted for the sampling variance that also contributed to the true allele frequency variance. This approach was further improved and used by several authors Pollak 1983;Waples 1989;Jorde and Ryman 2007).
With the widespread availability of powerful computers, also maximum-likelihood-based methods became popular (Williamson and Slatkin 1999;Anderson et al. 2000; Wang approaches discussed above. Although these methods show less bias than the moment-based approaches (Wang 2001), they are still computationally demanding, in particular for the large numbers of markers typically obtained with novel sequencing technologies (Foll et al. 2015).
Estimating N e with temporal methods requires samples collected at least at two time points. Alternative methods that use only a single time point are based on linkage disequilibrium (LD) (Hill 1981;Waples andDo 2008, 2010;Waples and England 2011), heterozygote excess (Pudovkin et al. 1996), molecular coancestry (Nomura 2008), sibship frequencies (Wang 2009(Wang , 2013, or combinations of summary statistics using approximate Bayesian computation (Tallmon et al. 2008). LD-based methods are widespread but require haplotype or unphased diploid genotype information, which limits their applicability.
Although the cost for sequencing has dropped considerably, the separate sequencing of thousands of individuals in replicate populations in experimental evolution studies is still out of reach. Sequencing samples in pools (Pool-seq) can provide a cost-effective alternative . Pool-seq has also been shown to outperform individual-based sequencing in estimating allele frequencies and inferring population genetic parameters under several conditions (Futschik and Schlötterer 2010;Zhu et al. 2012;Gautier et al. 2013). For these reasons, Pool-seq has become the basis of many experimental evolution "evolve and resequence" (E&R) studies (Turner et al. 2011;Schlötterer et al. 2015). Following the emergence of E&R, many population genetic estimators have been adjusted to handle the properties of Pool-seq data (Futschik and Schlötterer 2010;Kofler et al. 2011a,b;Kolaczkowski et al. 2011;Boitard et al. 2013;Ferretti et al. 2013). To the best of our knowledge, no N e estimators have been developed so far that properly deal with Pool-seq data.
In this article, we present a novel temporal method to estimate N e from pooled samples. We show that previously proposed estimators can lead to substantial bias, as they neglect the variance component due to pooled sequencing. We introduce a new model accounting for the two-step sampling process associated with Pool-seq data. In the first sampling step individuals are drawn from the population to create pooled DNA samples. In the second step, pooled sequencing is modeled as binomial sampling of reads out of the DNA pool. We show on simulated data that our method outperforms standard temporal N e estimators. For real data, we also suggest to use a segmentation algorithm, to partition the genome-wide sequence data into stretches of DNA with significantly different N e estimates. Finally, we present an application to a genome-wide experimental evolution data set from Drosophila melanogaster .

Materials and Methods
Sampling schemes Nei and Tajima (1981) investigated the sampling properties of temporal N e estimators and proposed two different sam-pling schemes. Under the first scheme (plan I), individuals are either sampled after reproduction or returned to the population after their genotypes have been examined. In contrast, under the second scheme (plan II) sampling takes place before reproduction and the sampled individuals are permanently removed from the population and their genotypes do not contribute to the next generation. By assuming different sampling distributions, they derived separate N e estimators under sampling plans I and II. Waples (1989) unified the calculations under the two plans by assuming binomial sampling out of an infinitely large parental gamete pool for both sampling schemes. He concluded that the measure of variance under the two sampling plans differs only in a covariance term. For plan I, there is a positive correlation between allele frequencies sampled t generations apart because they are both derived from the same population at generation 0. In contrast, for plan II, the initial sample and individuals contributing to the next generation can be considered as independent binomial samples; thus sample frequencies at generations 0 and t are uncorrelated.
For a typical E&R study, outbred experimental populations are created by mixing a large number of inbred lines (e.g., Turner and Miller 2012;Huang et al. 2014;Franssen et al. 2015). The populations are then propagated under the desired experimental conditions while keeping the census size of the population controlled through time ( Figure 1). However, the experimenter has no direct influence on the effective population size, which is in general lower than the census size. In E&R studies with Drosophila, the census size rarely exceeds some hundreds of individuals, and sampling usually takes place after reproduction according to plan I. For organisms maintained at larger sizes, such as yeast, the sample for genetic analysis is not returned to the population (Burke et al. 2014). Plan II applies to such cases.
In E&R studies, sampled individuals are often pooled together for DNA extraction . The size of the pool can be as large as the whole population. Depending on the experimental design, it is also possible that only a fraction of the population is sequenced, for instance, only females Franssen et al. 2015). Pooled individuals are used to create DNA libraries, which are, in turn, subjected to high-throughput sequencing.
We consider two separate sampling steps when estimating N e from Pool-seq samples (Figure 1). In the first step, we model the sampling of individuals out of the population. This can take place according to either plan I or plan II. In the second step, we model the sequencing of a DNA pool by drawing reads at random with replacement from the firststep sample. The allele frequency variance inferred from the sample is corrected for the additional variance coming from the two-step sampling and used for estimating N e :

Notation
We assume that the experimental population is propagated at a constant census size N and that N $ N e : We use genomewide single-nucleotide polymorphism (SNP) data sampled t generations apart to estimate N e ( Figure 1) and denote the estimated effective population size by b N e : Multiallelic sites in populations with low mutation rates, such as Drosophila, exist but are rare and likely to be sequencing errors (Burke et al. 2010). Therefore we consider only biallelic SNPs at n polymorphic sites. At each site i (i ¼ 1 . . . n) the true population allele frequency is denoted by p ij at time T ¼ j; where j 2 f0; tg: To obtain allele frequency estimates for an unknown p ij ; the population is subjected to sampling. We consider two sampling steps (Figure 1). At T ¼ j; we first sample S j individuals out of the population to create a pooled DNA library for sequencing. Note that the number of sampled individuals is constant over the n sites, and therefore the index i is omitted here. Sampling individuals can take place according to either plan I or plan II, as described above (also shown in Supplemental Material, Figure  S1). As the second sampling step, we model Pool-seq by drawing R ij reads out of the pooled DNA sample at each site i (i ¼ 1 . . . n). This allows for variation in sequence coverage. Below we derive the variance in allele frequency for a given site. To keep notation simple, we omit again the index i and denote the unknown sample allele frequency among the S 0 individuals at the first sampling time point (T ¼ 0) by x and the subsequent allele frequency estimate obtained via pool sequencing from R 0 reads byx: Similarly, at some T ¼ t; the respective frequencies are denoted by y andŷ: Note that under pool sequencing onlyx andŷ are observed.
Estimating N e from temporal allele frequency changes Under neutral Wright-Fisher evolution the variance in allele frequency (s 2 p ) generated by drift after t generations at a single locus in a diploid population is well described by the expression (1) where p is the starting allele frequency (Falconer and Mackay 1996). Wright (1931) denoted the standardized variance by F ¼ s 2 p =pð1 2 pÞ; which leads to a convenient closed-form expression for N e : Furthermore, if N e is large enough, F 1 2 e 2t=2N e and N e can be calculated as The relation between N e and allele frequency changes described in Equation 1 was first used by Krimbas and Tsakas (1971) in natural populations of olive flies. They estimated the variance using where x k and y k (k ¼ 1; . . . ; a) are the observed allele frequencies in the samples collected t generations apart and a is the number of alleles at a specific locus. To eliminate the contribution of sampling errors to the variance, the total variance F a was corrected for the random sampling noise by simply subtracting the corresponding variance. This approach was further investigated and developed by a number of authors (Pamilo and Varvio-Aho 1980;Nei and Tajima 1981;Pollak 1983;Waples 1989). Possible sources of bias in N e estimators were later investigated by Jorde and Ryman (2007). The authors pointed out that the expectation over F is typically approximated by taking the expected values separately for the numerator and the denominator (Turner et al. 2001). They suggested a different weighting scheme of alleles leading to an alternative lessbiased estimator to measure temporal frequency change.

Correction for two-step sampling
We consider a random-mating population with discrete generations. Neutral evolution is assumed with no selection, migration, and mutation. Samples are drawn from the population at generations T ¼ 0 and t. Throughout the derivation we consider diploid populations, and therefore a sample of S j individuals leads to 2S j sequences at times T ¼ j 2 f0; tg: Sampling is assumed to be binomial with parameters 2S j and p j (Waples 1989). In the second sampling step at time T ¼ j; sequencing a random pool R j of reads is also modeled as binomial sampling.
In analogy to Jorde and Ryman (2007), we use the following expression as our measure of the temporal change in allele frequency for biallelic sites, Figure 1 Two-step sampling in experimental evolution with Drosophila. In E&R studies, populations are propagated at a census size N defined by the experimenter, which is in general larger than the effective population size N e : Using temporal methods, N e can be estimated from the variance in allele frequency between samples taken t generations apart. To get an accurate representation of allele frequencies in population genetic studies, a large number of individuals S j ( j 2 f0; tg) are sampled and pooled. Sampling can take place according to sampling plan I or II based on the mode of reproduction. Pooled samples are then subjected to high-throughput sequencing. Sequenced reads are subsequently aligned to the reference genome (shown at the bottom). We represent pool sequencing by an additional sampling step (called sampling step 2). We correct for both sampling steps when estimating N e in pooled samples. Additionally, we take into account variable coverage levels across the genome (coverage R ij for site i at T ¼ j; j 2 f0; tg) when correcting for the variance coming from sequencing.
whereẑ ¼ ðx þŷÞ=2: The expectation of F c for a single biallelic locus is approximated by EðF c Þ Eðx2ŷÞ 2 Eðẑ 2xŷÞ ¼ VarðxÞ þ VarðŷÞ 2 2Covðx;ŷÞ Eðb z 2 b xb yÞ : For both plans, we derive expressions for the numerator and denominator in Equation 5 separately under the two-step sampling procedure, described above. Here we summarize our main conclusions; details on the derivation are provided in File S1. With C j :¼ 1=2S j þ 1=R j 2 1=2S j R j for j 2 f0; tg; and p denoting the true population allele frequency in the gamete pool at generation 0, we obtain and Note that Equations 6 and 7 differ only in the correction term C j from that in Waples (1989). Waples (1989) previously showed that the denominator in Equation 5 reduces to Eðẑ 2xŷÞ ¼ pð1 2 pÞ 2 Covðx;ŷÞ: For plan II, Covðx;ŷÞ ¼ 0 (Waples 1989), and F c corrected for the noise coming from the two-step sampling for a single locus is given by For plan I, on the other hand, the sample allele frequency at generation 0 is positively correlated to the sample allele frequency at t because both are derived from the same population at generation 0. This requires us to calculate the sample covariance Covðx;ŷÞ in Equation 5. It turns out (see File S1 for details) that the covariance ofx andŷ is equal to where N is the census size of the population at generation 0. Equation 10 is in agreement with the corresponding term of the standard methods (Waples 1989). Substituting the inferred covariance into Equation 5 leads to the following corrected variance estimate, F9 c for plan I We provide the corresponding formulas of F9 c in haploid populations in File S1. With Pool-seq data, randomness in sequencing and local structures in the genome can lead to different coverage across marker sites, which we denote by R ij for site i (i ¼ 1; . . . n) and time j ( j 2 f0; tg). In the genome-wide data set, we calculate F9 c across n SNPs by summing over all loci in the numerator and denominator separately before carrying out the division in Equation 9, leading to the following weighting scheme for plan II: Similarly, F9 c can be calculated for plan I using Equations 4 and 11. Analogous to the single-locus case, our proposed estimators b N e ðPÞ for a diploid population are obtained by plugging F9 c into Equation 2.
Long time series have recently become available for some E&R experiments (Barrick et al. 2009;Burke et al. 2010Burke et al. , 2014. Standard N e estimators (Krimbas and Tsakas 1971;Nei and Tajima 1981;Waples 1989) assume a small number of generations ðtÞ and approximate N e using 2N e t=F: If, however, t=N e is larger, using this approximation can lead to severe bias ( Figure S2). To avoid such a bias, we use Equation 2 to estimate N e :

Simulations
We evaluate the performance of our estimator on data simulated under the neutral Wright-Fisher model. With a given population size of 2N e ; we simulate the frequency trajectory of n independent SNPs. As we focus on biallelic SNPs, we assume two possible nucleotides (alleles) to be present in the population with given frequencies at the start. To create a new generation, nucleotides are drawn independently at random with a probability given by their respective allele frequencies in the old generation. The population is propagated at a constant size of 2N e for t nonoverlapping generations. The effective population size is then estimated from allele frequencies inferred from Pool-seq samples taken from the population at the start and after t generations. The sampling of individuals to create the pooled DNA library is simulated by using sampling without replacement. To model the uneven coverage of genome-wide sequence data, we simulate a random coverage for each site, using a Poisson distribution with parameter equal to the given mean coverage. For every position, reads are then generated by binomial sampling from the library with sample size equal to the local coverage.
We assess the performance of our estimator for various combinations of N e ; pool size, coverage, number of SNPs, and distribution of starting allele frequencies. Additional to these parameters, we also test how the ratio between census and effective population size (r ¼ N=N e ) affects the accuracy of the proposed estimator. For this purpose, we increment the population size to a desired level of N in each generation while keeping the allele frequencies unchanged to avoid introducing additional sampling variance. We simulated each scenario 100 times.
Linkage disequilibrium between loci can reduce the number of independent SNPs, thereby increasing the variance of the estimate. The impact of dependence between SNPs is investigated based on 10 replicated whole-genome forward simulations with recombination, using the software tool MimicrEE ( Allele counts are subjected to binomial sampling to mimic Pool-seq with a given sequence coverage. N e is estimated in nonoverlapping windows, each containing a fixed number of SNPs.
Estimating N e on simulated data We denote our estimator corrected for the additional sampling step, i.e., pooling, by N e ðPÞ: We compare N e ðPÞ to the standard estimators N e ðWÞ and N e ðJRÞ proposed by Waples (1989) and Jorde and Ryman (2007) that correct only for a single sampling step.
We illustrate experimental sampling procedures by considering two major scenarios: (i) The full population is sequenced as one large pool and (ii) only a subset of the population is used to create pooled samples. Under scenario (i) we simulate only a single binomial sampling step to represent sampling reads out of the DNA pool. The pool size is set to be equal to the census size of the population (S j ¼ N), and the number of sampled reads (R ij ) represents the per-site coverage. For estimators that correct only for a single sampling step, we use the coverage (R ij ) as the sample size. For scenario (ii), the sampled individuals (S j ) and the read num-ber (R ij ) represent the pool size and coverage for N e ðPÞ: The coverage (R ij ) is taken as the sample size for the N e ðWÞ and N e ðJRÞ estimators, as these methods consider only one sampling step.

Change point inference for genome-wide estimates
The effect of genetic drift on the variance in allele frequency specified in Equation 1 holds only under the assumptions of Wright-Fisher neutral evolution. Deviations from the Wright-Fisher model, such as the presence of selection or demography, may cause systematically different changes in allele frequency, affecting the variance and causing locally variable patterns in genetic diversity. Furthermore, the effect of selection on one site of the genome may cause changes in the behavior of variants at nearby sites (Maynard Smith and Haigh 1974;Barton 2000;Comeron et al. 2008). As a result, the estimates of N e at different locations of the genome may deviate from the true number of breeding individuals in the population (Kimura and Crow 1963;Charlesworth 2009). For example, regions under background selection are associated with reduced b N e values that extend to linked sites due to the Hill-Robertson effect (Charlesworth 1996(Charlesworth , 2012aComeron et al. 2008). Similarly, selectively favorable alleles can also drag nearby neutral sites to high frequency (Maynard Smith and Haigh 1974), causing a local reduction in the estimated N e (Liu and Mittler 2008). Such an event is also known as a selective sweep (Berry et al. 1991). On the other hand, we expect the opposite pattern, i.e., a local elevation of b N e for types of selection such as balancing selection that maintain variation in the genome (Baysal et al. 2007;Charlesworth 2009).
To detect such patterns in b N e ; we apply a segmentation algorithm to partition the genome into locally homogeneous b N e stretches. We use a method related to an approach suggested by Futschik et al. (2014) for partitioning DNA sequences with respect to GC content. It is based on a statistical multiscale criterion and provides statistical error control, in the sense Figure 2 Effective population size estimated with different methods. Sixty generations of Wright-Fisher neutral evolution with N e ¼ 100 diploid individuals were simulated for n = 2000 unlinked loci (SNPs). Prior to sampling, the population was increased to a census size of N ¼ 500 individuals at each generation. At the starting population and at each indicated time point a sample was taken to create a pool of S ¼ 100 individuals. The pool was sequenced to an average coverage of R ¼ 50 and N e was estimated on the resulting data set by separately contrasting allele frequencies at generation 0 to each of the evolved generations denoted on the x-axis, using N e ðPÞ; N e ðWÞ (Waples 1989), and N e ðJRÞ . Each box represents results from 100 simulations with identical parameters. The dashed gray line shows the true value of N e : Data are simulated under plan I assumptions and the results of plan I and II estimators are shown in the left and right panels, respectively.
that the estimated number of windows will not exceed the true one except for a small error probability a to be specified by the user. With our N e estimates, we use a criterion proposed by Frick et al. (2014) for normally distributed responses. It is implemented as part of the R package stepR (Frick et al. 2014). By using simulations with selection we also illustrate that this method is able to capture the signal of locally variable b N e along the chromosome.

Data availability
We estimated N e in an E&R study with D. melaongaster, published in Orozco-terWengel et al. (2012) and Franssen et al. (2015). Pool-seq read libraries from these studies are available at the European Sequence Read Archive at http://www.ebi.ac. uk/ena/ under accession nos. ERP001290 and ERS460611-ERS460613.

Results and Discussion
Two-step correction is vital to avoid large bias in b N e with Pool-seq data Methods that do not correct for the additional sampling step caused by pooling can lead to substantial bias in b N e as illustrated in Figure 2. Using simulated data, we compare our proposed estimator N e ðPÞ to two commonly used estimators N e ðWÞ (Waples 1989) and N e ðJRÞ (Jorde and Ryman 2007) that provide highly accurate estimates when only a single sampling event is simulated ( Figure S3). Figure 2 shows that the additional correction substantially decreases the bias for almost all scenarios (see also Figure S4, Figure S5, and Figure  S6). Under plan I, N e ðPÞ is nearly unbiased. The plan II version of the estimator has a slight upward bias when applied on data simulated under plan I, if the samples are taken at very close time points.
As an alternative approach, we also estimated N e separately for each locus, using F9 c in Equations 9 and 11. We then calculated the effective population size across the n loci as the harmonic mean over the single-locus b N e estimates ( b N * e ðPÞ) (Peel et al. 2013). In our simulations, the harmonic mean estimator shows an accuracy similar to that of the original b N e ðPÞ ( Figure S7). However, for t lying in the midrange of the simulated interval (t ¼ 15-40), b N * e ðPÞ is slightly more biased under plan I.
Because of the additional sampling variance, both N e ðWÞ and N e ðJRÞ have a downward bias in particular for small t. Furthermore, N e ðWÞ is upwardly biased for larger values of t, probably reflecting that alleles closer to fixation or loss are contributing less to the variance (Waples 1989). The drift variance accumulates with an increasing number of generations, while the sampling variance stays constant, making the initial bias of N e ðJRÞ less pronounced for larger t. When samples are collected only a few generations apart, the variance of N e ðPÞ estimators tends to be larger than that of N e ðWÞ and N e ðJRÞ under both plans.
Plan I and II estimators differ by a factor resulting from the covariance between the sample frequencies at generations 0 and t (Equation 10), which is inversely proportional to the census population size. Consequently, the difference between plans I and II becomes smaller for increasing N. Waples (1989) investigated how the ratio between census and effective population size (r ¼ N=N e ) affects the accuracy of the estimators and concluded that the ratio of r $ 2 is sufficient to reach similar estimates for both sampling schemes. We tested the performance of N e ðPÞ on simulated data with N e ¼ 100 and N : N e ratios of r ¼ 1; 2; 5 with different coverages and pool sizes ( Figure S4, Figure S5, and Figure S6). When N ¼ N e ; the N e ðPÞ plan I method achieves highly accurate estimates for all time points in contrast to the other methods ( Figure S4). If, however, the N e ðPÞ plan II estimator is applied to data simulated under plan I, we observe an upward bias for small t, which improves with an increasing number of generations. This pattern is not unexpected since the missing covariance term becomes less influential in view of the increasing drift variance after several generations. When the entire population is sequenced as a single pool (S ¼ 100), the plan II estimators of Waples (1989) and Jorde and Ryman (2007) perform similarly to the N e ðPÞ plan I estimator because the correction for pooling in N e ðPÞ cancels out the additional covariance term when S ¼ N; making the term used as F approximately identical to that of N e ðJRÞ: This is a general pattern irrespective of r.
For r $ 2; N e ðPÞ plan I remains highly accurate ( Figure S5 and Figure S6). Furthermore, when increasing the census size under a constant N e (equivalent to increasing r), the covariance between sample allele frequencies decreases, making the difference between plans I and II almost negligible (Waples 1989). The sampling variance becomes proportionally smaller compared to the drift variance with an increasing number of generations between the samples. This improves our ability to accurately estimate N e : Correcting for the additional variance inherent to Pool-seq leads to an improved performance of N e ðPÞ compared to the standard methods for both plans. In general, with Pool-seq data the extent of the bias of the N e ðWÞ and N e ðJRÞ estimates depends on the ratio between N and S, smaller sample sizes (S) leading to a larger bias. As we accounted for the sequencing step with these estimators (Estimating N e on simulated data), decreasing the coverage at a given pool size does not change the bias much but rather increases the variance of the estimators.
In most of the experimental studies the investigator has control over the census population size; thus requiring the knowledge of N for N e ðPÞ plan I does not restrict the analysis. We illustrate the performance of N e ðPÞ plan I only when N e ¼ N in the main text, but according to Figure S5 and Figure S6, N e ðPÞ plan I is also highly accurate when r $ 2: We show the coefficient of variation (CV) of the N e ðPÞ plan I estimator in Figure 3. The CV is defined as the ratio between the standard deviation and the mean (CV ¼ŝ=m; where botĥ s andm are estimated from the sample). It measures the relative dispersion of the distribution of the estimated values. N e ðPÞ estimators are highly precise in nearly all cases, except when the drift variance is negligible compared to the sampling variance (Figure 3; see also Figure S9 and Figure S11 where N e ¼ 1000; t , 30; S # 100; and R ¼ 50). The bias is coming from a few outlier estimates, but the median shows more robust results ( Figure S13). For plan II estimators, the behavior of the method is similar ( Figure S8, Figure S10, Figure S12, and Figure S14). Note that the simulations underlying Figure S8, Figure S10, Figure S12, and Figure S14 have been done under plan I.

Increasing the number of SNPs reduces the variance of N e (P)
We test how the number of loci (n) used to infer N e affects the accuracy and the precision of the estimates by gradually increasing the number of independent SNPs from 100 to 10,000 (Figure 4). We observe a larger variance and a slight downward bias for a small number of SNPs (100 SNPs). Both the bias and the variance become smaller with a larger  number of SNPs. Some further improvement is obtained when .10,000 SNPs are used (not shown), but the benefit of additional independent SNPs levels off. We conclude that n ¼ 2000 SNPs usually provide sufficient precision and accuracy. However, when linkage disequilibrium is present in a genome-wide data set, the number of truly independent SNPs per window is reduced and a larger number of loci is recommended.
A skewed starting allele frequency distribution only moderately increases the variance of N e (P) In natural populations, the neutral site frequency spectrum is skewed toward allele frequencies close to the boundaries. N e ðPÞ uses a weighting scheme that is not very sensitive to this skew (see also Jorde and Ryman 2007). This makes it robust with respect to the shape of the starting allele frequency distribution. We illustrate this with a simulated data set having a starting allele frequency distribution that is skewed toward low-and high-frequency variants (Beta(0.2, 0.2)) as expected under neutrality. The estimates of N e from such data sets are compared to simulated data with matching parameters but uniform starting allele frequency distribution ( Figure 5). We observe a very slight upward bias with neutral starting allele frequencies compared to uniform and a moderate increase in the variance given t $ 15: With an increasing number of generations, the difference becomes negligible.
The presence of linkage disequilibrium does not have a large effect on the precision of N e (P) We investigated the sensitivity of our estimator to linkage disequilibrium between loci, using genome-wide neutral simulations with recombination . We simulated data with three different rates of recombination: high, normal, and no recombination. For the first case, the recombination rate is set to mimic the behavior of almost independent SNPs. In the normal recombination rate scenario, we use D. melanogaster recombination rates (Fiston-Lavier et al. 2010). The effective population size was estimated in nonoverlapping windows with a fixed number of n ¼ 10; 000 SNPs ( Figure 6). Different levels of linkage disequilibrium affect the number of independent loci per window. Nevertheless, we observe only a slight increase in the precision of the N e estimates with increasing recombination rate ( Figure 6).

Heterogeneous b
N e along the genome in an E&R study with D. melanogaster We estimated N e in a recent E&R study with D. melanogaster (Orozco-terWengel et al. 2012;Franssen et al. 2015). In this experiment replicate populations of 1000 individuals were subjected to a fluctuating hot environment for 59 generations. Allele frequency estimates were obtained for founder and evolved populations, using Pool-seq. N e was estimated based on the allele frequency changes between founder and latest evolved populations in nonoverlapping windows containing 10,000 SNPs, using N e ðPÞ under plan I. To determine the number of DNA stretches with different b N e along the genome, we use a segmentation algorithm provided in the software tool by Frick et al. (2014). This method requires homogeneity of variances. Since the variance of estimates increases with N e ; the estimates were log-transformed before applying the partitioning procedure. The obtained step function was back-transformed to the original scale and is shown for three biological replicates (Figure 7).
The mean and the median estimates for each chromosome arm as well as across the genome are stable across replicates (see Table 1 and Table S1). As experimental evolution studies often aim to find signals that are consistent across replicates, this can be an important check of the experimental setup. On the other hand, we see differences between chromosome arms. For example, the mean is clearly lower for 3R, emphasizing the added value of spatial analysis compared to genome-wide estimates.
In D. melanogaster b N e ranges between 100 and 400. Around the centromere of chromosome 2, the estimated N e decreases by two-thirds in replicates 1 and 3, which is in agreement with the expectation of low diversity and, as a consequence, low N e in regions with reduced recombination (Begun and Aquadro 1992;Presgraves 2005;Haddrill et al. 2007;Campos et al. 2012). Furthermore, b N e is low on the entire chromosome arm 3R and also on parts of 3L. Overall, these patterns can be attributed to strong LD, caused either by low recombination rates around the centromeres (Chan et al. 2012) or by segregating inversions (Kapun et al. 2014) in combination with selection potentially on rare variants. The reduction in b N e is also well captured by the segmentation algorithm (Figure 7), which shows a similar pattern when applied on simulated data with selection ( Figure S15). These The effect of linkage disequilibrium on our estimator was evaluated based on a whole-genome forward simulation with recombination using the software MimicrEE . Three sets of simulations were performed with different rates of recombination: high, normal, and no recombination. For each parameter setup, a genome-wide simulation is replicated 10 times. The effective population size was estimated with N e ðPÞ (plan I) in nonoverlapping windows of n = 10,000 SNPs for each replicate. The box plots show the distribution of N e estimates across replicates and windows.
results are consistent with those of Tobler et al. (2014), who observed a massive amount of outlier SNPs around the centromere of chromosome 2 and on 3R. Interestingly, certain regions of the genome show extensive differences in b N e between the replicates, which might be reflecting different selection histories or differences in demography, such as replicatespecific bottlenecks.
b N e may also vary as a result of differences in the modes of transmission of different components in the genome. For example, on the X chromosome, N e is equal to three-quarters of the autosomal population size Charlesworth 2006, 2009). Interestingly, our estimates in the E&R experiment do not reflect this expectation of reduced effective population size. Instead, we estimate N e to be as high as b N e on the autosomes. Unequal sex ratio between males and females can be a source of such a pattern (Charlesworth 2009); however, unbalanced sex ratio has not been reported in this experiment. Another possible explanation for increased b N e on the X can be the presence of background selection as suggested by Charlesworth (2012b). He argues that because of the lack of recombination in male Drosophila, the effect of background selection is more effective on the autosomes than on the X chromosome. Orozco-terWengel et al. (2012) reported differences in the number of putatively selected sites between the X and autosomes. They found that candidate SNPs were underrepresented on the X. Their selection scan identifies signatures of deviation from neutral expectation, which is also reflected in the reduction in b N e on the autosomes, indicating higher selection pressure.

Recommendations for genome-wide data sets
Most of the methods proposed previously are not designed for genome-wide high-density SNP data sets. However, the method of Jorde and Ryman (2007) was successfully used for genome-wide data by Foll et al. (2014). Reed et al. (2014) also used a similar approach to estimate N e for wholegenome data, using sliding windows. We estimated N e in windows with a fixed number of SNPs. Using windows of fixed lengths in base pairs would affect the variance of the estimator (Figure 4) but does not distort the mean. All these approaches, however, do not account for the ruggedness of the recombination landscape and can lead to windows with different levels of linkage disequilibrium in them. To overcome this problem it would be possible to define windows based on recombination distance. Unfortunately, the lack of haplotype information in the Pool-seq data makes it difficult to infer linkage disequilibrium. One way to infer linkage information from pooled sequence data is provided by the software LDx (Feder et al. 2012). For model organisms such as Drosophila, readily available recombination maps can also be used as a proxy (Przeworski et al. 2001;Kulathinal et al. 2008;Fiston-Lavier et al. 2010). If only a single genome-wide N e estimate N e from an E&R study with D. melanogaster. N e is estimated based on the allele frequency changes between founder and evolved populations at generation 59 . In the top panel, we show genomewide estimates calculated with N e ðPÞ (plan I), using N ¼ 1000 as census size and S ¼ 500 as pool size (Orozco-terWengel et al. 2012) and nonoverlapping windows of 10,000 SNPs. Chromosome-wide mean estimates across replicates are shown by the dashed lines and also calculated separately for each replicate in Table 1. DNA stretches with significantly different b N e are determined using the stepR software package (Frick et al. 2014) (bottom panel). Lower and upper 1 2 a confidence bands are shown as shaded areas. a controls the error, i.e., the probability for overestimating the number of change points, and is calculated automatically as described in Frick et al. (2014). The colors indicate different biological replicates.
is required, one can alternatively use a set of randomly distributed SNPs over the genome to obtain b N e : Temporal methods make a number of assumptions, which, if violated, can lead to biased N e estimates. For example, in our simulations, we considered only effective population sizes that are constant over time. Fluctuating N e is a frequent phenomenon in natural populations and can be an important component of an experimental design. For example, in repeatedly bottlenecked populations, the smallest population size dominates b N e (Luikart et al. 1999;Charlesworth 2009). But even in strictly controlled populations the experimental regime can induce changes in N e : When the population changes in size, the estimated N e is generally interpreted as the harmonic mean of the effective population sizes over the generations (Wright 1938;Nei and Tajima 1981;Waples 1989). However, if time series allele frequency data are available, such changes can be detected by estimating N e from pairwise comparisons between consecutive time points.
All evolutionary forces (selection, demography, etc.) that lead to deviations from the neutral expectation will also affect our estimate. Nevertheless, systematic forces that result in locally different values of b N e can be detected with a slidingwindow approach, as illustrated with simulations under selection ( Figure S15). The D. melanogaster data set also illustrates this point; i.e., the hypothesized regions under selection coincide with regions of reduced N e (Orozco-terWengel et al. 2012;Tobler et al. 2014;Franssen et al. 2015). For this to be detected, however, most of the allele frequency change has to occur over the sampled time span.
In the E&R study with D. melanogaster, shown above, the criterion of nonoverlapping generations, assumed by temporal methods, is met (see Orozco-terWengel et al. 2012 for details on experimental design). However, for samples from an age-structured population, the resulting b N e can be biased (Waples and Yokota 2007). In these cases, as suggested by Waples and Yokota (2007), larger spacing between samples maximizes the drift signal compared to sampling biases associated with age structure.

Using a small number of generations can lead to outlier estimates
In general, N e ðPÞ has a lower bias but larger variance, especially when t is small. As pointed out by Jorde and Ryman (2007) our weighting scheme leads to an increased variance but a smaller bias compared to other schemes. We observe outlier estimates among replicates at early generations (gen-eration 5, Figure 2, Figure S4, Figure S5, and Figure S6) for N e ðPÞ: When the sampling variance is large compared to the drift variance (N e ¼ 1000; S # 100; and R ¼ 50; Figure S11 and Figure S12), the deviation of the outlier estimates from the true N e is particularly large. For a few cases, we even observe large negative estimates. Negative estimates, in general, can be interpreted as N e being infinity, that is, no evidence of genetic drift (Peel et al. 2013). In our simulations this is plausible when N e is large and t is small, such that drift has not had a large effect on the population allele frequencies yet. Note that the harmonic mean estimator ( b N * e ðPÞ) has smaller variance for large N e ( Figure S16). This estimator, however, is less accurate than N e ðPÞ for small N e as shown in Figure S17.
To eliminate potential outliers and an inflated variance we recommend increasing the signal-to-noise ratio by pooling a sufficient number of individuals. Using later generations or increasing the number of SNPs in the analysis also helps to avoid outlier estimates. When none of these strategies can be applied, we suggest using the genome-wide median N e ðPÞ estimates or the harmonic mean estimator, as these are more robust to extreme outliers.

Conclusions
Effective population size is an important parameter for describing evolutionary dynamics, making its accurate estimation essential for population genetic studies. Several methods have been designed to estimate N e ; and their performance was comprehensively evaluated on simulated as well as real data (Barker 2011;Serbezov et al. 2012;Baalsrud et al. 2014;Holleley et al. 2014;Gilbert and Whitlock 2015). These studies mainly focused on genetic data collected from natural populations, which usually differ from experimental studies in terms of the census population size and sampling scheme. We designed a method that accurately infers the effective population size in genome-wide data from experimental populations sequenced in pools. Our approach improves temporal methods by explicitly correcting for two stages of sampling introduced by pooling and sequencing. Our results on simulated data confirm that methods that fail to properly account for the two stages of sampling inherent to Pool-seq can lead to severely biased N e estimates.
Pool-seq data are often considered to be overdispersed, i.e., displaying more variability than is predicted by the binomial sampling model (Yang et al. 2012). However, Zhu et al. (2012) and Futschik and Schlötterer (2010) validated that The effective population size is estimated with N e ðPÞ plan I in windows of 10,000 SNPs (Figure 7). The mean estimates across windows are shown for the major chromosome arms. The genome-wide mean is taken over the autosomes, excluding chromosome 4.
the error in allele frequency estimates is reasonably well approximated by binomial sampling given that a large enough number of individuals are pooled. Nevertheless, if overdispersion is present in the data, that will lead to additional variance, which is not modeled in our framework and will result in a downward bias of the estimated N e : If the level of overdispersion can be inferred for the data (see, e.g., Gautier et al. 2013;Illingworth 2015), it is possible to introduce a parameter that accounts for the additional between-pool variation (see File S1, Equation S8). We also illustrate the applicability of our method for estimating N e from experimental data of D. melanogaster and show that in combination with a recursive partitioning method we can infer patterns of local variation in N e along the genome. Additionally, it is possible to calculate confidence intervals based on the x 2 distribution (Waples 1989) or alternatively apply a nonparametric bootstrap approach.

Software availability
Our proposed estimators along with standard methods from the literature are implemented within the R package Nest. The package is currently available at https://github.com/Tho-masTaus/Nest.

Supplement
Agnes Jónás, Thomas Taus, Carolin Kosiol, Christian Schlötterer, Andreas Futschik 1 Normalized allele frequency changes under two-step sampling 1.1 Calculating the expected normalized variation In the following, we calculate the expected value for a measure of allele frequency change due to the combination of t generations of drift and sampling. As in Waples (1989), we consider as our measure, wherex andŷ denote the observed allele frequencies in the pooled DNA sample collected t generations apart, a is the number of alleles at a locus andẑ k := (x k +ŷ k )/2. We consider only biallelic sites, which reduces equation (S1) to for a given locus. The expectation of F c is approximated in the following way: We will provide expressions for the numerator and denominator in equation (S3) both for plan I and II under the two-step sampling process. For this purpose, we introduce the following notation: Let p and p t denote the true allele frequency in the gamete pool preceding the first sampled generation (0) and after t generations, respectively. S 0 is the number of individuals sampled at the first time point (generation 0), and x denotes the relative allele frequency in this sample. As in the main text, we distinguish between the allele frequencies (x, y) estimated after sampling individuals at generations 0 and t, and the frequencies (x,ŷ) after pool sequencing based on R j (j ∈ {0, t}) reads. For a visual representation of our sampling schemes along with the notation see Fig. S1. We refer to previous findings of Waples (1989) without carrying out the detailed derivations here, namely 1. The variance of x is binomial under both plans: 1 2. The variance due to t generations of genetic drift is: We consider the relevant terms in eq. (S3) separately: Note that we assume a diploid population with sample size of 2S 0 and effective population size of 2N e . Replacing 2S 0 with S 0 leads to the variance ofx in a haploid population with effective size N e . We now derive the variance of the sample allele frequency at T = t.
V ar(ŷ) = E(V ar(ŷ|y)) + V ar(E(ŷ|y)) Here, 2S j is the number of sampled chromosomes in a diploid population at time t. For a haploid population, the factor 2 can be ignored for both the sample size and the effective population size.
We introduce a correction term C j := 1 2Sj + 1 Rj − 1 2Sj Rj for F c at generations j ∈ {0, t}. For haploid populations an analogous correction term is C j(hapl.) := 1 Sj + 1 Rj − 1 Sj Rj (j ∈ {0, t}). Based on the results above, the variance in sample allele frequency at T = 0 and t can be written as and In the haploid case, we have V ar(ŷ) Pool-seq data can be over-dispersed, e.g. if different individuals in the pool have different probabilities of contributing the sequenced reads.
If the level of overdispersion can be inferred from the data, then it is possible to account for the additional variance by introducing an overdispersion parameter γ. This parameter can be interpreted as the factor by which the actual variance exceeds the theoretical binomial variance. With γ, our correction term becomes Following Waples (1989) the denominator in equation (S3) reduces to We now investigate the covariance term separately for both sampling plans.

Sampling plan II
Under sampling plan II, a sample of size S 0 is taken before reproduction and is not replaced in the population. Consequenty, the sample of S 0 individuals and the 2N e chromosomes contributing to the next generation are mutually exclusive, implying that the sample allele frequency at generation 0 is uncorrelated to the sample allele frequency at t, i.e. Cov(x,ŷ) = 0 for sampling plan II. We now plug equations (S6), (S7), and (S9) into equation (S3). With equation (S7), we use the equation (2) from the main text.
By solving this equation and replacing E(F c ) by F c , under plan II we obtain a measure of variance corrected for the two-step sampling The effective population size can then be estimated using the method of moments, i.e. equating F c = E[F c ] and solving for N e .

Sampling plan I
Under sampling plan I, a sample of S 0 individuals is taken after reproduction, so the sample will contain chromosomes that might contribute to the next generation. This implies that the sample allele frequency at generation 0 is positively correlated with the sample allele frequency at t, i.e.
Cov(x,ŷ) > 0, because both are derived from the same initial sample at generation 0. Note that the covariance in equation (S3) would be zero, if the allele frequency p 0 of this initial sample were known because sampling itself at the two time points are independent. Similar to Waples (1989), it can be shown that the covariance ofx andŷ is simply the variance of p 0 . Indeed with N denoting the number of diploid individuals in the initial population, and p is the true allele frequency in the gamete pool preceding generation 0. This leads to the following estimate of E(F c ) for plan I Replacing E(F c ) by F c leads to the corrected estimator of F under plan I As under plan II, a method of moments estimator can now be computed using F c . Note that for haploid populations, the covariance in eq. (S13) reduces to Cov(x,ŷ) = p(1−p) N , and eq. (S14) for haploids will be F c = . Figure S1: Two-step sampling schemes. We follow the sampling schemes proposed by Nei and Tajima (1981) and generalized by Waples (1989). Under sampling scheme I (plan I) a sample is taken after reproduction. Under the second scheme (plan II), the sample is taken before reproduction and is not returned to the population, so the sampled individuals and the ones contributing to the next generation are mutually exclusive (Waples, 1989). Allele frequencies are obtained after a two-step sampling process: First a sample of S j (j ∈ {0, t}) individuals is taken to create a pooled library for DNA extraction (green), that is subjected to high throughput sequencing. Sampling R j (j ∈ {0, t}) reads is modeled in the second step (red). At each step the relative allele frequencies in the corresponding samples are indicated under the circles representing the sample. This is a modified version of a figure in Waples (1989). (2)). Right: F(approx.2)= t/2N e . The function arguments are t and N e .
R. S. Waples. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics, 121 (2):379-391, Feb 1989. Figure S3: The performance of the estimators proposed by Waples (1989) (N e (W )) and Jorde and Ryman (2007) (N e (JR)) are shown on simulated data. Simulations are performed as described in the main text (section "Simulations"), but without the additional sampling step representing pool sequencing, i.e. full genomic information is known for the S sampled individuals. The true effective population size is indicated by the grey dashed lines We assume N = N e , where N is the census size of the population. For the bottom left scenario the figure is truncated and one outlier estimate > 7000 for N e (W ) is not shown. Only plan I estimators are displayed. Figure S4: Performance of N e estimators for r = 1. Sixty generations of Wright-Fisher neutral evolution with N e = 100 diploid individuals are simulated for n = 2000 loci (SNPs). At the starting population (generation 0), and at each indicated time point, a sample is taken to create a pool of S individuals. Sequencing is modeled by an additional subsequent sampling step of a random number of reads (Poisson with expected value R) for each SNP. Then N e is estimated on the resulting data set by separately contrasting generation 0 with each of the generations 5, 10, 15, 20, 40 and 60. We consider the estimators N e (W ), N e (JR) and N e (P ). Each box represents results from 100 simulations with identical parameters. The grey dashed lines indicate the actual value of N e . Results under plan I and II are shown separately on panels (a) and (b), respectively. Figure S7: An alternative estimate of N e (N * e (P )), that is defined as the harmonic mean of individual N e estimates (one per locus). Before taking the harmonic mean, our two-step correction is applied to all individual estimates. N e (P ) is the original estimator also shown in Fig. 2. Simulation parameters are matching that of Fig. 2 Table S1: Genome-wide medianN e from an E&R study with D. melanogaster. The effective population size is estimated with N e (P ) plan I in windows of 10000 SNPs (Fig. 7). The median across windows of the estimates is shown for the major chromosome arms. The genome-wide median is taken over the autosomes excluding chromosome 4.       Figure S15: Estimated N e on data simulated with selection. We performed forward Wright-Fisher simulations for 59 generations with selection in three replicate populations using MimicrEE . (See also Section Simulations in the Main text). A selection coefficient of s = 0.2 (with semi-dominance, h = 0.5) is assigned to a single randomly selected locus with low starting allele frequency. All other loci were evolving neutrally. The population size is kept at a constant level of 1000 individuals during the simulations. MimicrEE simulates haplotype evolution with recombination, reflecting linkage disequilibrium in the data. The top panel shows the p-values of a selection scan indicating the true target of selection in red (Cochran-Mantel-Haenszel test contrasting allele frequencies at generation 0 and 59, Bastide et al. (2013)). N e is estimated in nonoverlapping windows containing 10000 SNPs independently for replicates (middle panel). The mean N e across windows and replicates is shown with the dashed line. The results of the segmentation algorithm performed onN e windows  is shown in the bottom panel separately for the replicates. Figure S16: Coefficient of variation ofN e estimated with the harmonic mean estimator N * e (P ) under plan I for various parameter values. Wright-Fisher neutral simulations were performed as in Fig. S8. Effective population size is estimated separately for each marker locus using the corrected variance estimator in eq. (11).N e is then estimated as the harmonic mean across the n single locus N e estimates.