Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data

Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow.

Abstract.-Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow. [Approximate Bayesian computation; demographic modeling; gene flow; gene tree; incomplete lineage sorting; introgression; phylogenomics; species tree.] The application of genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. Traditionally, phylogenetic methods are applied to a single locus or a small number of loci and the branching patterns of the resulting phylogenetic trees are interpreted as though they reflect the chronological order of speciation events (Felsenstein 2004). This framework usually assumes a strictly bifurcating tree, which requires immediate and complete speciation events, and an instantaneous substitution process, which ignores many populationlevel effects, like gradual allele frequency changes (Felsenstein 1981;Golding and Felsenstein 1990). Such an approach might provide satisfactory results in distantly related taxa, but suffers from a number of issues when dealing with evolutionary relationships at shallow time depths. In such cases, a phylogenetic tree inferred from any given genomic locus (a 'gene' tree) might not unequivocally reflect the true order of speciation events (the 'species' tree), and inconsistent topologies are often obtained across the genome (Maddison 1997;Nichols 2001;Edwards 2009). This heterogeneity among gene trees also implies that the model underlying most phylogenetic approaches, which assumes that all sites in the alignment have evolved along the same tree, may be violated. As a result, the concatenation of multiple independent loci into a single 'supermatrix' can yield statistically inconsistent phylogenies and produce strongly misleading results when such an approach is applied to genome-wide data sets (Kubatko and Degnan 2007).
Genome-wide heterogeneity in observed gene tree topologies can be caused by multiple biological processes, but might also be a methodological artifact, because phylogenetic trees represent point estimates associated with uncertainty. Assuming that gene trees can be inferred correctly, comparison of paralogous sequences due to gene duplication and loss can cause gene tree discordance (Goodman et al. 1979), especially when trees are inferred over long evolutionary time scales. Within groups of closely related species, genomewide variation in gene trees is caused mainly by two biological processes, namely incomplete lineage sorting (ILS) and interspecific gene flow.
ILS has been a major concern in the reconstruction of species trees in many taxonomic groups (Avise et al. 1983;Tajima 1983;Pamilo and Nei 1988;Maddison 1997). For instance, ILS has been studied intensively in great apes and humans, for which the first phylogenetic analysis using molecular data dates back more than 30 years (Ferris et al. 1981). Even in such clearly delimited species as humans, chimpanzees and gorillas, around 30% of loci throughout the genome support a topology that deviates from the well-supported species tree (Scally et al. 2012). ILS occurs when lineages fail to coalesce in the ancestral population of two species (Pamilo and Nei 1988;Maddison 1997;Degnan and Salter 2005). Therefore, the probability of ILS depends 2015 NATER ET AL.-RESOLVING SPECIES TREES WITH WHOLE-GENOME DATA 1001 on both the effective population size (N e ) in the ancestral population of two species, which determines the rate of coalescence of lineages, and the time between two successive speciation events (internode length , Hudson 1990;Degnan and Salter 2005).
Although gene tree incongruences caused by ILS are still fully compatible with a strictly bifurcating species tree, gene flow among species requires a more complex representation of evolutionary histories, resembling reticulate networks rather than trees (Yu et al. 2011). If lineages from one species successfully introgress into another species and spread to high frequency, the genealogy of these lineages will support a clustering of species that exchanged genes after their initial split regardless of their evolutionary relationship. However, the rate of gene flow has been shown to vary along the genome, with repressed introgression in regions harboring genes involved in reproductive isolation or ecological specialization (Wu and Ting 2004;Via and West 2008;Nosil et al. 2009). Thus, in systems where hybridization occurs regularly, local gene trees will reflect the tendency of introgression at a specific locus. In contrast, if introgression is rare and selection for or against it is absent, the fate of an introgressed lineage is determined by chance alone, which might again produce a large variation in observed local gene trees.
A key parameter determining the distribution of gene trees along the genome is the recombination rate. Recombination affects gene tree topologies in three ways. First, recombination can physically join DNA segments with different genealogies on the same chromosome (Griffiths and Marjoram 1996;Zheng et al. 2014). Thus, in regions with a high population recombination rate gene tree topologies can change in quick succession along the genome. Second, regions with low recombination have been shown to have a lower N e as compared with high-recombination regions (Kaplan et al. 1989;Charlesworth et al. 1993;Charlesworth 2012). This is mainly due to the effects of selection, which affects large stretches of linked sequence in the absence of regular recombination. Due to enhanced lineage sorting, regions with low recombination rates and thus reduced N e will show less ILS and therefore more consistent topologies of local gene trees. Third, regions of low recombination are more likely to be resistant to introgression due to strong linkage to genetic incompatibilities or genes under divergent selection (Feder et al. 2012;Nachman and Payseur 2012). If specific loci within the genome of the introgressing species are detrimental on the genomic background of the other species, neighboring regions will only be able to introgress if recombination uncouples them from incompatibility loci before they are removed from the population by selection. In summary, regions of low recombination are expected to show gene trees consistent with the species tree in higher frequency than high-recombination regions, and these gene tree topologies will persist over longer physical distance. This renders such regions potentially useful to reconstruct species relationships in the presence of ILS and interspecific gene flow (Pease and Hahn 2013).
By modeling the distribution of gene trees under a given demographic model, coalescence-based modeling approaches can reconstruct species phylogenies by taking into account both ILS and interspecific gene flow. Additionally, given data from multiple independent loci and multiple individuals per species, such modeling approaches are able to harvest information about N e in both current and ancestral populations (Hudson 1990;Liu and Pearl 2007;Heled and Drummond 2010). However, due to computational constraints, multispecies coalescent methods have often been limited to relatively small data sets with only few loci (Rannala and Yang 2003;Liu and Pearl 2007;Liu 2008;Heled and Drummond 2010), and thus their usefulness for the analysis of whole-genome data is limited. Moreover, selection of a small number of loci might also strongly bias the outcome of the species tree inference (Maddison 1997;Nichols 2001;Degnan and Rosenberg 2009;Leache and Rannala 2011). Recent approaches have been optimized to utilize large numbers of independent loci. This considerably improves estimation of the species tree in closely related species where the variance in genealogies among loci is large (e.g., Liu et al. 2009;Liu et al. 2010;Bryant et al. 2012). These methodological advances therefore open the door to statistically sound phylogenomic analysis that avoids critical issues encountered when analyzing genome-wide data sets by concatenation and classical phylogenetic inference (Edwards et al. 2007;Kubatko and Degnan 2007;Rannala and Yang 2008; but see Gatesy and Springer 2014 for a counterargument).
A major limitation of many species-tree methods is their assumption of a model with strict isolation after species splits. It is unclear to what extent gene flow among lineages in the species tree can confound the true order of speciation events. Disentangling the effects of species split time and migration on genealogies is indeed challenging (Hey 2006). Even though maximum-likelihood modeling frameworks have been successfully applied to tackle this question, most of these approaches are restricted to a small number of populations or species, which limits their use to resolve phylogenetic questions (Hey and Nielsen 2004;Hey 2006). Furthermore, the calculation of the likelihood function is computationally expensive, and expanding such approaches to whole-genome data is therefore hindered by computational constraints. However, recent advances in approximate methods like approximate Bayesian computation (ABC) offer an elegant way around the problem of solving a likelihood function (Beaumont et al. 2002;Marjoram et al. 2003;Excoffier et al. 2013). This allows investigation of the influence of complex demographic processes on the reconstruction of species trees in closely related species, given the availability of both a large number of independent loci and a population sample from each species considered in the tree. However, despite recent advances in the generation of genome-wide sequencing data, such data sets remain rare and computational limitations have so far prevented a 1002 SYSTEMATIC BIOLOGY VOL. 64 thorough investigation of the utility of coalescent-based approaches to resolve species relationships in closely related species.
Inferring phylogenetic relationships in birds has often been challenging due to rapid radiations and frequent hybridization (Grant and Grant 1992;Ericson et al. 2006). The Old World black-and-white flycatcher species complex (genus: Ficedula) represents such a case of closely related bird species with difficult-toresolve evolutionary relationships. The four species of this complex-collared flycatcher (F. albicollis), pied flycatcher (F. hypoleuca), Atlas flycatcher (F. speculigera), and semicollared flycatcher (F. semitorquata)-warrant particular focus because of extensive phylogenetic, phylogeographic and ecological studies (e.g. Ellegren et al. 1996;Saetre et al. 1997;Qvarnstrom et al. 2000;Veen et al. 2001;Saether et al. 2007;Qvarnstrom et al. 2010;Saetre and Saether 2010). Black-and-white flycatchers are distributed in central and eastern Europe, the Middle East, and northwest Africa, and likely diverged from a common ancestor less than 2 Ma . Like other species in Europe, they have likely experienced recurrent population expansions and contractions during Pleistocene glacial cycles, which resulted in recurrent episodes of isolation in refugia followed by hybridization upon secondary contact Borge et al. 2005;Backström et al. 2013;Nadachowska-Brzyska et al. 2013). Therefore, their recent divergence and repeated occurrence of hybridization make evolutionary relationships of black-and-white flycatchers difficult to resolve despite extensive studies using a wide variety of nonmolecular and molecular markers (Lundberg and Alatalo 1992;Saetre et al. 2001;Hogner et al. 2012;Smeds et al. 2015).
Here we used whole-genome sequence data of 200 individuals covering all four species of the blackand-white flycatcher species complex, as well as two outgroup species, to tackle the long-lasting problem of unravelling evolutionary relationships in closely related and rapidly radiating species. We aimed to quantify the genome-wide extent of gene tree discordance caused by ILS and interspecific gene flow in relation to chromosome location, within-species diversity, and recombination rate. Moreover, we apply a range of coalescent-based modeling approaches to find the species tree which best explains the observed distribution of local gene trees. By applying population genomic data, we aimed for two key improvements over earlier smaller scale studies. First, we avoid biases in a priori selection of specific loci and additionally obtain information about genome-wide variation in genealogies, which aids in selecting suitable loci in other species with less extensive data. Second, by employing sequence data from multiple individuals per species, we identify genomic regions of accelerated lineage sorting in which signals of speciation events should be pronounced. Moreover, population sampling allows estimation of species-specific N e and gene flow rates, which aids in disentangling species split patterns from overlaying signals of introgression.
Our findings highlight the difficulties and pitfalls encountered when dealing with phylogenetic questions in closely related and hybridizing species. Owing to the availability of extensive population genomic resources, black-and-white flycatchers can serve as a model system to guide the selection of appropriate genetic markers and analysis methods for systems with similarly challenging evolutionary relationships but lacking comparable data.

Sequence Data and General Filtering Strategy
We extracted DNA from the blood and tissue samples using Qiagen's Blood and Tissue Kit following the manufacturer's instructions. We performed wholegenome sequencing using paired-end libraries with 450bp insert size on an Illumina HiSeq 2000 instrument with 100-bp read length. We sequenced the 200 individuals at a mean depth of 14.7× per individual (min 5.0×, max 26.7×, Table 1, Table S1) and mapped reads against the collared flycatcher assembly version FicAlb1.5 (Kawakami et al. 2014b) using BWA 0.7.4 (Li and Durbin 2009). We then performed variant calling on a per-population/species level using GATK 2.8-1 (McKenna et al. 2010), following the GATK Best Practices workflow for DNAseq data. This included a variant quality recalibration and filtering step using highquality genotype data from a 50k SNP chip (Kawakami et al. 2014a) to obtain high-quality single-nucleotide polymorphisms (SNPs). To further reduce erroneous genotype calls, we applied a common filter to all sequences used in subsequent analyses. This included a minimum root mean square of mapping quality of 20 per site, a minimum read coverage of 5× per individual and site, and a minimum number of 10 individuals with callable genotype per population/species. Sites not conforming to these filter requirements were coded as missing. To remove potentially collapsed regions in the reference genome assembly, we additionally excluded all sites with a sequencing coverage larger than five standard deviations from the mean coverage across all individuals or with exclusively heterozygous genotypes in any of the populations/species. Furthermore, we applied a repeat mask to the collared flycatcher reference sequence as described in Ellegren et al. (2012).

Polarization
We used the two outgroup genomes to determine the ancestral state for each position in the reference genome using a parsimony-based approach. If both outgroup individuals were monomorphic for the same allele, this allele was considered ancestral. If data were missing, one of the outgroups was polymorphic, or the two outgroups were fixed for different alleles, the ancestral state was considered undetermined and the position coded as missing data in all individuals.

Allele Sharing and Principal Component Analysis
We used a custom-made Perl script to count the occurrences of shared derived variation in any pairwise comparison of the four focal species using SNP filtering and polarization as described above. Venn diagrams were plotted using the R package VennDiagram (Chen and Boutros 2011).
To obtain a genome-wide picture of the partitioning of genetic diversity among species, we performed a principal component analysis (PCA) using the "prcomp" function in R v3.0.1 (R Development Core Team 2010). For this, we generated a table of diploid genotype calls for a total of 10 million SNPs with complete genotypes in all 198 individuals of the four focal species. We randomly sampled SNPs along the genome, but excluding the Z chromosome, mtDNA genome, and regions with elevated differentiation (differentiation islands) as defined by Ellegren et al. (2012).

Window-based Gene Tree Discordance
We assessed the genome-wide distribution of gene tree topologies with a stepping window approach, moving 10-kb windows in steps of 50 kb to minimize linkage between subsequent windows. For each window, we first inferred haplotypes of all 198 individuals of the focal species using fastPHASE v1.4.0 (Scheet and Stephens 2006). To minimize phasing errors, we coded sites with less than 80% posterior phasing support as missing data and randomly selected one haplotype per individual. Due to unequal sample sizes among species, we subsampled pied and collared flycatchers by selecting for each window and each of the four populations per species the five individuals with the highest sequencing coverage. Employing the alignment of haploid sequences, we then inferred the maximumlikelihood gene tree as well as 200 bootstrap replicates using the rapid bootstrapping algorithm with the GTRGAMMA model in RAxML v8.0.20 (Stamatakis 2014). We rooted the resulting trees with the F. hyperythra outgroup using Newick Utilities v1.6 (Junier and Zdobnov 2010). We processed the rooted trees using a custom-made Perl script to calculate the genealogical sorting index (gsi, Cummings et al. 2008) for each of the four species, as well as the depth of the most recent common ancestor of all four species relative to the root depth. The gsi is a standardized measurement of the degree of monophyly for a group of lineages on a given gene tree topology, with a value of one indicating that the lineages form a monophyletic group, whereas a value of zero indicates a random distribution of lineages across the entire tree.
Because lack of species monophyly is common in this study system, we performed 200 rounds of subsampling for each gene tree by randomly selecting one leaf node per species, including the F. parva and F. hyperythra VOL. 64 outgroups. We extracted the same leaf nodes from the 200 bootstrap trees and used these replicates to annotate bootstrap support values on the original sixtaxon tree using Newick Utilities. We then processed each subsampled tree with a custom-made Perl script using the BioPerl Tree module (Stajich et al. 2002) in order to determine the frequencies of the rooted and unrooted tree topologies over the 200 iterations for each window. Based on the frequencies (p) of gene tree topologies in 10-kb windows, we calculated a topology diversity index (tdi) as , with I being the number of possible rooted tree topologies. A value of one therefore indicates the occurrence of all possible topologies at equal frequencies, whereas a value of zero means that only a single topology is found in a window. We calculated the tdi for both single 10-kb windows, as well as with frequencies averaged over 500-kb sliding windows to assess persistence of topologies over longer physical distance. We used R to plot all window-based statistics of gene tree distributions.
In order to test whether a quantitative analysis of the genome-wide distribution of gene tree topologies can be used for species tree inference, we summed the frequencies of the 15 possible rooted and three unrooted gene tree topologies over the entire genome, omitting every other window in order to reduce linkage effects. We also assessed if a particular tree topology was enriched in low-recombining or highly differentiated regions of the genome as suggested before (Corl and Ellegren 2013;Pease and Hahn 2013;Cruickshank and Hahn 2014). For this, we partitioned the window-based tree topologies into differentiation islands as defined in Ellegren et al. (2012), the lower 10% quantile of recombination rates (<0.71 cM/Mb, Kawakami et al. 2014b), and the Z chromosome. Additionally, we applied a supermatrix approach to obtain genomewide phylogenetic trees by concatenating individual haplotypes from 10-kb windows in each of the four data partitions. Due to the large number of loci in the complete data set, we subsampled by concatenating only every 10th window. We then used the GTRCAT model in RAxML to infer the maximum-likelihood tree as well as 200 rapid bootstrap replicates.

Model-based Species Tree Reconstruction
We applied a series of modeling approaches suitable for whole-genome data in order to test the power to resolve the species tree in a group of closely related species. All modeling approaches are based on the multispecies coalescent, but employ different population models and marker types for inference. With the exception of the MP-EST analysis, for which we used the same data partitions as in the window-based analyses, we applied stringent filters to select suitable loci for the model-based species tree inference. Because these models assume neutrality of the employed genetic markers, we only considered loci in the genome that were at least 20 kb away from any exonic region as determined by the Ensembl gene annotation of the collared flycatcher reference genome version FicAlb1.4. We also excluded loci on the Z chromosome and the mitochondrial genome, as these are expected to have reduced N e as compared with autosomal regions. Furthermore, we did not consider any loci located within differentiation islands as defined by Ellegren et al. (2012), as these show patterns of diversity and differentiation clearly distinct from the genomic background, which might be the results of selective sweeps or background selection.

MP-EST
As a first approach, we applied MP-EST v1.4 (Liu et al. 2010) to the set of 18,649 gene trees from 10-kb windows obtained in the previous analysis. Given a set of gene trees, MP-EST optimizes a pseudo-likelihood function of the species tree using the multispecies coalescent model. Thus, MP-EST accounts for ILS as a source of gene tree/species tree discordance, but does not model interspecific gene flow. MP-EST is computationally highly efficient, which allowed us to analyze the genome-wide distribution of gene tree topologies in a species tree framework. We performed ten independent runs each on the four different partitions (all regions, low-recombination regions, differentiation islands, and Z chromosome), retaining the tree with the highest pseudo-likelihood for each partition. Due to the large number of available independent loci and the short run times of the MP-EST method, we additionally explored the precision of inferring a given species tree relative to the number of loci used. For this, we subsampled from the full set of trees in each of the four partitions 1000 times without replacement between 10 and 1000 gene trees. We then analyzed each of the resulting 10,000 data sets with MP-EST in the same way as for the complete data sets.

SNAPP
We used SNAPP v1.1.1 (Bryant et al. 2012), which assumes a strict isolation model with constant population sizes to estimate the species tree using a large number of independent genetic markers. SNAPP calculates the probability of the data given a species tree, bypassing the need to integrate over all possible genealogies. As a major advantage, SNAPP performs tree search and therefore does not require a priori definition of a set of species tree topologies to test. Additionally, SNAPP provides estimates of theta for all current and ancestral species; theta equals 4N e , with being the mutation rate per site per generation.
Due to severe runtime and memory constraints of this method, we used a relatively small data set of 16,000 SNPs randomly sampled from the genome. We collected the allelic states for each SNP in a random subset of 12 individuals per species, and from each individual we 1005 randomly sampled one of the two chromosomes. SNPs with less than 12 covered individuals per species were not considered. We coded allelic states as either ancestral or derived as described above. For the priors of the model parameters, we chose wide and uninformative distributions. The forward and backward mutation rates were sampled from uniform distributions between zero and 100, and the rate parameters were sampled from a gamma distribution with the alpha and beta parameters fixed at 10 and 100, respectively. The lambda parameter of the Yule prior for the species tree was uniformly distributed in the range of zero to one. We performed 10 independent runs, each with 1,000,000 iterations of the Markov chain Monte Carlo (MCMC) algorithm, discarding the first 200,000 iterations of each run as burnin. We then assessed convergence of the MCMC chain using Tracer v1.5 (Rambaut and Drummond 2007) and plotted the distribution of species trees in the posterior sample using DensiTree v2.1.11 (Bouckaert 2010).

Fastsimcoal2
To make more efficient use of our genome-wide data set, we used an approximate method to fit demographic models to the observed two-dimensional site frequency spectra (2D-SFS) as implemented in fastsimcoal v2.5.2 (Excoffier et al. 2013). This approach approximates the probability of observing a certain site frequency configuration given a demographic model by simulating a large number of genealogies for each model/parameter setting. The length of the branches leading to the number of tips matching the site frequency configuration relative to the total length of the genealogy is then averaged over all simulated genealogies. Assuming independence of SNPs, a composite likelihood can be calculated by multiplying the approximate likelihoods for each cell in the SFS. This composite likelihood is then maximized for the whole 2D-SFS (or six 2D-SFS in our caseone for each pairwise comparison of four species) to obtain maximum-likelihood estimates of the model parameters. Because we also scored the total number of monomorphic sites (the 0/0 and 1/1 boxes in the 2D-SFS) and an estimate of the mutation rate was available (2.5× 10 −9 per site per generation, Nadachowska-Brzyska et al. 2013), this method is also able to provide absolute estimates of all the parameters in the model. Due to its approximate nature, complex demographic models can be assessed, including gene flow among species, which enabled us to test for the influence of gene flow in species tree reconstruction.
Composite likelihood methods like fastsimcoal2 assume independence of the sites in the SFS. We therefore assessed a subset of sites distributed randomly across the genome. For this, we first divided the genome into 100-bp windows, filtered all windows with more than 20% repeat-masked sites, and then randomly sampled 100,000 windows without replacement (from a total of 3,282,941 windows). In each 100-bp window, we subsampled each position to 24 random chromosomes per species and counted the number of derived alleles in each species for each pairwise comparison. Positions that did not have enough data for one or more of the four species were excluded in all species.
Because fastsimcoal2 does not allow for exhaustive tree search, we restricted our model testing procedure to the four most frequently observed gene tree topologies as determined in the genome-wide analysis of gene tree discordance. The search ranges for the model parameters are listed in Table S2. We optimized the composite likelihood for each model in 100 independent runs, each using 200,000 simulations for every cycle in the expectation conditional maximization (ECM) algorithm, a minimum of 10 and a maximum of 40 ECM cycles, and a stop criterion of 0.001. We then compared the run with the highest likelihood among the four models to obtain the Akaike's weight of evidence for the model with the highest likelihood (Johnson and Omland 2004). To obtain confidence intervals for the parameter estimates of the best-fitting model, we applied a re-subsampling procedure of our full set of genomewide 100-bp windows. We repeated the subsampling of 100,000 windows a total of 50 times and obtained maximum-likelihood estimates for each replicate as described above, but with only 30 independent runs per replicate to reduce computation time.
ABC As a fourth model-based approach to species tree reconstruction, we applied an ABC framework (Beaumont et al. 2002). Following a recent simulation study (Robinson et al. 2014), we gathered a total of 2962 independent sequence loci of 2 kb each from 12 individuals per species. We randomly sampled loci from the genome, with a minimum distance of 50 kb between loci to minimize the effects of linkage. We then calculated the mean and the variance over all loci of the following summary statistics: number of segregating sites, proportions of shared and fixed polymorphism, average sequence divergence (d XY ), and ST (Excoffier et al. 1992). We calculated the statistics for each of the six pairwise species comparisons, thus resulting in a total of 60 summary statistics used in the ABC analysis.
As in the fastsimcoal2 analysis, we restricted the model testing to the four most frequent gene tree topologies. The prior distributions for the model parameters are listed in Table S3. We simulated 2×10 6 data sets for each model using the coalescent simulator ms (Hudson 2002). From the simulated data, we calculated summary statistics using a custom-made C++ program. We then used the R package "abc" v1.8 (Csillery et al. 2012) to perform a multinomial logistic regression on the simulated and observed summary statistics, using a tolerance level of 0.1% (8000 simulations closest to the observed data). For parameter estimation, we performed another 8×10 6 simulations for the best-fitting model. We used 20,000 random simulations to define the first 10 orthogonal components of the summary statistics 1006 SYSTEMATIC BIOLOGY VOL. 64 that maximize the covariance matrix between summary statistics and model parameters, using a partial leastsquares (PLS) regression approach (Boulesteix and Strimmer 2007) as implemented in the "pls" R package (Mevik and Wehrens 2007). We defined the optimal number of PLS components based on the drop in the root mean squared error for each parameter with the inclusion of additional PLS components (Wegmann et al. 2009). We then transformed the simulated summary statistics with the loadings of the first 10 PLS components and performed a ABC-GLM post-sampling regression (Leuenberger and Wegmann 2010) on the 8000 simulations with the smallest Euclidean distance to the PLS components of the observed summary statistics.
We assessed the power of our model selection approach to differentiate between the four tested demographic models by a cross-validation procedure. For each model, we randomly selected 200 simulated data sets and performed the same model selection procedure as with the real observed data. For each pseudo-observed data set, we scored the model with the highest posterior probability as well as the mean posterior probability of the true model (Fig. S5).

Allele Sharing and PCA
In a first step, we assessed the degree of pairwise allele sharing between the four flycatcher species, which is indicative of the prevalence of ILS and/or interspecific gene flow. We used ancestral state information from two outgroup species to polarize SNPs and specifically look at shared and private occurrence of derived alleles (Fig. 1a). Out of a total of ∼27 million SNPs polymorphic over all four species, ∼5.2 million derived variants are shared among all four species, highlighting the close evolutionary relationship of blackand-white flycatchers. Between 3.5 and 5.2 million SNPs were private to a single species, with collared flycatcher showing the highest number of SNPs with private derived variation. In pairwise comparisons, most derived variation was shared between collared flycatcher and semicollared flycatcher (∼1.4 million SNPs), followed by collared flycatcher and pied flycatcher (∼0.85 million SNPs).
A PCA of 10 million genome-wide SNPs revealed a strong clustering of individuals by species with no overlap (Fig. 1b). The first principal component (PC), explaining ∼20% of the total variance, separated pied, Atlas, as well as collared and semicollared flycatchers into three distinct groups. PC1 also revealed strong population structure within pied flycatchers, with samples from the Spanish population being genetically distinct from the other samples. PC2 (∼11% of the variance) separated semicollared from Atlas flycatchers, but kept collared and pied flycatchers together. Atlas flycatchers were located centrally in the plot of the first two components, approximately equidistant to the other three species.

Genome-wide Gene Tree Incongruence
Assessment of the genome-wide distribution of gene tree topologies revealed widespread gene tree incongruence. The four species did not form monophyletic groups for most genomic regions, as evidenced by genealogical sorting indices (gsi) below one (Fig. 2b). For most genomic regions, none of the 15 rooted gene tree topologies appeared consistently at high frequencies, as shown by tree diversity indices well above zero (Fig. 2e, Fig. S6). Rather, the genome-wide gene tree distribution was characterized by narrow frequency spikes of quickly changing topologies, indicating widespread gene tree discordance in the four species (Fig. 2d, Fig. S6). This result cannot be explained by unreliable placement of the root node, as the three possible unrooted tree configurations showed similar diversity along the genome (Fig. S7).
Despite widespread lack of species monophyly, both gsi and tdi identified regions in the genome that displayed strongly increased rates of lineage sorting (Fig. 2b, Fig. S6). These regions were accompanied by reduced within-species diversity (nucleotide diversity, , Fig. 2a) and largely corresponded to previously identified differentiation islands (Ellegren et al. 2012). Differentiation islands are strongly associated with genomic regions of low recombination (Kawakami et al. 2014a), and are likely the result of locally reduced N e due to the effects of selection at linked sites. Within differentiation islands, windows with gsi values of one for all four species were common, indicating accelerated lineage sorting due to strongly reduced current effective population sizes. Although the reduction in N e within differentiation islands was strong enough to promote species monophyly, we still observed switching of tree topologies in quick succession within these regions. This is highlighted by the tdi calculated over sliding 500-kb windows, which stayed close to one even in regions where multiple successive 10-kb windows showed a tdi of zero. A similar, but chromosome-wide, pattern of accelerated lineage sorting was visible on the Z chromosome, in line with the reduced withinspecies diversity and increased differentiation on this chromosome (Backström et al. 2010;Ellegren et al. 2012).
A quantitative assessment of the distribution of gene tree topologies along the genome revealed that an asymmetric gene tree topology (((P,A),C),S) was most common (17.7%, Fig. 3a). This contrasted with an expected frequency of ∼5.5% in a four-taxon tree with internal branch lengths of zero (Degnan and Rosenberg 2006). The second-ranked, symmetric topology ((P,A),(C,S)) occurred with a frequency of 14.3% versus the expected ∼11%. In combination, the distribution of gene tree topologies strongly supported a species tree with an internal pairing of pied flycatcher     and Atlas flycatcher (41.9% of all gene trees, Fig. 3b). However, the third-ranked topology (((C,P),A),S) was still nearly 2-fold overrepresented (10.5% vs. ∼5.5%), even though such a gene tree topology is discordant to a species tree with an internal pied-Atlas pairing. In total, 23.5% of all gene trees supported an internal collaredpied pairing (Fig. 3b). In agreement with the observed frequencies of gene trees, the phylogenetic tree inferred from the genome-wide concatenated data set resulted in a highly supported asymmetric (((P,A),C),S) topology (Fig. S2).
When restricting the quantitative analysis to specific partitions of the genome, the frequency distribution of gene trees changed considerably as compared with the whole-genome analysis (Fig. 3a). For lowrecombination regions (recombination rate <0.71 cM/Mb), the ranking order of the four most frequent topologies stayed the same, but the symmetrical ((P,A),(C,S)) topology occurred almost as frequently as the top-ranking asymmetrical (((P,A),C),S) tree (16.5% vs. 17.9%). However, for differentiation islands and the Z chromosome, the order of the first two topologies switched, with the symmetrical tree being the most frequent (20.8% vs. 12.5% and 25.8% vs. 23.5%, respectively). For the Z chromosome, an increase in all topologies with an internal pied-Atlas pairing was evident (65.6% vs. 41.9% for all genomic regions, Fig. 3b), with a corresponding underrepresentation of trees with an internal collared-pied pairing (10.8% vs. 23.5%). Such a pattern of increased occurrence of a certain gene tree topology in regions with reduced N e and therefore ILS (which applies to all three mentioned regions) lends strong support that this particular configuration reflects the true species relationships. This notion is further supported by the phylogenetic trees obtained from concatenated 10-kb loci, which resulted in the same symmetrical ((P,A),(C,S)) topology with 100% bootstrap support for all three restricted data partitions (Fig. S2). However, in the following paragraphs, we show that such a conclusion would be misleading in the case of black-and-white flycatchers.

Model-based Species Tree Reconstruction
Analyzing the genome-wide collection of gene trees from 10-kb windows in a species tree framework implemented in MP-EST confirmed the results obtained from the frequency distributions. Although we obtained an asymmetrical (((P,A),C),S) species tree with the genome-wide and the low-recombination regions data sets, the topology switched to a symmetrical ((P,A),(C,S)) tree for the differentiation islands and the Z chromosome (Fig. S3). Subsampling smaller numbers of independent loci revealed that the corresponding topologies obtained with the complete data sets were consistently (> 95% of replicates) obtained with 200 loci sampled genomewide and within differentiation islands (Fig. 4). The low-recombination regions had higher power to infer the same species tree, reaching a 95% confidence level already at 100 loci. Interestingly, the Z chromosome converged slower to the topology obtained with the complete set of Z-chromosomal gene trees, requiring over 1000 loci to reach the 95% confidence level.
Species tree reconstruction based on 16,000 genomewide SNPs with SNAPP, a modeling approach assuming strict isolation after species splits, resulted in a wellresolved topology. After removing the burn-in, the MCMC sample only contained species trees with an asymmetric (((P,A),C),S) topology (Fig. 5). Theta estimates from the SNAPP analysis revealed similar sizes for modern and ancestral populations, except for collared flycatchers, which showed a markedly larger current effective population size (Table S4). Three additional runs each with a new random sample of genome-wide SNPs resulted all in the same (((P,A),C),S) topology and highly similar estimates of theta (Table S4).
To assess the impact of interspecific gene flow on species tree inference, we applied two more complex modeling approaches. First, we used fastsimcoal2 to fit demographic models representing the four most frequent gene tree topologies to pairwise 2D-SFS, which provided strong support for the same species tree topology as inferred by SNAPP (Table 2). We obtained low maximum-likelihood estimates for all migration rates among species, indicating that gene flow among species has not been strong enough to influence the coalescent patterns in the data in a way that would provide spurious results in models without gene flow. In accordance with this notion, we obtained similar patterns of N e estimates as in the SNAPP analysis, with collared flycatchers showing considerably larger N e than the other species (collared: ∼450,000; pied: ∼300,000; Atlas/semicollared: ∼250,000; Table 3). Fastsimcoal2 inferred very short internode lengths of just ∼0.17N e generations for the pied-Atlas branch and ∼0.30N e generations for the pied-Atlas-collared branch.
Second, we applied an ABC framework to perform model selection and parameter estimation on demographic models derived from the four most frequent gene tree topologies (Fig. 6). The ABC analysis struggled to distinguish among the four different models, as evidenced by largely overlapping distributions of summary statistics and a misassignment rate of up to 28% in the model selection cross-validation (Figs. S4 and S5). This lack of power likely results from the loss of information when reducing the genomic data to summary statistics. Still, we obtained 75% posterior probability for the same topology as inferred from SNAPP and fastsimcoal2, representing the asymmetric tree with pied and Atlas flycatchers as sister species (Fig. 6). Together with the corresponding symmetric tree (i.e., ((P,A),(C,S))), a species tree with pied and Atlas flycatchers as sister species was supported with 97% posterior probability. Therefore, the by far largest part of the uncertainty in model choice originated from problems differentiating between symmetric and asymmetric species trees with a pied-Atlas pairing, corresponding to the two most frequently observed gene tree topologies based on 10-kb windows (Fig. 3a).   Re-subsampling of 10-kb loci for species tree inference wit MP-EST. For each number of loci, gene trees were re-subsampled without replacement 1,000 times from the complete set of gene trees of four data partitions. The proportion of resulting species trees matching the topology obtained with the complete data set (((P,A),C),S) is shown for 10 different numbers of loci. For the differentiation islands and the Z chromosome, species trees converged to the alternative ((P,A),(C,S)) topology with increasing number of loci. Point estimates of model parameters from the ABC analysis followed a similar pattern as in the fastsimcoal2 analysis, but resulted in about twice lower estimates of current N e in the four species (Table 3). In contrast to the fastsimcoal2 estimate, the internode between the Atlas/pied and (Atlas-pied)/collared splits was characterized by about eight times larger ancestral than current N e , resulting in an estimated length of only 0.09 N e generations. With such a short internode length, very few coalescent events are expected to happen in the ancestral population of pied and Atlas flycatchers, resulting in high frequencies of discordant gene trees due to ILS.

DISCUSSION
When speciation events occur in quick succession or when established species continue to exchange genes, independent regions along the genome will show phylogenetic relationships that differ from the species tree (Nichols 2001;Degnan and Salter 2005;Degnan and Rosenberg 2009). This stochasticity in lineage sorting between populations and species makes inferences of the sequence of speciation events a challenging task. Despite the emergence of large-scale genomic data sets in many  organisms, the problem of species tree reconstruction cannot be easily solved by just considering more data (Degnan and Rosenberg 2009). Rather, under certain topology and branch length combinations, species tree inference from a large number of gene trees can become positively misleading because the most frequently observed gene tree topology does not necessarily correspond to the species tree (Degnan and Rosenberg 2006). Here, capitalizing on a large population genomic data set of four closely related flycatcher species, we were able to show how gene tree topologies vary extensively and at a small physical scale along the genome. Despite the frequent occurrence of every possible tree topology, we reconstructed the underlying species tree with high confidence using a series of coalescent-based approaches taking both ILS and interspecific gene flow into account. Our study thereby illustrates the usefulness of modeling approaches to efficiently extract information about speciation patterns from large genome-wide data sets and provide confident answers even in such closely related and hybridizing systems such as the black-andwhite flycatchers.
Widespread incongruence of gene trees from multiple independent loci due to ILS has been recorded in many taxonomic studies (reviewed in Degnan and Rosenberg 2009). This problem is especially pronounced in species with large N e , due to the low rate of coalescence in ancestral populations. Furthermore, resolving species relationships in taxa undergoing rapid adaptive radiations has been found to be particularly challenging (e.g., Davies et al. 2004;Ericson et al. 2006;Alfaro et al. 2009). It has been suggested that in such cases the problem of widespread ILS can be alleviated by considering loci in the genome that exhibit reduced N e as compared with the genome-wide average (Corl and Ellegren 2013;Pease and Hahn 2013;Cruickshank and Hahn 2014). Genome-wide variation of N e at noncoding sites has been mostly attributed to the effects of selection reducing the diversity of neutral linked sites (Charlesworth et al. 1993;Gossmann et al. 2011;Charlesworth 2012). It follows that sites with low N e are expected to be found in regions with low recombination rates and/or high density of sites under selection. In such regions, the coalescence rate is increased, causing species to attain reciprocal monophyly faster as compared with regions with higher N e . Furthermore, the probability that any particular gene tree yields the same topology as the species tree is increased if N e was also reduced in the ancestral populations. Such patterns can be expected for background selection, whereby sites under purifying selection reduce the neutral variability at linked sites (Charlesworth et al. 1993).
Our results demonstrate that, contrary to previous suggestions (Corl and Ellegren 2013;Pease and Hahn 2013;Cruickshank and Hahn 2014), regions of reduced N e are not guaranteed to provide a more accurate inference of the species tree. This concerns especially regions that are selected based on extreme values of summary statistics (like nucleotide diversity or F ST ), as illustrated by the distribution of gene trees found in differentiation islands. Even though we observe increased species monophyly within differentiation islands, a large variety of tree topologies was recovered. Despite low levels of recombination in these regions, topologies still switch within short physical distances and any of the 15 possible tree topologies can be found. Importantly, restricting the analysis to differentiation islands led to an increase in the most frequently observed gene tree topology, which differed from the top-ranking topology obtained genome-wide.  Given that differentiation islands allegedly represent regions of reduced N e , and therefore reduced occurrence of ILS, such a pattern might itself give the impression that the true species tree topology can be recovered with a simple quantification of gene tree topologies from a small number of specifically selected loci. However, as we show here, only explicit modeling of the coalescent process under different species trees can provide a reliable answer regarding the order of speciation events in groups of closely related and genetically connected species.
All four coalescent-based species tree methods resulted in the same species tree topology, supporting pied and Atlas flycatcher as sister taxa. Interestingly, SNAPP, which employs a simple isolation model without gene flow, resulted in the same topology as the models allowing for migration among species. Therefore, in the present case, interspecific gene flow seems to have a negligible impact on the genome-wide distribution of genealogies. This observation is further supported by low estimates of migration rates from the two models with gene flow and in agreement with previous findings pointing to species divergence in isolated glacial refugia Backström et al. 2013;Nadachowska-Brzyska et al. 2013). In this scenario, gene flow among species was limited to sporadic episodes of secondary contact after the northwards expansion during interglacials. Furthermore, in the case of collared and pied flycatchers, hybrids have a reduced fitness as compared with the parental species (Svedin et al. 2008;Wiley et al. 2009), resulting in genome-wide selection against immigrant lineages. Under such an allopatric speciation scenario with both pre-and postzygotic reproductive barriers, low levels of interspecific gene flow can be expected when averaging over the whole genome.
In this study, we focused primarily on ILS and interspecific gene flow as the major processes behind gene tree discordance, which is warranted by the close evolutionary relationships within our study system (Maddison 1997;Degnan and Rosenberg 2009). However, it should be noted that on larger evolutionary time scales, processes related to gene duplication and loss might play a prominent role in shaping gene tree heterogeneity across the genome (Fitch 1970;Goodman et al. 1979). In such cases, approaches that model gene duplication and loss from alignments of gene families might be better suited to account for gene tree discordance in species tree reconstruction (e.g., PHYLDOG; Boussau et al. 2013).
Having obtained a consistent reconstruction of the most likely species tree over multiple modeling approaches, the distribution of gene trees revealed phylogenetic patterns in differentiation islands and on the Z chromosome that go beyond the expectations of a simple reduction in N e . Both subsets showed an increase in frequency of the collared-semicollared pairing, resulting in a most frequent gene tree topology different from the inferred species tree. In both cases, the relative increase of the collared-semicollared pairing was accompanied by a reduction in frequency of the collared-pied pairing. Gene trees with a collared-pied pairing are discordant with the inferred species tree, and the relative overrepresentation of this pairing in the whole-genome data set is a strong indicator for gene flow between these two flycatcher species (cf. Backström et al. 2013;Nadachowska-Brzyska et al. 2013). The fact that such signals were diminished  Table 3. The numbers in boxes show the posterior model probabilities from the ABC analysis. on the Z chromosome hints at reduced sex-linked introgression. The Z chromosome in flycatchers has been found to harbor genes involved in pre-and postzygotic reproductive isolation (Saether et al. 2007). Selection against immigrant alleles might therefore be particularly pronounced on the Z chromosome, resulting in a reduced effective gene flow rate (Saetre et al. 2003).
At a first glance, Z-chromosomal loci may seem particularly suitable for species tree reconstruction due to faster lineage sorting and a reduced tendency of introgression compared with autosomal loci. However, several pitfalls are connected to this approach. First, hybrid sterility of the heterogametic sex (females in birds, including flycatchers; Haldane 1922;Svedin et al. 2008) increases the effective rate of gene flow on the Z chromosome relative to autosomes, because the loss of transmission due to sterile female hybrids is relatively less for Z chromosomes than autosomes. Such a pattern might explain the increased occurrence of collaredsemicollared pairings in Z-chromosomal gene trees to the point that the symmetric ((P,A),(C,S)) topology becomes the most frequent. Alternatively, this pattern might be explained by positive selection of the same alleles in collared and semicollared flycatchers. Due to the relatively recent split times and large ancestral population sizes, many positively selected alleles in species-specific sweeps are likely to originate from standing genetic variation in the common ancestor. The nonrandom sorting of such ancestral variation according to similarities in ecology or life history could therefore explain localized patterns of reduced genetic divergence conflicting with the species tree. This particularly concerns regions showing signatures of selection, such as reduced nucleotide diversity found in differentiation islands in flycatchers. A similar pattern was found in African cichlid fish (Brawand et al. 2014), where regions of elevated differentiation between rapidly radiating species contained phylogenetic signals strongly discordant with the taxonomic classification.
Our study clearly demonstrates the power of coalescent-based modeling to resolve species trees even in closely related and hybridizing species, provided that data from a sufficient number of independent and neutrally evolving loci is available. By re-subsampling our whole-genome data set, we showed that a few hundred independent sequence loci are sufficient to confidently infer the species tree in species with low rates of interspecific gene flow. Although employing loci with reduced recombination rate and N e can in theory alleviate the problem of ILS, such loci are potentially under the influence of selection, which might lead to strong biases in the reconstruction of species relationships. Particular caution is warranted when applying concatenation (supermatrix) approaches to such data, because in our case concatenation was more prone to yield erroneous species relationships in low-recombination regions than coalescent-based methods. Thus, for reliable species tree inference, we recommend that data on the order of a few hundred randomly selected and genome-wide distributed loci are obtained. An increasing range of coalescent-based modeling methods can easily deal with such a number of loci sequenced in multiple individuals per species.
Despite recent advances, a major bottleneck to the widespread application of coalescent-based models for species tree reconstruction remains, as computational constraints limit current implementations of complex models to a small number of species. Furthermore, even models that take both ILS and interspecific gene flow into account are simple representations of the demographic history of natural populations. For instance, speciation might involve complex demographic processes, such as series of bottlenecks and subsequent population expansions associated with founder events (Mayr 1942), a slow decay of genetic exchange in speciationwith-gene-flow scenarios (Feder et al. 2012), or genetic exchange upon secondary contact after a period of geographic isolation. Deviations from the relatively simple models assumed in most modelbased approaches might negatively influence species tree inference. Additionally, these factors are not just nuisance parameters but themselves provide crucial information about the evolutionary forces involved in the speciation process. In the future, complex yet computationally efficient species tree models in combination with extensive population genomic data sets might allow us to obtain a much more refined picture of the evolutionary history in a wide range of species.

SUPPLEMENTARY MATERIAL
Supplementary material, including data files and online-only appendices can be found in the Dryad Data Repository: http://dx.doi.org/10.5061/dryad.b6gj8.
Raw read data and alignment files have been deposited in the Sequence Read