Paracoccidioides Genomes Reflect High Levels of Species Divergence and Little Interspecific Gene Flow

ABSTRACT The fungus Paracoccidioides is a prevalent human pathogen endemic to South America. The genus is composed of five species. In this report, we use 37 whole-genome sequences to study the allocation of genetic variation in Paracoccidioides. We tested three genome-wide predictions of advanced speciation, namely, that all species should be reciprocally monophyletic, that species pairs should be highly differentiated along the whole genome, and that there should be low rates of interspecific gene exchange. We find support for these three hypotheses. Species pairs with older divergences show no evidence of gene exchange, while more recently diverged species pairs show evidence of modest rates of introgression. Our results indicate that as divergence progresses, species boundaries become less porous among Paracoccidioides species. Our results suggest that species in Paracoccidioides are at different stages along the divergence continuum.

Species from the genus Paracoccidioides show a range of divergence that make the group promising to understand how diversification occurs in pathogenic fungi. One of the species of Paracoccidioides, Paracoccidioides lutzii, seems to have diverged from the rest of the species at least 30 million years ago (12). The species show extensive differences in terms of morphology and physiology. Five more species, all within the brasiliensis complex, form a monophyletic group. Paracoccidioides restrepiensis and Paracoccidioides venezuelensis are the most closely related dyad with recent divergence (less than 0.2 million years ago [MYA]) (10). Paracoccidioides brasiliensis sensu stricto is sister to the P. restrepiensis/P. venezuelensis dyad, while Paracoccidioides americana is sister to the ingroup. Paracoccidioides brasiliensis sensu stricto has also been proposed to be formed by two cryptic species, S1a and S1b (13), but no formal test of this divergence has been performed. A combination of yeast and conidial morphology differentiates between all Paracoccidioides species pairs (10,11,14).
Whole-genome sequences can be used to identify species boundaries in fungi. Three tests jointly indicate the existence of species boundaries (15). First, genome variation must reflect genetic differentiation in cases where speciation has taken place. In cases of advanced divergence, genomes of putative species should show reciprocal monophyly; this can be measured as the proportion of loci that show a phylogenetic history concordant with the hypothesized species history. Second, genetic variation should be partitioned across putative species; the extent of genetic differentiation between individuals from different putative species should be larger than the differentiation between individuals within each of the putative species. Finally, the genomes of putative species should show low or moderate levels of gene exchange. So far, all studies on species boundaries in the genus Paracoccidioides have focused on detecting genealogical congruence among a modest number of gene genealogies (8,(10)(11)(12). Incorporating genomic information is sorely needed to understand the magnitude of differentiation among Paracoccidioides pathogens. In this report, we use phylogenetics and population genetics to bridge that gap. We find that the species of Paracoccidioides are extensively differentiated, which suggests an advanced stage along the divergence continuum.

RESULTS
Paracoccidioides is haploid and shows little evidence for aneuploidy. We tested two aspects regarding the ploidy of Paracoccidioides. First, we assessed whether the coverage in the genome was homogenous. Significant deviations from homogeneity indicate aneuploidy. Second, we tested the global and local ploidy by scoring the mean coverage and exploring for the presence of polymorphic sites. Haploid genomes, unlike higher ploidies, show no variation in each locus and should show no minor-or major-alleles. Figure 1 shows the results of this analyses. First, the global distribution of coverage is distributed evenly around 1, with a few sites getting no coverage with respect to the reference genome (Fig. 1A). Permutation tests show that this distribution is not significantly different from 1 (i.e., the expectation of homogenous ploidy along the genome; two-sample Fisher-Pitman permutation test, P = 0.08). When we ran similar analyses to study the mean coverage and minor allele frequency locally (i.e., in 5-kb windows) along the Paracoccidioides genome, we found that the expectations of haploidy were fulfilled as well (i.e., there no regions with abnormally high coverage or that were systematically polymorphic; Fig. 1B to E). Figures S2 to S5 in the supplemental material show similar analyses for all species of Paracoccidioides. Overall, these results confirm previous observations that Paracoccidioides is haploid (16); for all analyses from now on, we treat Paracoccidioides as such.
All proposed species show reciprocal monophyly and strong genealogical concordance. We evaluated the phylogenetic relationships between all species of the Paracoccidioides genus using genome-wide variation and established how much of the genome supports these relationships. First, we built a maximum likelihood (ML) tree in which we used concatenated loci from the whole genome as the unit for phylogenetic analysis (this approach has important limitations [17]; see below). Figure 2A shows the show close adherence to the genome-wide mean coverage for the whole genome of the P. lutzii isolate EE. Per-site estimations of the extent of polymorphism across the genome suggest that no single site shows evidence of polymorphism (cutoff: minor allele frequency . 20%), suggesting that besides having no local aneuploidy, Paracoccidioides is haploid. The four panels on the right (B to E) show the coverage and minor allele frequency for the four largest supercontigs. Figure S2 to S5 in the supplemental material show similar plots for other Paracoccidioides species. resulting topology. Individual analyses of the 6 largest supercontigs yields similar topologies ( Fig. 2B to G and Fig. S6 in the supplemental material), with some minor exceptions (see Fig. S6 and S7 in the supplemental material). Two results stand out from these analyses. As predicted by smaller efforts (10), all five named species of Paracoccidioides are reciprocally monophyletic. The tree shows that, on average, all scaffolds show an evolutionary history consistent with the clusters previously described as species, including S1a and S1b. Four out of the six supercontigs show an identical branching pattern to that of the whole-genome genealogy (Robinson-Foulds [RF] distance = 0.00). The topologies from supercontigs 2.2, 2.3, and 2.5 are slightly different from the genome-wide topology (RF distances = 24, 23, and 3.00, respectively; Fig. S7). The maximum RF distance in these trees is 70, following Bryant and Steel (18). Similarly, genome-wide phylogenetic reconstruction shows similar results to previous approaches and suggest that P. restrepiensis and P. venezuelensis are the most closely related species of the group. Paracoccidioides brasiliensis sensu stricto is sister to the dyad P. venezuelensis/P. restrepiensis, and P. americana is an outgroup to the other three species of the brasiliensis species complex.
Additionally, we calculated the concordance factors for the each of the five Paracoccidioides species and the putative cryptic species S1a and S1b. If speciation has proceeded to extensive genetic differentiation, then most of the genome should show the signature of reciprocal monophyly in each of the proposed species. Figure 3 shows the results of a gene genealogy concordance analysis using BUCKy. The obtained topology is identical to that produced using maximum likelihood. The concordance factors for the five proposed species is in all cases greater than 90%, which suggests that the vast majority of the genome shows concordance in the existence of the five species of Paracoccidioides. The only exceptions to this high level of concordance are the nodes that separate S1a and S1b, which had concordance factor (CF) values of 0.53 and 0.61, respectively. There is no metric on how high a CF value must be to elevate a group to species level (19,20), but the genome-wide concordance of this groups is much lower than that of the other Paracoccidioides species. For all analyses that follow, we treat P. brasiliensis sensu stricto as a single species with strong population structure.
Next, we calculated the mean genetic distance between individuals of the same species and between pairs of individuals from different species. The expectation is that pairwise comparisons between individuals from different species should show much higher differentiation than individuals from the same species (15,21). We found that genetic variability is partitioned among species and that in all cases the magnitude of interspecific distances is at least 2Â higher than that of intraspecific distances for all species pairs in genome-wide estimations ( Fig. 4 and Fig. S8 in the supplemental material). All pairwise comparisons between intraspecific and interspecific differentiation were significant (two-sample Fisher-Pitman permutation test, P , 1 Â 10 210 ). As expected from the concordance analysis, the differentiation-measured as D XYoccurs along the whole genome (leftmost panels of Fig. 5 and Fig. S9 in the supplemental material).
The joint results from the phylogenetic analyses and the genetic distance calculations indicate that the genetic diversity within Paracoccidioides is partitioned across species. The five proposed species of Paracoccidioides fulfill the expectations of being advanced in the speciation continuum in terms of genomic divergence (22,23). Next, we tested whether such divergence is accompanied by a reduction in the amount of gene flow between species.
Low rates of detectable gene exchange between species. If speciation has proceeded to advanced stages, as seems to be the case for the species in Paracoccidioides, FIG 2 Legend (Continued) and have a "r" after the strain name. Paracoccidioides brasiliensis isolates are marked in orange and have a "1a" or a "1b" after the strain name. the magnitude of gene exchange between species should be limited. To detect potential alleles that have crosses species boundaries, we used two different methods, D-statistics and Int-HMM, a method that detects haplotypes likely to have crossed species boundaries. We describe the results for each species pair as follows.
(i) Paracoccidioides lutzii and the species from the brasiliensis complex. Using Int-HMM, we found no evidence of introgression in any of these species pairs or in any reciprocal direction. If hybridization and admixture has occurred between P. lutzii and the other Paracoccidioides species, it has left no trace in the genomes of any of the involved species.
(ii) Paracoccidioides americana and the rest of the species from the brasiliensis complex. D and f D , two metrics to detect introgression from phylogenetic trees (see Materials and Methods for details), suggest that P. americana has donated more genetic material to P. brasiliensis and P. venezuelensis than it has donated to P. restrepiensis. In all cases the proportion of introgression is small (i.e., f D is lower than 0.04; Table 1). We followed up with Ancestry-HMM and found no evidence of large haplotypes (over 500 bp) between P. americana and the other species from the brasiliensis complex. This result suggest that introgressions are small and are probably old or strongly selected against. Regardless of the explanation for this low proportion of admixture, the joint results indicate that the magnitude of gene exchange between P. americana and the other species of the brasiliensis species complex is low.
(iii) Paracoccidioides brasiliensis/P. restrepiensis and P. brasiliensis/P. venezuelensis. Unlike pairwise comparisons involving the more divergent species pairs, we found evidence of limited gene exchange using both methods in these two species pairs. The only tetrad that fulfilled the requirements for the calculation of D was [(((venezuelensis, restrepiensis), brasiliensis), lutzii)]. The evidence of gene flow in this case was strong and showed that introgression between P. brasiliensis and P. venezuelensis is more common  Genomic Divergence of Paracoccidioides spp. ® than between P. brasiliensis and P. restrepiensis (D = 0.331, P , 0.001; degrees of freedom [df] = 0.078). This amount of introgression is higher than that observed between P. americana and other species in the brasiliensis species group.
Next, we used Int-HMM to study the characteristics of the introgressed haplotypes. Table 2 shows the percentage of genome that has crossed species boundaries in each sequenced genome. We found no overlap of the introgression regions between reciprocal directions or species pairs. In P. brasiliensis/P. restrepiensis, introgression mean length did not differ between reciprocal directions (Welch two-sample t test data; t = 20.857, df = 4.135, P = 0.438) and in both cases was ;15 kb, suggesting similar times of admixture or similar selection against introgression (Fig. 6A). We found a similar pattern in P. brasiliensis/P. venezuelensis. The haplotype length did not differ between reciprocal directions (Welch two-sample t tests; P. brasiliensis-restrepiensis: t = 0.449, df = 3.456, P = 0.680; P. brasiliensis-P. venezuelensis: t = 0.536, df = 50.795, P = 0.594) and in both cases was ;15 kb ( Fig. 6B). In all cases, introgressions occur at low frequency (i.e., present in a single isolate per species) and are mostly located in intergenic regions (Table 3). Figure 7A to D shows the location of introgressed haplotypes in the two directions of the cross. Introgressions were distributed along most supercontigs, suggesting that most supercontigs are equally permissive-or refractory-to introgression (Fig. 7).
(iv) P. venezuelensis/P. restrepiensis. Finally, we studied the most recently diverged species pair in Paracoccidioides using Int-HMM. We found no overlap in the location of haplotypes between the two reciprocal directions or with any of the other dyads of Paracoccidioides. There was no difference in the genome proportion introgressed per individual between reciprocal directions (Welch two-sample t test; t = 0.11, df = 9.95, P = 0.91). The mean haplotype length did not differ between reciprocal directions (Welch two-sample t test; t = 0.741, df = 85.12, P = 0.461) and in both cases was ;16 kb ( Fig. 6C). As in the case for the other Paracoccidioides dyads, introgressions were at a low frequency and were largely in intergenic regions (Table 3). Notably, we found similar amounts of introgression in this pair as we found in the more divergent pairs (Welch two-sample t test; P. venezuelensis/P. restrepiensis versus P. venezuelensis/P. brasiliensis sensu stricto, t = 1.205, df = 31.962, P = 0.237; P. venezuelensis/P. restrepiensis versus P. restrepiensis/P. brasiliensis sensu stricto, t = 21.372, df = 21.731, P = 0.184). Figure 6C shows the haplotype size frequency distribution of introgressions in the two reciprocal directions. Introgressions were distributed along the whole genome and did not follow a particular clustering pattern ( Fig. 7E and F).

DISCUSSION
Our study uses genomic data to confirm previous observations that five species of Paracoccidioides (i) are all haploid, and (ii) are genetically differentiated. We also present results that suggest that the genomes of these species show strong levels of genealogical concordance genome wide and rarely exchange genes. The species of Paracoccidioides show considerable divergence and reciprocal monophyly which in turn suggest these five species are at an advanced stage on the speciation continuum (22)(23)(24). a A positive D value means more introgression between P3 (P. americana) and P2 (the second species listed) than between P3 and P1 (the first species in the list); a negative D value means introgression between P3 and P1. b In all cases, P. lutzii is the outgroup.
Our analyses of the magnitude of gene flow between species confirm that despite extensive geographic overlap, the Paracoccidioides species rarely exchange genes. In the more divergent pairs, those of the brasiliensis complex and P. lutzii, we found no evidence of introgression. This is consistent with the high levels of divergence between P. lutzii and the species from the brasiliensis complex, which have been hypothesized to be over 30 million years apart (10,12). We observed a similar-but not identicalpattern between P. americana and the other species of the brasiliensis complex (P. brasiliensis, P. restrepiensis, and P. venezuelensis). Potential introgressions between these species are rare and of very small size, making them potentially indistinguishable from incomplete lineage sorting, as there was no evidence of gene exchange in any of the pairs. This paucity of gene exchange is not caused by lack of contact. Paracoccidioides brasiliensis, P. lutzii, and P. americana coexist in Brazil and have even been found in the same host (8,25). Paracoccidioides venezuelensis and P. americana share their geographic range in Venezuela as well. This extensive geographic overlap suggests that there is ample opportunity for gene exchange, but it does not occur.
We do find evidence of moderate gene exchange in the triad P. brasiliensis-P. restrepiensis-P. venezuelensis. These low levels of gene exchange are consistent with advanced divergence among Paracoccidioides species. Our scans for gene exchange pose two additional questions. First, the rate of gene exchange is symmetrical in two Paracoccidioides species pairs, P. brasiliensis/P. venezuelensis and P. venezuelensis/P restrepiensis. The third species pair, P. brasiliensis sensu stricto/P. restrepiensis, shows strongly asymmetric introgression that is mostly found in intergenic regions. The reasons behind this pattern remain unknown. We formulate two possibilities. First, the direction of migration might be asymmetric between these two species. If P. brasiliensis migrants come into the range of P. restrepiensis more often than the reciprocal type of migration, then P. brasiliensis alleles should be found more frequently in the P.  restrepiensis background than the reciprocal. A second possibility is that the P. restrepiensis background is less permissive of introgression because the introgressed alleles might have more deleterious effects. Since P. restrepiensis has a much smaller effective population size (8), variants that might ameliorate the potentially deleterious effect of introgressed alleles should be rarer. On the other hand, small populations might harbor fewer deleterious mutations (26). The rates of migration, hybridization-or even intraspecific recombination-and of potential hybrid incompatibilities are unknown in Paracoccidioides, and we cannot disentangle these possibilities. A second intriguing pattern is that P. venezuelensis and P. restrepiensis show a similar level of introgression to those of the other species pairs. As divergence increases, so should the number of incompatibilities (27)(28)(29), which in turn should reduce the proportion of genome that can flow from one species to the other (30,31). Since the P. venezuelensis/P. restrepiensis dyad is more closely related than other species pairs within the brasiliensis complex, we expected a higher level of gene exchange. Our results do not support this hypothesis. Even though the precise reasons for this pattern remain unexplored, the role of geography might be of particular importance. The ranges of P. venezuelensis and P. restrepiensis are contiguous but have not been reported to overlap. This differs from all other species pairs in Paracoccidioides, which show some degree of geographic overlap (25). A precise assessment of the range and opportunity for hybridization will be crucial to establish the genetic, environmental, and demographic factors that govern the patterns of introgression in Paracoccidioides.
The identification of species boundaries and introgression in fungal pathogens has human health-related implications. Paracoccidioides lutzii and P. brasiliensis sensu stricto show differences in the immunological response they elicit (14,32), the strength of the disease they cause (14), and in traits involved in diagnostic tools (33)(34)(35). Introgression, then, can be a vehicle to transfer virulence factors and antifungal resistance in Paracoccidioides. Gene exchange can also be a source of variation in other fungal pathogens (36)(37)(38). A systematic survey to characterize the virulence and resistance of differentiated species across their whole geographic range could reveal the extent to which diversification of the ethological agents of PCM has also led to divergence in virulence strategies. The combination of phenotypic studies and population genetics can also reveal whether gene exchange plays a role on the transfer of virulence factors and antifungal resistance strategies.
Our results are in line with those of other studies that show that species boundaries in fungi are semipermeable (15,39,40) and that introgression might not be rare. On  the other hand, they also reveal that introgression is not an unavoidable outcome of secondary contact. Geographic overlap is not synonymous with hybridization, and in cases of diverged species (such as P. lutzii and the species of the brasiliensis species complex), hybrids might not occur even when species share a close geographic range. Hybrids might also be sterile or inviable (41). Genome factors such as the amount of divergence between hybridizing species (30) and the landscape of recombination (42,43) affect whether an introgression persists after hybridization. The different levels of divergence and the ample opportunity for hybridization among Paracoccidioides species provide for a system to test the relative importance of genomic factors in determining the amount of introgression occurs in nature.

MATERIALS AND METHODS
Public data. All of the data used here have been previously published. The SRA numbers are listed in Table S1 in the supplemental material. To root our trees (see below), we obtained sequencing reads from two species of Histoplasma: Histoplasma capsulatum sensu stricto and Histoplasma mississippiense (SRA BioProject accession number PRJNA416769) (44). These species are among some of the closest relatives of Paracoccidioides (45).
Read mapping and variant calling. Reads were mapped to the Paracoccidioides brasiliensis strain Pb18 genome (BioProject accession number PRJNA28733 and BioSample accession number SAMN02953720), currently assembled into 57 supercontigs, using Burrows-Wheeler Aligner (BWA) version 0.7.12 (46). BAM files were then merged using SAMtools version 0.1.19. Indels were identified and reads locally remapped in the merged BAM files using the GATK version 3.2-2 RealignerTargetCreator and IndelRealigner functions (47,48). Subsequently, single-nucleotide polymorphisms (SNPs) were called using the GATK UnifiedGenotyper function with the parameter "het" set to 0.01 and all others left as default. The following filters were applied to the resulting VCF file: QD = 2.0, FS_filter = 60.0, MQ_filter = 30.0, MQ_Rank_Sum_filter = 212.5, and Read_Pos_Rank_Sum_filter = 28.0. Sites were excluded if the coverage was less than 5 or greater than the 99th quantile of the genomic coverage distribution for the given line or if the SNP failed to pass one of the GATK filters.
Ploidy estimation. To detect admixture between species of Paracoccidioides, we used Int-HMM (49), an algorithm to detect introgression that requires information on the ploidy of an individual (i.e., it can be run to detect introgression in diploid or haploid organisms; see below). We used genome-wide data to determine the most likely ploidy of the Paracoccidioides isolates. We used Illumina short reads from the five species of Paracoccidioides (described above) to do two ploidy tests. First, we plotted the persite sequencing coverage. In cases in which there is partial aneuploidy in the form of chromosomal duplications, there will be a bimodal distribution. In cases where the genome does not harbor aneuploid regions, there will be a single mode in the distribution. To compare the observed distribution of per-site coverage with the null hypothesis of uniform sequencing coverage, we used a two-sample Fisher-Pitman permutation test (function oneway_test, library coin [50]). We used the "hist" function (library graphics [51]) in R to plot the distribution of the per site coverage and of allele frequencies per site across the whole genome for each strain.
Next, we studied the ploidy of Paracoccidioides at a local level. We used the same two metrics described above. Sites with the ploidy of the rest of the genome should show a mean per-window coverage. Once-duplicated segments (either as copy number variation or as changes in ploidy) should have twice the coverage of the genome average. We thus calculated the coverage and mean minor allele frequency for each 5-kb window in the genome to assess whether there were segments of the genome with evidence for changes in ploidy.
Phylogenetic reconstructions. Our goal was to determine whether the species from Paracoccidioides were reciprocally monophyletic and thus satisfy the requirements to be considered phylogenetic species. We followed a phylogenetic species concept (15,52,53) to recognize species, defining species as genetic clusters that are reciprocally monophyletic and for which there was genealogical concordance across genome-wide unlinked loci. We used two parallel approaches: (i) maximum likelihood trees at the genome and at the supercontig level, and (ii) Bayesian concordance analysis of the genealogies of orthologous genes. We describe each of these two approaches as follows.
(i) Maximum likelihood phylogenetics. To determine whether the proposed Paracoccidioides species were monophyletic, we first used maximum likelihood phylogenetics. Reciprocal monophyly is a trademark of speciation (15,22); as divergence accrues, the likelihood of reciprocal monophyly across the whole genome increases for two reasons. First, as divergence increases, the likelihood of incomplete lineage sorting decreases (54,55). Likewise, in diverged lineages, the magnitude of retained introgression is lower even in cases where hybridization might occur frequently (30,31,49). Since recombination in Paracoccidioides occurs but seems to be rare, and there is a high level of linkage disequilibrium across the genome (see Results), we studied each supercontig as an unlinked locus. Since mitochondrial DNA (mtDNA) shows evidence of interspecific gene transfer (10,56), we focused only on nuclear genomes. We obtained whole supercontig sequences for each individual from the VCF file using the FastaAlternateReferenceMaker tool in GATK, realigned them using Mafft version 7 (57), and used them to build maximum likelihood (ML) trees using RAxML version 8.2.9 (58). We inferred individual trees for each of the largest six supercontigs, which encompass 62% of the genome. We also generated a genome-wide tree of a concatenated alignment of all supercontigs. Analyses were run under the GTR 1 C Genomic Divergence of Paracoccidioides spp. ® November/December 2020 Volume 11 Issue 6 e01999-20 mbio.asm.org 13 model, with 1,000 bootstrap pseudoreplicates to assess support for each node. The genome-wide analysis was partitioned by supercontig, with each partition having its own set of GTR 1 C model parameters. All trees were rooted using Histoplasma mississippiense and H. capsulatum sensu stricto (44). To determine the extent of congruence among supercontig trees and the whole-genome tree, we used the Robinson-Foulds (RF) (or symmetric difference [59]) distance (function treedist, library phangorn [60]), which counts how many partitions are in one tree but not the other. Even though this is a simplistic approach that does not take into account branch length information (18,61,62), it reveals whether there are large-scale levels of incongruency in the topology. We also plotted the concatenated tree topology and superimposed the trees from each supercontig using the R function compare.chronograms (library phytools [63]) and the nodes that are present in the whole-genome sequence tree but not in the supercontig genealogies using the function comparePhylo (library ape [64]).
(ii) Bayesian concordance analyses. In cases where speciation has occurred and genetic divergence has accrued, the phylogenetic signal across the whole genome should be congruent across loci. We measured the genome-wide genealogical concordance at a finer scale using a Bayesian concordance analysis (BCA). First, we identified orthologous genes using the BUSCO annotation pipeline (65,66), which encompasses 1,316 benchmarking universal single-copy orthologs. These groups of genes have been curated in 25 different species of ascomycetous fungi (65,66). Next, we used MrBayes version 3.2.6 (67, 68) to generate posterior tree distributions for each single gene. We summarized the gene trees using the command mbsum in the BUCKy program (69-71) with a burn-in of 1,000 trees. We then fed individual gene genealogies to BUCKy version 1.4.2, with four independent runs and four Markov chain Monte Carlo (MCMC) chains, each with 10 million generations with a burn-in period of 100,000. Five values of the a parameter (0.1, 0.5, 1, 5, and 10) were tested, which correspond to the prior probability distribution for the number of distinct gene trees (70). The level of support for each node is expressed as a concordance factor (CF), which ranges between 0 (no concordance between genealogies) and 1 (complete concordance). This approach allowed us to infer the phylogenetic relationships between putative species, while estimating the genome-wide genealogical support for their monophyly.
Genetic distance. To further assess the extent of genetic differentiation between phylogenetic species, we used the metric p INTER , the average number of nucleotide differences between one sequence randomly chosen from a population and another sequence randomly chosen from a second population or species. D XY , or p INTER , followed the form: pairwise differences between individuals i and j sequence length Mean p INTER was the mean of all pairwise comparisons between individuals from two species. We calculated 20 mean p INTER values. We also calculated p INTRA , the average pairwise genetic distance between individuals of the same species. p INTRA followed the same form as p INTER , but instead of calculating the average number of differences between species, it calculates the average number of differences between two randomly selected individuals of the same species. Mean p INTRA was the mean withinspecies value for each of the five species. We used Python for all calculations.
In cases were speciation is complete, p INTER is expected to be much larger than p INTRA . For each species pair, p INTRA can take two values (i.e., from each of the two species), so for the pairwise comparisons p INTRA is the pooled set of the two intraspecific distances. To compare the values of p INTER and p INTRA for each species pair (10 pairwise comparisons), we used two-sample Fisher-Pitman permutation tests as implemented by the function oneway_test in the coin library in R (9,999 Monte Carlo resamplings) (50). We also calculated p INTER for each of the 10 pairwise comparisons and p INTRA for the five Paracoccidioides species for each of the largest 6 supercontigs.
Gene exchange between species of Paracoccidioides. Previous work based on coding and microsatellite data suggested the possibility of gene exchange between Paracoccidioides species (10). However, although microsatellite makers have the potential to reveal genealogical relationships between very closely related individuals, they are also prone to homoplasy, as they mutate quickly and their identity might not be caused by descent (72)(73)(74). To address the possibility of gene exchange with better resolution, we used whole-genome data and two complementary methods, D-statistics and Int-HMM.
(i) D-statistics. First, we calculated the excess of variants shared between potentially admixed species using D-statistics (75)(76)(77)(78). D is a metric to detect introgression from phylogenetic trees. The metric requires a four-taxon topology [(((P1, P2), P3), O)]. The allele in the outgroup (O) is labeled A, while the derived allele in the ingroup is labeled B. D compares the occurrence of two discordant site patterns, ABBA and BABA, representing sites in which an allele is derived in P3 relative to O and is derived in one but not both of the sister lineages P1 and P2. These discordant patterns are most likely to arise if introgression occurs between P3 and either P2 or P1, in which case one site pattern will occur more frequently than the other. A positive D value means introgression between P3 and P2; a negative D value means introgression between P3 and P1. Due to the need for a sorted topology, we focused on four spe- For each species pair, we measured the standard deviation of D from 1,000 bootstrap replicates. The observed genome-wide D was converted to a Z-score measuring the number of standard deviations it deviates from 0, and significance was assessed from a P value using an a of 0.01 as a cutoff after Holm-Bonferroni correction for multiple testing. We also calculated a variation of D, f D , which estimates the proportion of admixture by dividing the observed difference between the ABBA and BABA counts to the expected difference when the entire genome is introgressed. Besides the genome-wide average of D and f D , we calculated both metrics for 5-kb windows along the genome. We used DSuite for all calculations (79), and used the allele frequencies within each species, as recommended in reference 77.
(ii) Identification of introgressed haplotypes with Ancestry-HMM. We used a hidden Markov model (HMM) able to detect introgression in diploids and haploids (i.e., Int-HMM [36,37,49]) to identify introgressed regions between all pairs of Paracoccidioides species. The HMM identifies introgressions between a pair of diverged populations or species, a donor and a recipient (i.e., the admixed individual), by inferring the ancestry of every SNP in the genome. It then identifies a consecutive group of SNPs from the donor in the recipient background. Donor SNPs were selected such that they were monomorphic in the donor species and the allele frequency differences between the two species was greater than or equal to 30%. We also required that every individual in the donor species and at least one individual in the recipient species had a called genotype. Transition and emission probabilities of the HMM have been described elsewhere (36,37).
(iii) Identifying introgression tracts. Int-HMM determined the most probable genotype for each marker in each individual. We defined tracts as contiguous markers with the same genotype (species 1 or species 2). Introgressed SNPs are defined as those within a tract where the HMM probability for an introgression state (d) (i.e., originating from the donor) was $50%. In cases where we identified a region of d with at least 10 introgressed SNP markers flanked on one side by a small tract (under 10 SNPs) from the recipient that in turn was flanked by a single larger tract that was completely d, the two introgressed regions were merged and consolidated into a single tract. We did four consecutive rounds of filtering to allow identification of larger introgressed tracts that were broken up by small sections of the recipient species. These broken regions might be caused by gene conversion, double recombination events, or sequencing error (Fig. S1 in the supplemental material shows an example).
Enrichment by sequence type. In cases where introgression is deleterious, selection will operate most efficiently against regions encoding functional elements (e.g., coding sequences and promoters [49,80,81]). To test if a particular type of sequence was more or less prone to appearing in introgressed regions, we partitioned the genome by sequence type into one of the following seven categories using the P. brasiliensis Pb18 genome annotations (BioProject accession number PRJNA28733): CDS (coding sequence), 59 untranslated region [UTR], 39 UTR, intron, 2-kb upstream inter (intergenic sequence 2 kb upstream of a gene), 10-kb inter (intergenic sequence within 10 kb of a gene excluding 2 kb upstream of a gene), and intergenic (intergenic sequence more than 10 kb from a gene [82]). Introgressions present in more than one individual but with different endpoints among isolates were broken into blocks, and these blocks were treated separately in the permutation test (described immediately below).
We calculated a summary statistic for each of the seven categories using the following definitions: "introgressed percentage" is the percentage of introgressions overlapping a given sequence type that occurred in any of the four possible introgression directions (two different species pairs and two reciprocal directions), "genomic percentage" is the proportion of the genome of any given type of sequence, and "Enrichment" is the ratio of the percentage of a given sequence type that has crossed species boundaries and the percentage of the genome encompassed by the same sequence type. We used a permutation test in which each introgression block was randomly assigned to a new position in the genome to calculate P values. For each permutation assay, we calculated the percentage of the randomly reassigned blocks overlapped with type of sequence. We repeated this procedure 10,000 times and generated a null-distribution for enrichment. If introgressions are more likely to occur within a certain type of sequence than in the rest of the genome, enrichment will be greater than 1. Conversely, if introgressions are less likely to occur in a given type of sequence, enrichment will be less than 1. We determined whether introgressions were significantly enriched for any sequence type (i.e., a significant departure from 1), by comparing the observed enrichment and the distribution of resampled enrichments.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.

ACKNOWLEDGMENTS
We thank the members of the Matute lab for helpful scientific discussions and comments. The work was extensively improved by the comments of two anonymous reviewers.