Spatially variable coevolution between a haemosporidian parasite and the MHC of a widely distributed passerine

The environment shapes host–parasite interactions, but how environmental variation affects the diversity and composition of parasite-defense genes of hosts is unresolved. In vertebrates, the highly variable major histocompatibility complex (MHC) gene family plays an essential role in the adaptive immune system by recognizing pathogen infection and initiating the cellular immune response. Investigating MHC-parasite associations across heterogeneous landscapes may elucidate the role of spatially fluctuating selection in the maintenance of high levels of genetic variation at the MHC. We studied patterns of association between an avian haemosporidian blood parasite and the MHC of rufous-collared sparrows (Zonotrichia capensis) that inhabit environments with widely varying haemosporidian infection prevalence in the Peruvian Andes. MHC diversity peaked in populations with high infection prevalence, although intra-individual MHC diversity was not associated with infection status. MHC nucleotide and protein sequences associated with infection absence tended to be rare, consistent with negative frequency-dependent selection. We found an MHC variant associated with a ∽26% decrease in infection probability at middle elevations (1501–3100 m) where prevalence was highest. Several other variants were associated with a significant increase in infection probability in low haemosporidian prevalence environments, which can be interpreted as susceptibility or quantitative resistance. Our study highlights important challenges in understanding MHC evolution in natural systems, but may point to a role of negative frequency-dependent selection and fluctuating spatial selection in the evolution of Z. capensisMHC.

Abstract The major histocompatibility complex (MHC) is a highly variable family of genes involved in parasite recognition and the initiation of adaptive immune system responses. Variation in MHC loci is maintained primarily through parasite-mediated selection or disassortative mate choice. To characterize MHC diversity of rufous-collared sparrows (Zonotrichia capensis), an abundant South American passerine, we examined allelic and nucleotide variation in MHC class I exon 3 using pyrosequencing. Exon 3 comprises a substantial portion of the peptide-binding region (PBR) of class I MHC and thus plays an important role in intracellular pathogen defense. We identified 98 putatively functional alleles that produce 56 unique protein sequences across at least 6 paralogous loci. Allelic diversity per individual and exonwide nucleotide diversity were relatively low; however, we found specific amino acid positions with high nucleotide diversity and signatures of positive selection (elevated d N /d S ) that may correspond to the PBR. Based on the variation in physicochemical properties of amino acids at these "positively selected sites," we identified ten functional MHC supertypes.
Spatial variation in nucleotide diversity and the number of MHC alleles, proteins, and supertypes per individual suggests that environmental heterogeneity may affect patterns of MHC diversity. Furthermore, populations with high MHC diversity have higher prevalence of avian malaria, consistent with parasite-mediated selection on MHC. Together, these results provide a framework for subsequent investigations of selective agents acting on MHC in Z. capensis.

Background
The major histocompatibility complex (MHC) has received considerable attention from the fields of ecology, conservation, and evolutionary biology in recent years because of its relevance to disease resistance and its high level of genetic variability within and between species (Spurgin and Richardson 2010). The MHC is an essential component of the vertebrate adaptive immune system that encodes intracellular (MHC class I or MHC-I) and extracellular (MHC class II or MHC-II) peptide-binding proteins. MHC peptide-binding regions (PBRs) bind a specific range of foreign pathogenderived peptides or self-derived peptides and present them to cytotoxic T cells to initiate the adaptive immune system responses (Matsumura et al. 1992). The binding specificity between MHC proteins and pathogen peptides provides an important mechanism of disease resistance in hosts (Hammer et al. 1995;Wucherpfennig et al. 1995). For example, MHC genotypes in many species are associated with resistance or susceptibility to diseases and parasites such as malaria (Hill 1998;Westerdahl et al. 2005;Bonneaud et al. 2006;Westerdahl et al. 2012), Rous sarcoma (LePage et al. 2000; Electronic supplementary material The online version of this article (doi:10.1007/s00251-014-0800-7) contains supplementary material, which is available to authorized users. Wallny et al. 2006), and Marek's disease (Cole 1968;Wakenell et al. 1996;Wang et al. 2014). Understanding population level MHC diversity and variation may therefore enhance our knowledge of disease ecology and host-parasite evolutionary dynamics.
MHC is among the most variable gene families in vertebrates and has contributed to our understanding of evolutionary mechanisms maintaining genetic variation in populations (Spurgin and Richardson 2010). For example, the human MHC (or human leukocyte antigen (HLA)) gene family comprises over 400 loci distributed across approximately 7.6 Mb of the human genome (Horton et al. 2004). Since estimates of HLA mutation rates appear relatively low for primate genes, disease-related selection pressures or mate choice patterns likely drive the levels of standing MHC variation (Klein et al. 1993;Edwards and Hedrick 1998). Several nonexclusive mechanisms of parasite-mediated selection have been proposed to explain high MHC diversity in vertebrates, including heterozygote advantage, negative frequency-dependent selection, and fluctuating selection (Spurgin and Richardson 2010). Under the heterozygote advantage hypothesis, individuals with a high diversity of MHC alleles are able to bind a wider range of pathogen-derived peptides and are less likely to harbor infections from a wide array of parasites (Hughes and Nei 1988;Penn et al. 2002;Wegner et al. 2003;Kloch et al. 2010). The negative frequency-dependent selection hypothesis suggests that rare MHC alleles are more likely to confer resistance to specific parasites than more common alleles (Takahata and Nei 1990). Finally, the fluctuating selection hypothesis proposes that MHC variation arises as a result of spatial or temporal heterogeneity in the strength or specificity of directional parasite-mediated selection on MHC (Hedrick 2002;Loiseau et al. 2010). MHC diversity may also be affected by other historical factors including population bottlenecks followed by genetic drift (Babik et al. 2009a;Sutton et al. 2011;Sutton et al. 2013), gene flow (Hansen et al. 2007;Nadachowska-Brzyska et al. 2012), and sexual selection (Bernatchez and Landry 2003). MHC genotypes are correlated with the quality of sexually selected traits in several species, suggesting that selection on MHC may follow the "good genes" hypothesis of sexual selection to enhance parasite resistance of offspring (von-Schantz et al. 1996;Ditchkoff et al. 2001;Dunn et al. 2013). Individuals may choose mates with high MHC variation or dissimilar MHC genotypes (disassortative mating) to maximize allelic diversity in offspring or to avoid inbreeding (Reusch et al. 2001).
Across families of birds, the composition and variability of MHC genes are highly diverse. A relatively simplistic MHC is believed to be the ancestral MHC state in birds. For example, chickens possess a "minimal essential MHC," named for its highly compact and minimalist gene composition, and are basal in avian phylogenies (Kaufman et al. 1999;Hackett et al. 2008). Chicken MHC consists of only 46 tightly linked genes across a 242-kb region on chromosome 16 (Kaufman et al. 1999;Shiina et al. 2007;Delany et al. 2009). Other nonpasserines, for example parrots (Hughes et al. 2008) and some birds of prey (Alcaide et al. 2007), also possess a simplistic, minimal essential MHC, while some basal bird species (e.g., Coturnix japonica, Apteryx owenii) possess a relatively more complex MHC that is composed of multiple functional paralogs and pseudogenes, complicating inferences of the ancestral MHC state in birds (Shiina et al. 2004;Miller et al. 2011). The simplistic nature of the minimal essential MHC and reduced recombination at these loci is expected to enhance long-term coevolution between genes and affect the evolution and functionality of coadapted gene complexes (Kaufman et al. 1999).
By contrast, passerines have a more complex MHC composed of multiple functional paralogs and pseudogenes (Westerdahl 2007;Sepil et al. 2012). The Zebra Finch MHC spans a much longer distance (∼739-kb region) across at least two chromosomes (Balakrishnan et al. 2010;Ekblom et al. 2011). Like other complex gene families, variation in MHC gene copy number is believed to arise through repeated gene duplication events with subsequent selection on or degradation of paralogous loci.
Characterizing genetic variation at the MHC is an important task for understanding disease resistance as well as elucidating evolutionary mechanisms responsible for maintaining genetic variation in populations. Accurate genotyping of MHC is a challenge (Westerdahl 2007;Sepil et al. 2012), and recently, next-generation 454 pyrosequencing has been applied to characterize the MHC in fish (Ellison et al. 2012;Lamaze et al. 2014), nonavian reptile (Stiebens et al. 2013), bird (Zagalska-Neubauer et al. 2010Spurgin et al. 2011;Sepil et al. 2012;Strandh et al. 2012;Dunn et al. 2013), and mammal species (Babik et al. 2009b;Wiseman et al. 2009;Kloch et al. 2010;Babik et al. 2012). Pyrosequencing of MHC amplicons provides deep coverage of alleles across multiple loci and is a powerful method for detecting rare variants in multilocus systems (Promerová et al. 2012). This method comes at the cost of not being able to assign alleles to specific loci since all MHC paralogs are sequenced in parallel. However, alleles across multiple loci can be functionally grouped into MHC "supertypes," which share physicochemical or peptide-binding characteristics (Doytchinova and Flower 2005;Ellison et al. 2012;Sepil et al. 2012). MHC supertypes may more accurately reflect the units of selection on MHC since different MHC alleles may not differ at functionally important sites. Thus, identification of MHC supertypes can facilitate inferences of selection on MHC diversity and its relationship to disease (Doytchinova and Flower 2005;Sepil et al. 2012).
We investigated variation at MHC-I exon 3 of Zonotrichia capensis (rufous-collared sparrow), a widely distributed (from 0 to roughly 5,000 m a.s.l.) and abundant Neotropical passerine using 454 pyrosequencing. MHC-I molecules are found on nearly all somatic cells of birds and bind peptides derived from intracellular pathogens, such as malaria and viruses (Bonneaud et al. 2006;Wallny et al. 2006;Westerdahl 2007). Avian MHC-I molecules are heterodimers composed of an α chain and β 2 -macroglobulin (Westerdahl et al. 1999). The majority of polymorphism at avian class I MHC resides in exon 2 and, in particular, exon 3 (α 1 and α 2 domains), which encode the PBR (Bjorkman et al. 1987;Alcaide et al. 2009;Alcaide et al. 2013). We focused our efforts on MHC-I exon 3 because it is generally more polymorphic than exon 2 and some evidence suggests that it plays a larger role in disease resistance (Sepil et al. 2012;Wang et al. 2014). Our goals were to (1) characterize allelic and nucleotide MHC variation within Z. capensis distributed along an elevational gradient on the western slope of the Peruvian Andes, (2) investigate signatures of selection on MHC, and (3) identify functionally related MHC alleles (MHC supertypes) based on amino acid variation at positively selected sites (PSSs).

Sample collection
We obtained pectoral muscle tissue samples from 184 Z. capensis specimens collected along three elevational transects (T1-T3) on the western slope of the Peruvian Andes from 2004 to 2007 (Fig. 1). Individuals were either shot or collected in mist nests and euthanized using methods approved by the Institutional Animal Care and Use Committee at Louisiana State University (IACUC protocol number LSU #06-124). Total genomic DNA was extracted from flashfrozen pectoral muscle using DNeasy tissue extraction kits (Qiagen, Valencia, CA).
Although little is known about local movement patterns among Z. capensis populations, on the western slope of the Peruvian Andes, Z. capensis populations are nonmigratory and abundant (Schulenberg et al. 2007;Cheviron and Brumfield 2009). At low elevations, Z. capensis is patchily distributed and restricted to desert oases and irrigated farmlands (Schulenberg et al. 2007;Cheviron and Brumfield 2009). Along elevational transects, mean elevation gain was ∼3,900 m over a mean linear distance of ∼161 km. We performed analyses on Z. capensis individuals classified by elevational zones across all transects (low elevation 0-1,500 m, middle elevation 1,501-3,100 m, high elevation 3,101-4,150 m) and by transect (T1, T2, and T3; Fig. 1), used as a proxy for latitude.
Reads were aligned and sorted by individual using unique MID tags after removing reads with ambiguous base pair assignments or incomplete primer or tag sequences using jMHC (Stuglik et al. 2011). Unique variants were then identified as well as the number of reads of each variant per individual (jMHC; Stuglik et al. 2011). We used a five-step variant validation procedure to attempt to exclude sequencing error artifacts and pseudo-alleles from the MHC dataset. We focused our study on variation at putatively functional MHC alleles because of their potential relevance in disease resistance. Here, we define "putatively functional" alleles as those with an open reading frame (Galan et al. 2010;Zagalska-Neubauer et al. 2010;Sepil et al. 2012). First, we removed all reads that were not 214 bp in length or that contained stop codons within the sequence. We focused on 214-bp reads because the majority of our reads (61.4 %) were this length and we wanted to remove pseudoalleles, which are often characterized by indels (Ophir and Graur 1997). However, we cannot be certain that we removed a fraction of functional alleles not 214 bp in length or that we amplified 214-bp nonfunctional alleles. Second, we removed individuals that had fewer than 200 total reads, which are unlikely to have been accurately genotyped (Galan et al. 2010;Sepil et al. 2012). The probability of accurately genotyping all alleles in an individual with more than 200 reads, a ploidy level of 12 (6 MHC-I loci, see "Results"), and a minimum number of reads per allele of 2 is very high (P>0.999; Galan et al. 2010). Hence, we have chosen a very conservative cutoff value for individual genotyping. Third, sequences with a maximum per allele frequency (MPAF) less than 0.01 were removed from the dataset (Zagalska-Neubauer et al. 2010;Sepil et al. 2012). These are alleles that comprised less than 1 % of the total number of reads within an individual and likely occur due to sequencing error. Fourth, we removed all sequences that occurred fewer than five times in the entire dataset and only once within an individual. Assuming a mean substitution error rate of 1.07 % (Gilles et al. 2011), the probability of observing the same sequencing error five times in our dataset is very low (P<10 −9 ; Galan et al. 2010). Finally, we removed variants that appeared in only a single individual in the dataset. These are variants that may have reached a high frequency in a single individual as a result of PCR duplication of single nucleotide misincorporation errors or chimeras (Sepil et al. 2012). With the final dataset, we tested for correlation between individual coverage and number of alleles using ordinary least squares linear regression in R (R Development Core Team 2013). To examine differences in the number of MHC alleles per individual by transect and elevational zone, we performed a twoway ANOVA with transect and elevational zone as fixed factors in R (R Development Core Team 2013). For pairwise comparisons of individual allelic diversity between transects and elevational zones, we used a Student's t test.

MHC phylogeny construction
We inferred a phylogeny of putatively functional MHC alleles using MrBayes v.3.2.1 (Ronquist and Huelsenbeck 2003) with Carpodacus erythrinus MHC-I exon 3 (Caer_101; Promerová et al. 2012) as an outgroup (Fig. 2). We determined an appropriate model of nucleotide substitution using FINDMODEL (Tao et al. 2005). Likelihood scores were calculated from a Weighbor tree, and substitution models were assessed based on the Akaike Information Criterion score (Tao et al. 2005). A generalized time-reversible (GTR)+Γ was found to be the best-fit nucleotide substitution model. A GTR+Γ model uses six substitution rate parameters and four equilibrium base frequency parameters and a shape parameter (Γ) for the gamma-distributed site rate heterogeneity. To converge on the posterior probability distribution for the phylogeny, we ran 50 million Markov Chain Monte Carlo (MCMC) generations using default priors, sampling trees every 10,000 generations. We ran four independent runs with a random initial tree and four chains (one cold, three heated) and calculated a majority-rule consensus tree using the last 10 % of sampled trees from each of the four runs. MHC gene topologies are difficult to resolve due to their rapid rate of molecular evolution and duplication (Hughes and Nei 1989;Takahashi et al. 2000). To independently evaluate support for low support nodes (<75 %) on our final consensus tree, we calculated majority-rule consensus trees using the last 10 % of sampled trees for each replicate. We then checked if the major splits (those involving four or more alleles) and node support values were similar between each replicate tree and final consensus tree.

Tests of historical selection
To examine signals of historical selection on paralagous MHC-I exon 3 loci, we calculated the ratio of nonsynonymous (d N ) to synonymous mutations (d S ) in the CodeML module implemented in the software PAML (Yang 2007). Ratios of nonsynonymous to synonymous substitution rates significantly >1 are indicative of positive selection, whereas ratios significantly <1 are indicative of negative selection. Importantly, while selection operates at the allele level (or groups of alleles), individual sites may contribute disproportionately to the overall fitness of an allele and thus show strong signatures of positive selection. At the MHC, we expect sites corresponding to the PBR to contribute disproportionally to allele fitness and show signatures of stronger positive selection relative to other sites because they are directly involved in pathogen peptide-binding (Hughes and Nei 1988). Therefore, we used a "sites" model, which allows d N /d S to vary across amino acid positions. We examined two hypotheses of codon evolution: (1) a nearly neutral codon substitution model (M1), which allows sites to have d N /d S ≤1 and (2) a positive selection codon substitution model (M2), which allows sites to have d N /d S ≤1 or d N /d S >1. To examine the relative support for each model, we used a likelihood ratio and an empirical Bayes procedure to identify PSSs,  Fig. 2 MrBayes majority-rule consensus tree depicting two major clades of MHC alleles. Colors of allele names correspond to their supertype cluster (black = 1, dark red = 2, green = 3, blue = 4, turquoise = 5, red = 6, gold = 7, gray = 8, purple = 9, orange = 10). The red and blue bars represent the alleles within each clade (see text). Posterior support value for each node is 1.0 unless otherwise noted. Nodes A and B were collapsed in a consensus tree produced from one independent run, while nodes C and D were collapsed in a separate run (color figure online) which are defined as sites bearing a significantly elevated d N /d S ratio. Tests of selection based on d N /d S are theoretically problematic for MHC datasets because they do not account for uncertainty in the tree topology, which is difficult to resolve for paralogous MHC loci. High recombination rates can lead to multiple tree topologies for population sequences, which in turn can increase false positives in a d N /d S outlier test (Anisimova et al. 2003;Wilson and McVean 2006). Because our sequencing strategy produced unphased MHC genotypes and an unknown number of MHC loci, we are unable to formally account for recombination in our dataset. The empirical Bayes approach that we applied here has been shown to have high power and accuracy for detecting true PSSs (M2 model) in the presence of high rates of recombination (Anisimova et al. 2003). Additionally, detecting signatures of selection based on d N /d S tests can be relatively robust to alternative topologies (Stager et al. 2014). The results of d N /d S analyses applied to within species datasets should also be interpreted cautiously since this analysis assumes fixed differences between populations rather than segregating polymorphisms within populations (Kryazhimskiy and Plotkin 2008). However, we are comparing MHC paralogs with divergences that are much older than the population level splits and that likely possess fixed differences. Furthermore, the "randomsites" model of codon evolution, similar to the sites model we used, has been demonstrated to detect signatures of strong positive selection in the PBR of human population MHC datasets (Yang and Swanson 2002).
To classify the "putative PBR" in Z. capensis, we aligned Z. capensis MHC-I exon 3 to Gallus gallus and assumed conservation of codon positions comprising the PBR in G. gallus (Fig. 3). We calculated nucleotide diversity and proportion of segregating sites across the entire exon, within the PSSs and putative PBR, and outside of the PSSs and putative PBR using pegas (Paradis 2010) in R (R Development Core Team 2013). Nucleotide diversity was calculated as the sum of the number of pairwise differences between alleles divided by the number of comparisons (Paradis 2010). To examine spatial differences in nucleotide diversity, we performed a one-way ANOVA with transect and elevational zone as fixed factors. We used a Student's t test to examine pairwise differences in nucleotide diversity across different codon classes (whole exon, PBR and PSS, and non-PBR and non-PSS), MHC clades, and spatial zones.

MHC supertypes
Physicochemical variation at the PBR and PSSs is expected to provide the phenotypic targets of natural selection on MHC alleles (Bernatchez and Landry 2003). We characterized the physicochemical properties of PSSs as proposed by Doytchinova and Flower (2005). We aligned amino acid sequences from PSSs, characterized those amino acids based on five z-descriptor variables-hydrophobicity (z1), steric bulk (z2), polarity (z3), and electronic effects (z4 and z5)and translated them into a matrix (Ellison et al. 2012;Sepil et al. 2012). To identify MHC "supertype" clusters and describe those clusters, we performed a K-means clustering algorithm and a principal components analysis using adegenet (Jombart 2008;Jombart et al. 2010) in R (R Development Core Team 2013). The number of supertype clusters was determined using a Bayesian Information Criterion score after retaining five principal components.

Population and individual level variation
Initially, we obtained a total of 616,150 reads and 5,364 unique variants. Seven individuals with fewer than 200 reads were removed from subsequent analyses leaving a total of 177 individuals. After removing all reads with ambiguous base pair assignments, incomplete primer or tag sequences, or reads not 214 bp in length, our dataset was reduced to 360,156 reads (58.5 % of original reads). Our final dataset consisted of 98 unique MHC alleles (GenBank KF433977-KF434074), which comprise 56 unique MHC protein sequences. Mean coverage across MHC amplicons per individual was 2,035± 787 (range 203-5,072 reads). We did not find a significant correlation between the number of MHC alleles per individual and the number of reads per individual (P=0.864, R 2 =0.002, Online Resource 2). Across all alleles, 31.8 % of nucleotides were polymorphic (68/214), 49.3 % of amino acid positions were polymorphic (35/71), and nucleotide diversity was 0.085±0.002 (Table 1). Mean pairwise sequence divergence among MHC alleles was 8.52 % and ranged from 0.47 to 16.36 %. The mean number of MHC alleles per individual was 4.36±1.75 corresponding to a mean of 3.56±1.42 MHC proteins per individual. The number of MHC alleles per individual varied from 1 to 11, which yields a minimum of 6 MHC-I loci. Most alleles were rare (<5 %), and only one allele (ZocaU*1) was found in over half of the individuals (Online Resource 3).
We identified ten MHC supertype clusters based on physicochemical variation at PSSs (Fig. 4, Online Resource 4). Individuals possessed a mean of 3.21±1.11 MHC supertypes. The first three principal components explained 92.4 % of variation in MHC supertypes (standard deviation=2.03). The first two principal components explained 83.8 % of variation and were plotted together (Fig. 4) to summarize differences among clusters. PC1 was strongly associated with variation in electronic effects (z4) at codon 61 (r=0.862), and PC2 was most strongly correlated with amino acid variation (z1-z5) at codon 18 (r=0.761).
The number of MHC alleles, proteins, and supertypes per individual and nucleotide diversity varied significantly by elevational zone (P allele =0.017, P protein =0.001, P supertype = 0.003, P π <0.0001). In particular, middle elevation individuals possessed significantly more MHC alleles, proteins, and supertypes than low elevation individuals (P allele =0.002, P protein <0.001, P supertype =0.005) and high elevation individuals (P allele =0.007, P protein <0.001, P supertype =0.001, Table 2). Nucleotide diversity was also significantly higher in middle elevation populations relative to low (P<0.001, Table 2) or high elevation (P=0.004, Table 2). After controlling for the effect of elevation, MHC allele and supertype diversity per individual did not significantly vary by transect (P allele =0.228,   Fig. 3 The d N /d S ratio along MHC-I exon 3 in Zonotrichia capensis. Four significant peaks in d N /d S (indicated by asterisks) correspond to codon positions 18, 53, 61, and 70. Sites marked by a "+" indicate positively selected sites in>50 % of passerines examined (Alcaide et al. 2013). Sites marked by a "g" correspond to the peptide-binding region of Gallus gallus MHC (Wallny et al. 2006;Sepil et al. 2012). White horizontal bars represent polymorphic codons, while black horizontal bars represent invariant codons P supertype =0.074). The number of MHC proteins per individual was significantly different across transects (P protein < 0.031) with elevated diversity at the low-latitude transect (T1). Nucleotide diversity was also significantly different across different transects (P π <0.001) with substantially decreased nucleotide diversity in the middle-latitude transect (T2) relative to the low-and high-latitude transects. Across the elevational transects, 19 MHC alleles were only present in T1, seven were found exclusively on T2, and nine were found on T3. Across elevational zones, we found five MHC alleles only at low elevation, eight exclusively found in middle elevations, and 17 found only in high elevations.

MHC Phylogeny
We identified two clades of MHC sequences in the final tree and in all four replicate runs (Fig. 2), although most MHC alleles within these groups did not fall into distinct, wellresolved clusters. Consensus trees from two of the four independent runs produced identical topologies to Fig. 2 with similar nodal support, while the other two consensus trees were nearly identical, however with nodes A and B collapsed in one run and nodes C and D collapsed in the other (Fig. 2). For all other nodes, support values were similar to those in Fig. 2. Nucleotide diversity was significantly higher in clade 1 (π=0.052±0.001) compared to clade 2 (π=0.049±0.001, P<0.001).

Patterns of MHC variation
Variation at the MHC has been linked to important fitnessrelated traits, including disease resistance and secondary  (Ditchkoff et al. 2001;Bonneaud et al. 2006;Loiseau et al 2010;Dunn et al. 2013). In this study, we found substantial allelic diversity at a functionally important MHC-I exon in Z. capensis. We identified 98 putatively functional MHC-I exon 3 alleles distributed across a minimum of 6 loci. Nucleotide diversity at the PSSs and putative PBR of Z. capensis MHC is substantially higher than nucleotide diversity at the putative PBR in several species, such as Cyanistes caeruleus (π PBR =0.14, Schut et al. 2011) and Halobaena caerulea (π PBR =0.14, Strandh et al. 2011), but similar to the low levels of exon-wide MHC-I nucleotide diversity reported in these species (π exon =0.06). Furthermore, most amino acids in Z. capensis MHC-I exon 3 are invariant (50.7 % of codons) or show d N /d S <1 (54.1 % of codons, Table 3). Thus, low nucleotide and amino acid variation in Z. capensis may reflect pervasive negative selection, which plays an important role in the evolution of MHC in some species (Yang and Swanson 2002). Historical population bottlenecks are capable of producing similar signatures of reduced MHC variation (Babik et al. 2009a;Sutton et al. 2011;Sutton et al. 2013). In New Zealand saddlebacks, the effects of a severe bottleneck event are manifested in decreased overall allelic diversity and allelic diversity per individual rather than nucleotide diversity (Sutton et al. 2013). Allele surfing following a population expansion may also create similar genetic signatures as population bottlenecks (e.g., low genetic variation; Excoffier et al. 2009). A recent phylogeographic study of Z. capensis revealed signatures of population expansion in Peruvian lineages following the Pleistocene glaciation (Lougheed et al. 2013), though the patterns were weak and locus specific. Alternatively, low MHC variation in nonmigratory populations, such as the Z. capensis populations in this study, may reflect adaptation to a simple parasite community (Møller and Erritzøe 1998). A screening of avian malaria in these Z. capensis populations revealed a single Haemoproteus lineage contributed to ∼94 % of malaria infections (Jones et al. 2013), suggesting a relatively simple malaria parasite community associated with Z. capensis. The factors driving low exon-wide nucleotide variation (Fig. 3, Table 1) and per individual allele variation are unclear. However, the high nucleotide variation at the PSSs and inferred PBR in spite of low exon-wide variation is consistent with strong balancing selection on the MHC driven by variation at these amino acid positions.
Although we do not know the transcription levels of the putatively functional alleles, given that the majority of reads were 214 bp in length (61.4 %) and had an open reading frame suggests that they may be functional. The high proportion of initial reads that were excluded from analyses by our variant verification procedure (42.5 % of total) likely comprises pseudoalleles or PCR or sequencing errors. Other MHC studies using pyrosequencing discarded a similarly high proportion of reads presumed to represent either pseudoalleles or errors (Galan et al. 2010;Zagalska-Neubauer et al. 2010;Spurgin et al. 2011;Sepil et al. 2012). The high level of potential pseudogenes and the lack of structure on our MHC phylogeny are consistent with extensive gene duplication and pseudogenization in MHC.

Selection on MHC
We found strong signatures of historical positive selection at specific amino acid positions (PSSs) on MHC-I exon 3 (Fig. 3). One PSS (codon 53) corresponded to a peptidebinding site on the G. gallus MHC, which is 80-100 million years diverged from Z. capensis (Shetty et al. 1999), while the remaining three PSSs did not correspond to the G. gallus PBR. As a result of the substantial sequence divergence between passerines and chickens and their differences in MHC structure (Balakrishnan et al. 2010), the Z. capensis PBR is likely not perfectly homologous with the G. gallus PBR. However, it is possible that not all PSSs in Z. capensis correspond to the PBR. Some PSSs may correspond to non-PBR sites that serve important subsidiary roles, function as epistatic compensatory mutations as a result of changes in the PBR, or result from the stochastic mutational process. Furthermore, variation in MHC-I exon 3 will not encompass all functionally important variation in MHC-I because we did not sequence exon 2, which also comprises a portion of the PBR, or regulatory sequences. Nonetheless, the elevated nucleotide diversity and proportion of segregating sites (Table 1, Fig. 3) within the putative PBR suggest that these amino acids may be also functionally important. We identified ten MHC supertypes that share similar physicochemical properties at PSSs. Supertype clusters differed primarily by the physicochemical properties at codon positions 18 and 61, according to a principal components analysis (Fig. 4). MHC supertypes do not form distinct phylogenetic clusters (Fig. 2), which reflect either (i) repeated convergent evolution of MHC supertypes, (ii) that MHC supertypes we identified do not represent functionally important groups, or (iii) that we lack the appropriate phylogenetic resolution to observe distinct patterns.

Maintenance of diversity
Our data are consistent with historical balancing selection on Z. capensis MHC. The method of MHC sequencing we employed does not yield allele or genotype frequencies, so we cannot use traditional population genetic approaches to further investigate evidence of current selection. An approach comparing the geographic distribution of MHC variation to neutral variation would help elucidate whether selection on MHC is ongoing. Identifying associations between MHC alleles and parasite infection status or mate choice would further help to reveal the nature of selection on MHC.
Individuals sampled from intermediate elevations possess significantly more MHC alleles, proteins, and supertypes compared to those from high or low elevation populations ( Table 2). We also observed significantly higher nucleotide diversity in middle elevation populations relative to low and high elevation populations. Thus, middle elevation populations may experience stronger balancing selection on MHC relative to low and high populations. MHC allelic and supertype diversity per individual did not significantly vary by transect after controlling for the effects of elevation, although MHC protein diversity per individual and nucleotide diversity were significantly affected by transect. Taken together, environmental heterogeneity seems to play a role in mediating selection dynamics on MHC. Differences in parasite composition, abundance, or virulence between elevational zones and transects may contribute to spatial variation in parasite-mediated selection and MHC composition. Indeed, Jones et al. (2013) found higher prevalence of avian malaria within Z. capensis at middle elevations and low latitudes. The patterns of malaria prevalence and MHC variation are consistent with intensified parasite-mediated selection in these environments, yet the precise selective mechanisms are unclear and warrant further investigation.

Conclusions
Our study reveals a high level of allelic variation in MHC-I exon 3 of a common South American passerine. Although we find low levels of allelic diversity per individual and exonwide nucleotide diversity, we find extraordinarily high levels of nucleotide diversity within regions of the exon showing signatures of positive selection and those corresponding to the putative PBR. Variation in physicochemical properties of amino acids at the PSSs revealed ten functional MHC supertypes that do not cluster together on the MHC allele phylogeny, suggestive of either recurrent parallel evolution of these diverse supertypes or low phylogenetic resolution. Geographic variation in nucleotide diversity and the number of MHC alleles, proteins, and supertypes per individual is consistent with varying environmental selective pressures, perhaps related to malarial parasites. This study lays the groundwork for future research into the specific selective agents and mechanisms acting on MHC and provides important insight into the evolution of the MHC in a wild Neotropical passerine.