Genome-Wide Survey of SNP Variation Uncovers the Genetic Structure of Cattle Breeds

A survey of genetic diversity of cattle suggests two domestication events in Asia and selection by husbandry. Not Just Dinner on Legs Several thousand years ago, human beings realized the virtues of domesticating wild animals as easy meat. Soon other possibilities became apparent, and as revealed in a series of papers in this issue, early pastoralists became selective about breeding for wool, leather, milk, and muscle power. In two papers, Gibbs et al. report on the bovine genome sequence (p. 522; see the cover, the Perspective by Lewin, and the Policy Forum by Roberts) and trace the diversity and genetic history of cattle (p. 528), while Chessa et al. (p. 532) survey the occurrence of endogenous retroviruses in sheep and map their distribution to historical waves of human selection and dispersal across Europe. Finally, Ludwig et al. (p. 485) note the origins of variation in the coat-color of horses and suggest that it is most likely to have been selected for by humans in need of good-looking transport. The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.

T he emergence of modern civilization was accompanied by adaptation, assimilation, and interbreeding of captive animals. In cattle (Bos taurus), this resulted in the develop-ment of individual breeds differing in, for example, milk yield, meat quality, draft ability, and tolerance or resistance to disease and pests. However, despite mapping and diversity studies (1)(2)(3)(4)(5) and the identification of mutations affecting some quantitative phenotypes (6)(7)(8), the detailed genetic structure and history of cattle are not known.
Cattle occur as two major geographic types, the taurine (humpless-European, African, and Asian) and indicine (humped-South Asian, and East African), which diverged >250 thousand years ago (Kya) (3). We sampled individuals representing 14 taurine (n = 376), three indicine (n = 73) (table S1), and two hybrid breeds (n = 48), as well as two individuals each of Bubalus quarlesi and Bubalus bubalis, which diverged from Bos taurus~1.25 to 2.0 Mya (9,10). All breeds except Red Angus (n = 12) were represented by at least 24 individuals. We preferred individuals that were unrelated for ≥4 generations; however, each breed had one or two sire, dam, and progeny trios to allow assessment of genotype quality.
Single-nucleotide polymorphisms (SNPs) that were polymorphic in many populations were primarily derived by comparing whole-genome sequence reads representing five taurine and one indicine breed to the reference genome assembly obtained from a Hereford cow (10) (table S2). This led to the ascertainment of SNPs with high minor allele frequencies (MAFs) within the discovery breeds (table S5). Thus, as expected, with trio progeny removed, SNPs discovered within the taurine breeds had higher average MAFs *The full list of authors with their contributions and affiliations is included at the end of the manuscript.  (table S4). The proportions of SNPs in intergenic, intronic, and exonic regions were 63.74, 34.9, and 1.35%, respectively, similar to their representation within the genome. We found that as few as 50 SNPs were necessary for parentage assignment and proof of identity (table  S9). Additionally, when we compared ancestries based on pedigree and allele-sharing between individuals, we were able to predict accurately the extent of ancestry when the pedigree was not known ( fig. S24), which could be a useful tool for the management of endangered bovine populations.
To examine relatedness among breeds, we analyzed SNP genotype frequencies with InSTRUCT (11) and performed principal component analysis (PCA) using Eigenstrat (12) (Fig. 1 and fig. S27). Varying the number of presumed ancestral pop-ulations (K) within InSTRUCT revealed clusters consistent with the known history of cattle breeds (Fig. 1A). The first level of clustering (K = 2) reflects the primary, predomestication division of taurine from indicine cattle. Consequently, breeds derived from indicine and taurine crosses (Beefmaster, Santa Gertrudis, and Sheko) show signatures of admixture with both approaches. At K = 3, the African breeds N'Dama and Sheko separate from the European breeds-a division that reflects an early, possibly predomestication, divergence. PCA recapitulated these findings ( Fig. 1B). At higher levels of K, we observed clusters that identify single breeds as closed endogamous breeding units. For example, at K = 9, Jersey, Hereford, Romagnola, and Guernsey each form unique clusters.
If modern breeds arose from bottlenecks from a large ancestral population, we should detect bottleneck signatures within patterns of linkage disequilibrium (LD) and effective population size. We found that the decline of r 2 with genetic distance varied among breeds, although the decline was generally rapid (fig. S10). The extent of LD in cattle is greater than human (13) but less than dog (14). The Jersey and Hereford breeds had higher r 2 than other breeds across the range of distances separating loci. N'Dama had the highest r 2 values at short distances and the lowest r 2 at long distances, which suggested that they were derived from a relatively small ancestral population not subjected to very narrow bottlenecks. The indicine breeds had lower r 2 values at short distances and intermediate r 2 values at longer distances, which indicated that their ancestral popula- Effective population size in the past estimated from linkage disequilibrium data. Inset graph shows effective population size for the European humans over the same period; from (13). Breeds as in Fig. 1. www.sciencemag.org SCIENCE VOL 324 24 APRIL 2009 tion was much larger than that from which taurine cattle were domesticated (Fig. 2). As the MAFs for utilized SNPs were generally high and the estimates of LD did not require phased chromosomes, these results should be robust. When breeds were combined, the decline in LD was more rapid, which reflected a lack of conserved phase relations across breeds. We characterized the extent of haplotype-sharing among breeds between pairs of adjacent SNPs using the r statistic. A high correlation between r values between two breeds indicates that the same haplotypes tend to persist within both breeds. Correlations between r values for SNPs separated by 10 kb were high among the taurine and indicine breeds but were low between these groups (fig. S11). Once SNPs are separated by 100 to 250 kb, we found little haplotype sharing between breeds. Clearly, phase relations dissipated as populations diverged despite the relatively young origin of all breeds. Breeds known to have a recent shared ancestry, notably, Angus and Red Angus; Holstein and Norwegian Red; and Beefmaster and Santa Gertrudis, showed a high correlation among r values for SNPs separated by 100 to 250 kb.
Breeds were expected to differ for effective population sizes (N e ) on the basis of differences in the decline of r 2 with genetic distance (13). We estimated N e at various times in each breed's history by setting average r 2 values equal to their expectation (15) (Fig. 2 and table S1). N e has recently declined for all breeds, which reflects bottlenecks associated with domestication, breed formation, and, in some breeds, recent intense selection for milk or beef production. In contrast, human N e has expanded exponentially over the same period (inset to Fig. 2).
A smaller N e suggests lower genetic diversity, which is of concern for species viability. To assess genetic diversity free from SNP ascertainment bias, we used the polymerase chain reaction to amplify and sequence 119 closely spaced fragments from five genomic regions on two chromosomes. Two of these regions were known to harbor quantitative trait loci (QTL). Following the amplification of these regions from 18 Angus, 16 Holstein, and 5 Brahman, the individual segments were Sanger-sequenced to detect SNPs. Of the 1201 discovered SNP, only 258 were common to taurine and indicine breeds, consistent with their age of divergence. Remarkably, 569 SNP (47.4%) were unique to Brahman, and 365 SNP (30.4%) were found only in Angus or Holstein, with 169 SNP (46.3%) common to both breeds. This suggests that breeds represent partly overlapping subsamples within the taurine diversity. However, seven times as many taurine animals had to be sequenced to uncover 75.3% as many SNPs as were discovered in indicine animals. Estimates of the unascertained genomic distributions of SNPs by MAFs within taurine and indicine breeds are in fig. S19. Diversities as measured by the population mutation rate (q) and pairwise nucleotide heterozygosity (p) were also estimated for the 119 fragments and compared between the three breeds ( Fig. 3). Angus and Holstein have similar levels of nucleotide diversity measured by both statistics (~1.4 × 10 −3 ) and have~40% more nucleotide variation than is found in human populations (~1.0 × 10 −3 ). Brahman variation was even higher, with average estimates of q and p of 3.35 × 10 −3 and 2.74 × 10 −3 , respectively. These correspond to densities of 1 SNP every 714 bp for pairs of Angus or Holstein chromosomes and 1 SNP every 285 bp for pairs of Brahman chromosomes. These results demonstrate that genetic diversity in cattle is not low despite the decline in N e .
The lower genetic diversity within modern taurine cattle could reflect a lower diversity within the predomestication ancestral population, and/or postdomestication effects of stronger bottlenecks at breed formation and stronger selection for docility and productivity. Selection is unlikely to be the primary cause, because the diversity distributions for q and p were similar for all five sequenced regions, and only one region revealed a signature of selection. On the other hand, Fig. 2 suggests that the predomestication N e of indicine cattle, which originated in southern Asia, a center of species diversity, was much larger than that of taurine cattle. Finally, the process of breed formation in European taurine cattle involved sequential limited migrations from the center of domestication in west Asia (5). Diversity declines with distance from primary sites of domestication (4) and ancient DNA from domesticated cattle and aurochs in Europe show that there was essentially no gene flow from the aurochs into domesticated cattle (5). Therefore, the evidence suggests that the current difference in diversity is mainly due to progenitor population diversity and bottleneck effects at, and before, breed formation rather than differences in the intensity of natural or artificial selection postdomestication.
Cattle have been marked by selection during domestication, breed formation, and ongoing selection to enhance performance and productivity. We utilized three methods to detect genomic selection in cattle: (i) the iHS statistic, which identifies regions of increased local LD (16) suggestive of directional selection; (ii) the F ST statistic, a measure of the degree of differentiation between subpopulations (17); and (iii) the composite likelihood ratio test (CLR) (18), which assumes a selective sweep model (10). The iHS method was limited by low SNP density and our inability to completely specify ancestral SNP allele states (10). However, despite these limitations, we found evidence for selective sweeps on chromosomes 2, 6, and 14 (table S8 and fig. S20). We identified selection near MSTN, in which mutations can cause double muscling (6). Similarly, high iHS values were found in the region near ABCG2 in which mutations cause differences in milk yield and composition (8). A peak in iHS values was also identified within a gene poor region of chromo- Stress-and exercise-induced sudden cardiac death some 14 adjacent to a region containing genes from KHDRBS3 to TG, associated with intramuscular fat content in beef (19). Calculation of F ST across all populations for each SNP detected both balancing and divergent selection ( fig. S20). Some of the highest and lowest average F ST values were found in genes associated with behavior, the immune system, and feed efficiency (Table 1). Domestication most likely required the selection of smaller and more docile animals that could resist pathogens and adapt to a human-controlled environment (20). One region under selection contains R3HDM1 and is associated with efficient food conversion and intramuscular fat content in some breeds (2). In addition to the R3HDM1 gene (21), this region is also under selection in Europeans, most likely because it contains LCT, mutations of which allow the digestion of lactose in adults (22). These results suggest that mutations in this region may affect energy homeostasis. Furthermore, we detected selection between beef and dairy breeds with both CLR and iHS, represented by a broad, high F ST peak across the region, centered on SPOCK1 ( Table 1). As several QTL have been mapped to this region, multiple loci could be under divergent selection (1), although this peak does not encompass CAST, which affects meat quality (23).
Our high resolution examination of cattle shows that unlike the dog-which has restricted diversity and high levels of inbreeding-domesticated cattle had a large ancestral population size and that more aurochs must have been domesticated than wolves; reducing the severity of the domestication bottleneck. SNP diversity within taurine breeds was similar to that of humans, but was significantly less than diversity within indicine breeds, which suggested that the Indian subcontinent was a major site of cattle domestication and predomestication diversity. Selection first for domestication and then for agricultural specialization have apparently reduced breed effective population sizes to relatively small numbers. The recent decline in diversity is sufficiently rapid that loss of diversity should be of concern to animal breeders. Despite this, population levels of LD are unexpectedly low considering the relatively small N e , which indicates that effective population sizes were much larger in the very recent past.