Mitochondrial Genome Diversity in the Central Siberian Plateau with Particular Reference to Prehistory of Northernmost Eurasia

The Central Siberian Plateau was last geographic area in Eurasia to become habitable by modern humans after the Last Glacial Maximum (LGM). Through comprehensive mitochondrial DNA genomes retained in indigenous Siberian populations, the Ket, Tofalar, and Todzhi - we explored genetic links between the Yenisei-Sayan region and Northeast Eurasia over the last 10,000 years. Accordingly, we generated 218 new complete mtDNA sequences and placed them into compound phylogenies along with 7 newly obtained and 70 published ancient mt genomes. Our findings reflect the origins and expansion history of mtDNA lineages that evolved in South-Central Siberia, as well as multiple phases of connections between this region and distant parts of Eurasia. Our result illustrates the importance of jointly sampling modern and prehistoric specimens to fully measure the past genetic diversity and to reconstruct the process of peopling of the high latitudes of the Siberian subcontinent.

genetic interactions between and within Siberian populations. Accordingly, we focused on mitochondrial DNA genome diversity in the Yenisei-Sayan autochthonous populations, primarily the Ket, Tofalar, and Todzhi (Fig. 1). Ket are now the sole surviving member of the Yeniseian language family, speaking a unique language that does not easily fit into any known phyla and is unrelated to any other Siberian language [15]. Their related tribes, the Assan, Arin, and Kott, disappeared about 200 years after their first contact with Russians. In this region, the Ket is the last tribe to retain their original language and until recent times subsisted entirely on hunting, fishing, and the gathering of wild plants [2,16,17]. The Tofalar and Todzhi languages whose members originally spoke Samoyed belong to the subgroup of Turkic languages confined to the upper Yenisei area [18][19][20], where they may have had ample opportunity to exchange genes with the Ket or related tribes (Lopatin, 1940 [5], and references therein). We compared mitochondrial DNA present-day diversity from these groups with complete mitochondrial genomes from ancient samples from the region and placed them into combined genealogical trees. We used these updated genealogies to trace ancestral relationships between populations sharing subhaplogroup-specific mutations. We used the results to reconstruct expansions (preponderantly from south to north) as well as contractions of populations, thus shedding new light on northeastern Eurasia's past.

Populations and samples
Ket -Formerly called Yenisei Ostyak (the census of 1897 recorded 988 persons), the Ket people are thought to be descendants of some of the earliest inhabitants of Central Siberia, while all of their present-day neighbors seem to be relative newcomers. In the 18 th century part of the Ket were forcibly moved to the lands between the Ob and Yenisei Rivers, where the Selkup belonging to the Samoyed group of the Uralic language family lived [2,5,16]. Until recently, there were ~500 Ket living in a few riverside villages in the middle reaches of the Yenisey; as in the past, many survive as seasonal hunters, trappers, and fishermen. The remnants of the Southern (Upper) Ket, whose ancestors are believed to have originated from the Podkamennaya Tunguska region, have been almost completely integrated into the expanding Evenki [17].
We integrated our newly obtained mtDNA sequences from previously collected Tofalar -Fewer than 400 Tofalar, who are reindeer breeders and hunters, inhabit the northern slopes of the eastern Sayan Mountains, along the rivers Uda, Gutara, and Nerkha. They originally spoke a Samoyed language, but later changed to a language of the Turkic family. The Tofalar are comprised of the remnants of several hunting tribes, gradually assimilated into a broader group. In the 17 th century, the predecessors of the Tofalar entered into five administrative settlements of "Udinsk Land" of Krasnoyarsk Region [2]. When the Tofalar transitioned to a settled lifestyle in 1930, they were concentrated in three newly established villages in the territory of Nizhni Udinsk District encompassing Upper Gutara, Nerkha, and Alygdzher. There is anthropological and linguistic evidence that classifies Tofalar among the 'Old Siberians' [5]. In this study, we revised the analysis of previously collected Tofalar

RESULTS
In what follows we describe the genetic diversity of complete mtDNA sequences from the Ket, Tofalar, and Todzhi, and compare them with previously reported data. We also combined this with mtDNAs from the Mansi, Tubalar, Nganasan, Evenki, Even, Yukaghir, and Koryak, many of whom we sequenced to the full mtDNA genome level, building on lesser amounts of data previously reported from these samples. A brief description of each population, as well as details of the sampling collection are reported in the work of Torroni et al. 1993b

Haplogroup N2a
The Yenisei region was outside the main routes of Eurasian agricultural populations sampled to date are known to carry N2a. This haplogroup is found in just a few contemporary individuals from Europe, Iran, Arabia, and Ethiopia [27,28]. It is likely that this lineage came to the mid-Yenisei from Caucasus, presumably the major source of N2a, which contributed to the central Siberian maternal lineages.

Haplogroup U4d
The updated phylogeny of U4d includes both its main subhaplogroups U4d1 conclusion is in agreement with ancient genome-wide data from Finland and the Russian Kola Peninsula [31], revealing that the specific genetic makeup of northern Europe traces important ancestry from Siberia migration that began at least 3,500 years ago.

Haplogroup U5a
Recent studies utilizing the genome-wide approach suggest that U5a lineages already existed in Mesolithic Fennoscandia (U5a1 and U5a2) and may imply an eastern origin for these sublineages in Europe [32,33]

Haplogroup A8
Through the course of this study, previously published and newly obtained A8a2 samples were explored to redefine the structure of the entire A8 tree (Supplementary Table 1; Fig. 6). As a result, we were able to delineate a previously uncategorized A8b lineage [34]. The immediate split of A8 created a distinctive A8b evident in two Koryak mt genomes. The entire tree expands our understanding of haplogroup A8 by encompassing previously attested ancient mt genomes attributed to the Bronze Age Okunevo culture in Khakassia. Hence, genetic and archeological evidence support a single/common origin of a population directly related to the maternal ancestors of some of the present-day Ket, Tofalar, Tuvan, Yakut, Buryat, as well as the Koryak individuals. It has been suggested that the Okunevo Culture derived ancestry from long-resident populations of the Altai-Sayan Upland whose roots may extend back to the Neolithic, if not before [35].

Mitochondrial DNA gene pool in Tofalar and Todzhi
The present-day Tofalar and Todzhi do not cluster with the Russian Buryat who

Haplogroup C4
The age estimates of the newly described sublineages hint to the origin and diversification of the C4 haplogroup in Siberia/Asia (Fig. 7). Thus, modern samples  Supplementary Fig. 2), contrasting with C4a2 diversity which is basically restricted to C4a2a (Supplementary Fig. 3).

Haplogroup C5
The phylogeny of C5 is structured into four major branches, assigned as C5a, C5b, C5c, and C5d, in accord with the latest release of PhyloTree ( Supplementary   Fig. 4-6). While the C5a1 and C5a2a samples are from the Altaic-speaking populations scattered across the southern extent of Eastern Siberia, the C5a2b is likely to reflect different migrations of Reindeer Koryak, Chukchi, Yukaghir, and Kamchatkan Even.
An updated C5-m.16093T>C network, including 29 entire sequences, of which 15 are new, is given in Supplementary Fig. 4. Accordingly, the entire tree splits into two main

Haplogroup Z1
A novel Z1 sublineage, designated here as Z1b, and formed by two Tofalar samples, with one haplotype from western Sayan [10] and the other from eastern Sayan (this study) is noteworthy (Supplementary Fig. 7). An ancient sequence falling within  Table 1, Supplementary Fig. 8). Accordingly, the phylogeny of F1b1b is structured into three major sub-branches, assigned as Interaction between reindeer Koryak, Chuvan, Khodyn and Anaul (the latter two tiny tribes dissolved among the Chuvan (Chuvantsi) shortly after the first Russians appeared in Chukotka in the mid-17 th century) is consistent with historical records indicating that quite a few of Yukaghir and Chuvantsi women were amongst the Koryak and the Chukchi by the end of 19 th century [6,55]. We also highlight the evolutionary history of D4b1c/D3, whose intrinsic diversity has not yet been well resolved. Following increased sampling, the haplogroup D4b1c (D3) phylogeny encompasses seven Nganasan and four Yukaghir, which together account for 61.1% of the entire D3 sample tested at the complete mtDNA level (Supplementary Table 1; Supplementary Fig. 10).  [56,57,54, p.198].

Conclusion
Here, we were assembled novel data on unique native Siberian populations, the

Mitochondrial genome sequencing for modern samples
Genomic DNA was extracted from blood buffy coats using standard procedures.

Ancient DNA Analysis
In a dedicated clean room at Harvard Medical School, we prepared powder from the teeth of 7 individuals, all of whom we directly radiocarbon dated using accelerator mass spectrometry. These individuals consisted of:  [63]. We enriched the libraries for sequences overlapping the mitochondrial genome [64], and then sequenced on an Illumina NextSeq500 instrument using v.2 150 cycle kits for 2×76 cycles and 2×7 cycles. We computationally removed the barcode and adapter sequences, and merged pairs of reads requiring a 15 base pair overlap (allowing up to one mismatch). We mapped the merged sequeces to the reconstructed human mitochondrial DNA consensus sequence [65] using bwa (v.0.6.1) [58], removed sequences with the same strand orientation, start and stop positions, and built a consensus mitochondrial genome sequence for each sample (average coverages were 25-1550-fold). We used contamMix-1.0.9 to estimate 95% confidence intervals for the rate of mismatch to the mitochondrial DNA consensus sequence, all of which were fully contained in the range 0.99-1 [66]. All samples had a rate of damage in the final nucleotide in the range of 0.028-0.127.

Mitochondrial data analysis
All mtDNA genome consensus sequences were called using SAMTOOLS mpileup [67]. The resulting consensus sequences were then inspected by eye, with particular attention being paid to the hypervariable regions and nucleotide positions previously identified as being problematic [34]. All ambiguous sites were called as 'N'.
Entire mtDNA sequences were assembled into phylogenetic trees by using mtPhyl v5.003 software. Coalescence dates were estimated with the ρ statistic [69].
Standard errors (σ) were calculated according to Saillard et al. (2000) [70]. Mutational distances were converted into years using the substitution rate for the entire molecule, 2.67×10 −8 substitutions per site per year [68]. The haplogroup affiliations reported in this analysis correspond to the current nomenclature of mtDNA in agreement with the latest release (February 2016) of PhyloTree Build 17 [34].
Overall, 218 newly reported mt genomes are listed in Table S1 along with ethnicity, sample location, and accession codes in GenBank. In the course of this study, 70 ancient mtDNA sequences were published and we used them to provide powerful new information about subhaplogroup affiliation (Supplementary Table 2) [24,25, 30,33,41,43]. To uncover ancient mtDNA lineages, especially those related to the Altai-Sayan area, we compared modern mitogenomic data with their ancient counterparts sampled from skeletons recovered from the Euro-Siberian region that extends from the Central Europe and Scandinavia to Lake Baikal.  Black circles mark the settlements of sampling expeditions.
Black triangles denote locations of the ancient specimens generated through the course of this study and listed in Supplementary Table 2.
General legend for phylogeny figures: In bold and green color is a new sequence generated through the course of this study. When two or more identical samples belong to the same group, their numbers are given in brackets. In bold and yellow are new ancient sequences generated through the course of this study, in grey -ancient sequences gleaned from published sources (see Supplementary Table 2). Dashed lines for ancient samples indicate that the sequence is not complete and contain gaps due to DNA damage or contamination.