Insertion of Telomeric Repeats in the Human and Horse Genomes: An Evolutionary Perspective.

Interstitial telomeric sequences (ITSs) are short stretches of telomeric-like repeats (TTAGGG)n at nonterminal chromosomal sites. We previously demonstrated that, in the genomes of primates and rodents, ITSs were inserted during the repair of DNA double-strand breaks. These conclusions were derived from sequence comparisons of ITS-containing loci and ITS-less orthologous loci in different species. To our knowledge, insertion polymorphism of ITSs, i.e., the presence of an ITS-containing allele and an ITS-less allele in the same species, has not been described. In this work, we carried out a genome-wide analysis of 2504 human genomic sequences retrieved from the 1000 Genomes Project and a PCR-based analysis of 209 human DNA samples. In spite of the large number of individual genomes analyzed we did not find any evidence of insertion polymorphism in the human population. On the contrary, the analysis of ITS loci in the genome of a single horse individual, the reference genome, allowed us to identify five heterozygous ITS loci, suggesting that insertion polymorphism of ITSs is an important source of genetic variability in this species. Finally, following a comparative sequence analysis of horse ITSs and of their orthologous empty loci in other Perissodactyla, we propose models for the mechanism of ITS insertion during the evolution of this order.


Introduction
Telomeres are nucleoprotein structures at the end of eukaryotic chromosomes. In vertebrates, telomeres are composed by extended arrays of the hexanucleotide TTAGGG [1] and by a specialized protein complex called shelterin [2]. The main role of telomeres is to prevent chromosome ends from being recognized and processed as double-strand breaks. In normal somatic cells, telomeres shorten at each replication round, while in germ-line and stem cells the reverse transcriptase telomerase ensures DNA replication of chromosome ends by adding TTAGGG repeats to the 3' end of telomeric DNA. In normal somatic cells, after several replication rounds, telomeres reach a critical length, resulting in the loss of their ability to maintain genome stability. Short telomeres induce a state called replicative senescence, which is characterized by irreversible arrest of the cell cycle and is responsible for a decline in tissue renewal capacity [3][4][5][6]. Senescence can be seen as a barrier to uncontrolled cell proliferation and tumor development. Cells escaping senescence enter in a condition called crisis that is characterized by genome instability and ultimately leads to apoptosis. Rare survivor cells can rescue telomere maintenance mechanisms which can lead to cell immortalization and cancer. In addition, noncoding RNA molecules transcribed from telomeres (telomeric-repeat-containing RNA (TERRA)) nuclear sequences of mitochondrial origin (numt) [50]. These observations, together with a number of molecular and cytogenetic comparative studies [51][52][53][54][55][56][57][58], support the hypothesis that the horse genome is in a stage of rapid evolution.
The first goal of the present work was to update the list of human ITSs and to investigate whether insertion polymorphism at ITS loci can be detected in the human population.
The second goal of this work was to identify ITS loci in the horse genome and to test whether, similarly to retrotransposons and numts, these loci are characterized by insertion polymorphism.
The third goal of the present work was to study the molecular mechanisms of ITS insertion through sequence comparison between ITSs and their corresponding empty loci.

Identification of Human ITS Loci and Search of Insertion Polymorphism
We updated the list of human ITS loci by carrying out a BLAST search against the genome sequence assembly hg19/GRCh37. Using the sequence (TTAGGG) 4 as query we identified 229 loci containing at least four telomeric repeats and less than one mismatch per unit, relative to the telomeric sequence. Supplementary Table S1 reports the complete list of human ITS loci together with their coordinates, length and number of mismatches.
We previously demonstrated that, unlike classical microsatellite repeats and similarly to other insertion sequences, ITSs arose from the sudden introduction of telomeric repeats into the genome. Since the fixation of a new genomic variant requires many generations, we expect that, at loci where ITS insertion occurred in evolutionarily recent times, the ITS-containing allele and the empty allele may be detected in the same population.
In the attempt to identify empty alleles at human ITS loci, we took advantage of the 2504 genomes produced by the 1000 Genomes Project [59]. In this database, an ITS-less allele would miss part of the reference sequence (i.e., the telomeric repeat stretch), and therefore it would be classified as indel. We used the coordinates of the ITS loci listed in Supplementary Table S1 to manually test the presence of indels. We did not detect any deletion corresponding to ITSs, i.e., ITS-less alleles were not identified. Since the 4× genome coverage of this collection of human genomes would allow the detection of most variants with frequencies higher than 1%, we cannot exclude that rare ITS-less variants may be present in the human population. We would like to point out that, in the 1000 genomes database, all alleles at each variable locus are listed. Therefore, empty alleles could be detected both in heterozygous and in homozygous individuals. In other words, homozygosity is not necessary to detect empty alleles. On the other hand, if the reference genome was homozygous for an empty allele that was present as ITS in other individuals, with our approach we would not be able to detect it because this locus would not be included in our ITS list. However, since insertion polymorphism of human ITSs is extremely rare or even absent, we can reasonably suppose that this possibility is unlikely.
To test the presence of ITS-less alleles, we also carried out a PCR-based approach. For each locus, primer pairs were designed on the sequences flanking the telomeric repeat. We reasoned that empty alleles may only be present at ITS loci that were inserted recently in the human genome. We previously identified four ITSs that appeared in the human lineage after its separation from the chimpanzee lineage [25]. These ITSs are absent from all nonhuman primates and we called them "human-specific ITSs". Using primer pairs flanking the four human-specific ITSs (Supplementary  Table S2), we amplified genomic DNA of 209 individuals from different populations distributed worldwide. The results of this analysis are reported in Table 1. In accordance with the results of the in silico analysis, no ITS-less alleles were identified, confirming that insertion polymorphism of ITSs is either absent or very rare in the human population.

ITS Loci in the Horse Genome
We first performed FISH experiments on metaphase spreads of horse primary fibroblasts using the previously described telomeric repeat oligonucleotide [60] as a probe ( Figure 1). As expected, all chromosome ends were labelled. No strong signals were observed in nonterminal positions, indicating that, like in the human genome, het-ITS are not present in the horse genome. Only some faint intrachromosomal signals were detected (arrows in Figure 1). Based on our previous work [24,29], we conclude that these faint signals correspond to short-ITSs. However, as demonstrated previously, the FISH technique is not sensitive enough to efficiently detect short sequences. A human spread hybridized with the same telomeric probe, showing a similar pattern of faint interstitial signals corresponding to short ITSs, is shown in a previous publication [13].
To obtain a comprehensive catalogue of horse short-ITSs we carried out a BLAST search in the reference genome sequence of Equus caballus (EquCab3.0) [61] that was obtained by the assembly of the genomic sequence of the thoroughbred mare Twilight, the same individual used to obtain the previous horse genome assembly EquCab2.0 [52]. We used the parameters described above for searching human ITSs, i.e., the presence of at least four TTAGGG units with less than one mismatch per unit. Using this procedure, we identified 140 loci. As described in the following paragraph, two additional loci were found in the trace database, bringing the total number of ITSs to 142. In Supplementary Table S3 these loci are listed together with their length and number of mismatches. It is possible that additional ITS loci, represented by homozygous empty alleles in Twilight, may be present in the horse population. To obtain a comprehensive catalogue of horse short-ITSs we carried out a BLAST search in the reference genome sequence of Equus caballus (EquCab3.0) [61] that was obtained by the assembly of the genomic sequence of the thoroughbred mare Twilight, the same individual used to obtain the previous horse genome assembly EquCab2.0 [52]. We used the parameters described above for searching human ITSs, i.e., the presence of at least four TTAGGG units with less than one mismatch per unit. Using this procedure, we identified 140 loci. As described in the following paragraph, two additional loci were found in the trace database, bringing the total number of ITSs to 142. In Supplementary Table S3 these loci are listed together with their length and number of mismatches. It is possible that additional ITS loci, represented by homozygous empty alleles in Twilight, may be present in the horse population.

Search of ITS-Less Alleles in Twilight
To identify horse ITS-less alleles, we carried out a search of heterozygous loci in the genome of Twilight using the two strategies summarized in Supplementary Figure S1.
Since only one allele per locus is included in the reference genome, according to the first strategy, we used the ITS loci as query to BLAST search possible corresponding empty alleles in the NCBI Trace Database (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&BLAST_SPEC=Tr aceArchive&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch) [62], which includes unassembled DNA sequences from Twilight. Using this method, we found three ITS-less alleles corresponding to chr15:23487997, chr19:32741 and chr27:21217687. Since this method would not allow us to identify heterozygous ITS loci whose empty allele was included in the assembled reference genome, according to the second strategy, we BLAST searched loci containing telomeric repeats in the horse Trace Database. Using this second procedure, we found two additional ITS loci Figure 1. Fluorescence in situ hybridization with telomeric probe on two metaphase spreads from horse primary fibroblasts. Chromosomes are labelled with DAPI. Hybridization signals (red) at chromosome ends mark telomeres. Some of the chromosomes showing faint hybridization signals corresponding to short interstitial telomeric sequences are indicated (white arrows). Magnifications of selected chromosomes are reported below each metaphase, and some of the interstitial telomeric signals are marked by arrowheads.

Search of ITS-Less Alleles in Twilight
To identify horse ITS-less alleles, we carried out a search of heterozygous loci in the genome of Twilight using the two strategies summarized in Supplementary Figure S1.
Since only one allele per locus is included in the reference genome, according to the first strategy, we used the ITS loci as query to BLAST search possible corresponding empty alleles in the NCBI Trace Database (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Nucleotides&PROGRAM= blastn&BLAST_SPEC=TraceArchive&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch) [62], which includes unassembled DNA sequences from Twilight. Using this method, we found three ITS-less alleles corresponding to chr15:23487997, chr19:32741 and chr27:21217687. Since this method would not allow us to identify heterozygous ITS loci whose empty allele was included in the assembled reference genome, according to the second strategy, we BLAST searched loci containing telomeric repeats in the horse Trace Database. Using this second procedure, we found two additional ITS loci that were not found in the assembled reference genome (chr2:13178780 and chr9:1570127). These loci are heterozygous in Twilight. Altogether, we identified 142 ITS loci, five of which are heterozygous in Twilight. In Supplementary Table  S3, horse ITS loci are listed.
In Figure 2, the sequences of the ITS-containing and of the ITS-less alleles of the five heterozygous loci are reported. At four of these loci, the insertion of telomeric repeats was accompanied by the deletion of a sequence flanking one side of the break (Figure 2a-c,e). At two loci (Figure 2a,d), the direct repetition of a sequence flanking the break was generated. In addition, at the third and fourth of these loci (Figure 2c,d), we detected two and three nucleotides in register with the inserted telomeric repeat, respectively. These nucleotides correspond to microhomology to the telomeric hexamer at the 3' end of the break site.
heterozygous loci are reported. At four of these loci, the insertion of telomeric repeats was accompanied by the deletion of a sequence flanking one side of the break (Figure 2a-c,e). At two loci (Figure 2a,d), the direct repetition of a sequence flanking the break was generated. In addition, at the third and fourth of these loci (Figure 2c,d), we detected two and three nucleotides in register with the inserted telomeric repeat, respectively. These nucleotides correspond to microhomology to the telomeric hexamer at the 3' end of the break site. We can reasonably predict that the five loci that are heterozygous in Twilight are polymorphic and that a large number of additional polymorphic loci are present in the horse population.

Polymorphism of Horse ITSs
Three of the heterozygous ITS loci identified in Twilight were analyzed by PCR in 114 horses from five breeds (Norwegian Fjord, Icelandic Pony, Quarter Horse, Andalusian and Lipizzaner), from 18 Show Jumping horses and from 20 Przewalski's horses. Genomic DNA was amplified using the primers listed in Supplementary Table S4. Table 2 shows that the frequency of the empty allele is variable among the different populations, ranging from 0.47 to 1.00. Empty alleles tend to be wellrepresented in all populations analyzed, being either more or equally frequent compared to ITS alleles. At the locus on chromosome 19, the ITS-containing allele was found, at a low frequency, only in Show Jumpers and Quarter horses, while it was absent in the other populations. While the other two ITSs seem to be absent in Przewalski's horses, the locus on chromosome 15 is polymorphic also in this species. with the ITS-containing allele (bottom) is shown. The sequence indicated as Reference corresponds to the allele found in the horse reference genome EquCab3.0, the alternative allele is indicated by the NCBI Trace ID number. Telomeric repeats in TTAGGG and CCCTAA orientation are indicated in red and orange, respectively. Flanking sequence modifications that occurred together with telomeric repeat insertions are indicated using different colors: nucleotides deleted from the flanking sequence are in blue; random nucleotides sequence insertions are in green; duplications of the insertion site are shaded in grey. Nucleotides at break sites in register with the inserted telomeric repeats are boxed.
We can reasonably predict that the five loci that are heterozygous in Twilight are polymorphic and that a large number of additional polymorphic loci are present in the horse population.

Polymorphism of Horse ITSs
Three of the heterozygous ITS loci identified in Twilight were analyzed by PCR in 114 horses from five breeds (Norwegian Fjord, Icelandic Pony, Quarter Horse, Andalusian and Lipizzaner), from 18 Show Jumping horses and from 20 Przewalski's horses. Genomic DNA was amplified using the primers listed in Supplementary Table S4. Table 2 shows that the frequency of the empty allele is variable among the different populations, ranging from 0.47 to 1.00. Empty alleles tend to be well-represented in all populations analyzed, being either more or equally frequent compared to ITS alleles. At the locus on chromosome 19, the ITS-containing allele was found, at a low frequency, only in Show Jumpers and Quarter horses, while it was absent in the other populations. While the other two ITSs seem to be absent in Przewalski's horses, the locus on chromosome 15 is polymorphic also in this species. In previous work, we observed that several human ITS loci are characterized by variable number of tandem repeats (VNTR) polymorphism [63]. To test whether this type of variability is also present at horse ITSs, we analyzed 11 ITS loci in the 18 Show Jumping horses (Table 3). This analysis includes the three loci already characterized for insertion polymorphism (Table 2). At eight loci, more than one VNTR allele was found, with the number of alleles ranging from two to six. At two loci (chr2:13178780, chr15:23487997), both insertion and VNTR polymorphism was detected. Table 3. Variable number of tandem repeats at 11 ITS loci in 18 horses.

ITS Locus
No. of Telomeric Repeats Frequency chr20:18458893

Comparison of ITS-Containing and ITS-Less Sequences: Mechanisms of Telomeric Repeat Insertion
In previous work we demonstrated that, in primates and rodents, interstitial telomeric repeats were inserted in one step in the course of evolution. A comparative analysis of the sequences flanking the telomeric repeats with the sequence of orthologous empty loci in evolutionarily related species had allowed us to demonstrate that the insertion sites often underwent the typical modifications occurring during nonhomologous end-joining [25,27]. This analysis also strongly suggested that telomerase was involved in this pathway [25,27].
To identify ITS-less ancestral loci orthologous to horse ITSs, we used 1 kb sequences containing each horse ITS as query for a BLAST search against the draft genomic sequences of donkey (Equus asinus) [64,65] and white rhinoceros (Ceratotherium simum simum) that are available at the NCBI genome database (https://www.ncbi.nlm.nih.gov/assembly/GCF_001305755.1; https://www.ncbi.nlm.nih.gov/ assembly/GCA_003033725.1; https://www.ncbi.nlm.nih.gov/assembly/GCF_000283155.1) [66][67][68]. For 46 of the 142 horse ITS loci, the telomeric repeat was conserved in the three species (Table 4). For 66 ITS loci, we found orthologous empty loci in donkey and/or rhinoceros. For 30 horse ITSs, the orthologous loci in the other two species were not detectable due to gaps in the genome assembly or to gross sequence rearrangements. Four of the five loci for which Twilight is heterozygous (marked with an asterisk in Table 4) are empty in the other two species, confirming that they were inserted recently in the horse lineage. At the fifth locus, a telomeric repeat stretch is present in the orthologous donkey locus, suggesting that lineage sorting may have occurred in the common ancestor of the horse and donkey lineages.  In Figure 3, examples of sequence comparisons between ITS-containing and their corresponding empty loci are shown. At the locus shown in Figure 3a, the insertion of telomeric repeats occurred without modification of the target sequence, whereas in Figure 3b the deletion of a short sequence from the insertion site accompanied ITS insertion. In Figure 3c, the sequence of two loci is shown. At chr28:41719878, the ITS was introduced together with an apparently random sequence, whereas at chr19:10034261, 17 nucleotides retrotranscribed from the horse telomerase RNA were inserted. In Figure 3d, a direct duplication of the target sequence is shown. In the ITS shown in Figure 3e, a deletion and a random sequence insertion occurred together with the telomeric repeat insertion. The generation of three ITS loci was accompanied by complex rearrangements that are sketched in Figure 4. The first rearrangement (Figure 4a) involved the inversion of a 286 bp sequence and the inverted duplication of two short sequences (38 and 53 bp). This rearrangement created two head-to-head stretches of the telomeric sequence. Inversions and duplications of sequences flanking the site of telomeric repeat insertion generated the ITS shown in Figure 4b. The generation of the ITS in Figure 4c   Examples of data used to describe the insertion mechanism of telomeric repeats. For each locus, the alignment of the empty ancestral sequence from donkey or white rhinoceros with the ITS sequence in the horse reference genome assembly is shown. A sketch of the mechanism responsible for telomeric repeat insertion is shown on top of each sequence alignment. Telomeric repeats in TTAGGG and CCCTAA orientation are indicated in red and orange, respectively. At empty loci, nucleotides in register with the inserted telomeric repeats are boxed. (a) Interstitial telomeric repeat insertion occurred without modification of the sequences flanking the double-strand break. The orthologous locus from donkey is empty at the insertion site. The double-strand break exposed a GGG trinucleotide in register with the inserted telomeric repeats. (b) Interstitial telomeric repeat insertion accompanied by the deletion of nucleotides from the flanking sequence (blue nucleotides, blue strip in sketch). The double-strand break exposed a GGT trinucleotide in register with the inserted telomeric repeats.
(c) Interstitial telomeric repeat insertion accompanied by the addition of a nucleotide sequence (green nucleotides, green strip in sketch). Telomeric repeat insertion was accompanied by the addition of a random nucleotide sequence, and the double-strand break exposed an AG dinucleotide in register with the inserted telomeric repeats. The ITS was inserted together with 17 bp homologous to a region of horse TERC 91 nucleotides away from the telomeric repeat template. (d) Interstitial telomeric repeat insertion at a staggered double-strand break followed by flanking sequence duplication (nucleotides shaded in grey, grey strip in sketch). The double-strand break exposed a GG dinucleotide in register with the inserted telomeric repeats. (e) Interstitial telomeric repeat insertion accompanied by a complex modification of the insertion site involving the simultaneous deletion of nucleotides and addition of a random sequence. A TA dinucleotide in register with the inserted telomeric repeats was exposed by the double-strand break.
(d) Interstitial telomeric repeat insertion at a staggered double-strand break followed by flanking sequence duplication (nucleotides shaded in grey, grey strip in sketch). The double-strand break exposed a GG dinucleotide in register with the inserted telomeric repeats. (e) Interstitial telomeric repeat insertion accompanied by a complex modification of the insertion site involving the simultaneous deletion of nucleotides and addition of a random sequence. A TA dinucleotide in register with the inserted telomeric repeats was exposed by the double-strand break.  The frequency of the different types of modifications at ITS insertion sites is reported in Table 5. In about 17% of the loci, the telomeric repeat was inserted without any sequence modification at the break site. The deletion of short sequences from the insertion site was the most frequent modification (30%), while complex rearrangements occurred at the insertion site in about 29% of the events. A relevant observation deriving from this comparative analysis was the nonrandom presence of nucleotides in register with the inserted telomeric sequence in the ancestral ITS-less loci, In the examples shown in Figure 2; Figure 3, such nucleotides are highlighted. For this analysis, we could only utilize 40 loci where the ancestral sequence flanking the ITS was not modified during telomeric repeat insertion. This sequence arrangement was observed at 31 out of the 40 informative loci (Table 6). About 78% of the ITSs were inserted at sites where 1-6 nucleotides in register with the telomeric repeats were exposed at the 3' end of the double-stranded DNA break. This value is much higher than expected by randomness (≤25%). Even more striking is the difference between observed and expected values when we consider loci with two or more nucleotides in register (Table 6).

Conservation and Genome Distribution of ITSs
To study the conservation between horse and human ITSs, only the 46 horse ITSs that are conserved in donkey and rhinoceros were analyzed (Table 4), while species-or genus-specific ITSs were not considered informative for this analysis. The analysis was carried out using BLAT to compare the horse and the human orthologous loci. For two of the horse ITSs, orthologous ITS loci were present in the list of the 229 human ITSs containing at least four telomeric repeats and less than one mismatch per unit (chr10:81331655-81331689 and chr26:32420315-32420368). For four additional horse ITSs, orthologous ITS loci were found in the human reference genome; however, since their sequence was degenerate, they were not comprised in our list (chr1:91725039-91725121, chr2:19293258-19293287, chr3:68906105-68906142 and chr15:77022476-77022502).
None of the human ITSs were contained within exons, while 31% of them were contained in introns of coding NCBI annotated genes. Given the incomplete annotation of the horse genome, this analysis could not be performed in the horse.
In Supplementary Tables S5 and S6, the distribution of ITSs on all human and horse chromosomes, respectively, is shown. In the tables, the average number of ITSs per Mb on each chromosome is also shown.

Discussion
We previously classified interstitial telomeres according to their cytogenetic position and sequence organization as heterochromatic, short, fusion and subtelomeric [13].
In previous studies, large blocks of telomeric-like repeats, corresponding to heterochromatic ITSs, could be detected by FISH in several metazoan and plant species [14][15][16][17][18][19][20]. The application of the FISH technique revealed that this type of ITS is not present in the human genome, while allowing us to detect only a limited number of short-ITSs [24]. In the present work, the same kind of analysis applied to horse metaphase spreads revealed that the general organization of interstitial telomeres in horses is similar to the one described in humans. As for the human situation, the horse short-ITSs were displayed as weak signals or remained largely undetected, due to the limited sensitivity of the FISH technique.
To compile a comprehensive list of short-ITSs in the human and horse genomes and to study their sequence organization, we analyzed the genome assemblies of the two species. It is worth mentioning that the number of ITSs that can be detected with this approach depends on the software, parameters used and coverage of the genome assembly. For instance, by BLAT search against the human genome version NCBI34/hg16, we previously found 83 ITSs composed by at least four repeats with less than one mismatch per repetition [25]. In a successive work, in which we discovered that telomeric repeat factors 1 and 2 (TRF1 and TRF2), which are the two main telomere binding proteins involved in telomere structure and function, bind to a subset of interstitial telomeric repeats, a less stringent search of ITSs was carried out using the automatic RepeatMasker annotation [69]. Following this search, we found 714 loci which included highly degenerate telomeric-like repeats. In that study we used pre-masked genome data from the software RepeatMasker, which tends to split long or degenerate repeat arrays into several shorter hits. In the present work, we used BLAST to carry out a search in the hg19/GRCh37 version of the human genome assembly, manually corrected overlapping hits and discarded degenerate repeat arrays. With this strategy, we identified 229 human ITSs containing at least four TTAGGG repeats with less than one mismatch per repeat. Using the same approach, we identified 142 short-ITSs in the horse reference genome. We have chosen to consider only stretches of at least four TTAGGG repeats with less than one mismatch per unit to avoid detection of short sequences possibly occurring in the genome by chance. A number of shorter and/or more degenerate ITSs are not included in our list. The choice of these parameters was arbitrary. In previous work, we demonstrated that, following their insertion, telomeric repeats undergo mutation during evolution; therefore, "young ITSs" are characterized by greater sequence conservation compared to "old ITSs" [25]. Since we were interested in finding insertion polymorphism and in describing insertion mechanisms, we concentrated our analysis on well-conserved and not too short "young" ITSs. It is noteworthy that one ITS in the human genome, at chromosome 2q13, was derived by fusion between ancestral acrocentric chromosomes [22], while we could not find any evidence of such ITS type in the horse.
A comparative analysis between human and horse ITSs showed that six horse ITSs have been inserted in the genome of a common ancestor of Primates and Perissodactyla, more than 90 million years ago [70]. It would be interesting to test whether the conservation of the telomeric repeat during such an extended evolutionary time may be related to any function. None of the human ITSs were inserted into exons of coding genes. This result is not surprising because such mutation would have inserted stop codons in both orientations.
The distribution of human and horse ITSs along chromosomes does not seem to be related to their size but is probably the result of random insertions. The fraction of human ITSs localized within introns (31%) is compatible with their random insertion in the genome since the fraction of human genome occupied by introns has been estimated to be between 26% and 38% [71]. It will be interesting to test whether the presence of telomeric repeats within introns may affect splicing.
In previous studies, we demonstrated that short-ITSs were introduced in one step at a given time during the evolution of primate and rodent lineages [25,27]. Therefore, short-ITSs can be considered insertion sequences.
It is well-known that insertion sequences that were introduced recently during evolution can display insertion polymorphism [36][37][38][39][40][41][42]49,50]. That is to say that the insertion-containing allele is not yet fixed, and the empty ancestral allele is also present in the population. Sequences showing insertion polymorphism have been used as markers for population genetic studies in many species including humans [37,42,47,49] and, in some cases, they have been associated to gene expression regulation [43,49]. To our knowledge, insertion polymorphism at short-ITS loci has not been described so far. Given the short length of these repeated arrays, their variation cannot be detected by FISH but only by sequence analysis. Indeed, only a fraction of short ITSs can be detected by FISH as faint signals whose frequency is related to the number of repeats at each locus ( Figure 1) [24,29]. On the contrary, variation of het-ITSs was described before through FISH experiments in plants [72] and in PALA (N-(phosphonacetyl)-L-aspartate)-resistant CHO cells containing amplifications of the CAD (carbamyl-P-synthetase, aspartate transcarbamilase, dihydro-orotase) gene [60].
Are short-ITSs inserted at random sites or within specific genomic regions? To answer this question, we analyzed the GC content of the regions surrounding human and horse short-ITSs. The analysis was carried out within windows of different length: 100 bp, 1 kb and 5 kb on each side of the telomeric repeat. The values varied greatly among different loci, ranging between 12% and 75%, and the average values corresponded to 41.6% and 41.9% in horse and human, respectively (data not shown). We could conclude that there is no preferential choice for ITS insertion based on GC content. In previous studies we showed that, in primates and rodents, ITS colocalize with fragile sites [26,28,29,35]. Although we do not know which genomic or epigenetic features may be related to the fragility of these sites, this correlation strongly supported the model of ITS insertion at DNA double-strand break sites.
In this work, we searched for the presence of ITS insertion polymorphism in the human population. Surprisingly, despite the large number of individuals analyzed, no ITS-less alleles were found, suggesting that this kind of polymorphism is not present, although we cannot exclude that very rare ITS-less alleles at some loci may exist.
We have previously shown that, in the horse, insertion polymorphism is particularly frequent for numts and ERE1 transposable elements [49,50]. We wondered whether loci polymorphic for ITS insertion could be detected as well. Indeed, as opposed to what we observed in humans, we found five ITS loci heterozygous for the presence of telomeric repeats in the genome of a single horse individual: the mare Twilight, who donated her DNA for the reference genome assembly. A PCR analysis of three loci heterozygous in Twilight in six horse breeds and in Przewalski's horses confirmed that they are polymorphic. The ITS at chr15:23487997 is polymorphic both in Equus caballus and in Equus przewalskii, suggesting that the insertion of the telomeric repeat stretch pre-dates the separation of the domestic and Przewalski's horse lineages. Therefore, this ITS was inserted in the genome of the common ancestor of the two horse lineages more than 0.5 million years ago [51]. For the ITSs at chr2:13178780 and at chr19:32741, all analyzed individuals from Przewalski's horse were homozygous for the ITS-less allele, suggesting that the insertion of telomeric repeats at these loci may have occurred in the domestic horse lineage very recently, after its separation from the Przewalski's horse lineage. Alternatively, since the population of modern Przewalski's horses derives from a few individuals [73], the absence of the ITS may be due to genetic drift.
Our results underline a striking difference between the human and horse genomes in terms of ITS insertion polymorphism. In humans, such polymorphism is either absent or very rare, while our data strongly suggest that it is extremely frequent in the horse. As mentioned above, in the horse, insertion polymorphism is also very frequent for ERE1 retrotransposons and numts. All together, these findings provide further evidence to the notion that the horse genome is in a stage of rapid evolution. In line with this hypothesis is our discovery of an evolutionary new centromere, totally devoid of satellite tandem repeats, on horse chromosome 11 [52,58]. Therefore, different molecular mechanisms, such as transposition, DNA double-strand break repair and centromere repositioning contribute to the great plasticity of the horse genome in the current evolutionary stage.
We previously described polymorphism of human ITS loci due to variable number of tandem repeats [63]. VNTR polymorphism is also present at horse ITSs, indicating that this peculiar type of microsatellite can be unstable and that, similarly to microsatellites with shorter units, they may be useful polymorphic markers for linkage analysis and parentage testing.

Mechanisms of ITS Insertion in the Horse Genome
Several ITSs were inserted in the horse genome within target sequences that are well-conserved in the orthologous position of the donkey or rhinoceros genome. Therefore, similarly to primates and rodents, ITS insertions have also occurred in one step in equids [25,27]. In the present work, the comparison between ITSs and ITS-less ancestral orthologous loci allowed us to demonstrate that the insertion sites underwent modifications that are typical of the nonhomologous end-joining pathway, supporting our previous hypothesis that they are generated in the course of evolution during the repair of DNA double-strand breaks [25,27]. Deletions of short sequences are the most frequent modifications occurring at the break site during the insertion of ITSs, but random sequence addition also occurred. At a few horse ITS loci, direct duplications of target sequences occurred that are likely resulting from the repair of staggered double-strand DNA breaks. Deletions of sequences flanking the break site were indeed the most frequent modifications observed at junctions produced by the repair of double-strand breaks (DSB) induced in experimental systems, while additions and duplications were also observed [74][75][76]. During DSB repair, sequence modifications of the broken ends seem to be often necessary to provide the correct substrate for the final ligation reaction and are operated by specific enzymes such as the Mre11, Exo1 and Artemis nucleases, polynucleotide kinases and template-independent DNA polymerases [77]. Interestingly, the insertion of telomeric repeats in the horse genome was frequently accompanied by complex modifications of the target sequence involving combinations of deletions, additions, inversions and duplications. Such complex rearrangements were not observed in our previous analysis of rodents and primates, further confirming the great plasticity of the horse genome. In one ITS (chr19:10034261, Figure 3c), the telomeric repeat stretch was inserted together with a sequence retrotranscribed from a region of the telomerase RNA component (TERC) far away from the telomeric template. Our previous observation of 14 mouse ITS loci with a similar sequence arrangement, called TERC-ITS [25], provided a strong indication that the telomerase enzyme may be involved in the generation of interstitial telomeres. Having also found a TERC-ITS in the horse genome corroborates this interpretation. Further evidence supporting the involvement of telomerase in ITS insertion is the observation that, in a highly significant number of ITS loci, nucleotides in register with the telomeric repeat sequence were exposed at the break site that occurred in the ancestral sequence. In this scenario, ITS insertion represents one of the noncanonical and controversial roles of telomerase that have been recently proposed [78]. An alternative mechanism that may account for the generation of some ITSs relies on the introduction of retrotranscribed telomeric RNA into DNA double-strand break sites. These two proposed pathways may be activated in different conditions.
To test whether ITSs can be introduced at DNA double-strand break sites in somatic cells in culture, we previously set up an experimental system based on the induction of site-specific breaks by the I-SceI endonuclease [75]. We analyzed about 350,000 junctions generated by the repair of these breaks but never observed the insertion of a telomeric repeat stretch, suggesting that, in this system, this event is very rare or does not occur at all. In a successive work, Onozawa and colleagues [76] transfected total RNA into cultured human cancer cells in which double-strand breaks were induced at I-SceI sites and showed that sequences retrotranscribed from the RNA could be introduced at the break site. At four of these insertions they found telomeric repeats and suggested that these sequences may have been retrotranscribed from telomerase RNA. We can now suggest that the telomeric repeats observed in this experimental system may have been retrotranscribed from TERRA, the family of RNA molecules transcribed from telomeres. It is tempting to postulate that also in vivo some ITSs may have been generated in the germ-line by a DNA repair pathway involving the insertion of DNA fragments retrotranscribed from TERRA molecules.

Search of ITS in the Human and Horse Genome Sequence
To identify human ITSs, the sequence (TTAGGG) 4 was used as query for a BLAST search against the genome reference sequence hg19/GRCh37.p13 (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_ TYPE=BlastSearch&BLAST_SPEC=OGP__9606__9558&LINK_LOC=blasthome) [79]. The search was performed using the "blastn" algorithm and the standard setup. The automatic adjustment of search parameters for short sequences was disabled. The BLAST search produced 3689 hits. Hits mapping on patches and unplaced sequences were removed, leaving 2970 hits. Further manipulations of the hit list were carried out using tools available on the Galaxy platform (https://usegalaxy.org/) [80,81]. To reconstruct the full sequence of ITS loci, hits with overlapping coordinates were merged into single loci using the "Merge the overlapping intervals of a dataset 1.0.0" tool.
Manual analysis of hits showed that BLAST splits long or degenerate ITSs into several shorter loci, causing an overestimation of the number of ITSs. To overcome this problem, we merged these hits into single loci using the function "Cluster the intervals of a dataset 1.0.0" followed by "Merge the overlapping intervals of a dataset 1.0.0", leaving 555 hits. We manually checked each locus of the list to remove false positives (telomeres or GC-rich stretches), leaving 458 loci. Finally, we selected sequences composed by at least four telomeric repeats and no more than 1.0 mismatch per unit, leaving 229 short human ITSs.
To identify horse ITSs, we applied the same search protocol to the horse reference genome sequence (NCBI horse genome sequence EquCab3.0). The BLAST search produced 10,328 hits. Hits positioned on unplaced chromosomes were removed, leaving a total of 7651. Removal of overlapping hits, merging of split hits into single loci and manual check left 306 ITSs. Finally, we selected sequences composed by at least four telomeric repeats and no more than 1.0 mismatch per unit, leaving 140 short ITSs.

In Silico Search of ITS Insertion Polymorphism in the Human Population
In order to identify empty alleles in the human population, we checked the 229 human ITSs in the 2504 genome sequences that were produced for the 1000 Genome Project. Empty alleles were searched using the UCSC Genome Browser and track "1000 Genomes Phase 3 Integrated Variant Calls: SNVs, Indels, SVs" (https://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=720970681_ qqmhEowWab8OoZ9mluPovptBdLXW&c=chrX&g=tgpPhase3) [82].

PCR Amplification of Four Human-Specific ITS Loci in Individuals from Different Countries
Human genomic DNA samples (50-100 ng) were previously used in Semino and colleagues [83]. PCR reactions were performed in a 25 µL final volume with 20 pmol of each primer, 0.2 mM dNTP, 1X Green Buffer (Promega Italia, Milano, Italy), 0.5 units of GoTaq DNA polymerase (Promega Italia, Milano, Italy) and water. After a denaturation step at 95 • C for 2 min, the following amplification cycle was performed 35 times: 95 • C for 40 s, annealing at the appropriate temperature for 40 s, 72 • C for 30 s. Final extension was carried out at 72 • C for 5 min. PCR products were checked by electrophoresis in 1-2% agarose gel. PCR primers are listed in Supplementary Table S2.

Identification of Empty ITS Loci in the Horse Reference Genome
To identify loci that are heterozygous in the reference genome, we screened the Horse Whole Genome Shotgun sequences in the NCBI Trace Database, which includes unassembled DNA sequences from Twilight. We downloaded 2 kb sequences containing the horse ITS loci from UCSC Genome Browser (https://genome. ucsc.edu/cgi-bin/hgGateway) [84], and then we manually removed the telomeric repeats. The "ITS-less" sequences were used as queries for a BLAST [62] using the "blastn" algorithm and the standard setup. The automatic adjustment of search parameters for short sequences was disabled.
To identify heterozygous ITS loci whose empty allele was included in the assembled reference genome, we used the sequence (TTAGGG) 4 [62]. Trace sequences were downloaded and used as query for a BLAST search against the horse genome reference sequence (NCBI horse genome sequence EquCab3.0) using the "blastn" algorithm.

PCR Amplification in Horse Populations
Genomic DNAs from 18 Show Jumping horses were prepared from peripheral blood samples of individuals that, according to their pedigree chart, do not share common ancestors up to the third generation. The genomic DNA samples were previously used in another study [49], thus sampling was not required for this work.
DNA samples from Quarter horses, Andalusian horses, Norwegian Fjords, Icelandic ponies and E. przewalskii were provided by Professor Cecilia Penedo (UC Davis, Davis, CA, USA). Professor Ernest Bailey (Gluck Equine Research Center, University of Kentucky, Lexington, KY, USA) provided DNA samples from Andalusian horses and Icelandic ponies. Lipizzaner DNA samples were described in [85].
PCR reactions were carried out as described above. For each locus, primer pairs were designed on the sequences flanking the telomeric repeat. PCR primers are listed in Supplementary Table S4.

Cell Culture and Fluorescence In Situ Hybridization
Horse primary fibroblasts were previously isolated and established from skin samples of slaughtered animals under sterile conditions [53,88].
The telomeric probe is a mixture of 1-20 kb long synthetic (TTAGGG)n fragments that was previously prepared in our laboratory [24,60] and labelled by nick translation with Cy3-dUTP (Enzo Life Sciences, Farmingdale, NY, USA). Hybridization to metaphase spreads and post-hybridization washes were carried out in low-stringency conditions as previously described [89]. Chromosomes were counterstained with 0.2 µg/mL DAPI and mounted with DAKO mounting medium. Digital images of fluorescence signals were acquired with a fluorescence microscope (Zeiss Axioplan) equipped with a cooled CCD camera (Photometrics). Pseudocoloring and merging of images were performed using the IPLab 3.5.5 Imaging Software (Scanalytics inc., Fairfax, VA, USA). To acquire images of metaphase spreads, a 63× objective was used. For the images shown in Figure 1A-J, 2× enlargements of portions of the spreads were obtained using Adobe Photoshop CS6.

Conclusions
The human and horse genomes showed a striking difference in terms of ITS insertion polymorphism: in humans, such polymorphism is either absent or very rare, while it is extremely frequent in the horse. These observations support the hypothesis that the horse genome is in a stage of rapid evolution.
Through sequence comparison between horse ITSs and their corresponding empty loci we analyzed the molecular mechanisms of their insertion during evolution. The results allowed us to describe several types of rearrangements deriving from the processing of DNA ends that occurred together with telomeric repeat insertion, providing compelling evidence to the conclusion that short-ITSs are generated by a DNA double-strand break repair pathway.

Conflicts of Interest:
The authors declare no conflict of interest.