Plastid Genomes of the Early Vascular Plant Genus Selaginella Have Unusual Direct Repeat Structures and Drastically Reduced Gene Numbers

The early vascular plants in the genus Selaginella, which is the sole genus of the Selaginellaceae family, have an important place in evolutionary history, along with ferns, as such plants are valuable resources for deciphering plant evolution. In this study, we sequenced and assembled the plastid genome (plastome) sequences of two Selaginella tamariscina individuals, as well as Selaginella stauntoniana and Selaginella involvens. Unlike the inverted repeat (IR) structures typically found in plant plastomes, Selaginella species had direct repeat (DR) structures, which were confirmed by Oxford Nanopore long-read sequence assembly. Comparative analyses of 19 lycophytes, including two Huperzia and one Isoetes species, revealed unique phylogenetic relationships between Selaginella species and related lycophytes, reflected by structural rearrangements involving two rounds of large inversions that resulted in dynamic changes between IR and DR blocks in the plastome sequence. Furthermore, we present other uncommon characteristics, including a small genome size, drastic reductions in gene and intron numbers, a high GC content, and extensive RNA editing. Although the 16 Selaginella species examined may not fully represent the genus, our findings suggest that Selaginella plastomes have undergone unique evolutionary events yielding genomic features unparalleled in other lycophytes, ferns, or seed plants.


Introduction
Chloroplasts, representing the most typical form of plastids, are semiautonomous cellular organelles found in photosynthetic plants and algae that contain their own genomes. Plastid genomes (plastomes) are typically 120-160 kb long, with a quadripartite architecture comprising one long single-copy (LSC) region and a short single-copy (SSC) region separated by two inverted repeats (IR A and IR B ) [1]. Plastomes contain approximately 120 genes, in which most encoding proteins function in photosynthesis, protein synthesis, and DNA replication [2,3]. Although the gene order and genome architecture have been broadly preserved across taxa, plastomes have undergone remarkable genome reduction and rearrangements over the course of plant evolution. Classic examples include the almost complete loss of IRs in conifers [4,5], IR expansion and contraction in some monilophyte ferns [6], several inversions and the loss of IR regions in some legumes [7] and ferns [8], ing (WGS) using both the Illumina sequencing platform and the Oxford Nanopore longread platform and assembled their complete plastomes. We performed a genus-wide comparison of genomic features, including GC contents, structural changes in the genome, and gene losses. This analysis uncovered remarkably dynamic features of the plastomes of plants of the Selaginella genus in the Division Lycopodiophyta.

Selaginella Plastomes Contain Reduced LSCs and Unusual DR
In the current study, we sequenced, assembled, and annotated the plastomes of three Selaginella species, including S. tamariscina, S. stauntoniana, and S. involvens (Figure 1). We analyzed these assembled plastome sequences, along with the sequences of 13 other species in the Selaginella genus and three species from other orders (Isoetales and Lycopodiales) in the Lycopodiopsida, which were obtained from NCBI ( Table 1). The plastome sizes of Selaginella species ranged from 110,411 to 147,148 bp, with an average of 132,571 ± 11,612, while the plastomes of non-Selaginella lycophytes ranged from 145,303 to 154,373 bp. Overall, Selaginella species had reduced LSC regions, with some species having LSCs that were almost half the size of those of non-Selaginella lycophytes. Most Selaginella species had longer SSCs than LSCs, except for the three species of Selaginella lepidophylla, Selaginella hainanensis, and S. uncinata (Table 1). In the current study, we sequenced the plastomes of three Selaginella species-Selaginella tamariscina, Selaginella stauntoniana, and Selaginella involvens-via whole-genome sequencing (WGS) using both the Illumina sequencing platform and the Oxford Nanopore long-read platform and assembled their complete plastomes. We performed a genus-wide comparison of genomic features, including GC contents, structural changes in the genome, and gene losses. This analysis uncovered remarkably dynamic features of the plastomes of plants of the Selaginella genus in the Division Lycopodiophyta.

Selaginella Plastomes Contain Reduced LSCs and Unusual DR
In the current study, we sequenced, assembled, and annotated the plastomes of three Selaginella species, including S. tamariscina, S. stauntoniana, and S. involvens (Figure 1). We analyzed these assembled plastome sequences, along with the sequences of 13 other species in the Selaginella genus and three species from other orders (Isoetales and Lycopodiales) in the Lycopodiopsida, which were obtained from NCBI ( Table 1). The plastome sizes of Selaginella species ranged from 110,411 to 147,148 bp, with an average of 132,571 ± 11,612, while the plastomes of non-Selaginella lycophytes ranged from 145,303 to 154,373 bp. Overall, Selaginella species had reduced LSC regions, with some species having LSCs that were almost half the size of those of non-Selaginella lycophytes. Most Selaginella species had longer SSCs than LSCs, except for the three species of Selaginella lepidophylla, Selaginella hainanensis, and S. uncinata (Table 1).   Most Selaginella species shared a unique plastome structure consisting of a set of direct repeats (DRs) instead of the inverted repeats (IRs) found in most plastomes. This DR structure was confirmed by PCR amplification using primer combinations designed based on the junction sites for the hypothetical DR and IR structures (Figure 2a). The primer combinations only differed in the sequence of the first repeat block (the inverted form of DR B in Figure 2a). PCR amplification and gel electrophoresis showed that only primer combinations for the direct repeat structure were amplified, suggesting that the direct repeat structure was present in the plastome (Figure 2b). We confirmed the unusual DR structure through the assembly of the S. tamariscina plastome using Oxford Nanopore Sequencing Technology and an assembly pipeline we established. The plastome was thereby assembled into two contigs of 61,029 and 71,944 bp in size (Figure 2c). These contigs covered the junction sites between the single-copy regions and the DR regions without inversions, confirming the direct repeat structure of the S. tamariscina plastome. Most Selaginella species shared a unique plastome structure consisting of a set of direct repeats (DRs) instead of the inverted repeats (IRs) found in most plastomes. This DR structure was confirmed by PCR amplification using primer combinations designed based on the junction sites for the hypothetical DR and IR structures (Figure 2a). The primer combinations only differed in the sequence of the first repeat block (the inverted form of DRB in Figure 2a). PCR amplification and gel electrophoresis showed that only primer combinations for the direct repeat structure were amplified, suggesting that the direct repeat structure was present in the plastome (Figure 2b). We confirmed the unusual DR structure through the assembly of the S. tamariscina plastome using Oxford Nanopore Sequencing Technology and an assembly pipeline we established. The plastome was thereby assembled into two contigs of 61,029 and 71,944 bp in size (Figure 2c). These contigs covered the junction sites between the single-copy regions and the DR regions without inversions, confirming the direct repeat structure of the S. tamariscina plastome.   Table A1). The primers were used in various combinations to amplify hypothetical direct repeat structures and indirect repeat structures. (b) 1% agarose gel electrophoresis of the PCR products. (c) Confirmation of the direct repeat structure in assembled contigs spanning single-copy regions and the repeat region using Oxford Nanopore Sequencing technology.

Selaginella Plastomes Contain Fewer Genes than Other Lycophyte Plastomes
The number of plastome genes differed significantly among lycophyte species, pointing to the frequent occurrence of gene loss events. In particular, the plastomes of Selaginella species generally had fewer protein-coding and transfer RNA (tRNA) genes than those of non-Selaginella species (Table 1). Within the genus, S. hainanensis had the highest number of plastome genes (103 genes), whereas S. tamariscina had the fewest (75 genes). Huperzia and Isoetes species contained almost all plastid genes. All 19 lycophyte species analyzed carried two copies of the four ribosomal RNA (rRNA) genes located in the repeat regions. Figure 3 shows gene polymorphisms (presence/absence/pseudogenes) among the plastomes of the 19 lycophyte species. Isoetes and Huperzia species contained most plastid genes, except for a few that were lost or pseudogenized. In Selaginella species, many genes were absent or pseudogenized, specifically tRNA genes, ribosomal protein genes, and ndh genes. While non-Selaginella species had an average of 29.7 tRNA genes, Selaginella species averaged 11.6 tRNA genes, i.e., less than half the number in non-Selaginella species. The S. stauntoniana and S. tamariscina plastomes each contained only six tRNA genes. There were also many losses of protein-coding genes, such as the loss or pseudogenization of most ndh genes in S. lepidophylla, Selaginella lyallii, Selaginella indica, S. stauntoniana, S. tamariscina, Selaginella vardei, and Selaginella bisulcata, as well as losses of many ribosomal subunit genes throughout the genus. Other genes lost in Selaginella species included genes related to lipid biosynthesis (accD), translation initiation (infA), and other miscellaneous functions (ycf4, ycf66, cemA, and matK).

Phylogenetic Analysis Shows Two Main Lineages with Dynamic Structural Variations
Phylogenetic analysis using the BEAST software separated the 19 lycophyte species into three major groups, including one group for non-Selaginella species and two lineages within the Selaginella genus, all divided by two main divergence events ( Figure 3). The first group (labeled 1 in Figure 3) consists of H. lucidula, H. serrata, and I. flaccida, which contain IRs. The Selaginella species were divided into two lineages. One lineage (labeled 2 in Figure 3) was divided into two subgroups, with one group containing S. lyallii, Selaginella kraussiana, and Selaginella remotifolia and the other containing S. indica, S. vardei, and S. lepidophylla. S. lepidophylla contains IRs, while the five other species contain DRs. Another lineage (labeled 2 in Figure 3) contains 10 Selaginella species without apparent subgrouping. Among these 10 species, eight contain DRs, whereas S. hainanensis and S. uncinata contain IRs.
As a result of the plastome structure comparison of all 19 species, species within the same lineages were shown to share similar structures ( Figure 4). In the first group, comprising non-Selaginella species, the plastomes had typical plastome structures, containing IRs. Selaginella species exhibited a different structure by a block inversion spanning from trnF in the LSC to trnN at the junction between IR B and the SSC. This inversion caused part of the LSC to be repositioned in the SSC region, resulting in the shortening of the LSC and the expansion of the SSC seen in Selaginella species. The block that took part in this inversion event contained one of the repeat regions, IR B , which ended up in the same orientation as the unaffected repeat region, thus creating a set of DRs in Selaginella species. The second inversion event occurred in both lineages independently (red diamonds in Figure 3). Because DR B was involved in these second inversions, DR B was converted back to IR B in species that underwent the second inversion event in each lineage, meaning S. lepidophylla in lineage 2 and S. hainanensis and S. uncinata in lineage 2 ( Figure 3). The coding sequence (CDS) of 45 shared genes were used in the phylogenetic analysis with BEAST software. The calibration node was set to 375 mya for the split of Isoetes and Selaginella species. Left, phylogenetic tree. The gray box labeled 1 highlights non-Selaginella species. Lineages 2 and 2 in the Selaginella genus are highlighted with yellow and green boxes, respectively. The red diamond labeled 1 represents the first inversion event, which differentiates non-Selaginella species from Selaginella species. Red diamonds labeled 2 and 2 represent the second inversion events, which occurred independently in each lineage and converted the direct repeat structures back into inverted repeats. Blue bars on the branch of the phylogenetic tree indicate height 95% highest posterior density (HPD) and numbers near the nodes represent median values. Right, corresponding gene contents for each species. Blue boxes indicate genes that are present, gray boxes indicate pseudogenes, and white boxes indicate missing genes. event contained one of the repeat regions, IRB, which ended up in the same orientation as the unaffected repeat region, thus creating a set of DRs in Selaginella species. The second inversion event occurred in both lineages independently (red diamonds in Figure 3). Because DRB was involved in these second inversions, DRB was converted back to IRB in species that underwent the second inversion event in each lineage, meaning S. lepidophylla in lineage 2 and S. hainanensis and S. uncinata in lineage 2′ ( Figure 3).

Plastome Diversity among Five S. tamariscina Collections Reflects the Geographical Diversity
We explored the intraspecific diversity within five S. tamariscina individuals collected from different regions: Two from China and three from Korea. The plastome size of S. tamariscina ranged from 126,365 to 126,700 bp. We compared the five plastome sequences by aligning them in a pairwise fashion. We examined the number of single-nucleotide polymorphisms (SNPs) and insertions/deletions (InDels) to analyze the intraspecies diversity in the S. tamariscina collections. The collections from regions within Korea showed relatively little variation, harboring up to 1 SNP and 15 InDels. However, a comparison of SNP and InDel counts in collections from Korea vs. China revealed much more variation. Plastomes of Korean collections showed approximately 1218 SNPs when compared to those of two Chinese collections: China2018 and China2019. Furthermore, between the two Chinese collections [33,35], 1246 SNPs and 401 InDels were detected ( Table 2). Among the variations, insertions and deletions were located within two genes related to chlorophyll biosynthesis (chlB and chlN) that induced frameshifts resulting in premature stop codons. While China2019 retained all three chl genes, the other four collections contained the remains of chlB and chlN as pseudogenes that are likely nonfunctional. A BEAST phylogenetic tree of five S. tamariscina individuals with S. stauntoniana, which is the closest relative, as the outgroup showed that SNU2014, SNU2018, Korea2020, and China2018 diverged from each other around 90 thousand years ago, but diverged from China2019 around 900 thousand years ago ( Figure A1).

Selaginella Plastomes Exhibit Many Intron Losses and GC Bias
We identified 19 genes that contain introns (13 protein-coding genes and 6 tRNA genes) in the plastomes of the 19 lycophytes (Table 3). Of the protein-coding genes, two-clpP and ycf3-contained two introns and the other 17 contained a single intron between two exons. Huperzia and Isoetes species retained introns in most of these genes. However, the numbers of introns in the Selaginella plastomes was highly reduced due to either the deletion of intron-containing genes or the absence of introns in specific Selaginella species. For instance, rps12, ycf66, and rps16 were not present in Selaginella species, and tRNA genes containing introns in other lycophytes were either absent or intron-less in Selaginella. The clpP gene in Huperzia and Isoetes species contained two introns: clpP-1 and clpP-2. clpP-1 was lost in species in lineage 2, but was retained in most species in lineage 2 , except for S. stauntoniana and S. tamariscina. ycf3 also contained two introns: ycf3-1 and ycf3-2. While ycf3-1 was present in all lycophyte species analyzed, ycf3-2 was not present in the species of Selaginella lineage 2. Analysis of the GC contents of the plastomes revealed that the Selaginella species had considerably higher GC ratios than non-Selaginella species ( Table 1). The GC contents of Selaginella species ranged from 50.75% for S. lyallii to 56.49% for S. remotifolia, whereas those of non-Selaginella species were significantly lower (ranging from 36.25% to 37.94%). There was no correlation between the orientation of the repeats and the GC content. Overall, Selaginella species lost more intron sequences compared to non-Selaginella lycophytes, and the intron pattern was shared in each lineage.

RNA Editing Is Commonly Found in Selaginella Plastomes
The annotation of the three plastomes was hindered by the frequent absence of authentic ATG start codons for protein-coding genes, as many genes contained ACG nucleotides instead of ATGs at their start sites. Stop codons were also missing in some genes. After a final manual curation of the gene annotations, we determined that 76.8%, 77.2%, and 53.3% of protein-coding genes in S. tamariscina, S. stauntoniana, and S. involvens, respectively, started with ACG instead of ATG.
To validate our hypothesis about RNA editing in these sequences, we conducted a transcriptome analysis of the S. tamariscina plastome. We mapped raw RNA sequencing data from S. tamariscina onto the complete plastome sequence of the same species to determine whether the potential RNA editing sites were indeed edited in the mRNA sequences. We calculated the ratios of reads aligned to the start and stop codons in the unedited (C nucleotide) vs. edited (T nucleotide) forms using CLC find variations (ver. 4.3.0). In the S. tamariscina plastome sequence, 43 of the 56 genes contained potential editing sites for start codons, and four genes contained potential editing sites for stop codons ( Figure 5). The coexistence of reads aligned to the same start and stop codons with either the edited or unedited codon indicated that these genes had undergone C-to-U RNA editing. Forty-two of the 43 potential editing sites were confirmed to have undergone C-to-U RNA editing. In contrast, there was a lower rate of potential RNA editing sites in stop codons, but three out of the four potentially edited sites were indeed edited. For start codons, most of the genes (except atpE, chlL, psbN, and rpoB) had a higher ratio of RNA-edited reads with the T nucleotide (switched from the U nucleotide during RNA sequencing) than unedited reads with the C nucleotide.       Overall, Selaginella species had a higher rate of abnormal start codons (ranging from 23.33% of the genes for S. lyallii to 77.19% in S. stauntoniana), while the Huperzia species had very low percentages (<10%), and I. flaccida was on the lower end of the range for Selaginella species (Figure 6). In regard to abnormal stop codons, Selaginella sanguinolenta had the highest percentage (41.79%), and the levels for most other Selaginella species were similar; the exceptions were S. lyallii, S. kraussiana, S. lepidophylla, S. stauntoniana, and S. tamariscina, with percentages < 10%. Some codons (marked as 'others') did not fall into either category, as they lacked authentic start or stop codons, but did not contain codons that were potential targets for RNA editing (Figure 6). A survey of start and stop codons in all 19 lycophyte plastid genes revealed a generally higher frequency of potential editing sites in start codons compared to stop codons ( Figure 6). The percentages of potential RNA editing sites in start codons were 29.27%, 8.14%, and 4.71% for I. flaccida, H. lucidula, and H. serrata, respectively, and among the Selaginella, they ranged from 23.33% for S. lyallii to 77.19% for S. stauntoniana. Within the genus, S. stauntoniana and S. tamariscina had the highest frequency of potential RNA editing sites (77.19% and 76.79%, respectively). In stop codons, the proportions of potential RNA editing sites ranged from 4.71% to 5.81% for H. serrata and H. lucidula, respectively, while those for Selaginella species ranged from 0% for S. kraussiana to 41.79% for S. sanguinolenta ( Figure 6).

Unique Features of Selaginella Plastomes
Overall, our results support previous findings that the plastome sequences of Selaginella species were smaller than those of non-Selaginella and typical land plants [34]. As of February 2019, 2364 complete plastome sequences of land plants were registered in NCBI, with an average size of 151,167 ± 34,672 bp. In contrast, the average size of the 16 Selaginella plastomes we assessed was 132,571 ± 11,612 bp. Although the canonical quadripartite plastid genomic structure was retained in these Selaginella species, directional change occurred in most Selaginella species with SSC expansion and LSC contraction. The relative length of the SSC as a proportion of the total plastome ranged from 24.96% (S. moellendorffii) to 42.65% (S. involvens), averaging 35.73%. These values are higher than those in non-Selaginella species (12.73-18.72%) ( Table 1), as well as those in eight selected monilophyte ferns (ranging from 11.7% in Psilotum nudum to 15.57% in Adiantum capillus-veneris, averaging 13.84%) [6]. The expansion of the SSC region in Selaginella species contrasts with the SSC contraction in Schizaeceae ferns, in which the SSC was reduced by as much as 2255 bp to contain only two genes in Schizaea elegans [9]. In the Geraniaceae family, the SSC has contracted in the genera Viviania, Hypseocharis, and Pelargonium, but expanded in the genera Melianthus, Francoa, and California [41]. The overall statistics revealed unique features of the Selaginella genus.

One Common and Two Independent Inversions Caused the Appearance of DR and IR Structures during Selaginella Species Evolution
Canonical plastomes are arranged in a quadripartite structure composed of an LSC and SSC region separated by two inverted repeats [37]. This is true for the plastomes of Isoetes and Huperzia. However, the Selaginella genus contains both inverted and direct repeats. A large inversion occurred just after the divergence of Selaginella from other lycophyte genera, which gave rise to DR B in the Selaginella plastomes. DRs have also been reported in S. tamariscina [33], S. vardei, and S. indica [36]. Zhang et al. compared I. flaccida with S. vardei, proposing that this inversion might have occurred between 142.5 and 281.5 mya during the Late Triassic [36], which is earlier than the 64.24 mya that was estimated in this study. Nevertheless, this inversion resulted in the distinctive DRs found in the plastomes of the Selaginella genus, except for the three species of S. lepidophylla, S. hainanensis, and S. uncinata. Further inversions involving DR B occurred independently in S. lepidophylla and in S. hainanensis and S. uncinata, in both cases restoring IR B and lengthened LSCs. S. lepidophylla split from its sister species S. indica and S. vardei approximately 25.64 mya. Therefore, the inversion in S. lepidophylla must not be older than 25.64 million years old. S. hainanensis and S. uncinata diverged from each other approximately 1.03 mya, and from their sister species S. bisulcata and S. pennata approximately 4.51 mya. Therefore, this inversion must have occurred between 4.51 and 1.03 mya. The grouping of Selaginella species is generally in agreement with the reported phylogeny of Selaginellaceae [20,21]. However, the estimated divergence times of Selaginella species are relatively younger than those from previous reports [35]. These differences could be due to the fact that divergence times estimated from previous studies were obtained by utilizing a few universal genes, such as rbcL [20], leading to possible underestimation or overestimation. Nonetheless, the inversion events were estimated to have occurred around 246 and 23 mya [35], which is similar to the time frame our results.
IRs are an interesting feature of circular plastomes. Although one missing IR has been reported in eudicots in Fabaceae [2,42] and in some species of Geraniaceae [41], directional changes rarely occur. Other than in Selaginella species, only one instance of DRs has been reported, in the red alga Porphyra purpurea [43]. IR and DR regions contain ribosomal RNA genes and a few tRNA genes, in duplicates. Cyanobacteria and other eubacteria have one copy. It is possible that in the ancestral plastid, these genes were present in duplicates in the same orientation, but were reversed in the plastids of extant species from algae to seed plants [44]. Aside from the high demand of ribosomal RNAs for efficient translation, the IR serves to stabilize the plastome, enabling mutations to be repaired by homologous recombination between repeats through homologous sequence synapsis [37]. The synapsis of two inverted repeats can lead to the formation of a dumbbell-shaped plastid structure, as confirmed in the plastids of the common bean [37]. Alternatively, pairing between DR sequences might impose structural constraints, rather than stabilizing the circular genome. Further studies are needed to confirm the circular plastome structures of Selaginella species.

Gene and Intron Losses
Along with dynamic structural rearrangement events, Selaginella species showed evidence of severe gene losses that resulted in their overall small plastomes. The most noticeable losses involved ndh genes, encoding for NADH dehydrogenase that function in electron transfer during photosynthesis. ndh genes were lost or pseudogenized in several Selaginella species. The ndh genes are also absent or pseudogenized in some species of green algae and almost all gymnosperms [39]. Rhulman et al. demonstrated that the loss of plastid ndh genes is compensated for by the nucleus-encoded ndh genes in gymnosperms [45], but there are no nuclear ndh genes in S. tamariscina [33]. Zhang et al. noted that Selaginella species adapted to dry environments have lost their ndh genes, suggesting that this gene loss might be related to their adaptation to these habitats [35]. The loss of ndh genes was recent, as revealed by the pseudogenization of all ndh genes in S. bisulcata, whereas its sister species S. pennata retained the complete set: These two species only diverged 3.61 mya (Figure 3). Moreover, homologous genes to the missing ribosomal protein genes in S. tamariscina were found in the nuclear genome [33]. Therefore, homologous genes in the nuclear genome may have replaced the functions of some of the genes missing from the plastomes of Selaginella species, which could also be true for other genes.
Sixty-one codons encode 20 amino acids, but only 37 plastid tRNAs have been recognized [39]. Selaginella might be the genus with the most reduced number of tRNA genes in eukaryotes, as we determined in the current study. To compensate for this apparent lack of tRNA genes, Wolf et al. have proposed that post-transcriptional editing alters anticodons to code for different tRNAs [46]. However, there is no evidence for RNA editing in tRNAs from S. uncinata [32]. The importation of tRNAs from the nucleus has been proposed for S. uncinata [30] and the non-photosynthetic parasitic angiosperm Epifagus virginiana [47]. Alternatively, the superwobbling theory has been put forward to explain the insufficient number of plastid tRNAs [48]. This "two out of three" mechanism could allow the reading of all four nucleotides in the third codon position so that a single tRNA gene could read the corresponding codons for one four-codon family [48,49]. We also support this super-wobbling theory due to the highly insufficient number of tRNAs in Selaginella plastomes, given the lack of clear evidence for nucleus-derived tRNA species in the plastid.
Introns have been attracting increasing attention due to the various functions that they might have exerted during eukaryotic evolution [50]. The introns in plastid genes are self-splicing, with most being group II introns [51], except for the intron in trnL-UAA [52]. Nineteen plastome introns have been recognized, including 13 in protein-coding genes and 6 in tRNA genes in plastomes from green algae to seed plants [39]. Most of these introns are present in Isoetes and Huperzia species, as can be found in monilophyte ferns [6], but their numbers have been reduced by approximately half in Selaginella species. The deletion of intron-containing genes appears to account for the reduced number of introns in the Selaginella plastomes. For instance, rps12, ycf66, rps16, and many tRNA genes containing introns were not present in the Selaginella plastomes. Moreover, we observed a Selaginella genus-specific absence of clpP gene introns and the second intron of the ycf3 gene.

Intraspecies Diversity
There was a wide range of intraspecies diversity within S. tamariscina individuals collected in different regions of Korea and China. Unlike the little variation detected among Korean collections, Chinese collections were very different at the nucleotide level ( Table 2). Considering that the rate of intraspecific variation is lower in the plastome than in nuclear DNA due to the typical uniparental mode of inheritance [53], the variation detected within S. tamariscina individuals was very high, such as a Chinese collection-China 2019-which showed a divergence time of approximately 0.9 MYA with other S. tamariscina collections in China and Korea ( Figure A1). Intraspecific variation may reflect the adaptation of plants to changing environmental conditions [54,55], perhaps explaining the high level of intraspecific diversity within the S. tamariscina collections from different regions. The morphological classification of these small primitive plants may be difficult. Therefore, with the availability of additional characteristics derived from morphological keys and molecular data, species nomenclature could become more specific.

A High GC Content and Abundant RNA Editing
Unlike nuclear genomes, organellar genomes are AT biased, with an average GC content of 36.45 ± 2.81% and 36.87 ± 8.23% detected in 11,542 mitochondrial genomes and plastomes, respectively [39]. Therefore, the GC contents of >50% observed in Selaginella plastomes are higher than those of other eukaryotic organellar genomes. AT-mutation pressure or AT-biased gene conversion (or both) is thought to have driven most plastomes to become AT biased [56,57]. Based on the nucleotide compositions of the four-fold degenerate sites in protein-coding and noncoding regions in the S. uncinata and S. moellendorffii plastomes, Smith proposed that unbiased mutation/gene conversion equilibrated the nucleotide composition to~50% GC contents in Selaginella plastomes. Alternatively, he suggested that RNA editing has influenced the GC contents in these plastomes, as numerous GC-rich codons were changed into AT-rich codons through RNA editing [31].
Selaginella plastomes are subject to extensive RNA editing. C-to-U RNA editing is a post-transcriptional modification mechanism that grants gene diversity. There are 30-50 in plant plastomes, but S. uncinata contains 3415 RNA editing sites [32]. According to our results, there are numerous potential RNA editing sites in Selaginella plastomes. The frequency of RNA editing at the start codons of plastid genes was >50% in some Selaginella species. Furthermore, S. tamariscina RNA sequencing reads confirmed these editing sites. This high number of editing sites coincides with the exceptionally high number of pentatricopeptide repeat (PPR) protein gene families (>800) in the S. moellendorffii nuclear genome, representing proteins essential for plastid RNA editing [17,58]. Hecht et al. identified 2139 C-to-U RNA editing sites in the mitochondrial genome of S. moellendorffii [59]. Grewe et al. detected 1782 RNA editing sites in the mitochondrial genome and identified U-to-C editing in Isoetes engelmannii [60], indicating that the organelle genomes of lycophytes contain many RNA editing sites. If all potential sites were checked in S. tamariscina plastomes, the number of total events observed could be even greater than that in S. uncinata [32].
The complete plastome sequences of 13 Selaginella species and three non-Selaginella lycophytes were obtained from NCBI ( Table 1)

DNA Extraction, Sequencing, Plastome Assembly, and Annotation
Leaves were collected from the plants and ground in liquid nitrogen with a mortar and pestle. A modified cetyltrimethylammonium bromide (CTAB) method [61] was used to extract total genomic DNA from S. tamariscina (SNU2014), S. stauntoniana, and S. involvens. The DNA quality and concentration were assessed by gel agarose electrophoresis and UVspectrophotometry (NanoDrop ND-1000, Thermo Fisher Scientific, Waltham, MA, USA). The extracted DNA was sequenced on the Illumina NextSeq platform, generating 7.68, 2.2, and 2.0 Gb of raw data from S. tamariscina (SNU2014), S. stauntoniana, and S. involvens, respectively.
The plastomes of the three species were assembled using the de novo assembly of Low-Coverage Whole genome sequence (dnaLCW) approach with the CLC genome assembler program (ver. 4.6 beta, CLC Inc., Aarhus, Denmark) [62,63]. In short, raw paired-end reads were trimmed with an offset value of 33, and trimmed reads were assembled with overlap distances set to 150 to 500 bp and the window size set to 32. Initial contigs were extracted from the assembled reads using MUMmer [64], with the reference sequence of a related species as a guide.
For the second S. tamariscina individual (SNU2018), total genomic DNA was extracted using an Exgene Plant SV midi kit (GeneAll Biotechnology, Seoul, Korea). The DNA quality and concentration were assessed via agarose gel electrophoresis and UVspectrophotometry. The extracted DNA was sequenced on a MinION sequencer from Oxford Nanopore Technologies. Library preparation was performed with a Ligation Sequencing Kit (SQK-LSK108) following the manufacturer's instructions. One flow cell was run for 48 h. In total, 10 Gb of raw sequencing reads was filtered by removing adaptor sequences using Porechop (v0.2.0, https://github.com/rrwick/Porechop) and corrected with Canu using default parameters [65]. The Canu-corrected reads were assembled using SMARTdenovo (https://github.com/ruanjue/smartdenovo). The complete plastome sequence assembled using Illumina sequencing data was used as a reference to extract plastome-related contigs assembled by Nanopore sequencing using BLASTN.
The plastomes assembled in the previous steps were annotated using GeSeq (https: //chlorobox.mpimp-golm.mpg.de/geseq.html) with complete plastome references of other species in the Selaginella genus and non-Selaginella species registered in NCBI. The annotated sequences were manually curated for details such as start and stop codons and correct coding frames using Artemis [66] For further validation of annotation, complete CDS sequences of reference sequences were collected and converted into amino acid sequences, and searched in the three assembled sequences using TBLASTN (https://blast.ncbi.nlm.nih. gov/Blast.cgi?PROGRAM=tblastn&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome) for sequence homology. Furthermore, the aligned sequence was searched in the Conserved domain search provided by NCBI for complete gene structures. Genes that have sequence homology detected by TBLASTN but have incomplete gene structures due to the lack of sequences of internal stop codons were assigned as pseudogenes. Genes that were not searched at all through TBLASTN were marked as missing genes. This validation step for the pseudogene/missing gene assignment was conducted for other already published sequences used in this study.

Confirmation of Direct Repeat Structures by PCR Amplification
To confirm the direct repeat structure of S. tamariscina, primers were designed based on the junction regions between the repeat regions and the two single copies (LSC and SSC). Primers were used in different combinations to amplify the junction sites of the plastome. Hypothetical direct repeat junctions were amplified by primer combinations 1/2, 3/4, 5/2, and 3/6, while hypothetical inverted repeat junctions were amplified by different primer combinations of 1/3 and 2/4 for the first repeat block (IR B ) (Table A1).
PCR amplification was performed in a 25 µL volume containing 1U Taq polymerase, 2.5 µL of 10× reaction buffer, 0.2 mM dNTPs (Inclone Biotech, Yongin, Korea), 10 ng genomic DNA, and 10 pmol of each primer. The PCR was carried out in a thermocycler with the following parameters: 5 min at 94 • C; 28 cycles of 20 s at 94 • C, 20 s at 62 • C, and 20 s at 72 • C; and 7 min at 72 • C for the final extension step. The PCR products were visualized by electrophoresis using a 1% agarose gel with a 100 bp ladder.

Gene Contents and Phylogenetic Analysis
CDS, tRNA, and rRNA sequences for all 19 species of interest were extracted using FeatureExtract 1.2 L (http://www.cbs.dtu.dk/services/FeatureExtract/). Existing genes and pseudogenes were marked. Among the extracted CDS, 45 genes that were common to all 19 species were combined into one long fasta sequence for each species. These sequences were aligned using MAFFT (https://mafft.cbrc.jp/alignment/server/). The aligned sequences were trimmed using Gblocks [67] so that conserved blocks from the aligned sequences could be extracted for more accurate phylogenetic analysis. The aligned file was used for phylogenetic analysis performed using BEAST version 2.5.2. The general Time-Reversible (GTR) substitution model and a gamma category count of 4 were used for the Gamma Site Model. A random local clock and Yule tree prior were applied. A normal distribution root calibration of 375 mya representing the split between Isoetes and Selaginella families reported by Wikstrom and Kenrick [68] was used. The chain length was set to 1 million generations with a sampling frequency of 1000. The BEAST analysis was repeated four times, and the results were checked for the effective sample size (ESS) via Tracer v1.7.1. Results from all four repetitions were combined with LogCombiner implemented in the BEAST software. Moreover, a maximum clade credibility tree was drawn utilizing TreeAnnotator with a burnin percentage of 15. The final tree was visualized with FigTree v1.4.4, marking the height 95% HPD with blue node bars and node median values near the node bars.

Analysis of Intron Variation and RNA Editing Sites
Total cellular RNAs were prepared from hydrated and 50% dehydrated leaf tissues with the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) by following the manufacturer's instructions. The RNA quality and concentration were assessed using the Agilent Bioanalyzer (Agilent Technology, Santa Clara, CA, USA) and Nanodrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) with parameters RIN ≥ 7, 28S:18S > 1, and OD260/280 ≥ 2.
Genes in the plastomes of all 19 species were surveyed to check for the presence of intron regions. Potential C-to-U RNA editing sites were surveyed in the start and stop codons of the CDS from the plastomes of ten lycophyte species. Start codons with an ACG sequence instead of ATG and stop codons with CAA, CAG, and CGA sequences instead of TAA, TAG, and TGA were counted as potential RNA editing sites. Potential RNA editing sites were marked for all ten species and analyzed.
To determine whether the potential RNA editing sites had gone through the editing process, raw RNA sequencing data from S. tamariscina_2 were mapped onto the reference genome produced from the assembly of S. tamariscina (SNU2014) sequencing data. The start and stop codon positions of the CDS were obtained, and CLC find variations software (ver. 4.3.0) was used to examine the distribution of unedited mRNA reads with the C nucleotide and edited mRNA reads with the T nucleotide (previously U in RNA that changed to T during the process of RNA sequencing).

Data Availability Statement:
The data presented in this study are openly available in the National Center for Biotechnology Information under the accession numbers presented in Table 1.

Conflicts of Interest:
The authors declare no conflict of interest.     Table A1. Primer sequences used in combination to confirm the direct repeat structure of the Selaginella tamariscina plastid genome.