The First Complete Mitochondrial Genome of Lachninae Species and Comparative Genomics Provide New Insights into the Evolution of Gene Rearrangement and the Repeat Region

Simple Summary The Lachninae is a unique aphid lineage with some particular characteristics. The first complete mitochondrial genome from this subfamily is reported here, with Stomaphis sinisalicis as a representative. The gene arrangement of the new mitogenome differs greatly from the normal gene order of other aphids and insects. The existence of a tandem repeat region in the Lachninae mitogenome may suggest it is probably an ancient feature of aphid mitogenomes. Phylogenetic analysis indicates a basal position of Lachninae within Aphididae. Abstract Complete mitochondrial genomes are valuable resources for different research fields such as genomics, molecular evolution and phylogenetics. The subfamily Lachninae represents one of the most ancient evolutionary lineages of aphids. To date, however, no complete Lachninae mitogenome is available in public databases. Here we report the Stomaphis sinisalicis mitogenome, representing the first complete mitogenome of Lachninae. The S. sinisalicis mitogenome is consist of 13 protein-coding genes (PCGs), two rRNA genes (rRNAs), 22 tRNA genes (tRNAs), a control region and a large tandem repeat region. Strikingly, the mitogenome exhibits a novel, highly rearranged gene order between trnE and nad1 compared with that of other aphids. The presence of repeat region in the basal Lachninae may further indicate it is probably an ancestral feature of aphid mitogenomes. Collectively, this study provides new insights on mitogenome evolution and valuable data for future comparative studies across different insect lineages.


Introduction
Complete mitochondrial genomes of insects are highly conserved circular molecules, which are valuable resources for investigating the molecular evolution, phylogenetic relationships and gene arrangement among different lineages, due to their rare recombination, rapid evolutionary rate and evolutionary conservation in organization [1][2][3]. The double-stranded molecules are generally composed of 13 protein coding genes (PCGs), two ribosomal genes (rRNAs), 22 tRNA genes (tRNAs) and a large non-coding control region (also called AT-rich region). In some taxa, such as some hemipteran aphid species, however, there is an additional unique non-coding tandem repeat region comprising different repeat units and copy numbers [4][5][6][7]. The gene order within insect mitogenomes is basically pretty conservative and identical to that of proposed Insecta ancestral gene arrangement pattern, with some exceptions of gene rearrangements involving tRNAs, rRNAs or PCGs [1,[8][9][10][11][12]. Hemipteroids are one of the most distinct examples having a highly rearrangement frequency among insects, including species from orders Phthiraptera, Psocoptera, and Thysanoptera [1,13]. High rearrangement rates are also found in some Hemipterans including planthoppers (Delphacida), bugs (Aradidae, Enicocephali-dae, Largidae, Pyrrhocoridae), whiteflies (Aleyrodidae), and more recently, scale insects (Coccoidea) [1,[14][15][16][17][18]. While very few gene rearrangements are found in other hemipteran taxa such as aphids, those gene arrangement patterns remain to be confirmed by investigating more taxa. Mitogenome arrangement data have been supposed to provide useful information for investigating evolutionary relationships and insect evolutionary history because gene rearrangement of typical 37 genes in mitogenome is extremely uncommon and unlikely to replicate through convergent evolution [2,3]. With the growing availability of complete mitogenomes of insects, comprehensive comparative studies across different lineages should promote our understanding for their molecular evolution and phylogenetic relationships. However, mitogenomes of some insect taxa are still underrepresented.
Aphids are an important and diverse group within Hemiptera that includes many plant pests with complex life cycles and a large number of host plants. In the Aphididae, more than 5000 aphid species from 24 subfamilies have been described [19]. Of the thousands of extant aphid species, complete mitochondrial genomes are known for only 43 species from six subfamilies (Table S1). For some major aphid lineages, not even a single complete mitochondrial genome has been reported. The limited mitogenomic resources and small sampling has hampered widely comparative studies for the mitogenome sequences, gene arrangements and molecular evolution among aphids. Moreover, an extensive comparative analysis among aphids will favor gaining insights into the evolution of some unique characteristics for aphid mitogenomes, such as the presence of a large tandem repeat region. The unique tandem repeat region in the aphid mitogenomes is generally locating between the trnE and trnF but differ largely in length and copy number of repeat units [5,7]. This repeat region was first reported in several species from the subfamily Aphidinae and was once considered to be a lineage-specific feature which occurred from independent evolutionary events [20,21], whereas several species from other subfamilies have subsequently been found to contain a repeat region as well. So far, 18 aphid species from four subfamilies (Aphidinae, Eriosomatinae, Greenideinae and Hormaphidinae) have been reported to contain this tandem repeat region [4,[22][23][24]. It was proposed that the repeat region in aphid mitogenomes either originated from the recent common ancestor of aphids followed by subsequent loss events throughout aphid diversification or independently evolved among different aphid lineages [5,6,25]. The former hypothesis has been increasingly accepted recently as more species from other subfamilies have been found to contain this region, and high sequence similarity of this region among species has been discovered [25]. However, there has been no data from a more ancient aphid lineage with repeat region to support this hypothesis, which will help to better understand the origin and evolution of repeat region in aphid mitogenomes.
The Lachninae represent an important and unusual aphid lineage with complicated host associations, encompassing distinct species feeding on both angiosperm and gymnosperm hosts [26]. This subfamily has been supposed to be one of the most ancient clades in the Aphididae according to several recent phylogenetic studies [27,28]. Up to now, however, no complete mitochondrial genome from Lachninae has been deposited in the GenBank (http://www.ncbi.nlm.nih.gov; accessed on October 19, 2020). Stomaphis sinisalicis is one representative species of the aphid genus Stomaphis, which represent a unique evolutionary lineage in the aphid subfamily Lachninae and are known for their large body size and particularly long rostrum (see the aphid in Figure 1) [29,30]. S. sinisalicis is mainly distributed in northern China with temperate climate, including Beijing, Hebei, Liaoning, Ningxia and Shandong. This species inhabits and feeds on the base of tree trunk or crevices of thick bark of Salix spp. and Populus sp., and generally establishes close mutualistic relationship with ants [29]. In this study, we generated and analyzed the first complete mitochondrial genome for S. sinisalicis. The structural organization, nucleotide composition, codon usage and gene order arrangement were analyzed and compared against that of well characterized complete mitogenomes of other aphids to investigate the unique features and mitogenome evolution of aphids. This complete mitogenome reported here contributes to aphid higher phylogeny reconstruction based on mitogenome sequences and promotes future comparative mitogenome studies, which should help understand the mitogenome evolution and gene arrangement pattern across different aphid lineages.

Sample Preparation and DNA Extraction
The S. sinisalicis samples used in this study were collected on Salix babylonica from Beijing Olympic Forest Park, China (40°1′30.8″ N, 116°23′48.4″ E), in June 2018. All collected samples and voucher specimen (20180615-12) were preserved in 95% ethanol under In this study, we generated and analyzed the first complete mitochondrial genome for S. sinisalicis. The structural organization, nucleotide composition, codon usage and gene order arrangement were analyzed and compared against that of well characterized complete mitogenomes of other aphids to investigate the unique features and mitogenome evolution of aphids. This complete mitogenome reported here contributes to aphid higher phylogeny reconstruction based on mitogenome sequences and promotes future comparative mitogenome studies, which should help understand the mitogenome evolution and gene arrangement pattern across different aphid lineages.

Sample Preparation and DNA Extraction
The S. sinisalicis samples used in this study were collected on Salix babylonica from Beijing Olympic Forest Park, China ( identified based on previous morphological descriptions of this species by the corresponding author.

Mitogenome Sequencing and Assembly
The next-generation sequencing was performed at the Biomarker Technologies Co., Ltd. (Beijing, China). For library construction, genomic DNA concentration was detected using ExKubit dsDNA HS test kit (ExCell Biotech Co., Ltd., Shanghai, China) with a Qubit fluorimeter (Invitrogen, Carlsbad, CA, USA). The qualified high-quality genomic DNA was used for next-generation sequencing subsequently. For library construction, TrueLib DNA Library Rapid Prep Kit for Illumina (ExCell Biotech Co., Ltd.) was used with 250 ng DNA as template, then sequencing was conducted using the NovaSeq 6000 System (Illumina, San Diego, CA, USA) (2 × 150 bp paired-end reads) with an insert size of 350 bp. Finally, approximately 10.4 Gb raw data was obtained and further filtered to remove adapters and low-quality reads, including reads with an N base content reached 10% of the read length and those with low quality bases (quality value ≤ 10) over 50% of the read length. Mitogenome was assembled using a total of 7.1 Gb clean data (clean reads: 23,773,262) by NovoPlasty v. 2.7.1 [31] with default parameters, using a COI sequence from S. sinisalicis as the seed sequence. The final genome assembly showed a high coverage and depth ( Figure S1).

Annotation and Analysis of S. sinisalicis Mitogenome
The assembled mitogenome was first annotated using MITOS2 web server (http: //mitos2.bioinf.uni-leipzig.de/index.py) [32] with the invertebrate mitochondrial genetic code. The boundaries of 13 PCGs and two rRNA genes were further determined by alignment with the known corresponding sequences of closely related aphid species. All tRNA genes in S. sinisalicis were identified using the MITOS2 web server and program ARWEN v. 1.2 [33], and their clover-leaf secondary structure were further drawn by RNAplot integrated in the ViennaRNA Package [34]. Tandem repeat region was detected by using the Tandem Repeats Finder Web server (http://tandem.bu.edu/trf/trf.html) [35]. The AT-rich control region was identified via boundaries of adjacent genes. Circular maps of the assembled mitogenome of S. sinisalicis was finally drawn with CGView Server [36]. MEGA7 software [37] was used to analyze the nucleotide composition and relative synonymous codon usage (RSCU). To better understand the base usage bias of the mitogenome, we also calculated the nucleotide skew values using the formulae AT-skew = (A − T)/(A + T) and GC-skew = (G − C)/(G + C) [38].

Phylogenetic Analysis
For phylogenetic analysis, all 13 PCG sequences from S. sinisalicis were collated to that of 24 available mitogenomes of representative species from different subfamilies of Aphididae and that of Adelges tsugae (Adelgidae) as ourgroup ( Table 1). All PCG sequences (excluding the stop codons) of these 26 aphid species was aligned individually using codonbased multiple alignments by MAFFT v7.149 software [39] with default settings. After removing ambiguously aligned sites, alignments of 13 aligned PCGs were concatenated into a single alignment dataset using PhyloSuite v1.1.15 [40]. The best-fit substitution model was determined using the ModelFinder [41] integrated in IQ-TREE [42], and GTR+F+R4 was inferred as the optimal model according to the Bayesian Information Criterion (BIC) and was then used for subsequent phylogenetic analysis. Maximum-likelihood (ML) phylogenetic tree was reconstructed with the A. tsugae as the outgroup using IQ-TREE with 1000 ultrafast bootstrapping replicates. We also repeated the phylogenetic reconstruction with the dataset partitioned by each PCG, which yielded phylogenetic tree with identical topological relationships ( Figure S2). Note: Only aphid species with complete mitogenomes and annotation information are listed; "NA" indicates sequences that are direct submission to GenBank.

Comparative Mitogenomic Analysis
The base composition, gene length, and genomic organization of all available aphid complete mitogenomes were collected and calculated for comparative genomic analysis. Furthermore, the online Interactive Tree of Life (iTOL) v 4 tool [61] was used for gene order visualization of known aphid mitogenomes, with the gene arrangement pattern mapping onto an inferred phylogenetic tree. We also detected the putative gene rearrangement events occurred in mitogenomes of each aphid species using the web-based program Common interval Rearrangement Explorer (CREx) [62] with arrangement of Drosophila yakuba as the inferred ancestral gene order [12,63]. And the genome rearrangement scenarios including transpositions, reverse transpositions, reversals, and tandem duplication random loss (TDRL) were reconstructed between D. yakuba and each aphid.

Mitogenome Features of S. sinisalicis and Other Aphids
Known complete mitogenomes of aphids range from 15,049 (Appendiseta robiniae) to 17,954 bp (Aphis glycines) ( Table 1). The mitochondrial genome of S. sinisalicis was 17,109 bp in length (the circular map is shown in Figure 1), longer than most other known aphid mitogenomes ( Table 1). As in the case in other insects, the S. sinisalicis mitogenome contained the typical 13 PCGs, two rRNAs and 22 tRNAs, in which 14 tRNAs and 9 PCGs were transcribed on the majority strand (J-strand), while the others (eight tRNAs, two rRNAs and four PCGs) were located on the minority strand (N-strand) ( Table 2). Besides, the S. sinisalicis mitogenome also contained a typical AT-rich control region locating between rrnS and trnI, and a large non-coding tandem repeat region. For aphid mitogenomes, the sizes of control region and repeat region vary greatly among different species, whereas the PCGs, tRNAs and two rRNAs show little variation in length (Figure 2a), suggesting that the mitogenome size of different aphids is largely determined by the size of non-coding regions.  The S. sinisalicis mitogenome contained an obviously higher A+T content (84.8%) and a lower G+C content (15.2%). These values are comparable to that of other aphid species, which have an average A+T content of 84.0%, varying from 81.7% (A. robiniae) to 85.7% (Greenidea ficicola) (Table S1). This A+T bias in S. sinisalicis mitogenome was reflected in all components of its genome. The A+T content of control region and repeat region were 91.5% and 85.9%, and the PCGs, tRNAs and rRNAs had an average A+T content of 83.5%, 86.4% and 85.1%, respectively (Table 3). In aphids, the A+T content varies greatly for the repeat region and control region, but a small variation for PCGs, tRNAs and two rRNAs (Figure 2b), indicating that higher A+T contents in the control region and repeat region probably raise the total A+T content of aphid mitogenomes. Eleven gene overlaps with a total of 43 bp were present in S. sinisalicis mitogenome, ranging from 1 to 20 bp, with the longest overlap (20 bp) occurring between atp6 and atp8 ( Table 2). The overlap between atp8-atp6 is also found in mitogenomes of other arthropods [7,64]. Moreover, there were also 19 intergenic spacers between genes, with the longest intergenic spacer (50 bp) existing between nad5 and trnH. Longer intergenic regions were also found in the junctions between trnE-trnT (34 bp) and trnQ-trnM (35 bp) ( Table 2).
The S. sinisalicis mitogenome contained an obviously higher A+T content (84.8%) and a lower G+C content (15.2%). These values are comparable to that of other aphid species, which have an average A+T content of 84.0%, varying from 81.7% (A. robiniae) to 85.7% (Greenidea ficicola) (Table S1). This A+T bias in S. sinisalicis mitogenome was reflected in all components of its genome. The A+T content of control region and repeat region were 91.5% and 85.9%, and the PCGs, tRNAs and rRNAs had an average A+T content of 83.5%, 86.4% and 85.1%, respectively (Table 3). In aphids, the A+T content varies greatly for the repeat region and control region, but a small variation for PCGs, tRNAs and two rRNAs (Figure 2b), indicating that higher A+T contents in the control region and repeat region probably raise the total A+T content of aphid mitogenomes.
The nucleotide skew analysis showed that the whole S. sinisalicis mitogenome exhibit a positive AT-skew (0.044) and a negative GC-skew (−0.305). A similar pattern has been found in other aphid mitogenomes [4], with the AT-skew ranging from 0.054 in Paracolopha morrisoni to 0.102 in Hamamelistes spinosus and a GC-skew varying from −0.317 in Hormaphis betulae to −0.227 in Cervaphis quercus. These results indicate that S. sinisalicis mitogenome has a weakest A skew and a stronger C skew in compared to other known aphid mitogenomes (Figure 3a,b). This base composition bias has been suggested to be associated with replication and transcription of mitochondrial genome [65].  The nucleotide skew analysis showed that the whole S. sinisalicis mitogenome exhibit a positive AT-skew (0.044) and a negative GC-skew (−0.305). A similar pattern has been found in other aphid mitogenomes [4], with the AT-skew ranging from 0.054 in Paracolopha morrisoni to 0.102 in Hamamelistes spinosus and a GC-skew varying from −0.317 in Hormaphis betulae to −0.227 in Cervaphis quercus. These results indicate that S. sinisalicis mitogenome has a weakest A skew and a stronger C skew in compared to other known aphid mitogenomes (Figure 3a,b). This base composition bias has been suggested to be associated with replication and transcription of mitochondrial genome [65].  Table S1 for details of aphid species.

Gene Arrangement Patterns
Gene arrangement of mitogenomes has been supposed to be quite conservative among insects and identical to that of the ancestral pancrustacean mitogenomes [1,12,66]. Various gene rearrangements have been observed in some insect taxa, including major rearrangement involving the PCGs and rRNAs, as well as minor rearrangement involving only tRNAs [1]. Mitochondrial genome rearrangement is rarely reported among aphids, and most of them have a gene order resembling the ancestral insect arrangement. However, in S. sinisalicis mitogenome, the gene order between trnE and nad1 was highly rearranged, with the order of gene cluster trnF-nad5-trnH-nad4-nad4l-trnT-trnP-nad6-cob-trnS2 shifting into the trnT-nad6-cob-trnS2-trnF-nad5-trnH-nad4-nad4l-trnP (Figure 4). This gene arrangement pattern differed greatly from that of other aphid species, which has been  Table S1 for details of aphid species.

Gene Arrangement Patterns
Gene arrangement of mitogenomes has been supposed to be quite conservative among insects and identical to that of the ancestral pancrustacean mitogenomes [1,12,66]. Various gene rearrangements have been observed in some insect taxa, including major rearrangement involving the PCGs and rRNAs, as well as minor rearrangement involving only tRNAs [1]. Mitochondrial genome rearrangement is rarely reported among aphids, and most of them have a gene order resembling the ancestral insect arrangement. However, in S. sinisalicis mitogenome, the gene order between trnE and nad1 was highly rearranged, with the order of gene cluster trnF-nad5-trnH-nad4-nad4l-trnT-trnP-nad6-cob-trnS2 shifting into the trnT-nad6-cob-trnS2-trnF-nad5-trnH-nad4-nad4l-trnP (Figure 4). This gene arrangement pattern differed greatly from that of other aphid species, which has been thought highly conserved within Hemipteran insects and similar to the ancestral gene order [4,15,63].  Rearrangement of mitogenomes can generally be classified into several gene movements: transposition, inversion, and inverse transposition [66]. Several models have been proposed to explain the rearrangement of mitogenome genes, such as the tandem duplication random loss (TDRL) model [67,68], tandem duplication/non-random loss (TDNL) model [69], recombination model [70] and tRNA mispriming model [71]. According to the results of CREx analysis, the rearrangement event occurred in S. sinisalicis mitogenome can be better explained by the widely accepted tandem duplication random loss (TDRL) model [67,68]. Based on this model, the gene cluster locating trnE and nad1 in S. sinisalicis mitogenome can be separated into four gene blocks: trnF-nad5-trnH-nad4-nad4l, trnT, trnP Rearrangement of mitogenomes can generally be classified into several gene movements: transposition, inversion, and inverse transposition [66]. Several models have been proposed to explain the rearrangement of mitogenome genes, such as the tandem duplication random loss (TDRL) model [67,68], tandem duplication/non-random loss (TDNL) model [69], recombination model [70] and tRNA mispriming model [71]. According to the results of CREx analysis, the rearrangement event occurred in S. sinisalicis mitogenome can be better explained by the widely accepted tandem duplication random loss (TDRL) model [67,68]. Based on this model, the gene cluster locating trnE and nad1 in S. sinisalicis mitogenome can be separated into four gene blocks: trnF-nad5-trnH-nad4-nad4l, trnT, trnP and nad6-cob-trnS2, and the present gene arrangement pattern may be derived from tandem duplications and random losses of these gene blocks. The gene order observed in S. sinisalicis mitogenome represents a new arrangement pattern that has never been discovered in any other aphid species, which may have important implications for understanding the evolution of aphid mitogenomes. It's unclear whether this rearrangement is a mitochondrial signature for species of the subfamily Lachninae or an independent evolutionary event only for S. sinisalicis. A broader sampling of this subfamily will be needed to further investigate whether this new rearrangement pattern also occurs in other species.
Most aphids of other subfamilies possess a conserved gene order identical to that of ancestral insects. There are some exceptions with mostly involving the duplications, transpositions, reverse transpositions or reversals of some tRNA genes, such as several species from Fordini (Eriosomatinae) and Aphidinae (Figure 4). Notably, all species within the Fordini exhibit a transposition of trnQ and trnM except the Baizongia pistaciae, a basal species within Fordini. Whereas gene order of two Eriosomatini (Eriosomatinae) species is arranged in the same order as the ancestral insect. These indicate that the most recent common ancestor of Fordini aphids may have a typical arrangement of trnQ-trnM, which may be rearranged through an independent evolutionary event that occurred during subsequent species diversification of this tribe. In addition to the cases involving tRNAs, two rRNA gene reversals are found in Aphis fabae mordvilkoi from Aphidinae. Besides, a more prominent rearrangement including five tRNAs and one PCG is observed in Pseudoregma bambucicola (Hormaphidinae), with transpositions occurred within the gene cluster trnG-nad3-trnA-trnR-trnN-trnS1 resulting in a novel arrangement pattern of trnA-trnN-trnS1-trnR-trnG-nad3, while the other two Hormaphidinae species have an ancestral gene order as most other aphids, suggesting that gene rearrangement in P. bambucicola seems to occur independently.

Protein Coding Genes and Codon Usage Patterns
The overall size (excluding stop codons) of 13 PCGs of S. sinisalicis was 10,923 bp, accounting for 63.8% of the total genome (Table 3). All PCGs showed a high A+T bias, ranging from 76.3% (cox1) to 91.2% (atp8). The AT-skew (−0.156) and GC-skew (−0.067) were both negative for the whole PCGs, reflecting a bias towards nucleotides T and C than their counterparts. The third codon position presented a much higher A+T content than that of the first and second codon positions, and tended to use T and C bases ( Table 3). All protein-coding genes were initiated with the typical ATN codons, and terminated with TAA, except the cox1 and nad4 terminating with a truncated termination codon T, which is common in other metazoan mitogenomes [4,72]. These incomplete stop codons have been supposed to be completed through post-transcriptional polyadenylation [73].
The codon usage analysis showed that the most frequently used codons were UUA-Leu2 (13.4%), AUU-Ile (13.2%) and UUU-Phe (12.7%), while the codons CUG-Leu1, UCG-Ser2, CCG-Pro and GCG-Ala were not used in the mitogenome ( Figure 5, Table S2). The UUA (Leu2) also had the highest RSCU values, further indicating that UUA was the most preferred codon. The RSCU values of the PCGs revealed that there was a higher frequency in the usage of AT than that of GC in the third codon positions (Table S2).  The rrnL (1278 bp) and rrnS (777 bp) genes were located between trnL1 and trnV, trnV and the control region, respectively. There was a little variation in the size of both rRNAs among different aphid species (Figure 2a). The overall rRNAs of S. sinisalicis mitogenome showed a negative AT-skew (−0.007) and a positive GC-skew (0.340), indicating a preference for using T base over A, and G over C ( Table 3). The overall length and A+T content were rather conservative and identical to that of other aphid species.

Non-Protein Coding Regions
The AT-rich control region of S. sinisalicis mitogenome located between rrnS and trnI, spanning 919 bp in length. This is comparable to that of most other aphids, which have an average length of 867 bp (Table S1). Generally, the length of control region varies greatly with the size of embedded tandem repeats, which was not detected in the S. sinisalicis mitogenome. The control region is supposed to be involved in the initiation of replication and transcription of mitogenomes [76]. The control region in S. sinisalicis mitogenome had a higher A+T content (95.07%) compared to most aphid mitogenomes, which is varying from 81.75% (Kaburagia rhusicola rhusicola) to 95.07% (P. bambucicola). Both the AT-skew (−0.023) and GC-skew (−0.135) were negative, showing a bias towards using T and C ( Table 3).
In addition to the control region, a large non-coding repeat region (1489 bp) comprising five tandem repeat units ranging from 262 bp to 264 bp and a 171 bp partial repeat unit was also present in the S. sinisalicis mitogenome. Such special tandem repeats have been found in some aphid species from different subfamilies, and vary widely in size and number of repeat units, leading to varied lengths of repeat regions, ranging from 452 bp

Transfer and Ribosomal RNA Genes
The complete set of 22 typical tRNAs were all found in S. sinisalicis mitogenome, their secondary structures were shown in the Figure 6. The total length of tRNAs was 1478 bp, ranging from 60 bp to 75 bp in size. Both the AT-skew value (0.025) and GC-skew (0.170) were positively biased in the tRNAs, which showed a slight bias toward using A and an obvious bias toward G (Table 3). Most tRNAs exhibited the typical clover-leaf structures with two exceptions: trnS1 missing the DHU arm and trnD without the TΨC stem. Lacking of DHU arm in trnS1 is a common feature for most metazoan mitogenomes [1]. This aberrant tRNAs are supposed to sustain their function via a posttranscriptional RNA editing mechanism [74,75].
The rrnL (1278 bp) and rrnS (777 bp) genes were located between trnL1 and trnV, trnV and the control region, respectively. There was a little variation in the size of both rRNAs among different aphid species (Figure 2a). The overall rRNAs of S. sinisalicis mitogenome showed a negative AT-skew (−0.007) and a positive GC-skew (0.340), indicating a preference for using T base over A, and G over C ( Table 3). The overall length and A+T content were rather conservative and identical to that of other aphid species.
repeat region along with the gene cluster trnF-nad5-trnH-nad4-nad4l may constitute a gene block, and go through the TRDL event together. The function of this unique tandem repeat region is currently unknown. However, the existence of repeat region in mitogenome of the basal species S. sinisalicis but not in A. tsugae may suggest that it is probably an ancient feature acquired by the common ancestor of Aphididae, which along with the scatter distribution within different subfamilies further favors the hypothesis about a single origin and subsequent diversification of the repeat region in aphid mitogenomes [5,6,25].

Non-Protein Coding Regions
The AT-rich control region of S. sinisalicis mitogenome located between rrnS and trnI, spanning 919 bp in length. This is comparable to that of most other aphids, which have an average length of 867 bp (Table S1). Generally, the length of control region varies greatly with the size of embedded tandem repeats, which was not detected in the S. sinisalicis mitogenome. The control region is supposed to be involved in the initiation of replication and transcription of mitogenomes [76]. The control region in S. sinisalicis mitogenome had a higher A+T content (95.07%) compared to most aphid mitogenomes, which is varying from 81.75% (Kaburagia rhusicola rhusicola) to 95.07% (P. bambucicola). Both the AT-skew (−0.023) and GC-skew (−0.135) were negative, showing a bias towards using T and C (Table 3).
In addition to the control region, a large non-coding repeat region (1489 bp) comprising five tandem repeat units ranging from 262 bp to 264 bp and a 171 bp partial repeat unit was also present in the S. sinisalicis mitogenome. Such special tandem repeats have been found in some aphid species from different subfamilies, and vary widely in size and number of repeat units, leading to varied lengths of repeat regions, ranging from 452 bp in Nurudea yanoniella to 2322 bp in A. glycines. However, in our analysis, obvious tandem repeats were not found in some aphid mitogenomes previously reported with repeat regions (A. robiniae, A. fabae mordvilkoi, Myzus persicae, Rhopalosiphum nymphaeae and Sitobion avenae; Table S1), indicating the validity of the annotation of repeat regions in these species need to be further verified. Figure 7 presents the repeat regions of S. sinisalicis and other representative species from four aphid subfamilies. Typically, the unique non-coding tandem repeat region found in known aphid mitogenomes from other subfamilies (Hormaphidinae, Eriosomatinae, Greenideinae and Aphidinae) is located between trnE and trnF [4,[22][23][24], which has been thought an alternative origin of the mitogenome replication as in other insects [77]. However, the repeat region in S. sinisalicis mitogenome was found located between trnS2 and trnF. According to the gene rearrangement analysis above, the repeat region along with the gene cluster trnF-nad5-trnH-nad4-nad4l may constitute a gene block, and go through the TRDL event together. The function of this unique tandem repeat region is currently unknown. However, the existence of repeat region in mitogenome of the basal species S. sinisalicis but not in A. tsugae may suggest that it is probably an ancient feature acquired by the common ancestor of Aphididae, which along with the scatter distribution within different subfamilies further favors the hypothesis about a single origin and subsequent diversification of the repeat region in aphid mitogenomes [5,6,25].

Phylogenetic Analysis
The phylogenetic analysis using mitogenome data supported the monophyly of most subfamilies with high support values except the Eriosomatinae, which was found to be paraphyletic consistent with previous phylogenetic studies, with the Eriosoamtini clade clustering with the Greenideinae (Figure 8) [22,78]. The S. sinisalicis formed a separate clade locating at the basal position of Aphididae, which further confirms the previous proposed phylogenetic hypothesis [28]. This primitive taxonomic status makes this mitogenome an important data resource for future comparative studies to investigate the mitogenome evolution across different aphid lineages.

Phylogenetic Analysis
The phylogenetic analysis using mitogenome data supported the monophyly of most subfamilies with high support values except the Eriosomatinae, which was found to be paraphyletic consistent with previous phylogenetic studies, with the Eriosoamtini clade clustering with the Greenideinae (Figure 8) [22,78]. The S. sinisalicis formed a separate clade locating at the basal position of Aphididae, which further confirms the previous proposed phylogenetic hypothesis [28]. This primitive taxonomic status makes this mitogenome an important data resource for future comparative studies to investigate the mitogenome evolution across different aphid lineages. paraphyletic consistent with previous phylogenetic studies, with the Eriosoamtini clade clustering with the Greenideinae (Figure 8) [22,78]. The S. sinisalicis formed a separate clade locating at the basal position of Aphididae, which further confirms the previous proposed phylogenetic hypothesis [28]. This primitive taxonomic status makes this mitogenome an important data resource for future comparative studies to investigate the mitogenome evolution across different aphid lineages.

Conclusions
In this study, we reported the S. sinisalicis mitogenome, the first complete mitogenome from the subfamily Lachninae. The gene content, base composition and overall organization of mitogenome was analyzed in detail and compared to other aphid species. This mitogenome is similar to that of ancestral insect mitogenomes in terms of the gene content, base composition and overall organization; however, the gene order between the trnE and nad1 is highly rearranged and differs greatly with other aphids. This gene rearrangement involving tRNAs, PCGs and the repeat region occurred in S. sinisalicis mitogenome has probably evolved from the ancestral gene order due to the tandem duplication and random loss events. Comparative analysis also indicates that the presence of repeat region may be an ancestral feature of aphid mitogenomes. Besides, phylogenetic analysis supports the basal position of Lachninae within Aphididae. Overall, our study provides valuable information for investigating the molecular evolution, mitogenome arrangement patterns and phylogenetic relationships across different aphid lineages.