Complete Chloroplast Genomes of Three Medicinal Alpinia Species: Genome Organization, Comparative Analyses and Phylogenetic Relationships in Family Zingiberaceae

Alpinia katsumadai (A. katsumadai), Alpinia oxyphylla (A. oxyphylla) and Alpinia pumila (A. pumila), which belong to the family Zingiberaceae, exhibit multiple medicinal properties. The chloroplast genome of a non-model plant provides valuable information for species identification and phylogenetic analysis. Here, we sequenced three complete chloroplast genomes of A. katsumadai, A. oxyphylla sampled from Guangdong and A. pumila, and analyzed the published chloroplast genomes of Alpinia zerumbet (A. zerumbet) and A. oxyphylla sampled from Hainan to retrieve useful chloroplast molecular resources for Alpinia. The five Alpinia chloroplast genomes possessed typical quadripartite structures comprising of a large single copy (LSC, 87,248–87,667 bp), a small single copy (SSC, 15,306–18,295 bp) and a pair of inverted repeats (IR, 26,917–29,707 bp). They had similar gene contents, gene orders and GC contents, but were slightly different in the numbers of small sequence repeats (SSRs) and long repeats. Interestingly, fifteen highly divergent regions (rpl36, ycf1, rps15, rpl22, infA, psbT-psbN, accD-psaI, petD-rpoA, psaC-ndhE, ccsA-ndhD, ndhF-rpl32, rps11-rpl36, infA-rps8, psbC-psbZ, and rpl32-ccsA), which could be suitable for species identification and phylogenetic studies, were detected in the Alpinia chloroplast genomes. Comparative analyses among the five chloroplast genomes indicated that 1891 mutational events, including 304 single nucleotide polymorphisms (SNPs) and 118 insertion/deletions (indels) between A. pumila and A. katsumadai, 367 SNPs and 122 indels between A. pumila and A. oxyphylla sampled from Guangdong, 331 SNPs and 115 indels between A. pumila and A. zerumbet, 371 SNPs and 120 indels between A. pumila and A. oxyphylla sampled from Hainan, and 20 SNPs and 23 indels between the two accessions of A. oxyphylla, were accurately located. Additionally, phylogenetic relationships based on SNP matrix among 28 whole chloroplast genomes showed that Alpinia was a sister branch to Amomum in the family Zingiberaceae, and that the five Alpinia accessions were divided into three groups, one including A. pumila, another including A. zerumbet and A. katsumadai, and the other including two accessions of A. oxyphylla. In conclusion, the complete chloroplast genomes of the three medicinal Alpinia species in this study provided valuable genomic resources for further phylogeny and species identification in the family Zingiberaceae.


The Chloroplast Genome Features of Alpinia Species
All the three species of Alpinia we sequenced had a typical quadripartite structure, with a single circular molecule ranged from 161,410 bp (A. oxyphylla sampled from Guangdong) to 162,387 bp (A. katsumadai) in length. They had four junction regions: a separate LSC region of 87,261-87,667 bp, an SSC region of 15,306-16,180 bp, and a pair of IRs (IRa and IRb) each 28,964-29,707 bp ( Figure 1 and Table 1 and Table S1). The size of the A. katsumadai chloroplast genome (162,387 bp) was the largest among the three sequenced Alpinia species, with 977 bp longer than that of A. oxyphylla sampled from Guangdong and 467 bp longer than that of A. pumila. The GC content of the three species chloroplast genomes varied slightly from 36.15% to 36.17% (Table 1 and Table S1). The AT content at the third codon position (71.35%-71.37%) was higher than that at the first (55.30%-55.44%) and second (62.50%-62.59%) positions in the protein-coding genes of these three Alpinia species (Table S1). Additionally, the AT content was the highest (70. .38%) in the SSC region, the lowest (50.48%-50.79%) in the IR regions, and moderate (66.14%-66.18%) in the LSC region (Table S1). These genomic structures were consistent with most other published chloroplast genomes of family Zingiberaceae, such as two Kaempferia species [23], three Amomum species [24], Zingiber officinale [25], Stahlianthus involucratus [31], Hedychium coronarium [32] and Curcuma longa [33].
A total of 133 predicted genes, consisting of 87 protein-coding genes, 38 tRNA genes, and 8 rRNA genes, were detected in each chloroplast genome of our sequenced three species of Alpinia (Table 1,  Table 2 and Table S2). Both of the chloroplast genomes of A. zerumbet and A. oxyphylla sampled from Hainan contained 132 genes ( Table 1). As shown in Table 1, the A. zerumbet chloroplast genome had the highest GC content (36.27%), while the A. katsumadai chloroplast genome had the lowest GC content (36.15%). The length of the A. katsumadai chloroplast genome was the longest and the A. zerumbet chloroplast genome (159,773 bp) was the shortest. Interestingly, the SSC region of A. katsumadai (15,306 bp) was the shortest, whereas the SSC region of the A. zerumbet chloroplast genome (18,295 bp) was the longest ( Table 1). The complete chloroplast genome of A. oxyphylla sampled from Guangdong was 59 bp longer than that of A. oxyphylla sampled from Hainan ( Table 1). The lengths of the IR regions of the five chloroplast genomes, ranging from 26,917 to 29,707 bp (Table 1), were shorter than those of the three species of Amomum, which varied from 29,820 to 29,959 bp [24]. In addition, 86 protein-coding genes were identified in the A. zerumbet chloroplast genome, and 87 were identified in the other four accessions. Eight conserved rRNAs were identified in every chloroplast genome. The chloroplast genome of A. oxyphylla sampled from Hainan encoded 37 types of tRNAs, and the other four accessions encoded 38 (Table 1).

Codon Usage and Predicted RNA Editing Sites Analyses
All the protein-coding genes were composed of 27,427-27,669 codons in the three chloroplast genomes of Alpinia species. Of the 27,427-27,669 codons, leucine (Leu) was the most abundant amino acid, with a frequency of 10.34%-10.38%, then isoleucine (Ile) with a frequency of 8.74%-8.80%, while cysteine (Cys) was the least common one with a proportion of 1.12%-1.13% ( Figure 2 and Table S4). This phenomenon was consistent with other land plants' chloroplast genomes, such as Z. officinale [25], Ailanthus altissima [35], Lycium chinense [36], Symplocarpus renifolius [37], and Epipremnum aureum [38]. Due to the value of relative synonymous codon usage (RSCU) >1, thirty codons showed the codon usage bias in the chloroplast genes of all the three Alpinia species (Table S4). Interestingly, out of the above 30 codons, twenty-nine codons were A/T-ending codons. Conversely, the C/G-ending codons had RSCU values of less than one, indicating that they were less common in the chloroplast genes of the three Alpinia species. Stop codon usage was found to be biased toward TAA (RSCU > 1.00). Two amino acids, methionine (Met) and tryptophan (Trp), showed no codon bias both with RSCU values of 1.00 (Table S4).
A total of 56 editing sites in 22 protein-coding genes were identified in A. katsumadai, while similar numbers were found in A. pumila (54 sites) and A. oxyphylla sampled from Guangdong (55 sites) (Table S5). In the three Alpinia species chloroplast genomes we sequenced, the ndhB gene had the highest number of potential editing sites (11), followed by the accD gene (5). Similar to other species, such as Kaempferia galanga [23], Kaempferia elegans [23] and Forsythia suspense [39], the ndhB gene contained the largest number of editing sites. All these editing sites were C-to-T transitions and occurred at the first or second positions of the codons. Interestingly, most conversions at the codon positions changed from serine (S) to leucine (L) and most RNA editing sites led to amino acid changes for hydrophobic products, such as leucine, isoleucine, tryptophan, tyrosine, valine, methionine, and phenylalanine (Table S5). Similar RNA editing features have already been revealed by previous observations [23,39]. synonymous codon usage (RSCU) >1, thirty codons showed the codon usage bias in the chloroplast genes of all the three Alpinia species (Table S4). Interestingly, out of the above 30 codons, twenty-nine codons were A/T-ending codons. Conversely, the C/G-ending codons had RSCU values of less than one, indicating that they were less common in the chloroplast genes of the three Alpinia species. Stop codon usage was found to be biased toward TAA (RSCU > 1.00). Two amino acids, methionine (Met) and tryptophan (Trp), showed no codon bias both with RSCU values of 1.00 (Table S4). A total of 56 editing sites in 22 protein-coding genes were identified in A. katsumadai, while similar numbers were found in A. pumila (54 sites) and A. oxyphylla sampled from Guangdong (55 sites) (Table S5). In the three Alpinia species chloroplast genomes we sequenced, the ndhB gene had the highest number of potential editing sites (11), followed by the accD gene (5). Similar to other species, such as Kaempferia galanga [23], Kaempferia elegans [23] and Forsythia suspense [39], the ndhB gene contained the largest number of editing sites. All these editing sites were C-to-T transitions and occurred at the first or second positions of the codons. Interestingly, most conversions at the codon positions changed from serine (S) to leucine (L) and most RNA

SSRs and Long Repeats Analyses
SSRs were widely dispersed in chloroplast genomes, and have been extensively used in population genetics and molecular phylogenetic researches [40,41]. In this study, 1205 SSRs were identified in five Alpinia chloroplast genomes, including two accessions of A. oxyphylla, A. katsumadai, A. zerumbet, and A. pumila ( Figure 3 and Table S6). The most abundant were mononucleotide repeats, located in non-coding regions, and contributed to AT richness ( Figure 3). These results are consistent with most reported angiosperms [23][24][25][26][28][29][30]. The total number of SSRs was 237 in A. katsumadai, 244 in A. oxyphylla sampled from Guangdong, 247 in A. pumila, 236 in A. zerumbet, and 241 in A. oxyphylla sampled from Hainan ( Figure 3A). In the genomic structure of five chloroplast genomes, the non-coding region had the most abundant SSRs, whereas the coding region had the least SSRs ( Figure 3A). The majority of SSRs were located in the LSC regions rather than in the SSC and IR regions of the five chloroplast genomes ( Figure 3B). Mono-, di-, tri-and tetranucleotide SSRs were all detected in the five chloroplast genomes ( Figure 3C). Additionally, pentanucleotide SSRs were found in two accessions of A. oxyphylla and A. pumila, respectively. Mononucleotide repeats were the largest in a number of these SSRs, with 77.63% and 78.54% found in A. katsumadai and A. pumila, respectively ( Figure 3C). A/T repeats were the most common of mononucleotides (70.90%-76.69%), while AT/AT repeats were the majority of dinucleotide repeat sequences (92.30%-94.59%). Interestingly, our results show that AAAT/ATTT repeats were the third abundant SSR types in the five chloroplast genomes (2.96%-3.79%) ( Figure 3D).
Additionally, we detected long repeats in five chloroplast genomes using REPuter, including forward, complement, reverse, and palindromic repeats. A total of 225 unique long repeats were found from the five chloroplast genomes. In detail, there were 44 (16 forward, 26 palindrome, two reverse), 35 (10 forward, 24 palindrome, one complement), 50 (18 forward, 27 palindrome, three reverse, two complement), 46 (19 forward, 26 palindrome, one reverse) and 50 (17 forward, 29 palindrome, two reverse, two complement) long repeats in A. katsumadai, A. pumila, A. oxyphylla sampled from Guangdong, A. zerumbet and A. oxyphylla sampled from Hainan, respectively ( Figure 4A and Table S7). Interestingly, there were no complement repeats in A. katsumadai, similar to A. zerumbet. With 29 palindrome repeats, A. oxyphylla sampled from Hainan contained the highest number of palindrome repeats, while A. zerumbet contained the highest number of forward repeats at 19, and A. oxyphylla sampled from Guangdong contained three reverse repeats, the highest among the compared genomes ( Figure 4B-D). The palindrome, forward and reverse repeats with 30-60 bp were found to be the most common in the five chloroplast genomes ( Figure 4B-D). Moreover, almost all of the lengths of the reverse repeats were less than 60 bp in the five chloroplast genomes ( Figure 4D). in a number of these SSRs, with 77.63% and 78.54% found in A. katsumadai and A. pumila, respectively ( Figure  3C). A/T repeats were the most common of mononucleotides (70.90%-76.69%), while AT/AT repeats were the majority of dinucleotide repeat sequences (92.30%-94.59%). Interestingly, our results show that AAAT/ATTT repeats were the third abundant SSR types in the five chloroplast genomes (2.96%-3.79%) ( Figure 3D). repeats in A. katsumadai, similar to A. zerumbet. With 29 palindrome repeats, A. oxyphylla sampled from Hainan contained the highest number of palindrome repeats, while A. zerumbet contained the highest number of forward repeats at 19, and A. oxyphylla sampled from Guangdong contained three reverse repeats, the highest among the compared genomes ( Figure 4B-4D). The palindrome, forward and reverse repeats with 30-60 bp were found to be the most common in the five chloroplast genomes ( Figure 4B-D). Moreover, almost all of the lengths of the reverse repeats were less than 60 bp in the five chloroplast genomes ( Figure 4D).

IR Contraction and Expansion Analyses
The contractions and expansions at the borders of IR regions were common evolutionary events and may cause size variations of chloroplast genomes [13,[23][24][25][26][27][28][29][30]. We compared the IR-SC boundaries information of the five Alpinia chloroplast genomes ( Figure 5). The genes rpl22, rps19, pseudogene ycf1 (Ψycf1), ndhF, ycf1 and psbA were present at the junction of the LSC/IRa, IRa/SSC, SSC/IRb, and IRb/LSC borders. As shown in Figure 5, the rpl22-rps19 genes were located in the junctions of the LSC/IRa regions of the five chloroplast genomes. There were 47 bp between rpl22 and the LSC/IRa borders, meanwhile, 129-130 bp between the rps19 and the other LSC/IRa junctions.
The Ψycf1-ndhF genes were located at the junctions of the IRa/SSC regions for all five chloroplast genomes; IRa/SSC borders of three species (A. zerumbet, A. oxyphylla sampled from Hainan and A. katsumadai) were all situated just adjacent to the end of Ψycf1; Ψycf1 expanded into the SSC regions for 6 bp in A.oxyphylla sampled from Guangdong and 22 bp in A. pumila, respectively ( Figure 5). In the three species (A. zerumbet, A. oxyphylla sampled from Hainan and A. katsumadai), the distances between ndhF and IRa/SSC border were 251 bp, 42 bp, and 2 bp, respectively. There were 24 bp and 42 bp between ndhF and Ψycf1 border in A. pumila and A. oxyphylla sampled from Guangdong, respectively ( Figure 5).
The SSC/IRb junction was situated in the ycf1 coding region, which crossed into the IRb region in all five chloroplast genomes. However, the length of ycf1 in the IRb region varied among the five chloroplast genomes from 1048 bp to 3844 bp, which indicated the dynamic variation of the SSC/IRb junctions ( Figure 5). The rps19-psbA genes were located in the junctions of the IRb/LSC regions in five chloroplast genomes ( Figure 5). In the five chloroplast genomes, the distances between rps19 and IRb/LSC border were 130 bp, 129 bp, 130 bp, 130 bp and 130 bp, respectively. For all five chloroplast genomes, 93-109 bp distance was found between psbA gene and the IRb/LSC border ( Figure 5). Taken together, these data indicated that the contractions and expansions of the IR regions exhibited relatively stable patterns within genus Alpinia, with slight variations. sampled from Guangdong and 22 bp in A. pumila, respectively ( Figure 5). In the three species (A. zerumbet, A. oxyphylla sampled from Hainan and A. katsumadai), the distances between ndhF and IRa/SSC border were 251 bp, 42 bp, and 2 bp, respectively. There were 24 bp and 42 bp between ndhF andΨycf1 border in A. pumila and A. oxyphylla sampled from Guangdong, respectively ( Figure 5).
The SSC/IRb junction was situated in the ycf1 coding region, which crossed into the IRb region in all five chloroplast genomes. However, the length of ycf1 in the IRb region varied among the five chloroplast genomes from 1048 bp to 3844 bp, which indicated the dynamic variation of the SSC/IRb junctions ( Figure  5).
The rps19-psbA genes were located in the junctions of the IRb/LSC regions in five chloroplast genomes ( Figure 5). In the five chloroplast genomes, the distances between rps19 and IRb/LSC border were 130 bp, 129 bp, 130 bp, 130 bp and 130 bp, respectively. For all five chloroplast genomes, 93-109 bp distance was found between psbA gene and the IRb/LSC border ( Figure 5). Taken together, these data indicated that the contractions and expansions of the IR regions exhibited relatively stable patterns within genus Alpinia, with slight variations.

Interspecific Analyses of Alpinia Chloroplast Genomes
Using the A. pumila chloroplast genome sequence as the reference, we compared the SNP/indel loci of the other four chloroplast genomes in the current study. One-hundred-and-twenty-eight and 176 SNP markers were detected between A. pumila and A. katsumadai in protein-coding genes and non-coding regions, respectively (Table S8). One-hundred-and-sixty-two and 205 SNP markers were detected between A. pumila and A. oxyphylla sampled from Guangdong in protein-coding genes and non-coding regions, respectively (Table S8). One-hundred-and-forty-two and 189 SNP markers were detected between A. pumila and A. zerumbet in protein-coding genes and non-coding regions, respectively (Table S8). One-hundred-and-sixtythree and 208 SNP markers were detected between A. pumila and A. oxyphylla sampled from Hainan in

Interspecific Analyses of Alpinia Chloroplast Genomes
Using the A. pumila chloroplast genome sequence as the reference, we compared the SNP/indel loci of the other four chloroplast genomes in the current study. One-hundred-and-twenty-eight and 176 SNP markers were detected between A. pumila and A. katsumadai in protein-coding genes and non-coding regions, respectively (Table S8). One-hundred-and-sixty-two and 205 SNP markers were detected between A. pumila and A. oxyphylla sampled from Guangdong in protein-coding genes and non-coding regions, respectively (Table S8). One-hundred-and-forty-two and 189 SNP markers were detected between A. pumila and A. zerumbet in protein-coding genes and non-coding regions, respectively (Table S8). One-hundred-and-sixty-three and 208 SNP markers were detected between A. pumila and A. oxyphylla sampled from Hainan in protein-coding genes and non-coding regions, respectively (Table S8). The SNPs in the A. katsumadai chloroplast genome were significantly fewer than those in the two accessions of A. oxyphylla. SNPs were detected in 54 protein-coding genes in A. katsumadai, A. zerumbet, and two accessions of A. oxyphylla sampled from Guangdong and Hainan. Nine genes were in the SSC region, one gene was in the IR regions, and 44 genes were in the LSC region (Table 3  and Table S8). These 54 genes were divided into four categories according to their different functions in plant chloroplasts, including photosynthetic apparatus, photosynthetic metabolism, gene expression, and other genes ( Table 2). For the 162 and 163 SNP markers in the protein-coding genes of A. oxyphylla sampled from Guangdong and Hainan chloroplast genomes, respectively, 90 and 91 belonged to the synonymous type, and 72 and 72 belonged to the nonsynonymous type (Table 3 and Table S8). Synonymous and nonsynonymous SNP markers in the protein-coding genes shared very similar number in these two chloroplast genomes. There were 67 synonymous SNPs and 61 nonsynonymous SNPs in the protein-coding genes of the A. katsumadai chloroplast genome (Table 3 and Table S8). Seventy-three synonymous and 69 nonsynonymous SNP sites were detected in the chloroplast genome of A. zerumbet (Table 3 and Table S8).  2  15  6  19  3  18  6  19  IRa/b  Total  67  61  90  72  73  69  91 72 -All of the indels were classified into insertions and deletions ( Figure 8 and Table S9). Fifty-six insertions and 62 deletions were detected between A. pumila and A. katsumadai chloroplast genomes, respectively ( Figure 8A-C). Fifty-three insertions and 69 deletions were detected between A. pumila and A. oxyphylla sampled from Guangdong chloroplast genomes, respectively ( Figure 8A-C). Sixty-five insertions and 50 deletions were detected between A. pumila and A. zerumbet chloroplast genomes, respectively ( Figure 8A-C). Fifty-three insertions and 67 deletions were detected between A. pumila and A. oxyphylla sampled from Hainan chloroplast genomes, respectively ( Figure 8A-C). Eight protein-coding genes from the four Alpinia accessions contained indels ( Figure 8D). The gene rpoC2 was a hotspot for indel variation, and all the four Alpinia accessions contained two indels in this gene. Comparison with two Kaempferia species was extremely interesting. The result indicated that SNPs between the four Alpinia species were less common than those between two Kaempferia species, but had more indels than those between two Kaempferia species. There were 536 SNPs and 107 indels between K. galanga and K. elegans [23]. Moreover, the SNPs obtained from chloroplast genomes had been successfully used for phylogenetic studies in several Zingiberaceae species, such as in K. galanga and K. elegans [23], S. involucratus [31], H. coronarium [32] and C. longa [33]. Therefore, these SNPs and indels in Alpinia here would be potential genetic markers to facilitate phylogenetic analysis and species identification in the family Zingiberaceae. SNPs were located in introns and coding regions, respectively. All of the other four regions contained only one SNP. By contrast, there were more mutational events (SNPs and indels) in Eucommia ulmoides [47] and Scutellaria baicalensis [48]. There were 75 SNPs and 80 indels in two different individuals of E. ulmoides [47], 25 SNPs and 29 indels between the two S. baicalensis genotypes [48]. These 23 indels and 20 SNPs could be used for identification of different sources of A. oxyphylla.

Phylogenetic Analyses
To determine the phylogenetic relationships of five Alpinia accessions in family Zingiberaceae, two phylogenetic trees of 25 representatives from family Zingiberaceae using chloroplast SNP-based matrix, were constructed with Costus pulverulentus, Costus viridis and Canna indica as outgroups (Figure 9 and S1). Both the maximum likelihood (ML) and maximum parsimony (MP) phylogenetic trees strongly indicated that two genera Amomum and Alpinia in the Zingiberaceae clade were strongly supported as a sister monophyletic

Intraspecific Analyses of two Chloroplast Genomes of A. oxyphylla
The two chloroplast genomes from A. oxyphylla were found to show a 59-bp difference in length (Table 1). In addition to the total length difference, we obtained indels and SNPs between the two chloroplast genomes of A. oxyphylla in their entirety. Through intraspecific comparison, a total of 23 indels were identified between the two A. oxyphylla accessions (Table S10). Two, one, one, one, and one indels were located in clpP, rpoC1, rps16, rpl16 and trnG-GCC, respectively. The other 17 indels were located in 14 different regions. atpB-rbcL, trnC-GCA-petN and ndhF-rpl32 exhibited the same number indels, of two in total, respectively. There were 20 SNPs identified in the two chloroplast genomes of A. oxyphylla (Table S10). The most frequently occurring mutations were A/C substitutions (4 times), followed by G/T (three times), T/A (three times), and T/G (three times), respectively. ccsA-ndhD contained the highest number of SNPs (6), followed by rps16-trnQ-UUG and trnS-GCU-trnG-GCC, each of which showed three SNPs. Two and two SNPs were located in introns and coding regions, respectively. All of the other four regions contained only one SNP. By contrast, there were more mutational events (SNPs and indels) in Eucommia ulmoides [47] and Scutellaria baicalensis [48]. There were 75 SNPs and 80 indels in two different individuals of E. ulmoides [47], 25 SNPs and 29 indels between the two S. baicalensis genotypes [48]. These 23 indels and 20 SNPs could be used for identification of different sources of A. oxyphylla.

Phylogenetic Analyses
To determine the phylogenetic relationships of five Alpinia accessions in family Zingiberaceae, two phylogenetic trees of 25 representatives from family Zingiberaceae using chloroplast SNP-based matrix, were constructed with Costus pulverulentus, Costus viridis and Canna indica as outgroups (Figure 9 and Figure S1). Both the maximum likelihood (ML) and maximum parsimony (MP) phylogenetic trees strongly indicated that two genera Amomum and Alpinia in the Zingiberaceae clade were strongly supported as a sister monophyletic group, respectively (bootstrap values ≥ 99%). In the genus Alpinia, two accessions of A. oxyphylla clustered together with high statistical support values (bootstrap values ≥ 99%), which was subsequently sister to A. pumila; A. zerumbet and A. katsumadai clustered together (bootstrap values ≥ 99%) was another sister to A. pumila (Figure 9 and Figure S1). Both ML and MP trees confirmed the previously known phylogenetic relationships in the family Zingiberaceae based on earlier studies [12,20,[23][24][25][31][32][33][34], while unexpected relationships and positions of certain taxa were also revealed in this study. The reconfirmation of previously known relationships included (1) the monophyletic genera of Amomum, Alpinia, Kaempferia, Zingiber and Hedychium and their relationships to the rest of family Zingiberaceae, (2) the paraphyletic genus Curcuma in family Zingiberaceae, (3) relationships between two genera of Curcuma and Stahlianthus. The most surprising and unexpected findings in this study were the positions and relationships of A. zerumbet and A. oxyphylla. The two accessions of A. oxyphylla formed a group both in the two phylogenetic trees, confirming the validity of the assembled and annotated chloroplast genome of A. oxyphylla in this study. On morphological classifications, A. pumila was placed in subgenus Alpinia, A. zerumbet and A. katsumadai were placed in subgenus Catimbium, and A. oxyphylla was placed in subgenus Probolocalyx [11]. Therefore, our phylogenetic results were congruent with the morphological taxa [11]. On the other hand, several previous studies suggested that, A. oxyphylla was closely related to A. zerumbet based on chloroplast genomes [13], four groups were identified in genus Alpinia based on the ITS and matK trees [12]. We analyzed the reasons for the incongruence between chloroplast SNP-based phylogenetic analyses and chloroplast genomes, ITS and matK phylogenies in the following. Firstly, we did not have the same species samples as the ITS and matK phylogenies [12]. Secondly, we detected further phylogenetic relationships between A. zerumbet and A. oxyphylla with more chloroplast genomes data than the previous study [13]. Thirdly, with more than 230 species, the positions of Alpinia in family Zingiberaceae still remained somewhat uncertain and the future phylogenetic analyses should obtain additional chloroplast genomes of Alpinia species. The current phylogenetic analyses could supply the possibility that chloroplast genomes may be useful for phylogeny and species identification in family Zingiberaceae in the future.
suggested that, A. oxyphylla was closely related to A. zerumbet based on chloroplast genomes [13], four groups were identified in genus Alpinia based on the ITS and matK trees [12]. We analyzed the reasons for the incongruence between chloroplast SNP-based phylogenetic analyses and chloroplast genomes, ITS and matK phylogenies in the following. Firstly, we did not have the same species samples as the ITS and matK phylogenies [12]. Secondly, we detected further phylogenetic relationships between A. zerumbet and A. oxyphylla with more chloroplast genomes data than the previous study [13]. Thirdly, with more than 230 species, the positions of Alpinia in family Zingiberaceae still remained somewhat uncertain and the future phylogenetic analyses should obtain additional chloroplast genomes of Alpinia species. The current phylogenetic analyses could supply the possibility that chloroplast genomes may be useful for phylogeny and species identification in family Zingiberaceae in the future.

Plant Material and DNA Extraction
Fresh leaves were obtained from A. katsumadai (voucher specimen: Ak 2015001), A. oxyphylla sampled from Guangdong (voucher specimen: Ao_Liu 2015002) and A. pumila (voucher specimen: Ap_Liu 2015003) plants, respectively, from the resource garden of the Environmental Horticulture Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, China. The total chloroplast DNA was extracted from those leaves using the modified sucrose gradient centrifugation method [49]. The chloroplast DNA sample of good integrity and with both optical density (OD) 260/280 and OD 260/230 ratio greater than 1.8 was used for subsequent experiments.

Chloroplast Genome Sequencing and Assembly
For each sample, two libraries with insert sizes of 300 bp and 10 kb were constructed after purification, and then sequenced on an Illumina Hiseq X Ten instrument (Biozeron, Shanghai, China) and a PacBio Sequel platform (Biozeron, Shanghai, China), respectively. The resulting Illumina raw data and PacBio raw data were assessed with FastQC. A total of 66.3 M, 80.2 M and 66.4 M clean data of 150 bp Illumina paired-end reads were generated from A. katsumadai, A. oxyphylla sampled from Guangdong and A. pumila, respectively, and 0.62M, 0.56 M and 0.50 M clean data of 8-10 kb subreads were generated from the three species, respectively.
Firstly, the Illumina paired-end clean reads were assembled using SOAPdenova (version 2.04) with default parameters into principal contigs [50], and all contigs were sorted and joined into a single draft sequence using the software Geneious version 11.0.4 [51]. Next, BLASR software was used to compare the PacBio clean data with the single draft sequence and to extract the correction and error correction [52]. Next, the corrected PacBio clean data were assembled using Celera Assembler (version 8.0) with default parameters, thus generating scaffolds [53]. Next, the assembled scaffolds were mapped back to the Illumina clean reads using GapCloser (version 1.12) for gap closing [50]. Finally, the redundant fragments sequences were removed, thus generating the final assembled chloroplast genomic sequence.

Chloroplast Genome Annotation and Structure Analysis
The assembled chloroplast genome was annotated using the online tool DOGMA (Dual Organellar Genome Annotator) [54] with default parameters and then checked manually. BLASTn searches in the NCBI website were used to identify and confirm both tRNA and rRNA genes. Lastly, further verification of the tRNA genes was carried out using tRNAscanSE with default settings [55]. The circular physical map of the chloroplast genome was drawn using OGDRAWv1.3.1 program with default parameters and subsequent manual editing [56]. The GenBank accession numbers of A. katsumadai, A. oxyphylla sampled from Guangdong and A. pumila are MK262728, MK262729 and MK262731, respectively.

Codon Usage and Prediction of RNA Editing Sites
Codon usage was determined for all protein-coding genes. To examine the deviation in synonymous codon usage, the relative synonymous codon usage (RSCU) was calculated using MEGA7 software [59]. Amino acid frequency was also calculated and expressed by the percentage of the codons encoding the same amino acid divided by the total codons. To predict the possible RNA editing sites in the three Alpinia species chloroplast genomes, protein-coding genes were used to predict potential RNA editing sites using the online program Predictive RNA Editor for Plants (PREP) suite (http://prep.unl.edu/) with a cutoff value of 0.8 [60].

Genome Comparison and Divergence Analyses
To compare the chloroplast genome of A. pumila with other four Alpinia accessions (A. katsumadai, A.zerumbet, and two accessions of A. oxyphylla sampled from Guangdong and Hainan), the mVISTA program (http://genome.lbl.gov/vista/mvista/about.shtml) in the Shuffle-LAGAN mode [61] was carried out using the annotation of A. pumila as the reference. To detect the variation in the LSC/IR/SSC boundaries of Alpinia chloroplast genomes, five Alpinia whole genomes were included in comparisons. The nucleotide variability (Pi) among chloroplast genomes was performed using software DnaSP v5 [62], with window length of 600 bp and the step size of 200 bp. The five Alpinia whole genomes were also aligned using MUMmer software [63] and adjusted manually where necessary using Se-Al 2.0 [64], using the annotated A. pumila chloroplast genome as the reference. The SNPs and indels were recorded separately as well as their locations in the chloroplast genome. The two accessions of A. oxyphylla were analyzed to identify SNPs and indels markers that can be selected in subsequent different sources studies.

Phylogenetic Analyses
In this study, a total of 25 complete chloroplast genome sequences were downloaded from the GenBank (NCBI) database. C. pulverulentus, C. viridis and C. indica were used as outgroups of the family Zingiberaceae. Phylogenetic trees were constructed using SNP matrix from 25 representative chloroplast genomes of the family Zingiberaceae. The reliable SNP sites were obtained by previously described method [23]. For each chloroplast genome, all SNP sites were connected in the same order to obtain a sequence in FASTA format. Multiple FASTA format sequences alignments were carried out by ClustalW in software MEGA7 [59]. To examine the phylogenetic applications of rapidly evolving 11,112 SNP markers (Supplement.SNP matrix.data), maximum likelihood (ML) analysis based on the nucleotide substitution model of Tamura-Nei was conducted using software MEGA7 [59]. The maximum parsimony (MP) method was also employed to construct a phylogenetic tree using the Close-Neighbor-Interchange (CNI) model in software MEGA7 [59]. Both ML and MP analyses were performed with 1000 bootstrap replicates, respectively.

Conclusions
In summary, we sequenced and characterized the complete chloroplast genomes of three Alpinia (A. katsumadai, A. pumila and A. oxyphylla sampled from Guangdong) of the family Zingiberaceae. Then, we compared them to the two reported chloroplast genomes of A. zerumbet and A. oxyphylla sampled from Hainan. These five Alpinia chloroplast genomes were highly conserved in terms of gene order, GC content, SSRs, and long repeats, and were typical quadripartite circle molecules consisting of an LSC region, an SSC region, and a pair of separated IRs. The IR expansions and contractions of the border regions resulted in difference of genome size between five Alpinia accessions. Fifteen highly divergent regions (rpl36, ycf1, rps15, rpl22, infA, psbT-psbN, accD-psaI, petD-rpoA, psaC-ndhE, ccsA-ndhD, ndhF-rpl32, rps11-rpl36, infA-rps8, psbC-psbZ, and rpl32-ccsA) were found and could be used as potential markers for Alpinia species on plant identification and phylogenetic studies. Additionally, for the interspecific comparisons, 304, 367, 331 and 371 SNPs were detected in comparisons of A. pumila-A. katsumadai, A. pumila-A. oxyphylla (Guangdong), A. pumila-A. zerumbet, and A. pumila-A. oxyphylla (Hainan), respectively, when the A. pumila chloroplast genome was used as the reference; 118, 122, 115, and 120 indels were accurately located in these four comparisons, respectively. Through the intraspecific comparison, 20 SNPs and 23 indels were identified in the two accessions of A. oxyphylla. The ML and MP trees indicated that the chloroplast SNP-based phylogenetic analyses could be used to identify the five Alpinia accessions. Alpinia showed a close phylogenetic relationship with Amomum species. The molecular data in this study represent a valuable resource for the studies of phylogenetic relationships and species identification in the family Zingiberaceae.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/2/286/s1, Figure S1: Phylogenetic tree constructed with the SNP matrix from 28 chloroplast genomes using maximum parsimony (MP) method, Table S1: Features of the chloroplast genomes of three Alpinia species, Table S2: Gene distribution in the chloroplast genomes of three Alpinia species, Table S3: Genes with introns in the chloroplast genomes of three Alpinia species as well as the exons and introns, Table S4: Codon usages of protein-coding genes in the chloroplast genomes of three Alpinia species, Table S5: RNA editing sites analyses of three Alpinia species, Table S6: SSRs distribution among five Alpinia chloroplast genomes, Table S7: Long repeats distribution among five Alpinia chloroplast genomes, Table S8: SNPs detected in coding and non-coding regions among four Alpinia chloroplast genomes, Table S9: Indels detected among four Alpinia chloroplast genomes, Table S10: Detailed indels and SNPs information detected in the two chloroplast genomes of A. oxyphylla. Supplement.SNP matrix.data.

Acknowledgments:
A. oxyphylla sampled from Guangdong and A. pumila resources were supplied by professor Nian Liu at department of horticulture and landscape architecture, Zhongkai University of Agriculture and Engineering, Guangzhou, China. The authors are very thankful for the two anonymous reviewers suggestions for improving the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.