In silico/computational analysis of mevalonate pyrophosphate decarboxylase gene families in Campanulids

Abstract Mevalonate pyrophosphate decarboxylase (MPD) is a key enzyme in terpenoid biosynthesis. MPD plays an important role in the upstream regulation of secondary plant metabolism. However, studies on the MPD gene are relatively very few despite its importance in plant metabolism. Currently, no systematic analysis has been conducted on the MPD gene in plants under the order Apiales, which comprises important medicinal plants such as Panax ginseng and Panax notoginseng. This study sought to explore the structural characteristics of the MPD gene and the effect of adaptive evolution on the gene by comparing and analyzing MPD gene sequences of different campanulids species. For that, phylogenetic and adaptive evolution analyses were carried out using sequences for 11 Campanulids species. MPD sequence characteristics of each species were then analyzed, and the collinearity analysis of the genes was performed. As a result, a total of 21 MPD proteins were identified in 11 Campanulids species through BLAST analysis. Phylogenetic analysis, physical and chemical properties prediction, gene family analysis, and gene structure prediction showed that the MPD gene has undergone purifying selection and exhibited highly conserved structure. Analysis of physicochemical properties further showed that the MPD protein was a hydrophilic protein without a transmembrane region. Moreover, collinearity analysis in Apiales showed that MPD gene on chromosome 2 of D. carota and chromosome 1 of C. sativum were collinear. The findings showed that MPD gene is highly conserved. This may be a common characteristic of all essential enzymes in the biosynthesis pathways of medicinal plants. Notably, MPD gene is significantly affected by environmental factors which subsequently modulate its expression. The current study’s findings provide a basis for follow-up studies on MPD gene and key enzymes in other medicinal plants.

In silico/computational analysis of mevalonate pyrophosphate decarboxylase gene families in Campanulids https://doi.org/10.1515/biol-2021-0103 received November 24, 2020; accepted July 28, 2021 Abstract: Mevalonate pyrophosphate decarboxylase (MPD) is a key enzyme in terpenoid biosynthesis. MPD plays an important role in the upstream regulation of secondary plant metabolism. However, studies on the MPD gene are relatively very few despite its importance in plant metabolism. Currently, no systematic analysis has been conducted on the MPD gene in plants under the order Apiales, which comprises important medicinal plants such as Panax ginseng and Panax notoginseng. This study sought to explore the structural characteristics of the MPD gene and the effect of adaptive evolution on the gene by comparing and analyzing MPD gene sequences of different campanulids species. For that, phylogenetic and adaptive evolution analyses were carried out using sequences for 11 Campanulids species. MPD sequence characteristics of each species were then analyzed, and the collinearity analysis of the genes was performed. As a result, a total of 21 MPD proteins were identified in 11 Campanulids species through BLAST analysis. Phylogenetic analysis, physical and chemical properties prediction, gene family analysis, and gene structure prediction showed that the MPD gene has undergone purifying selection and exhibited highly conserved structure. Analysis of physicochemical properties further showed that the MPD protein was a hydrophilic protein without a transmembrane region. Moreover, collinearity analysis in Apiales showed that MPD gene on chromosome 2 of D. carota and chromosome 1 of C. sativum were collinear. The findings showed that MPD gene is highly conserved. This may be a common characteristic of all essential enzymes in the biosynthesis pathways of medicinal plants. Notably, MPD gene is significantly affected by environmental factors which subsequently modulate its expression. The current study's findings provide a basis for follow-up studies on MPD gene and key enzymes in other medicinal plants.

Introduction
Mevalonate pyrophosphate decarboxylase (MPD) is an enzyme that belongs to the Galactokinase-Homoserine kinase (GHMP) superfamily. It plays a key role in the Mevalonate (MVA) pathway and is the least studied enzyme in the GHMP superfamily [1,2]. MPD catalyzes decarboxylation of hexafluorovalerate diphosphate to form isoprene pyrophosphate [3]. Its enzyme structure is highly conserved across different species. MPD comprises two identical subunits in yeast: a monomer and a fissure structure. Three reverse parallel β-folds separate the α-helix in the monomer, and the fissure structure is implicated in ATP binding and comprises conserved amino acid residues [4]. The gene that encodes MPD is ubiquitous in animals, plants, and microorganisms such as Panax ginseng [5], Ganoderma lucidum [6], Bacopa monniera [7], Homo sapiens [8], Sus scrofa [9], Enterococcus faecalis [10], and Bifidobacterium bifidus [11].
MPD is implicated in the synthesis of terpenoids in Campanulids plants. These compounds include triterpenoid saponins in Panax ginseng and carotene in Daucus carota. Genomes of these two species have been sequenced. MVA pathway is the main terpenoids synthesis pathway in plants [12]. Furthermore, the expression level of the MPD gene is positively correlated with the terpenoid synthesis rate in plants [13].
The genetic information of species changes and is modulated by natural selection as a biological evolution process to enable adaptation to the living environment. This process entails the adaptive evolution of genes [14]. Therefore, gene family and adaptive evolution analyses should be carried out and more species sequenced to explore their roles. The study of the MPD gene family is important as it is a key enzyme in the terpenoid biosynthesis pathway in medicinal plants. In the current study, MPD gene identification and adaptive evolution analyses of 11 species of Campanulids were performed on P. ginseng and its related species. P. ginseng is a medicinal plant and a member of the Araliaceae family [15]. MPD gene family analysis of P. ginseng and the related species provides valuable information for further study of the MPD gene family. The findings help elucidate existing gene variations, protein knot structural and functional changes, and the evolutionary history of the species [16]. The current study further provides a reference for the study of gene families of key enzymes in other medicinal plants.

Sequence data
Sequencing data used in the current study were retrieved from two databases. Genome, protein, and annotation files of 11 Campanulids species were retrieved from the National Center for Biotechnology Information (NCBI) genome database. Another set of genome, protein, CDS, and annotation files was retrieved from the corresponding genome website of plants. Data for seven other plants with different genetic relationships were retrieved. BLASTp tool in NCBI was used to compare these data with the known MPD protein sequence of Eleutherococcus senticosus. Arabidopsis thaliana and Hevea brasiliensis sequences were selected as outgroups. The amino acid sequence of MPD was identified using the local BLAST tool and submitted to NCBI-Conserved Domain Database (CDD) for further screening. A total of 21 nucleotide sequences were obtained from the screening which were extracted from the genome by corresponding numbers using the FASTA extract function in TBtools [17]. Sequence data comprising 275-423 codons were obtained using the ClustalW program and the sequences were further aligned using Multalin software [18].

Construction of phylogenetic tree
MEGA7 software was used for analyzing the sequence characteristics and for evolutionary analysis. A phylogenetic tree was constructed using the maximum likelihood (ML) method with 1,000 bootstrap values. The trees were analyzed using ITOL (http://itol.embl.de) webserver.

Analysis of adaptive evolution
The phylogenetic tree file was first transformed into a Phylogenetic Analysis by Maximum Likelihood (PAML) file using the EasyCodeML software [19]. The branchsite model and site model in the codeML program of the PAML4.8 software package were used to analyze the phylogenetic trees [20]. Data were then submitted to Datamonkey (http://www.datamonkey.org/) [21] and MEC (http://selecton.tau.ac.il/) for analysis of adaptive evolution. The random-effects like (REL) model, fixed effects likelihood (FEL) model, and single likelihood ancestor counting (SLAC) were used to analyze the pressure of site selection using webservers. Positive selection using SLAC and FEL methods was set at a locus level of P < 0.1. A Bayesian factor of <50 was acceptable for REL method.

Prediction of basic physical and
chemical properties, secondary structure, and three-dimensional structure of MPD protein The MPD amino acid sequence identified by BLAST was submitted to the Swiss Institute of bioinformatics database (https://www.expasy.org/) to predict its basic physicochemical properties, secondary structure, and threedimensional structure.

Motif analysis
Motifs present in the MPD amino acid sequences were analyzed using the MEME software (http://meme-suite. org). The parameters were set as follows: a total number of search motifs = 10, shortest motif length = 6, and the maximum motif length = 50. Results were visualized using the visualized meme/Master motif pattern function in TBtools.

Chromosome location analysis
Chromosome location information of the MPD gene in all species was found in the annotation files of related species in Apiales. Related species included D. carota and Coriandrum sativum. The chromosome location map of MPD gene was then drawn using mapchart software.

Collinearity analysis
Collinearity of P. ginseng, D. carota, and C. sativum was analyzed using the one step MCScanX function in TBtools. The results were presented using the Local Circos software.

Prediction of cis-acting elements and
MPD gene structure 2,000 bp upstream nucleotide sequences of MPD genes in P. ginseng, D. carota, and C. sativum were extracted using the GTF/GFF3 sequence extractor function in TBtools. The sequences were submitted to the PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) web server for predictions of cis-acting elements [22]. The visualization gene structure-function in TBtools was used to predict MPD gene structure.

Phylogenetic analysis
BLAST results of 21 MPD sequences are shown in Figure 1.  Table S1. A phylogenetic tree was constructed based on the selected 21 MPD nucleotide sequences ( Figure 2). A phylogenetic tree based on MPD protein sequences of all downloaded species is shown in Figure S1. Lonicera japonica was isolated as a Caprifoliaceae. Cynara cardunculus var. scolymus, Lactuca sativa, Mikania micrantha, Helianthus annuus, Taraxacum kok saghyz, Artemisia annua, and Chrysanthemum nankingense were grouped into Asteraceae, whereas D. carota and C. sativum clustered into Apiaceae. P. ginseng was isolated as an Araliaceae, and clustered with Apiaceae of Apiales. All the inner group species belonged to the Campanulids species. The outgroups (A. thaliana and H. brasiliensis) clustered into one branch ( Figure 2). Campanulids clustered together, and branched into Asterales and Apiales ( Figure S1). Other species with different genetic relationships clustered into one branch. A. thaliana and H. brasiliensis clustered into one branch, and Zea mays and Amborella trichopoda clustered into one branch. Further analysis showed that Campanulids have better clustering effect and higher sequence similarity, approximately 94.52%, whereas other species with distant genetic relationship showed lower MPD sequence similarity, approximately 82.19% ( Table 1 and Table S1). These findings were consistent with the findings on the traditional taxonomy.

Identification and analysis of positive selection sites
Selection pressure of each locus in the MPD family was determined using the codeML tool in PAML software. The results are shown in Tables 2-4. Parameters for single  SLAC model, IFE model, and REL model were used to identify positive selection sites based on Datamonkey test selection pressure. SLAC model detected 1 positive selection site and 95 negative selection sites when P was <0.1, and 0 positive selection sites when P was <0.01. A total of 21 negative selection sites were identified. Analysis using IFEL model identified six positive selection sites (48S, 93q, 98K, 178d, 226v, and 419A) and 167 negative selection sites when P was <0.1. With P <0.01, 0 positive selection sites and 51 negative selection sites were identified. REL detection was statistically significant when the significance level was 50. No positive selection sites were identified using REL model. However, 192 negative selection sites were detected.
The 21 MPD CDS sequences were uploaded to MEC webserver for analysis using MUSCLE tool. Most of the loci were marked in purple ( Figure 3). A total of 83 dark purple sites were identified which were strong negative selection sites. These sites accounted for 20% of the total sites. Analysis did not show positive selection sites marked in orange and yellow ( Figure 3). This indicated that purified selection played a dominant role in evolution of the MPD gene family. This result was consistent with the findings from PAML and Datamonkey that negative selection was dominant in adaptive evolution of the MPD gene family.

Three-dimensional structure prediction of the MPD gene
Physicochemical properties of all identified MPD amino acid sequences are shown in Table 5. The average length of MPD amino acid sequences was 420 amino acids. The relative molecular weight ranged between 45,586.00 and 919.55, whereas the isoelectric point ranged between 5.89 and 8.58. The hydrophilic range was between −0.207 and   Table 6. MPD amino acid sequences of P. ginseng and C. sativum of other Apiales plants were selected for modeling because the length of the MPD amino acid sequence of D. carota was too short to be used for comparisons. The MPD amino acid sequence of E. senticosus (AFM77982.1) with a known sequence and structure was used as the template sequence in the model. Modeling results are shown in Figure 4 and Table S2. The proportions of α-helix, β-turn, irregular coil, and extended chain are shown in Table 6. The proportions ranged between 35 and 40%, 4 and 5.5%, 35 and 40%, and 17 and 19%, respectively. Findings from BLAST analysis showed that similarity of all sequences was more than 75%. Most sequences had a similarity more than 80%. Three-dimensional structure prediction showed that the four MPD protein sequences were homodimers, and the ligands were 2 × DP6:

Motif analysis
Structural differences of MPD amino acid sequences were subtle. Most of the motif structures were highly similar with only a few having significant differences ( Figure 5).

Chromosome location analysis
The chromosome/scaffold length and MPD gene location of each species were obtained using chromosome/scaffold annotation files of each species. The results were then analyzed using TBtools ( Figure 6). Notably, only D. carota, C. sativum, L. japonica, and A. thaliana had complete chromosome assembly information among the different species. The other species were mapped based on the scaffold level. The number of chromosomes, scaffolds, sequences, and species corresponding information are shown in Table 7.

Collinearity analysis
MCScanX tool was used to analyze the genomes of D. carota and C. sativum to identify any collinearity. The findings showed that MPD gene of D. carota chromosome   (Figure 7).

Prediction of cis-acting elements
TBtools were used to obtain 2,000 bp upstream sequences of MPD initiation codons of 13 species. These sequences were used to predict cis-acting elements of the promoters using PlantCARE software (Figure 8). The findings showed that all sequences had core cis-acting CAAT box and TATA box. Moreover, 24 components including G-box, MRE, Mybbinding site, and Box 4 were identified in the sequences. Other elements identified in most of the sequences included abscisic acid response elements WUN-motif and ABRE, salicylic acid homeostasis elements (TCA-element and SARE), auxin response elements (TGA-element and AuxRR-core), MeJA response elements (TGACG-motif and CCGTCA-motif), w-box of glucose metabolism and plant stress signal elements, CCGTCC-motif and CAT-box of meristem, MYB and DRE stress elements, low temperature response elements (as-1 and LTR), anaerobic induction element (ARE), drought stress elements (MBS and MYC), salt stress element (STRE),

Discussion
In the current study, 21 MPD gene sequences of 11 species of Campanulids were searched and retrieved by local BLAST. Analysis of physicochemical properties showed that MPD proteins are hydrophilic proteins without a transmembrane region. The average length of MPD proteins was approximately 420 amino acids with subtle differences in relative molecular weights and isoelectric points. Motif analysis further showed a total of 10 motifs in the 27 sequences. Most sequences showed high motif similarity. Adaptive evolution analysis of the MPD gene with A. thaliana and H. brasiliensis as outgroups showed that the MPD gene had a significant negative selection with high reliability in the evolution process. Moreover, the MPD gene on chromosome 2 of D. carota was collinear with that on chromosome 1 of C. sativum. This finding indicates a close genetic relationship between the two species. This finding was consistent with findings from phylogenetic analysis. Analysis of cis-acting elements of each MPD gene sequence showed presence of core elements (CAAT-box and TATA-box) in the upstream of each MPD sequence and several light response elements such as G-box. In addition, several cis-acting elements such as abscisic acid response elements, salicylic acid homeostasis elements, and auxin response elements were observed. The findings showed that selection pressure of each branch of the MPD gene was different owing to the protein's physical and chemical properties as well as results from adaptive evolution analysis, motif analysis, and cisacting element analysis. However, all sequences had an influence of net selection i.e., ω < 1 which indicated that the MPD gene may have significantly negative pressure in the evolution process. The structure of selective action is extremely conservative [19]. Notably, damarenediol synthetase, an essential enzyme in ginsenoside biosynthesis pathway is highly conserved. P. ginseng and P. notoginseng proteins differ only by 8 amino acids. However, their similarity is approximately 98% [23]. This is attributed to the conserved nature of the enzyme during evolution process to maintain stability of the structure and function of the enzyme as it is an important enzyme in synthesis of triterpenoids. Moreover, this can be attributed to less variations or genes involved in analysis and comparison as a consequence of having undergone adaptive evolution in an earlier period. Therefore, the evolutionary signals may have been submerged by the general medium-sized or purified selection [24].
The findings showed that the residues 150 to 350 of MPD protein were highly conserved, whereas NCBI-CDD identification showed that the conserved domain (domain accession: PLN02407) of MPD protein comprises residue 10 to 350. MEC analysis showed that there was no dark purple site after the residue 350 of MPD protein. This finding indicates that there was no negative selection site after this residue, implying that the first 350 amino acids of MPD protein, mainly amino acids at positions 150-350, were highly conserved. Further, prediction of the three-dimensional structure shows that this conserved domain may be responsible for binding to 2 × DP6: (3R)-3hydroxy-5-{[(R)-hydroxy (phophotonooxy) phophoryl] oxy}-3-methylpentanoic acid. Therefore, it forms the active site of MPD protein.
Analysis of cis-acting elements showed presence of several elements involved in light responses. Studies report that expression of the MPD gene may be affected by light. TGACG-motif and the CCGTCA-motif were present in most MPD gene promoter sequences. Previous studies report two types of methyl jasmonate (MeJA)   [26]. The current study shows that MPD gene was expressed in all parts of H. brasiliensis. The cis-acting elements of the MPD gene in H. brasiliensis were zein, seed specific regulatory elements, and endosperm expression elements. This finding was consistent with findings from previous studies which reported that in addition, expression of MPD in female flowers was higher compared with the levels in other parts. This result was similar to findings by Xing ZB et al., 2013 that the expression level of MPD in female E. senticosus plants was significantly higher compared with that of male plants [27]. These findings indicate that the expression level of MPD depends on the location in male or female parts. Tolerance of female plants to environmental stress is lower compared with that of male plants [28]. In a damage study using H. brasiliensis, expression of MPD increased after the plants were damaged by knocking. Several elements implicated in stress response were detected in the cis-acting elements H. brasiliensis after damage [26]. These findings imply that the stress elements in the promoter sequence of the MPD gene modulate expression process of MPD.

Conclusion
The findings from the current study show that MPD gene is highly conserved. This property is a possible characteristic of all essential enzymes in biosynthesis pathways of medicinal plants. MPD gene is significantly affected by environmental factors which subsequently modulate its expression. The findings of the current study provide key information and a reference for follow-up studies on the MPD gene and essential enzymes in other medicinal plants.