Bioactive Terpenes and Their Derivatives as Potential SARS-CoV-2 Proteases Inhibitors from Molecular Modeling Studies

The coronavirus disease 2019 (COVID-19) pandemic is caused by a novel coronavirus; the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). Millions of cases and deaths to date have resulted in a global challenge for healthcare systems. COVID-19 has a high mortality rate, especially in elderly individuals with pre-existing chronic comorbidities. There are currently no effective therapeutic approaches for the prevention and treatment of COVID-19. Therefore, the identification of effective therapeutics is a necessity. Terpenes are the largest class of natural products that could serve as a source of new drugs or as prototypes for the development of effective pharmacotherapeutic agents. In the present study, we discuss the antiviral activity of these natural products and we perform simulations against the Mpro and PLpro enzymes of SARS-CoV-2. Our results strongly suggest the potential of these compounds against human coronaviruses, including SARS-CoV-2.


Methodology
The present study was carried out based on the literature review of terpenes and human coronavirus. The search, performed in the PubMed database, concerning studies published until March 2020, used the following keywords: coronavirus, terpenes, Middle East Respiratory Syndrome Virus, 229E, NL63, OC43, HKU1, SARS-CoV, MERS-CoV. or SARS-CoV-2 (2019-nCoV or . The scientific publications on terpenes and derivatives against human coronaviruses were selected from studies published in English and discussed in this manuscript.

Molecular Docking
The crystal structures of the SARS-CoV-2 Main protease (M pro ) and Papain-like protease (PL pro ) were obtained from the Protein Data Bank database [21]. The structures of M pro in complex with an α-ketoamide inhibitor (PDB code 6Y2G) [22] and that of PL pro in complex with a peptide inhibitor (PDB code 6WX4) [23] were selected for modeling studies. One three-dimensional conformer was generated for each ligand, and am1-bcc partial atomic charges were added to them using OpenEye's Omega [24] and Molcharge [25], respectively.
Molecular docking was performed with the Gold software [26] following the protocol described in our previous research [27,28]. The inhibitors cocrystallized with the enzymes were used as a reference for defining the binding pockets. Primary docking of each compound was performed with the CHEMPLP scoring function to generate 30 docking solutions. Each of these ligand poses were then rescored with the GoldScore, ChemScore, and ASP scoring functions. The most probable binding modes of each compound to the investigated receptors were selected according to the consensus scoring protocol previously described [27,28]. Any ligand conformation with a consensus score greater than 1 was selected for further analyses

Molecular Dynamics and Estimation of the Free Energies of Binding
MD simulations and the estimation of the free energies of binding were carried out with Amber 18 [29]. For MD simulations for M pro we set up with one ligand present on each of the two active sites present in the dimer. These calculations proceed as previously described [30,31]. In summary, all the ligand-receptor complexes selected after the molecular docking calculations underwent the same modeling process. This protocol included systems preparation, energy minimization, equilibration, and production runs. All MD simulations took place in explicit solvent.
The equilibrated systems were used to seed five short (2 ns) production runs, each of which were initialized with different random initial atomic velocities. The free energies of binding of the ligands to M pro and PL pro were estimated with the MM-PBSA method as implemented in Amber 18. For this, 100 MD snapshots (one every 100 ps) were evenly extracted from the 10 ns of production MD simulations. For M pro , the ligand with the lowest ∆G of binding among those bound to the two monomers was analyzed.

Results and Discussion
The antiviral activity of plant terpenes has been evaluated on different CoVs. The studies show the anticoronavirus potential of several subtypes of terpenes isolated from different species, genera, and botanical families. For example, in 2012, Chang et al. [32] documented the significant inhibitory activity of friedelane-type triterpenoids, present in the ethanol extract from fresh leaves of Euphorbia neriifolia L., on HCoV-229E. Among the isolated friedelane derivatives, four terpenoid compounds exhibited higher anti-HCoV-229E activity, as indicated by the percentage of infected cell viability of 132.4, 80.9, 109.0, and 111.0%, shown by 3-β-friedelanol (1), 3-β-acetoxy friedelane (2), friedelin (3), and epitaraxerol (4), respectively. Of note, the antiviral activities of the tested compounds have been highly dependent on structural differences, as evidenced by significant variation of antiviral activity between 3-β-friedelanol and 3-α-friedelanol, which is the epimer with the inverted configuration on carbon-3. Also, the presence of acetyl group negatively affected the antiviral activity of 3-β-acetoxy friedelane, when compared to 3-β-friedelanol. Furthermore, 3-β-friedelanol showed antiviral activity against hepatitis B virus (HBV) through selective inhibition of HBeAg secretion [33].
For a long time, the antiviral activities of glycyrrhizin (GL) (34), a triterpene saponin glycoside isolated from Glycyrrhiza spp., have been reported against different viruses including herpes, influenza A, human immunodeficiency virus-1, hepatitis B, and vesicular stomatitis virus [17,55,56]. Diverse mechanisms of action have been attributed to the antiviral activities of glycyrrhizin, such as reduction of the transport to the membrane, inhibition of fusion of the viral membrane, induction of interferon-gamma in T-cells, and reduction of viral latency [57]. The effect of glycyrrhizin in inhibiting the replication of SARS-associated coronavirus has also been recently investigated. In 2003, Cinatl et al. [58] assessed the antiviral activity of glycyrrhizin in Vero cells infected with two clinical isolates of CoV from patients with SARS-CoV admitted to the clinical center of Frankfurt University, Germany. Glycyrrhizin showed a significantly potent inhibition of SARS-CoV adsorption (EC 50 = 600 mg/L; CC 50 > 20,000 mg/L; SI = 33), penetration (EC 50 = 300 mg/L; CC 50 > 20,000 mg/L; SI = 33), and replication (EC 50 = 2400 mg/L; CC 50 >20,000 mg/L; SI = 8.3). According to the authors, the anti-CoV activity of glycyrrhizin is associated with the induction of nitrous oxide synthase, leading to an increase of the intracellular levels of nitric oxide and inhibition of the SARS-CoV replication in Vero cells. In accordance, Chen et al. also reported that glycyrrhizin exhibited anti-CoV activity against 10 strains of SARS-CoV in fetal rhesus kidney-4 (fRhK-4) and Vero E-6 cell lines by neutralization test (IC 50 > 400 µM; CC 50 > 400 µM) [59].
Moreover, glycyrrhizin was bioactive against the varicella-zoster virus [60], herpex simplex virus [61], hepatitis C virus [62], and dengue virus [63]. The chemical structures of the compounds are shown in Figure 1. Table 1 shows the anti-CoV activity of terpenes, and Table 2 shows the antiviral activity of terpenes against viruses other than CoVs. Figure 2 presents the main mechanisms of action of terpenes against CoVs.     Tanshinone IIA (9)

Molecular Modeling
Despite that some of the above-listed terpenes have been assayed against the SARS-CoV virus and its enzymes, many of them lack studies regarding the inhibition of any of the M pro and PL pro enzymes in neither the SARS-CoV nor the SARS-CoV-2 viruses. Regardless of the high similarity of both proteases in the two viruses, it is worth exploring if the already known SARS-CoV protease inhibitors could maintain their inhibition capabilities against the SARS-CoV-2 proteins. The docking results of the 34 terpenes under investigation are presented as Supplementary Materials in Tables S1 (M pro ) and S2 (PL pro ). Molecular docking solutions were found for all the 34 compounds in both receptors and filtered following the procedure described in the Methods section (Section 4) to keep those most probable for each compound. This filtering step included the visualization of the predicted ligand poses to discard those falling outside the binding cavity and sets of highly similar conformations of the same compound. This resulted in 65 and 49 potential terpene-M pro and terpene-PL pro complexes, respectively. The larger number of complexes obtained for M pro can be due to its larger binding cavity compared to PL pro .
According to the consensus scoring criterion employed for selecting the most probable binding modes of the ligands, compounds 13, 7, 14, 24, and 9 scored the best in M pro . On the other hand, compounds 29, 28, 2, 29, and 34 are predicted as the less probable M pro inhibitors in docking calculations. The same analysis for PL pro reveals that compounds 19, 27, 10, 13, and 14 scored the highest, while the worst-scoring PL pro inhibitor candidates are 8, 17, 5, 6, and 34. It has been shown that the postprocessing of molecular docking predictions using the estimation of the free energies of binding from molecular dynamics (MD) simulations increases the reliability of structure-based modeling workflows [27,64]. For this reason, MD simulations and MM-PBSA calculations aiming at the more accurate stability evaluation of the docking predicted ligand-receptor complexes were performed as described above.
The detailed results of the free energies of binding calculated for the evaluated terpenes against M pro and PL pro are provided in the Supplementary Materials Tables S3 and S4, respectively, and summarized in Figure 3. Only seven compounds (5, 6, 8, 18, 29, 33, and 34) out of 34 are predicted with positive free energies of binding to the two targets, which indicates that the predicted complexes with M pro and PL pro are unfeasible. Twenty-three compounds are predicted to possess negative values of ∆G of binding to M pro , and 22 fulfill the same criterion for PL pro . This suggests that some terpenes may serve as promising compounds for the development of inhibitors of M pro and PL pro . Notably, when the lists of the five top-scoring compounds provided by the docking and MM-PBSA calculations are compared, it is seen that only compound 13 appears in both lists for PL pro .
Given that the calculation of the free energies of binding from MD simulations provide more accurate estimations of the feasibility of ligand-receptor complexes than molecular docking alone, from here on all discussion will be based on the results presented in Figure 3. According to these, compounds 11 (methyl tanshinonate), 22 (sugiol), and 31 (α-cadinol) are the top three candidates for M pro inhibition among the 34 terpenes evaluated. The predicted orientation of these compounds in the M pro binding cavity as well as the diagrams of their interactions with the receptor are depicted in Figure 4. The structure used for depiction corresponds to the centroid of the largest cluster resulting from the clustering of the ligand conformations along the 100 MD snapshots used for MM-PBSA calculations. The figures were produced with UCSF Chimera [65] and LigPlot+ [66]. Only the interactions observed in at least 50% of the analyzed MD snapshots are represented in the figures, and the interaction frequencies were analyzed with Cytoscape [67]. Given that the calculation of the free energies of binding from MD simulations provide more accurate estimations of the feasibility of ligand-receptor complexes than molecular docking alone, from here on all discussion will be based on the results presented in Figure 3. According to these, compounds 11 (methyl tanshinonate), 22 (sugiol), and 31 (α-cadinol) are the top three candidates for M pro inhibition among the 34 terpenes evaluated. The predicted orientation of these compounds in the M pro binding cavity as well as the diagrams of their interactions with the receptor are depicted in Figure 4. The structure used for depiction corresponds to the centroid of the largest cluster resulting from the clustering of the ligand conformations along the 100 MD snapshots used for MM-PBSA calculations. The figures were produced with UCSF Chimera [65] and LigPlot+ [66]. Only the interactions observed in at least 50% of the analyzed MD snapshots are represented in the figures, and the interaction frequencies were analyzed with Cytoscape [67]. The predicted conformations of these compounds to M pro show that compounds 11 and 22 bind to the same region of the pocket that comprises its S2, S3, and S4 subcavities [22]. On the other hand, 31 only occupies the S2 subregion. None of the three compounds interact with the catalytic C145 in more than 50% of the analyzed MD snapshots. Compound 31 occasionally interacts (in 26% of the snapshots) with this residue. In contrast, the three ligands directly interact with the catalytic H41 amino acid, thus potentially blocking the access of the substrates to it. Compound 11 is predicted to hydrogen bond the side chain of E166 in most of the studied snapshots, while less frequent interactions of this type are predicted with Q189 and Q192. This ligand forms a network of hydrophobic and van der Waals interactions with M pro that include, besides the aforementioned residues, M49, Y54, H164, M165, P168, D187, R188, and T190. The predicted conformations of these compounds to M pro show that compo and 22 bind to the same region of the pocket that comprises its S2, S3, and S4 su [22]. On the other hand, 31 only occupies the S2 subregion. None of the three com interact with the catalytic C145 in more than 50% of the analyzed MD snapsho pound 31 occasionally interacts (in 26% of the snapshots) with this residue. In the three ligands directly interact with the catalytic H41 amino acid, thus po blocking the access of the substrates to it. Compound 11 is predicted to hydrog the side chain of E166 in most of the studied snapshots, while less frequent intera this type are predicted with Q189 and Q192. This ligand forms a network of hydr and van der Waals interactions with M pro that include, besides the aforementio dues, M49, Y54, H164, M165, P168, D187, R188, and T190.
The second-best candidate for M pro inhibition, compound 22, is predicted to in hydrogen bonding through its hydroxyl substituent to the side chains of T190 an The second-best candidate for M pro inhibition, compound 22, is predicted to engage in hydrogen bonding through its hydroxyl substituent to the side chains of T190 and Q192, as well as to the backbone of the former. In addition, one hydrogen bond is observed between the ligand carbonyl group and the backbone of E166 in 47% of the analyzed MD snapshots. The rest of the contacts of this compound with M pro are mainly with H41, M49, H164, M165, D187, R188, and Q189. The overlap of the lists of residues interacting with compounds 11 and 22 highlights their highly similar binding modes to M pro . Finally, compound 31 is predicted to make contact with T25, H41, C44, S46, M49, Y54, D187, R188, and Q189. Although no hydrogen bond is predicted between this compound and M pro in most of the analyzed snapshots, this type of interaction is observed in 46% of them with Q189.
Regarding PL pro , the best inhibitory terpenes are compounds 24 (8-β-hydroxyabieta-9(11),13-dien-12-one), 21 (dehydroabieta-7-one), and 20 (ferruginol). These compounds share high structural similarity, and the predicted binding mode of compound 20 is almost identical to that of 21. To get additional insights into the possible mechanism of action of terpenes against PL pro , we investigated the theoretical complexes of compounds 24, 21, and 13 (Tanshinone I, ranked in the fourth position) with this SARS-CoV-2 enzyme. The predicted poses of these compounds in the PL pro active site and the diagrams of their interactions with the receptor are depicted in Figure 5.
Q189. Although no hydrogen bond is predicted between this compound and M pr of the analyzed snapshots, this type of interaction is observed in 46% of them wit Regarding PL pro , the best inhibitory terpenes are compounds 24 (8-β-hydrox 9(11),13-dien-12-one), 21 (dehydroabieta-7-one), and 20 (ferruginol). These com share high structural similarity, and the predicted binding mode of compound most identical to that of 21. To get additional insights into the possible mechanis tion of terpenes against PL pro , we investigated the theoretical complexes of compo 21, and 13 (Tanshinone I, ranked in the fourth position) with this SARS-CoV-2 The predicted poses of these compounds in the PL pro active site and the diagrams interactions with the receptor are depicted in Figure 5. The active site of PL pro can be subdivided into four regions S1-S4 associa substrate recognition. Among these, S2 is a narrow channel that connects S1 wit S4 [3]. The terpenes herein evaluated are bulky compounds and no ligand confo could be obtained that simultaneously bind to the S1 and S3/S4 subsites. Instead conformations binding to the S1-S2 sites and S2-S3-S4 subpockets were obtained modeling workflow. Among the best candidates for PL pro inhibition, compound 13 are predicted to bind at the S1-S2 region, whereas compound 21 binds to t Figure 5. Predicted binding modes of 24 (8-β-hydroxyabieta-9(11),13-dien-12-one), 21 (dehydroabieta-7-one), and 13 (tanshinone I, bottom) to the SARS-CoV-2 PL pro (left) and diagrams of the observed ligand-receptor interactions (right). The receptor surface is colored by atom type: gray for carbon, red for oxygen, yellow for sulfur, and blue for nitrogen. All atoms are represented only for amino acids forming hydrogen bonds with the ligands in the interaction diagrams. In these, hydrogen bonds are represented by dashed lines, carbon atoms are represented in black, and the rest of the atoms are colored as in the left images.
The active site of PL pro can be subdivided into four regions S1-S4 associated with substrate recognition. Among these, S2 is a narrow channel that connects S1 with S3 and S4 [3]. The terpenes herein evaluated are bulky compounds and no ligand conformation could be obtained that simultaneously bind to the S1 and S3/S4 subsites. Instead, ligand conformations binding to the S1-S2 sites and S2-S3-S4 subpockets were obtained from the modeling workflow. Among the best candidates for PL pro inhibition, compounds 24 and 13 are predicted to bind at the S1-S2 region, whereas compound 21 binds to the S2-S3 subsites. Compound 24 is predicted to hydrogen bond W106 and G271, the former being a critical residue for the stabilization of the negatively charged intermediates at the oxyanion hole during catalysis in SARS-CoV [68]. Less frequent hydrogen bonds are formed (in 24% of the analyzed MD snapshots) with the catalytic C111, although contacts with this residue are predicted in all the snapshots. This compound also directly interacts with H272 that is part of the catalytic triad. The rest of the interactions of compound 24 with the receptor are predicted to occur with N109, Y112, L162, G163, C270, H272, and Y273.
The modeling of compound 21 (and 20) reveals a binding mode in which it does not interact with any of the PL pro catalytic residues. However, its positioning at the S3-S4 sites would prevent substrate binding and block access to the active site of the enzyme. This ligand is predicted to bind in a region mainly flanked by hydrophobic residues that include A164, R166, M208, A246, P247, P248, Y264, Y268, Y273, and T301. It is interesting to note that compounds 24, 21, and 20 are highly similar, but they are predicted to bind to different regions of PL pro . This can be explained by their subtle structural differences. While compound 24 is predicted to hydrogen bond the receptor through its carbonyl and hydroxyl substituents, these groups are at different positions in compounds 21 and 20, which hinders the formation of these hydrogen bonds with the later compounds. The loss of these two hydrogen bonds with the receptor causes a large decrease in the ∆G of the binding of compounds 21 and 20 to PL pro .
The last candidate inhibitor of PL pro is compound 13. Its predicted binding mode overlaps with that of compound 24, with the possibility of directly hydrogen bonding the side chain of the catalytic C11 amino acid. Furthermore, it forms hydrogen bonds with the side chain of W106 in 44% of the selected MD snapshots. The network of contacts that this compound makes with the receptor is completed by N109, L162, G163, Y264, Y268, Q269, C270, G271, and Y273. Another interesting result derived from our calculations is that some of the evaluated terpenes could have dual M pro -PL pro inhibition activity. This is the case of compounds 21 and 31 that are predicted with ∆G < −5 kcal/mol for both receptors. The first of these ranks fifth and second among all molecules according to their predicted free energies of binding to M pro and PL pro , respectively. Likewise, 31 ranks third in the M pro and fifth in the PL pro rankings.
The predicted binding modes of the top candidates identified for the inhibition of the SARS-CoV-2 proteases provide useful insights for their future modification with the objective of improving their affinity with the receptors. For instance, none of compounds 11, 22, and 31 exploits the S1 subpocket of M pro . The introduction of modifications capable of occupying this region and complementary with the receptor could improve the stability of the predicted complexes. Given the number of amino acids' side chains capable of donating and/or accepting hydrogen bonds in the large binding cavity of M pro , it is possible to modify its candidate inhibitors with substituents favorably positioned to hydrogen bond the receptor. Thus, more stable complexes could be obtained. Similar analyses are also possible for the PL pro enzyme.
No experiments have been devoted so far to study the inhibition of the SARS-CoV-2 M pro and PL pro enzymes by terpenes. However, some of the compounds herein studied have been evaluated against the highly similar M pro and PL pro enzymes of the SARS-CoV virus [39,50,[69][70][71]. Our results are in line with this previously published evidence. Methyl tanshinonate (11), which ranked first as an M pro inhibitor candidate, is a confirmed inhibitor of this enzyme in SARS-CoV (IC 50 = 21.1 µM) [39]. The second and third in this list are Sugiol (22) and α-Cadinol (31), which were screened for their SARS-CoV M pro inhibitory activity [50] and were found inactive at concentrations below 100 µM. In the same report, ferruginol (20), which is highly similar to Sugiol and ranks eighth in our list of potential M pro inhibitors, was also assayed with the same result as compounds 22 and 31. However, a posterior re-evaluation of compound 20 concluded that it inhibits the enzymatic activity of the SARS-CoV M pro enzyme with IC 50 = 49.6 µM [71]. Unfortunately, sugiol (22) and α-Cadinol (31) were not re-assayed in these more recent experiments. In light of these data, we believe that it is worth re-evaluating these compounds against the SARS-CoV-2 M pro enzyme.
Among our best SARS-CoV-2 PL pro inhibitor candidates, tanshinone I (13) is a confirmed inhibitor of this enzyme in SARS-CoV with IC 50 = 8.8 µM [39]. The inhibition of PL pro by our first two candidates against this enzyme, 8-β-hydroxyabieta-9(11),13-dien-12one (24) and dehydroabieta-7-one (21), has been assayed in neither the SARS-CoV nor the SARS-CoV-2 viruses. However, their inhibition of the SARS-CoV virus has been confirmed through some mechanism not involving action on the M pro enzyme [50]. Our results suggest that the inhibition of PL pro might be the antiviral mechanism of action of these compounds against coronaviruses.

Conclusions
This study provides a detailed compilation and evidences that plant terpenes and their derivatives must be considered as promising sources for the discovery of effective anti-CoV agents, which can be used as treatments or as adjuvants to conventional COVID-19 therapies. The structural diversity of the investigated compounds makes it difficult to establish a structure-antiviral activity relationship. However, the most promising compounds can be used as prototypes for the discovery of effective drugs against CoVs. In addition, further investigations are needed to establish pharmacokinetic and pharmacodynamic parameters for these compounds. Computational models that included a total of 1.14 µs time of molecular dynamics simulations lead to the identification of promising terpenes for the inhibition of the proteases from the SARS-CoV-2 virus. Methyl tanshinonate (11), Sugiol (22), and α-Cadinol (31) are predicted as the best candidates for M pro inhibition, while 8-β-hydroxyabieta-9(11),13-dien-12-one (24), Dehydroabieta-7-one (21), and Tanshinone I (13) are better positioned as candidate inhibitors of PL pro of the SARS-CoV-2 virus. Interestingly, some of the studied chemicals have the potential to inhibit both protease enzymes. Altogether, our results show that terpenes and their derivatives should be considered in the search for therapeutic alternatives against the SARS-CoV-2. Finally, this class of compounds can be promising chemical scaffolds in the discovery of lead compounds to feed the anti-SARS-CoV-2 drug discovery pipelines.
Supplementary Materials: The following are available online at https://www.mdpi.com/2218-273 X/11/1/74/s1, Table S1: Docking results of the investigated compounds against the SARS-CoV-2 Mpro enzyme, Table S2: Docking results of the investigated compounds against the SARS-CoV-2 PLpro enzyme, Table S3: Estimated free energies of binding of the ligand-Mpro complexes selected from docking calculations, Table S4: Estimated free energies of binding of the ligand-PLpro complexes selected from docking calculations.

Data Availability Statement:
The data presented in this study are available as Supplementary Materials. These include Tables S1 and S2 with the detailed docking results for the 34 compounds against the M pro and PL pro enzymes, respectively. Tables S3 and S4 contain the full results of the MM-PBSA calculations performed for all the studied complexes between the compounds and the M pro and PL pro enzymes, respectively. The predicted complexes of compounds 11, 22 and 31 with M pro and of 13, 21 and 24 with PL pro are provided in PDB format within the SM_Structures.zip file.

Conflicts of Interest:
The authors declare no conflict of interest. Open reading frame 1a ORF1b