Druggability for COVID19 – In silico discovery of Potential Drug Compounds against Nucleocapsid (N) Protein of SARS-CoV-2

Background : The coronavirus disease 2019 (COVID-19) was caused havoc throughout the world by creating widespread mortality and morbidity. The presence of RNA binding domain in the nucleocapsid (N) protein of SARS-CoV-2 is a potential drug target, serving multiple critical functions during the viral life cycle, especially the viral replication. The unavailability of vaccines and proper antiviral drugs encourages the researchers to identify some potential antiviral drug compounds to be used against N protein of SARS-CoV-2 for this current scenario. While vaccine development might take some time, the identification of a drug compound might decrease the widespread deaths and suffering. Method: This study was analyzed the phylogenetic relationship of N protein sequence divergence with other 49 CoV species and also identified the conserved regions according to protein families through conserved domain search. Along with it, good structural binding affinities of some natural/synthetic phytocompounds/ drugs against N protein were also found using the molecular docking approaches. Result: The analyzed antiviral properties, predicted binding affinities and the presence of higher numbers of Hydrogen bonds of selected compounds represent the drug-ability of these compounds. Among them, the established antiviral drug Glycyrrhizic acid and the phytochemical Theaflavin can be considered as putative drug compound against target protein of SARS-CoV-2 as they showed all the properties of a potential drug. Conclusion: The findings of this study might lead to the development of a drug for the disease and helpful to reduce the risk of deadly infections in host cell due to SARS-CoV-2.


Introduction
The outbreak of novel coronavirus infection has drastically affected the lives of the human population worldwide. This infection started as respiratory illness/pneumonia of unknown origin in Wuhan city of China at the end of the year 2019. The organism identified and termed as novel on 7 th January 2020. The World Health Organization (WHO) declared it as a public health emergency of international concern as the disease spread to other regions of the world 1  The outcome of SARS-CoV-2 sequencing, (NCBI Reference Sequence: NC_045512. 2) has proposed about the significant sequence level identity of SARS-CoV-2 with SARS-CoV (79%) rather than MERS-CoV (50%). Besides, the higher levels of transmissibility and pandemic risk of COVID-19 at an early stage has been reported in many studies 1  Similarly, SARS-CoV-2 protein N is a multifunctional RNA binding protein, necessary for viral RNA transcription, replication and/or assembly of virus 6 . Interestingly, a unique N-terminal RNA binding domain of SARS-CoV-2 N protein has identified as a novel antiviral drug target site 7 . The viral N protein packages the genome into long, flexible and helical RNP complexes, called nucleocapsids which protect the SARS-CoV-2 virion structure 5 . Additionally, N protein has a significant contribution towards timely replication and reliable transmission of SARS-CoV-2 during its life cycle. Therefore N protein (PDB ID: 6VYO) can be considered as a novel drug target of SARS-CoV-2.
The SARS-CoV-2 infection has created a dangerous pandemic situation due to its quick transmission and deadly nature. It has affected both the health and economy of human population across the globe tremendously. Many ongoing pieces of research are trying to develop vaccines to control this situation, but all are in various phases of trials. Thus, the present study has focused on in silico discovery of potent leads from several antiviral drugs and compounds of plant origin against SARS-CoV-2 infection. The present study would throw lights on the discovery of antiviral drug against SARS-CoV-2.

Sequence retrieval and construction of phylogenetic tree
Nucleocapsid protein sequences of total 49 corona virus species and/or strains including SARS-CoV-2 were retrieved in FASTA format from National Centre for Biotechnology Information (NCBI) web server (https://www.ncbi.nlm.nih.gov/) on 30 th March 2020. Two N proteins of Ebola and H1N1 virus were included within study to study evolutionary divergence across species. Further, total 51 N protein sequences were aligned using MUSCLE algorithm of Molecular Evolutionary Genetics Analysis 7 (MEGA 7) package 8 . The resulted alignment was used to generate phylogenetic tree using Neighbour Joining (NJ) method of MEGA 7 for 1000 bootstrap replicates.

Conserved domain search
Functional domains of SARS-CoV-2 N protein (YP_009724397.2) were identified using NCBI conserved domain database (CDD) (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) search. The CDD is a collection of domain models which imports information from Pfam, SMART, COG, and NCBI to provide a more accurate assessment of neighbor relationships between protein sequences 9 .

Retrieval and preparation of 3D structure
Available N-terminal domain structure (PDB ID: 6VYO) of SARS-CoV-2 N protein was retrieved from Protein Data Bank (PDB) (https://www.rcsb.org/). Initially, hydrogen atoms were added to protein structure after removal of all water and other hetero molecules. Further, energy minimization was performed using Discovery Studio 3.5 suite to obtain a properly optimized structure of target protein.

Drug binding cavity prediction
In absence of knowledge on exact drug binding site, probable binding cavity within SARS-CoV-2 N protein was predicted using metaPocket 2.0 (https://projects.biotec.tudresden.de/metapocket/). MetaPocket tool identifies cavities on protein surface for drug binding site prediction using multiple computational approaches 11 such as PASS11, LIGSITE, Fpocket, SURFNET, GHECOM, and ConCavity.

Selection of ligand molecules
Different natural compounds of plant origin reported with antiviral, anti-inflammation, antiinfluenza, anti-HIV, anti-hepatic properties were shortlisted from different literatures. In addition, few FDA approved, and investigational antiviral drugs were also selected from Drug Bank (https://www.drugbank.ca/) database for further investigation.

Ligand structure retrieval and correction
Three-dimensional structures of natural ligands were retrieved from PubChem (https://pubchem.ncbi.nlm.nih.gov/) database in SDF format and converted into PDB format using Discovery Studio 3.5 suite. Similarly, PDB structures of antiviral drugs were collected from the Drug Bank (https://www.drugbank.ca/). Further, structure optimization and protonation state of all ligands were achieved using Discovery Studio 3.5 suite.

Molecular docking
Molecular docking was performed between all selected ligands (phytochemicals and antiviral drugs) and the drug target (N protein, PDB ID: 6VYO) separately in order to identify the most efficient inhibitor against SARS-CoV-2. AutoDock 4.2 (http://autodock.scripps.edu/) and Auto-Dock Tools 4 tool 12 were used to perform molecular docking study. The N-terminal RNA binding domain of SARS-CoV-2 N protein was observed as a homo tetramer structure; therefore, only chain A of the available crystal structure was employed for docking analysis. Prior to docking, Kollman charges and polar hydrogen atoms were added to the target structure. Both ligand and receptor structures were prepared using ADT tool and converted to pdbqt format before docking. A virtual grid box was set around the drug binding cavity of the target structure with size of 74, 78, 74 Å in x, y, z direction in spacing of 0.375 Å. Semi flexible docking was performed by maintaining target structure as rigid and allowing flexibility to ligand molecules within the drug-binding pocket 13 . Lamarckian genetic algorithm (LGA) was used with 25000000 energy evaluation steps for each dock run. Auto dock generated ten conformers based on free binding energy for each protein-ligand complex. The most energetically favorable (lowest energy) binding complex was considered for analysis. Further analysis and presentation of atomic interaction between docked complexes were performed using PyMol molecular graphics tool (www.pymol.org).

Molecular phylogeny ascertained sequential divergence of SARS-CoV-2 N protein
Total 49 N proteins different CoV species, including SARS-CoV-2 ( Table 1) were retrieved to construct the phylogenetic tree. Again, protein sequences of two distance homologues of SARS-CoV-2 such as Ebola (Accession: SCD11531.1) and H1N1 (Accession: YP_009118629.1) virus were included within the tree in order to establish sequential divergence pattern across species.
The phylogenetic tree was constructed using Neighbour Joining (NJ) method 14

Functional domain identified for SARS-CoV-2 N protein
The complete sequence of SARS-CoV-2 N protein (Accession: YP_009724397.

Structural elements of SARS-CoV-2 N protein
In the absence of full-length structure, the secondary structural elements of SARS-CoV-2 N protein were predicted from its primary sequence using PSIPRED web server. Secondary structural elements such as 2 long, 8 medium, 2 short helical regions and 2 medium, 9 short βsheets were predicted within the complete sequence of SARS-CoV-2 N protein (Fig.3). Most of the NTD (50-175) regions were predicted as β-sheets and coils. On the contrary, structural elements such as helices, β-sheets and coils were observed within CTD (258-359) regions ( Fig.3). Further, highly disordered regions of SARS-CoV-2 N protein were observed above the cut off score (0.5) from amino acid positions 1-50, 180-250, and 350-419 (Fig.4A). However, significant disorder portions were absent within the both NTD (50-175) and CTD (258-359) regions (Fig.4A). According to MEMSAT-SVM algorithm, the sub-cellular localization of SARS-CoV-2 nucleocapsid NTD was found as cytoplasmic, whereas a small C-terminal transmembrane region was noticed from 302-317 amino acids (Fig.4B).

Structure preparation and active site identification of N protein NTD
Homology search using BLASTP algorithm revealed the structure of N-terminal RNA binding domain occupied 30% region of SARS-CoV-2 N protein (Accession: YP_009724397.2) sequence with 100% identity. Therefore, the three-dimensional structure of SARS-CoV-2 N protein was retrieved and processed for structural correction and optimization. The possible drug-binding cavity of SARS-CoV-2 N protein was predicted in the absence of literary evidence.
Algorithm of metaPocket was generated top three hits after clustering the results of PASS11, LIGSITE, Fpocket, SURFNET, GHECOM, and ConCavity. Out of these three, the large active pocket was considered a possible drug-binding cavity (Fig.5).

Structure preparation natural/synthetic ligands against SARS-CoV-2 N protein
As of literature, a total of eight natural compounds of plant origin and three synthetic compounds ( Table 2) were identified with antiviral properties, therefore, prepared to dock against SARS-CoV-2 N protein. Again, seven antiviral drugs ( Table 3) were also included within the study to discover potent inhibitor against N protein of SARS-CoV-2. Finally, 3D structures of a total of eighteen ligands were extracted from online databases (PubChem/Drug Bank) and prepared for docking study.

Molecular docking identified efficient ligand against SARS-CoV-2 N protein
Molecular docking is an efficient technique to identify the binding affinity of a drug compound against a drug target 15,16 . Therefore, all possible inhibitors were docked separately against SARS-CoV-2 N protein to discover effective ligand and important atomic interaction between  (Table 4, Table 5, Fig. 6A-6J). To its support, few  (Table 4, Fig. 7A-6G). Overall docking study confirmed the binding potential of the discussed phytochemicals and drugs, against drug target, Nucleocapsid protein of SARS-CoV-2.

Discussion
The SARS-CoV-2 or COVID19 pandemic has created an alarming situation due to severe infection and death rate worldwide. Researchers all over the world are in search to identify novel drug/vaccine target as well as the development of drug/vaccine to combat the disease. In support of the present scenario, the current study has tried to conduct some critical analyses on important drug target, i.e. Nucleocapsid (N) protein of SARS-CoV-2. The present research also focuses on in silico discovery of potent natural/synthetic compounds against the virus. It has also been reported that N protein has a vital role in the survival and growth of SARS-CoV-2. Thus authors focused on the discovery of potential natural or synthetic compounds to block its regular mechanism. Primary sequence analysis resulted in two crucial functional domain regions both in N and C terminals of SARS-CoV-2. Interestingly, the NTD comprises RNA binding site, which signifies its importance towards a viral cellular mechanism. To its support, the available crystal structure of NTD SARS-CoV-2 N protein was retrieved and Today, the death report of COVID19 from different corner of the globe is drastically increasing due to the absence of an effective antiviral drug. To overcome this situation, eighteen compounds, including natural compounds of plant origin and antiviral drugs, were docked into the drug-binding cavity of N protein to identify potential ligands against SARS-CoV-2. This study has been able to find the binding efficiency of a few phytochemicals (Theaflavin, curcumin, ladanein), and a few drug compounds (glycyrrhizic acid, ethyl brevifolin caboxylate, and quercitrin) against N protein of the virus. This might serve as information about their potential to be a treatment option for SARS-CoV-2. The antiviral effects of phytochemicals such as Theaflavin, curcumin, and ladanein, against many pathogenic viruses, have already been well studied and reported. Theaflavin is known to prevent from influenza virus by inhibiting its replication 18 .
Similarly, Curcumin has anti viral properties against H1N1 Influenza and FIPV 19 . Again, the inhibitory effect of ladanein against hepatitis C virus infection 20 is also well studied. Thus, these compounds may be useful as an anti-infective agent against COVID19. Antiviral drugs such as Glycyrrhizic acid, Ethyl brevifolincarboxylate, and Quercitrin have inhibitory effect against 21,22 hepatitis B and C virus. But, Glycyrrhizic acid and quercetin are associated with severe side effects such as hypokalemia, oedema, rhabdomyolysis or myoglobinuria, mitochondrial toxicity and mutagenicity [23][24] . However, according to the resulted binding affinities and the presence of H-bonds glycyrrhizic acid and Theaflavin can be considered as suitable drug compounds against SARS-CoV-2 N protein. In regards to toxicity associated with glycyrrhizic acid, the use of natural compound, i.e. Theaflavin may be more effective against COVID19. Other than the mentioned natural/synthetic compounds, few others such as Diosgenin 20 , U18666A 25 , Apigenin (Ocimum sanctum) 26 , Resveratrol (Vitis labrusca) 27 , Berberine (Berberis vulgaris) 28 , Emodin (Radix et Rhizoma Rhei, Radix Polygoni Multiflori) 29 , Tenofovir (Phyllanthus niruri) 22 has shown stable binding interaction with SARS-CoV-2 N protein. Hence they may also be studied for further validation.

Conclusion
The COVID19 outbreak has caused havoc throughout the world, changing the course of human lives. Researchers are trying to design a vaccine against SARS-CoV2 but that might take some time. This study attempts to find a drug for treating the disease condition, which will help to save human lives and mitigate the sufferings of millions of people infected by the virus worldwide.
Some antivirals phytocompounds and synthetic drugs have been analyzed in this in silico study, which would target the N protein, responsible for replication of SARS-CoV-2 in the host body.
Of all the compounds in this study, glycyrrhizic acid and Theaflavin can be used as the antiviral drug, as they showed a higher binding affinity with the target protein. They might be effective to inhibit the viral effects and prevent the infections in the host cell, serving as "The Treatment" of the disease.