Viral Metagenomics Revealed Sendai Virus and Coronavirus Infection of Malayan Pangolins (Manis javanica)

Pangolins are endangered animals in urgent need of protection. Identifying and cataloguing the viruses carried by pangolins is a logical approach to evaluate the range of potential pathogens and help with conservation. This study provides insight into viral communities of Malayan Pangolins (Manis javanica) as well as the molecular epidemiology of dominant pathogenic viruses between Malayan Pangolin and other hosts. A total of 62,508 de novo assembled contigs were constructed, and a BLAST search revealed 3600 ones (≥300 nt) were related to viral sequences, of which 68 contigs had a high level of sequence similarity to known viruses, while dominant viruses were the Sendai virus and Coronavirus. This is the first report on the viral diversity of pangolins, expanding our understanding of the virome in endangered species, and providing insight into the overall diversity of viruses that may be capable of directly or indirectly crossing over into other mammals.


Introduction
The Malayan pangolin (Manis javanica), a representative mammal species of the order Pholidota, is one of the only eight pangolin species worldwide. Four of them are from Asia (M. javanica, M. pentadactyla, M. crassicaudata and M. culionensis), whereas another four from Africa (M. tricuspis, M. tetradactyla, M. gigantea and M. temminckii) [1]. Unlike other placental mammals, the skin of pangolins is covered by large and overlapping keratinized scales [2]. Because of the huge demand for their meat as a delicacy and their scales for use in traditional medicines, pangolins are the most poached and trafficked mammal in the world. That is why all the eight pangolin species are included in the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). Concerted efforts have been made to conserve and rescue these species in captivity in China because of their threatened status and continuing decline of the population size in the wild. At the same time, poor health condition and low immunity are also important problems for the rescue of pangolins. A previous study reported a complete genome sequence of Parainfluenza Virus 5 (PIV5) from a Sunda Pangolin (the same as Malayan Pangolin) in China, which further broadens the PIV5 infection host spectrum [3], implicating that pangolins are not only confronted with the potential of great harm from humans, but are also facing the risk of infectious diseases. Recently, a large number of viral metagenomic studies have found pathogenic viruses carried by human, pig, cow, bat, cat, horse, chicken and other animals [4][5][6][7][8][9][10], some of which successfully isolated new virus strains. However, we still know little about the diseases and their etiologies of rare and threatened terrestrial vertebrate such as pangolins.
Viruses are infectious agents that replicate only inside living cells and have the ability to infect a variety of hosts [11]. There has been a lot of discussion within the virology community regarding the best method to determine viral infectivity, pathogenicity, and effects on the host microbiome. Virologists use a variety of methods to gain understanding of infection, replication, pathogenicity, and, more recently, the evolution of the viral genome. Unbiased sequencing of nucleic acids from environmental samples has great potential for the discovery and identification of diverse microorganisms [12][13][14][15]. We know this technique as metagenomics, or random, agnostic or shotgun high-throughput sequencing. In theory, metagenomics techniques enable the identification and genomic characterization of all microorganisms present in a sample with a generic laboratory procedure [16]. The approach has gained popularity with the introduction of next-generation sequencing (NGS) methods that provide more data in less time at a lower cost than previous sequencing techniques. While initially mainly applied to the analysis of the bacterial diversity, modifications in sample preparation protocols allowed characterization of viral genomes as well. Researchers have seized the opportunity to expand our knowledge in the fields of virus discovery and biodiversity characterization [12,13,15,17].
The Guangdong Wildlife Rescue Center received 21 live Malayan pangolins from the Anti-smuggling Customs Bureau on 24 March 2019; most individuals, including adults and subadults, were in poor health, and their bodies were covered with skin eruptions. All these Malayan pangolins were rescued by the Guangdong Wildlife Rescue Center, however, 16 died after extensive rescue efforts. Most of the dead pangolins had a swollen lung which contained a frothy liquid, as well as the symptom of pulmonary fibrosis, and in the minority of the dead ones, we observed hepatomegaly and splenomegaly. We collected 21 organ samples of lung, lymph, and spleen with obvious symptoms from 11 dead Malayan pangolins to uncover the virus diversity and molecular epidemiology of potential etiologies of viruses based on a viral metagenomic study. This study will be beneficial to pangolin disease research and subsequent rescue operation.

Ethics Statement
The study design was approved by the ethics committee for animal experiments at the Guangdong Institute of Applied Biological Resources (reference number: GIABR20170720, 20 July 2017) and followed basic principles outlined by this committee.

Library Preparation and Sequencing
In our study, organ samples of lung, lymph and spleen were collected from dead Malayan pangolins at the Guangdong Wildlife Rescue Center. Preparation of viral-like particles followed a previous published paper [18]. Total nucleic acid was extracted from viral-like particles using a MagPure Viral DNA/RNA Mini LQ Kit (R6662-02; Magen, Guangzhou, China). Double-stranded cDNA was synthesized by reverse transcription from single-stranded and double-stranded RNA viral nucleic acids using REPLI-g Cell WGA & WTA Kit (150052; Qiagen, Hilden, Germany), while single-stranded DNA viral nucleic acids were converted to double-stranded DNA and purified by a REPLI-g Cell WGA & WTA Kit (150052; Qiagen, Hilden, Germany). Amplified DNA was randomly sheared by ultrasound sonication (Covaris M220) to produce fragments of ≤ 800 bp, and sticky ends repaired and adapters added using T4 DNA polymerase (M4211, Promega, Madison, WI, USA), Klenow DNA Polymerase (KP810250, Epicentre), and T4 polynucleotide kinase (EK0031, Thermo scientific-fermentas, Glen Burnie, MD, USA). Fragments of approximately 350 bp were collected by beads after electrophoresis. After amplification, libraries were pooled and subjected to 150 bp paired-end sequencing using the Novaseq 6000 platform (Illumina, San Diego, CA, USA). High-throughput sequencing was conducted by the Magigene Company (Guangzhou, China). The data supporting this study are openly available on the NCBI sequence read archive (SRA) under Bio Project PRJNA573298.

Quality Control
As raw sequencing reads always include some low-quality data, it is necessary to perform processing to improve the accuracy of reads for follow-up analyses. To this end, we used SOAPnuke version 1.5.6 [19] to remove adapter sequences and reads (i) with more than 5% Ns; (ii) those with 20% base quality values less than 20; (iii) those arising from PCR duplications; as well as (iv) those with a polyA sequence.

Remove Host Contamination
To avoid the confusion cause by ribosomes and host sequences, all clean reads that passed quality control were mapped to the ribosomal database (silva) and the host reference genome of M. javanica (NCBI Project ID: PRJNA256023) utilizing BWA version 0.7.17 [20]; only the unmapped sequences were used in subsequent analysis.

Rapid Identification of Virus Species
Clean reads without ribosomes and host sequences were mapped to an in-house virus reference data separate from the GenBank non-redundant nucleotide (NT) database to primarily identify virus reads. According to the NCBI taxonomy database annotation information, reads were classified into different virus families. To improve the accuracy, we removed the alignment results with a coverage below 5 reads.

Read Assembly and Species Identification
Clean reads were de novo assembled using MEGAHIT version 1.0 [21]. BWA version 0.7.17 [20] was used to align clean reads to assembled contigs. A host sequence was determined based on BLAST version 2.7.1 and was removed by satisfying one of the following conditions: (1) Length of matched area ≥ 500 bp, alignment similarity ≥ 90%; (2) Length of matched area accounts for more than 80% of the total length of contigs, and alignment similarity ≥ 90%. Then, Cdhit version 4.6 [22] was used to cluster the assembled virus contigs from each Malayan pangolin sample. Contigs were then classified by BLASTx against the NT database using alignment similarity ≥ 80%, length of matched area ≥ 500 bp and e-value ≤ 10 −5 . Contigs with significant BLASTx hits were confirmed as virus sequences.

Phylogenetic Analysis
Whole genome sequences of virus strains, the same species as the dominant viruses in Malayan pangolins, from different hosts were downloaded from ViPR database (https://www.viprbrc.org/brc/ home.spg?decorator=vipr). Virus sequences from Malayan pangolins and other hosts were aligned using MAFFT version 7.427 [23] with the auto alignment strategy. The best substitution models, as well as maximum likelihood (ML) trees were then evaluated with the iqtree version 1.6.9 [24] with 1000 bootstrap replicates. Then, all the ML trees were visualized and exported as vector diagrams with FigTree version 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).

Viral Metagenomics
A total of 21 organ samples of lung, lymph and spleen from 11 dead Malayan pangolins that could not be rescued by the Guangdong Wildlife Rescue Center were used to reveal viral diversity of pangolins. Viral nucleic acids were deep sequenced and then we obtained a total of 227.32 GB data (757,729,773 valid reads, 150 bp in length). In total, 233,587 reads were best matched with viral proteins available in the NCBI NR database (~0.03% of the total sequence reads). The number of viral-associated reads in each sample varied from 2856 to 78,052 (Table 1). In the aggregate, 28 families of viruses were parsed (Table S1). The most widely distributed virus families were Herpesviridae and Paramyxoviridae, and the diverse reads related to these families occupied~85% of the total viral sequence reads ( Figure 1). Contig sequences were then generated by de novo assembly using MEGAHIT version 1.0 [21], generating 62,508 unique contigs with a max. length of 13,503 bp (Table 2, Figure S1). A taxonomic assignment of these contigs was performed on the basis of BLAST analysis. At this stage, 68 contigs were confirmed for virus species, accounting for about 0.1% of the total number of contigs (Table 2). An assignment of these contigs to different types of viral genomes identified 20.59% DNA viruses and 79.41% RNA viruses, among which 14.71% were assigned to Phages. Another 3532 contigs were suspected to be assigned to virus species (Table S2). DNA viruses accounted for 66.53% while RNA viruses accounted for 33.47%, and 37.06% of these contigs were assigned to Phages. For all the unique contigs, the top 30 ones with the most reads abundance were assigned to families Paramyxoviridae, Flaviviridae and Caudovirales ( Figure 2). Contig sequences were then generated by de novo assembly using MEGAHIT version 1.0 [21], generating 62,508 unique contigs with a max. length of 13,503 bp (Table 2, Figure S1). A taxonomic assignment of these contigs was performed on the basis of BLAST analysis. At this stage, 68 contigs were confirmed for virus species, accounting for about 0.1% of the total number of contigs (Table 2). An assignment of these contigs to different types of viral genomes identified 20.59% DNA viruses and 79.41% RNA viruses, among which 14.71% were assigned to Phages. Another 3532 contigs were suspected to be assigned to virus species (Table S2). DNA viruses accounted for 66.53% while RNA viruses accounted for 33.47%, and 37.06% of these contigs were assigned to Phages. For all the unique contigs, the top 30 ones with the most reads abundance were assigned to families Paramyxoviridae, Flaviviridae and Caudovirales (Figure 2).

Sendai Virus
Sendai virus was identified in 6 of the 11 Malayan pangolin individuals, which was the common identified virus. For several of these pangolin samples, larger Sendai virus contigs were produced ( Table 2). In one case, a contig of 13,232 base pairs isolated from the lung tissue of individual 19 was identified, which is about 86% of the whole genome sequence length (15,384). This contig showed relatively high sequence identity (89.76%) to the whole genome sequence of a Sendai virus strain isolated from humans (GenBank accession: AB005795). The length of other contigs conformed as Sendai virus was in the range from 608 to 7027 bp ( Table 2).
Whole genome sequences of Sendai virus from human beings, mouse and monkey were downloaded from the ViPR database (https://www.viprbrc.org/brc/home.spg?decorator=vipr). After sequence alignment conducted by MAFFT version 7.427 (Katoh & Standley, 2013), the best substitution model analyzed by iqtree was GTR+F+I. Then phylogenetic analysis revealed the closest relationship between the 13,232 bp length contig from Malayan pangolin and Sendai virus strains isolated from humans (AB005795.1), but distant from strains isolated from the mouse (Figure 3). Then, we generated the phylogenetic relationships of each gene sequence. Six trees had slight differences, but the genetic distance between the Sendai virus from Malayan pangolin and humans (AB005795.1) was the closest (Figure 4); the same as the relationship between them generated based on whole genome sequences.

Sendai Virus
Sendai virus was identified in 6 of the 11 Malayan pangolin individuals, which was the common identified virus. For several of these pangolin samples, larger Sendai virus contigs were produced ( Table 2). In one case, a contig of 13,232 base pairs isolated from the lung tissue of individual 19 was identified, which is about 86% of the whole genome sequence length (15,384). This contig showed relatively high sequence identity (89.76%) to the whole genome sequence of a Sendai virus strain isolated from humans (GenBank accession: AB005795). The length of other contigs conformed as Sendai virus was in the range from 608 to 7027 bp ( Table 2).
Whole genome sequences of Sendai virus from human beings, mouse and monkey were downloaded from the ViPR database (https://www.viprbrc.org/brc/home.spg?decorator=vipr). After sequence alignment conducted by MAFFT version 7.427 (Katoh & Standley, 2013), the best substitution model analyzed by iqtree was GTR+F+I. Then phylogenetic analysis revealed the closest relationship between the 13,232 bp length contig from Malayan pangolin and Sendai virus strains isolated from humans (AB005795.1), but distant from strains isolated from the mouse (Figure 3). Then, we generated the phylogenetic relationships of each gene sequence. Six trees had slight differences, but the genetic distance between the Sendai virus from Malayan pangolin and humans (AB005795.1) was the closest (Figure 4); the same as the relationship between them generated based on whole genome sequences.

Coronavirus
One or several members of the Coronaviridae families were identified in 2 out of the 11 M. javanica individuals (individual 07 and 08). For several of these pangolin samples, larger contigs were produced, and the length ranged from 503 to 2330 bp (Table 3). Though there was high species variety of Coronavirus detected, SARS-CoV was the most widely distributed (Table 3). Whole genome sequences of strains belonging to four genera (Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronaviruses) isolated from different hosts were downloaded from the ViPR database (https://www.viprbrc.org/brc/home.spg?decorator=vipr). Together with 16 contigs confirmed as Coronavirus in this study, all the sequences were aligned utilizing MAFFT version 7.427 (Katoh & Standley, 2013). The best substitution model analyzed by iqtree was GTR+F+R7, and the phylogenetic analysis therefore showed multiple relationships between Coronavirus contigs and the four Coronavirus genera ( Figure 5).

Coronavirus
One or several members of the Coronaviridae families were identified in 2 out of the 11 M. javanica individuals (individual 07 and 08). For several of these pangolin samples, larger contigs were produced, and the length ranged from 503 to 2330 bp (Table 3). Though there was high species variety of Coronavirus detected, SARS-CoV was the most widely distributed (Table 3). Whole genome sequences of strains belonging to four genera (Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronaviruses) isolated from different hosts were downloaded from the ViPR database (https://www.viprbrc.org/brc/home.spg?decorator=vipr). Together with 16 contigs confirmed as Coronavirus in this study, all the sequences were aligned utilizing MAFFT version 7.427 (Katoh & Standley, 2013). The best substitution model analyzed by iqtree was GTR+F+R7, and the phylogenetic analysis therefore showed multiple relationships between Coronavirus contigs and the four Coronavirus genera ( Figure 5).

Discussion
Pangolins are important wildlife resources in imminent danger of extinction. Great efforts have been made to rescue trafficked pangolins; however, most of the pangolin individuals intercepted by customs were in a poor health condition, and then dead in a few days. Investigating the potential pathogens carried by pangolins may help to rescue them. Our viral metagenomics analysis revealed a high diversity of viruses carried by dead Malayan pangolins. The Sendai virus and Coronaviruses were dominant virus species conformed by assembled contigs, which might have some relationship with the death of Malayan pangolins. Recently, the prediction of viral zoonosis epidemics has become a major public health issue. A profound understanding of the viral population in key animal species acting as reservoirs represents an important step towards this goal. Bats are natural hosts for a large variety of zoonotic viruses. In a recently study, up to 47 different virus families were detected from bat fecal samples [25]. Over 130 virus species have been detected in bats as of 2017 [26], including several emergent human pathogens [27][28][29][30][31][32][33][34][35][36][37][38]. For domesticated animals, virome analysis between sick and health ones could help to find out the pathogens or virus diversity [8,9,[39][40][41]. Our study showed that viral metagenomics analysis could also work in revealing viral diversity and potential pathogens of rare and threatened terrestrial vertebrates such as pangolins.
The Sendai virus was the most widely distributed pathogens in 11 dead Malayan pangolins, which was one of the potential causes of their death. The whole genome and individual gene phylogenies for Sendai virus sequences assembled in this study all showed that the Sendai virus from Malayan pangolin had the closest relationship with the strain isolated from humans (AB005795.1), which strongly suggests the possibility that the Sendai virus is transmitted between pangolins and humans. Sendai virus is a member of the paramyxovirus subfamily Paramyxovirinae, genus Respirovirus, members of which primarily infect mammals. The scientific community considers the Sendai virus as the archetype organism of the Paramyxoviridae family because most of the basic biochemical, molecular and biologic properties of the whole family were derived from its own characteristics [42]. Sendai virus-associated disease has a worldwide distribution and has been found in mouse colonies in Asia [43], North America [44] and Europe [45], and is responsible for a highly transmissible respiratory tract infection in mice, hamsters, guinea pigs, rats, and occasionally pigs and bats [46,47], with infection through both air and direct contact routes. Epizootic infections of mice are usually associated with a high mortality rate, while enzootic disease patterns suggest that the virus is latent and can be cleared over the course of a year. This is the first report of a wild pangolin dying possibly due to Sendai virus infection, which further broadens the Sendai virus infection host spectrum. Because of the lack of healthy individuals as a control, we could not figure out whether the Sendai virus carried by pangolins was caused by infection from other hosts or was inherited.
Besides the Sendai virus, Coronaviruses were also detected as potential pathogens of Malayan pangolins. The phylogeny of Coronavirus sequences assembled and strains from four Coronavirus genera demonstrated complex genetic relationships and high species diversity of the Coronavirus in Malayan pangolins. Coronaviruses can cause a variety of severe diseases including gastroenteritis and respiratory tract diseases, and have been identified in mice, rats, chickens, turkeys, swine, dogs, cats, rabbits, horses, cattle and humans [48,49]. Sometimes, but not often, a coronavirus can infect both animals and humans. Human coronaviruses were first described in the 1960s for patients with the common cold. Since then, more have been discovered, including those that cause severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS), two pathogens that can cause fatal respiratory disease in humans [50,51]. It was recently discovered that dromedary camels in Saudi Arabia harbor three different human coronaviruses species, including a dominant MERS HCoV lineage that was responsible for the outbreaks in the Middle East and South Korea during 2015 [52]. The detection of different types of SARS-CoV in this study may also be related to the death of the Malayan pangolins. Considering the outbreak of SARS which was transmitted by masked palm civet from the natural reservoir of bats [29,53,54], Malayan pangolins could be another host with the potential of transmitting the SARS coronavirus to humans. As a consequence, the viral metagenomic study of Malayan pangolin is meaningful both for the conservation of rare wild animals and public health.

Conclusions
We found high viral diversity of dead Malayan pangolins, and the Sendai virus and Coronavirus may be the dominant pathogens responsible for their death. The Sendai virus showed a close relationship between the Malayan pangolin and the strain isolated from humans, whereas Coronavirus sequences showed a high species diversity. Further investigations are required to compare the incidence of these viruses in healthy and diseased pangolin individuals in order to better elucidate their pathogenic role. To date, this is the first metagenomic study of virus diversity in pangolins in China. This study expands our understanding of the viral diversity in endangered species and the capability of directly or indirectly crossing over into other mammals.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/11/979/s1, Figure S1: Length distribution of all the unique contigs generated by de novo assembly. The horizontal axis indicates length of contigs, while the vertical axis indicates number of contigs at different stages of length, Table S1: Number of sequencing reads assigned to different virus families in each Malayan pangolin sample, Table S2

Conflicts of Interest:
The authors declare that the research described herein was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.