Pan-Enterovirus Amplicon-Based High-Throughput Sequencing Detects the Complete Capsid of a EVA71 Genotype C1 Variant via Wastewater-Based Epidemiology in Arizona

We describe the complete capsid of a genotype C1-like Enterovirus A71 variant recovered from wastewater in a neighborhood in the greater Tempe, Arizona area (Southwest United States) in May 2020 using a pan-enterovirus amplicon-based high-throughput sequencing strategy. The variant seems to have been circulating for over two years, but its sequence has not been documented in that period. As the SARS-CoV-2 pandemic has resulted in changes in health-seeking behavior and overwhelmed pathogen diagnostics, our findings highlight the importance of wastewater-based epidemiology (WBE ) as an early warning system for virus surveillance.


Introduction
Enterovirus A71 (EVA71) is a member of Enterovirus A species in the genus Enterovirus, family Picornaviridae, order Picornavirales. Like other EVs, the EVA71 genome is a plussense, single-stranded, monopartite RNA with a length of~7500 nts. Until recently, EV genomes were known to have one polyprotein (>2000 aa) encoding open reading frame (ORF) whose protein product is proteolytically cleaved first into three protein precursors (P1, P2 and P3) and subsequently into 11 proteins; VP1-VP4 (the capsid proteins from P1), 2A-2C (from P2) and 3A-3D (from P3). However, it has been shown that EV genomes also often encode a small ORF upstream of and overlapping the polyprotein ORF (ppORF) [1]. The upstream ORF (uORF) encodes a 56-76 aa peptide (UP) that has been shown to modulate gut infection [1]. Based on sequence diversity of the VP1 protein, seven (7) EVA71 genotypes (A-G) and several sub-genotypes (e.g., C1-C5) have been delineated [2]. EVA71 has been associated with hand, foot and mouth disease (HFMD) globally for over two decades [3] and more recently (alongside EVD68) with a polio-like illness called Acute Flaccid Myelitis (AFM). Since 2012, AFM epidemiology has shown biennial (2012,2014,2016 and 2018) peaks [3] and there have been over 600 confirmed cases in the USA with fourteen (1, 6, 2 and 5 in 2014, 2016, 2017 and 2018, respectively) of them in Arizona [4]. Despite the occurrence of AFM cases, there has been limited EVD68 and EVA71 genomic sequence data from Arizona in GenBank. The authors are only aware of one partial EVA71 VP1 sequence from Arizona recovered in 1994 (AF009553) [5]. Here, we describe the complete capsid of a genotype C1-like EVA71 variant from wastewater in a
Phylogentic analysis of the P1 and P2 regions from MT952340 suggests that it shares a common ancestor with the lineage that contains C1-like variants associated with severe neurological complications in Europe (from 2015-2016) and America (2018) (Figures 1 and 2A). However, we found contrary results when examining VP1. Phylogenetic analysis using the complete gene suggests MT952340 is most closely related to a 2016 variant (KY888026) recovered in the USA ( Figure 2B). This could be due to selection bias since most of the eighty (80) sequences analyzed here have a complete genome (or at least complete P1) but most sequences in GenBank have only VP1. Specifically, we searched GenBank using the query "Enterovirus A71 complete genome" or "Enterovirus A71 VP1" and found 1001 and 10,622 hits, respectively. To rule out this bias, we used the complete VP1 sequence of our variant (MT952340) in a BLASTn search of GenBank and selected the top 100 hits. We removed 39 sequences that were already in our dataset and used the 61 remaining, plus our dataset of 80 sequences, to construct a new ML tree. The result ( Figure S1) shows a similar topology as Figure 2B with MT952340 clustering with the 2016 variant (KY888026) recovered in the USA. These results, taken together, suggest that MT952340 has recombination occurring within the P1 genomic region. To further examine this hypothesis, we performed similarity and bootscan analysis in Simplot [11]. Simplot showed that MT952340 was indeed a C1-like EVA71 both in the P1 and P2 genomic regions ( Figure 3A). Bootscan analysis however suggest that MT952340 might have a complex recombination history even within the C1-like sub-genogroup ( Figure 3B). Specifically, it suggested that P1 was (at least) a double recombinant. We tested this hypothesis by making ML trees of VP1, VP2 and VP3 and these confirmed the results of bootscanning by showing different topologies per gene ( Figures S2-S4). In addition, pairwise distance analysis using complete VP1 sequences showed that all the EVA71 variants in GenBank were >2% divergent from MT952340 (data not shown). In poliovirus surveillance parlance, this would have been classified an 'orphan virus' [12], suggesting gaps in case-based surveillance. Going by the molecular clock of EVs [13,14], the fact that all the EVA71 variants in GenBank were >2% divergent from MT952340 suggests that a predecessor variant circulated and consequently accumulated mutations for over two years to result in the sequence MT952340 (the variant described here). More importantly, none of the variant sequences spanning the over two-year evolution from the predecessor variant to MT952340 have been made publicly available, if sequenced. Hence, as of the time of writing, analysis of MT952340 using publicly available sequence data in GenBank shows that there is a two-year gap of undescribed circulation of EVA71 C1-like variants related to MT952340. Whether these strains have been detected by different research groups, but are yet to be deposited in public databases such as GenBank, remains to be determined. This could be long enough to accumulate the complex recombination history suggested by the results of Bootscan and phylogenetic analysis (Figures 1-3, Figures S1-S4).   This recombination pattern could have been as a result of template switching during PCR amplification. This would imply the presence of more than one variant in our sample. However, we found no single nucleotide variants or INDELS in the reads (data not shown). Furthermore, we Sanger-sequenced a 350 bp portion of VP1, and viewed the chromatogram which had clean peaks and the same sequence as MT952340 ( Figure S5). This further confirmed our virus consists of one variant and not an assemblage of variants. Interestingly, the only difference at the VP1 amino acid (aa) level between MT952340 and virus sequences from the European and United States is an asparagine to serine substitution at position 282 (N282S) in the VP1 protein ( Figure S6). It is not clear how this change might impact the phenotype of the virus but changes (usually N282D) in this VP1 aa position has been associated with increased virus replication in both human (rhabdomyosarcoma) and nonhuman primate (kidney; Vero) cell lines [15,16]. Also, there are six (6) substitutions (V28I, Y35H, N59S, K65R, L70P and V73A) in the UP that seem unique to MT952340 (data not shown). Considering UP's relatively new discovery, it is not clear how these amino acid substitutions in the UP protein might impact its function.

Discussion
We describe an EVA71 C1-like variant (MT952340) detected in wastewater in a neighborhood in the greater Tempe, Arizona area (Southwest United States) in May 2020. MT952340 has a complex recombination pattern that suggests it has regions from variants in Europe and USA and seems to have been circulating for over two (2) years undetected. Recently, Ngagas et al. [10] showed that four (4) recombination modules (5 -UTR, P1, P2 and P3) might exist in the EVA71 C1-like genome. Our data ( Figure 3B) suggests the number of recombination modules might be more. Specifically, while the four (4) modules detailed in Ngagas et al. [10] might be true for inter-genotype recombination, our data suggests that intra-subgenotypically, the P1 module in EVA71 C1-like genomes might function as at least three modules as has been previously shown for P1 segments of EV-A, EV-B and EV-C [17][18][19]. Altogether, the results of this study suggest several things. First is that more EVA71 C1-like variants exist than is currently documented by case-based surveillance (CBS). This is not surprising as CBS usually only captures only 0.5-1% of people infected with EVs [20]. Secondly, our data suggests that coinfection with multiple variants of EVA71 C1-like happens as this seems the only way the type of recombinant described here could have evolved. Thirdly, as has been previously documented for EV-A, EV-B and EV-C [17][18][19], it seems the P1 genomic region of EVA71 C1-like genomes also function as multiple recombination modules within the subgenotype.
Despite the SARS-CoV-2 lockdown, the CDC has confirmed 29 cases of AFM across the USA in 2020 [4] (none in Arizona); suggesting there was EV activity during this time. Considering the biennial peak pattern of AFM, 2020 was supposed to be an AFM peak-year. It is, therefore, possible that the lockdown of schools and daycare centers, which are the major transmission hubs of EVs, and the consequent reduced host mobility intended to curb the spread of SARS-CoV-2, might have unexpectedly also helped to do the same for EVA71. It is also likely that the overwhelming burden of SARS-CoV-2 in the United States has resulted in changes in health-seeking behavior (see [21]) and testing bias towards COVID-19 as opposed to other possible infections. It is, therefore, possible that there had been silent circulation of EVA71 C1-like in the greater Tempe, Arizona area in 2020 which did not show up on the case-based surveillance radar.
Our findings highlight how valuable WBE can be for enhancing our ability to catalogue EVA71 C1-like diversity and consequently, our understanding of its evolutionary dynamics. Furthermore, it also shows the importance of wastewater-based epidemiology as an earlywarning system for detecting the introduction or circulation of viruses within communities.

Materials and Methods
We analyzed wastewater as part of routine wastewater-based epidemiology (WBE) via the Human Health Observatory (HHO) [22], Biodesign Institute, Arizona State University (ASU). The HHO represents a network of greater than 300 cities across the globe [23] that routinely provides wastewater and sludge samples for analysis and research. The specific sample described here was a composite sample collected using a time-weighted automated sampler which aliquots approximately 60 mL of wastewater every 15 min over a 24-h period. The wastewater collection station sampled in this study of a neighborhood in Arizona in the greater Tempe area serves the total population of 6500 residents. The sample was then transported on ice to the wastewater processing laboratory at the Biodesign Institute, ASU where 400 mL of the sample was subjected to virus concentration using polyethylene glycol (PEG) 8000 and sodium chloride precipitation. Subsequently, RNA was extracted from the concentrate and subjected to the one-step reverse-transcriptase polymerase chain reaction (RT-PCR) assay recently described [24] using the SSIII, one-step RT-PCR kit with Platinum Taq (Invitrogen, Carlsbad, CA, USA). Part of the amplicon was used as template in a nested PCR assay using primers AN89 and AN88 as detailed in Nix et al. [25]. Both amplicons were submitted to the ASU Genomics Core. The first round PCR amplicon (~4 kb) was used for library preparation (KAPA Hyperplus Library Kit) followed by paired end sequencing (2 × 250 bp) using the MiSeq system V2 (Illumina, San Diego, CA, USA). The nested PCR amplicon (~350 bp) was Sanger-sequenced using primers AN89 and AN88.
The Illumina raw reads were viewed using FASTQC v0.11.5, trimmed using Trimmomatic v0.36 and de novo assembled using MetaSpades v3. 13.0 (all using default parameters) on the KBase platform [26]. The EV contig in the assembly was identified using the enterovirus genotyping tool [27] and then used for variant analysis in Geneious Prime (using the built-in "Find Variations/SNPs" program) [28].
For phylogenetic analysis, the EV contig recovered in this study was used for a BLASTn search of the GenBank database. The top 100 hits were downloaded from GenBank and further screened using the enterovirus genotyping tool [27] to remove those that did not belong to the same subgenotype. Subsequently, we had eighty (80) sequences (including ours) in a local database (see Table S1 for a list of all sequences analyzed in this study). These 80 sequences were subjected to multiple sequence alignment using the ClustalW program in MEGA X software [29]. Subsequently, maximum-likelihood (ML) trees were constructed using 1000 bootstrap replicates in MEGA X [29] for the P1 + P2, P2, P1, VP2, VP3 and VP1 genomic regions. The pairwise distance calculator in MEGA X [29] was used for VP1 pairwise distance estimation. This was done using the Kimura (2-parameter) model and 1000 bootstrap replicates. The complete EV contig was also subjected to similarity and bootscanning analysis using the Kimura (2-parameter) model in SimPlot version 3.5.1 [11] with a sliding window of 800 base pairs moving in steps of 80 nucleotides.
The sequence described in this study has been submitted in GenBank under the accession number MT952340. The raw reads have been deposited under SRA accession number SRX9713168.
Supplementary Materials: The following are available online at https://www.mdpi.com/1999-4 915/13/1/74/s1, Figure S1: Phylogenetic tree of EVA71 genotype C1; Figure S2: Phylogenetic tree on an alignment of VP1 nucleotide sequences of the 80 EVA71 genotype C1 sequences detailed in Table S1; Figure S3: Phylogenetic tree based on an alignment of VP3 nucleotide sequences of the 80 EVA71 genotype C1 sequences detailed in Table S1; Figure S4: Phylogenetic tree based on an alignment of VP2 nucleotide sequences of the 80 EVA71 genotype C1 sequences detailed in Table  S1; Figure S5: Chromatogram of Sanger sequencing results of a 350 bp amplicon of the VP1 gene for MT952340; Figure S6: Amino acid translation of 11 complete VP1 sequences (297 aa) from select isolates in Europe and the United States; Table S1: Sequences analyzed in this study.   Table S1, we provide GenBank accession numbers of the sequences analyzed in this study.