Assessing Multiplex Tiling PCR Sequencing Approaches for Detecting Genomic Variants of SARS-CoV-2 in Municipal Wastewater

ABSTRACT Wastewater-based genomic surveillance of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus shows promise to complement genomic epidemiology efforts. Multiplex tiling PCR is a desirable approach for targeted genome sequencing of SARS-CoV-2 in wastewater due to its low cost and rapid turnaround time. However, it is not clear how different multiplex tiling PCR primer schemes or wastewater sample matrices impact the resulting SARS-CoV-2 genome coverage. The objective of this work was to assess the performance of three different multiplex primer schemes, consisting of 150-bp, 400-bp, and 1,200-bp amplicons, as well as two wastewater sample matrices, influent wastewater and primary sludge, for targeted genome sequencing of SARS-CoV-2. Wastewater samples were collected weekly from five municipal wastewater treatment plants (WWTPs) in the Metro Vancouver region of British Columbia, Canada during a period of increased coronavirus disease 19 (COVID-19) case counts from February to April 2021. RNA extracted from clarified influent wastewater provided significantly higher genome coverage (breadth and median depth) than primary sludge samples across all primer schemes. Shorter amplicons appeared to be more resilient to sample RNA degradation but were hindered by greater primer pool complexity in the 150-bp scheme. The identified optimal primer scheme (400 bp) and sample matrix (influent) were capable of detecting the emergence of mutations associated with genomic variants of concern, for which the daily wastewater load significantly correlated with clinical case counts. Taken together, these results provide guidance on best practices for implementing wastewater-based genomic surveillance and demonstrate its ability to inform epidemiology efforts by detecting genomic variants of concern circulating within a geographic region. IMPORTANCE Monitoring the genomic characteristics of the SARS-CoV-2 virus circulating in a population can shed important insights into epidemiological aspects of the COVID-19 outbreak. Sequencing every clinical patient sample in a highly populous area is a difficult feat, and thus sequencing SARS-CoV-2 RNA in municipal wastewater offers great promise to augment genomic surveillance by characterizing a pooled population sample matrix, particularly during an escalating outbreak. Here, we assess different approaches and sample matrices for rapid targeted genome sequencing of SARS-CoV-2 in municipal wastewater. We demonstrate that the optimal approach is capable of detecting the emergence of SARS-CoV-2 genomic variants of concern, with strong correlations to clinical case data in the province of British Columbia. These results provide guidance on best practices on, as well as further support for, the application of wastewater genomic surveillance as a tool to augment current genomic epidemiology efforts.

the application of wastewater genomic surveillance as a tool to augment current genomic epidemiology efforts.
KEYWORDS COVID-19, RNA, SARS-CoV-2, epidemiology, variants, wastewater, wholegenome sequencing G enomic surveillance of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus plays a critical role in tracking its evolution during the current global coronavirus disease 2019 (COVID- 19) pandemic (1)(2)(3). Recently, several emerging lineages of SARS-CoV-2, so-called variants of concern (VoCs), have been associated with increased levels of transmission (4), disease severity (5), and/or immune escape (6,7). These VoCs have originated from various locations globally (4,8), but they are spreading within new geographic regions due to travel-associated and local transmission (9). Providing rapid detection of VoC infections within a population could thus help to inform effective public health outbreak mitigation strategies.
Since the SARS-CoV-2 virus is shed in feces during infection (10), viral genome fragments can be detected in municipal wastewater and have been associated with clinical case numbers within contributing regions (11)(12)(13)(14). Previous work has demonstrated the potential to sequence SARS-CoV-2 fragments in municipal wastewater and detect single-nucleotide variants (SNVs) that correspond to clinical cases in the contributing sewershed (15)(16)(17). As SARS-CoV-2 titers in wastewater are relatively low (11,13), an enrichment step is typically needed prior to sequencing to improve sensitivity (15). The two main approaches for enriching SARS-CoV-2 RNA in wastewater include oligonucleotide-based capture (15) and multiplex tiling PCR-based targeted amplification (16,17). The latter approach is promising for wastewater-based viral genomic surveillance due to its lower reagent cost and the potential to be deployed rapidly and in remote locations (18). An important consideration for applying multiplex tiling PCR is the average amplicon length, as this can impact assay sensitivity in the case of RNA degradation (19). This could be particularly important for its application to wastewater-based epidemiology, as SARS-CoV-2 particles and free RNA can undergo variable levels of degradation (20,21) and may differ based on the type of wastewater sample matrix (e.g., influent versus primary sludge) (22). We therefore hypothesized that there may be an optimal tiling PCR amplicon size and an optimal wastewater sample matrix type that enable adequate genome coverage of SARS-CoV-2 for the identification of genomic VoCs.
SARS-CoV-2 genome coverage is greater with influent wastewater ultrafiltration than with direct sludge extraction and is impacted by multiplex tiling PCR amplicon length. We sequenced a total of 96 wastewater samples collected between 7 February and 18 April 2021 across five municipal WWTPs in Vancouver, Canada, using the following three different primer schemes for multiplex tiling PCR of SARS-CoV-2: Swift Bioscience's 150-bp amplicon scheme (n = 10 total, 3 sludge and 7 influent), the Artic 400-bp amplicon scheme (23) (n = 62 total, 8 sludge and 54 influent), and the Freed/midnight 1,200-bp amplicon scheme (24) (n = 24 total, 4 sludge and 20 influent) (detailed methods are given in Text S1 in the supplemental material). Sludge samples failed to produce libraries with over 32% breadth of genome coverage across all primer schemes and sample cycle thresholds (C T ) ( Fig. 1A to C). Conversely, influent wastewater samples produced libraries that had significantly higher breadth of coverage across all primer schemes (P , 0.01, Tukey's test; Fig. 1). One possible explanation for this finding could be that the sludge matrix was inhibitory to reverse transcription-PCR (RT-PCR) (11); however, no inhibition of reverse transcription-quantitative PCR on sludge RNA extracts was detected using internal controls (see Text S1 in the supplemental material and Table S2 at https://doi.org/10 .6084/m9.figshare.16416528). Another potential reason for the lower genome coverage in sludge was that SARS-CoV-2 was more nonintact or its RNA was more degraded with the direct sludge extraction compared to ultrafiltration of influent wastewater, as has been previously hypothesized (22). A third potential cause of discrepancies in genome coverage between sludge and influent wastewater samples could be higher off-target amplification in sludge extracts. Correspondingly, sample type significantly impacted the fraction of on-target reads for all schemes after accounting for C T values (P , 0.01, two-way analysis of covariance [ANCOVA]), with mean mapping rates of sludge samples being over 100 times lower than that of influent samples (0.01% versus 11.3%, respectively; Table S1). Therefore, ultrafiltration of influent wastewater provided more suitable RNA extracts for multiplexed tiling PCR of SARS-CoV-2 than did direct extraction from wastewater sludge, likely due to a combination of greater SARS-CoV-2 RNA degradation and greater off-target amplification in sludge.
If the level of RNA degradation within a wastewater sample impacts the resulting SARS-CoV-2 genome coverage, we would expect to see less of a drop-off in coverage at high C T values for schemes with shorter amplicons. Indeed, we detected a significant effect of amplicon length on the breadth of genome coverage as a function of C T (P , 0.01, two-way ANCOVA). The median genome coverage with the 150-bp amplicon scheme spanned one order of magnitude within influent wastewater samples that had C T values ranging from 31 to 37 (Fig. 1D), while those from the 400-bp and 1,200-bp schemes spanned 3.2 and 3.0 orders of magnitude, respectively ( Fig. 1E and F). Improvements with the 400-bp scheme versus the 1,200-bp scheme were marginal, yet 83% of paired influent samples with C T values over 32.5 (10 of 12) showed higher breadth of coverage with the 400-bp scheme (Fig. S1). Thus, shorter amplicon schemes may be more robust to sample RNA degradation at higher C T values. However, there was a trade-off between amplicon length and genome coverage, as the magnitudes of the median genome coverage and breadth of coverage obtained with the 150-bp scheme and influent samples were significantly lower than those obtained with the  400-bp scheme (P = 0.022 and 5.0 Â 10 29 , respectively; Tukey's test). This result likely cannot be attributed to library preparation and/or the sequencing platform, as we found no significant difference in the breadth of genome coverage obtained on a subset of 10 400-bp amplicon samples sequenced with both Illumina MiSeq and Oxford Nanopore Technologies MinION platforms (P = 0.5, Tukey's test; Fig. S2). The lower breadth of coverage observed with the 150-bp scheme could thus have been caused by more primer-primer interactions with a larger number of primers or by the characteristics of the primer pool design (19). Such effects were also indicated by sequencing a synthetic SARS-CoV-2 RNA genome as a positive control, as the 150-bp primer scheme showed more uneven coverage across the genome compared to that of the 400-bp and 1,200-bp schemes (Fig. S3). Therefore, the 400-bp primer scheme appears to strike a balance between resilience to sample RNA degradation and mitigation of issues around primer pool complexity and multiplex amplicon balancing.
SARS-CoV-2 whole-genome sequencing from wastewater captures the emergence of genomic variants in a geographic region. The sequence data produced via the 400-bp primer scheme and influent wastewater samples was used to measure the frequency of VoC-associated SNVs (see Table S3 at https://doi.org/10.6084/m9 .figshare.16416528) across the five WWTPs over the study period. SNVs associated with the VoC lineages B.1.1.7 and P.1 both increased to a maximum mean frequency of 60% across all WWTPs ( Fig. 2A and Fig. S4 to S6), while the frequency of B.1.351 did not substantially increase (Fig. S7). These findings align with the results of clinical screening and sequencing of patient samples over the same period within the province of British Columbia, during which P.1 and B.1.1.7 became the dominant lineages, while B.1.351 did not appreciably spread (25) (Fig. 2B and Fig. S5 and S7). At the time of publishing, VoC frequency data for clinical cases was only available at the provincial level, yet the health service areas corresponding to the 5 WWTP sewersheds accounted for 74% of total cases in the province during the study period (25). The log-transformed, flow-normalized daily loads of P.1 and B.1.1.7 across all WWTPs (in genome copies/day) were strongly correlated with clinical case counts of those lineages within the province for the corresponding epidemiological weeks (R 2 = 0.86 and 0.85, respectively; Fig. 2C and Fig. S5). The log-transformed mean frequencies of VoC-associated SNVs in wastewater were also significantly correlated with that of VoC clinical case counts within the region (P , 0.01; Fig. S8). Therefore, the frequency of VoC-associated SNVs within influent wastewater measured with multiplex tiling PCR is a suitable measure to monitor community transmission of genomic variants within a sewershed. The onset of P.1-and B.1.1.7-associated SNVs within influent wastewater followed different patterns for the five WWTPs, providing additional support that wastewater SARS-CoV-2 sequencing can illuminate localized spread of genomic variants on a regional scale (15,17). The rapid turnaround time (here, ;3 days from sampling to data generation), low capital cost, and high portability of nanopore sequencing combined with highly multiplexed tiling PCR for SARS-CoV-2 sequencing of wastewater shows great promise to complement genomic epidemiology efforts during the COVID-19 pandemic by detecting the emergence of VoC within a pooled population sample.
Data availability. The raw reads associated with all samples are available in the Sequence Read Archive under BioProject accession number PRJNA731975. The accession numbers for each sample are also provided in Table S1 in the supplemental material, along with the sample metadata. P.1 genomes summed across all five wastewater treatment plants (WWTPs) and the total P.1 clinical cases in the province of BC observed within the same epidemiological week. The wastewater P.1 daily load (genome copies/day) was approximated by normalizing copies of the SARS-CoV-2 N1 gene (copies/liter) by daily flow rates (liters/day) to obtain N1 loads (copies/day) for all WWTPs and multiplying those by the mean frequency of P.1-associated SNVs in each WWTP across all sample dates. For each date, the cumulative P.1 daily load was determined by summing the P.1 loads across all five WWTPs. The P.1 clinical case counts by week were estimated from reference 25 by multiplying total provincial COVID-19 case counts by the frequency of P.1 in clinical provincial cases.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, PDF file, 0.02 MB.