Genomic analysis of Lassa virus from the 2018 surge in Nigeria

In early 2018 Nigeria experienced an unprecedented increase in Lassa fever cases with widespread geographic distribution. We report 77 Lassa virus genomes generated from patient samples, 14 from 2018, to investigate whether recent changes in the virus genome contributed to this surge. Our data argue that the surge is not attributable to a single Lassa virus variant, nor has it been sustained by human-to-human transmission. We observe extensive viral diversity structured by geography, with major rivers appearing to act as barriers to migration of the rodent reservoir. Together our results support that the 2018 Lassa fever surge was driven by crossspecies transmission from local rodent populations of multiple viral variants from different lineages.


INTRODUCTION
Lassa fever is a viral hemorrhagic disease endemic to parts of Western Africa that causes over 300,000 cases and 3,000 fatalities per year 1 . It has been recognized by the World Health Organization (WHO) and the Coalition for Epidemic Preparedness Innovations (CEPI) as a significant threat to global health and in need of urgent R&D attention [2][3][4] . Despite the burden of Lassa virus, there is currently no approved vaccine, and the only available pharmacologic therapy is early intravenous administration of the antiviral ribavirin [5][6][7] . In early 2018 there was a marked increase in Lassa fever cases in Nigeria: by early March, Nigeria had more confirmed cases (394) than in any previous year. Confirmed cases were observed in 19 Nigerian states, with an estimated case fatality rate of approximately 25% 8 . The factors underlying this increase were not known, raising concern among public health officials that something had fundamentally changed about this endemic disease.
In a presumed Lassa fever outbreak, genomic analysis of contemporaneous Lassa virus in samples from infected patients can complement conventional epidemiological data by determining whether changes to intrinsic properties of the virus explain the increase in cases.
In particular, viral genomic analysis can rapidly assess whether a novel variant or specific viral lineage, or a change in viral transmission route is associated with the case surge. Most human Lassa virus infections result from contact with infected Mastomys natalensis (the major natural reservoir 9 ) or their excreta, but human-to-human transmission has been documented in hospital settings and is a focus of public health monitoring 10,11 . Previous retrospective investigation of the genomic epidemiology of Lassa virus in Nigeria between 2008 and 2014 showed extensive genetic diversity across the region and provided support for predominantly reservoir-to-human transmission 12

Genomic data analysis
We analyzed sequencing data using our publicly available software viral-ngs v1.19.2 17,18 implemented on the DNAnexus cloud-based platform. Briefly, we demultiplexed individual libraries, removed reads mapping to the human genome and to other known technical contaminants (e.g. sequencing adapters), and filtered the remaining reads against previously published Lassa virus genomes. We performed de novo assembly using Trinity 19 and scaffolded contigs against one of three Lassa virus reference genomes (KM821997-8, GU481072-3, KM821772-3), representing the major viral lineages (II, III and IV). We used Kraken v0.10.6 20 in viral-ngs to identify other viral taxa present in the samples. To do so, we first built a database that encompassed the known diversity of all viruses that infect humans (similar to that described elsewhere 21 , but without insect species). We searched for viral species detected in the samples with a read count at least 1.5x greater than that of any viral taxon identified in negative control samples and manually investigated any potential hits. We detected intra-host variants in samples from 2018 using V-Phaser 2 22 implemented in viral-ngs v1.19.2 using default parameters. To do so, we leveraged data from independently prepared replicate sequencing libraries for 13 of the 14 samples.
In order to construct the phylogenetic tree of Lassa virus, we performed a multiple sequence alignment of our new genomes with a set of 193 previously published Lassa virus genomes from Nigeria, Sierra Leone, Liberia, and Côte d'Ivoire 12 . We performed codon-based multiple sequence alignments of the NP and GPC sequences using MAFFT 23 . We estimated maximum likelihood phylogenies of concatenated alignments of NP and GPC using IQ-TREE v1.5.5 24,25 using a GTR� substitution model and ultrafast bootstrapping. To create time-aware phylogenies for the Nigerian lineage II sequences, we then performed Bayesian phylogenetic analyses using the program BEAST v1.8.4 26 , incorporating the collection date for each sequence. We included GPC and NP lineage II alignments as separate partitions. We used a model consisting of an SRD06 codon-aware nucleotide substitution model 27 , an uncorrelated relaxed clock with a lognormal distribution, and a Bayesian SkyGrid coalescent tree prior. All of the Bayesian analyses were run for 200 million MCMC steps, sampling parameters and trees every 5,000 generations.
Maximum-clade credibility trees summarizing all MCMC samples were generated using TreeAnnotator v1.8.4 with a burn-in rate of 10%.

Lassa fever case burden at ISTH in 2018
The ISTH Lassa ward, with 16 beds, is the largest Lassa fever facility in Nigeria and a major diagnostic referral center, receiving suspected Lassa fever patient samples from across the country. From January 1 to March 13, 2018, ISTH tested over 1500 clinically suspected Lassa fever cases, of which 368 were RT-qPCR-positive for Lassa virus (Fig. 1A & 1B). This number, which represents the majority of confirmed cases in Nigeria during this period, is markedly higher than that observed in previous years (Fig. 1A). There is a wide distribution of ages (Fig.   S1A) and geographic source of confirmed cases (Fig. S1B), as previously observed for Lassa fever 28 . We did observe an approximate 2:1 male-to-female ratio among confirmed cases, in contrast to previous conclusions that Lassa fever does not exhibit sex disparity 11 , though it would be difficult to determine whether this reflects a true difference, given the sampling bias inherent in clinical surveillance. Patients included healthcare workers, farmers, lawyers and students, demonstrating the broad reach of the 2018 surge.

Lassa virus sequencing of patient samples from 2018 surge
To investigate the viral population underpinning this surge, we performed unbiased sequencing and assembled Lassa virus genomes on a subset of RT-qPCR-positive patient samples (Fig.   1B). We obtained complete or high-quality partial Lassa virus genomes from 14 out of 26 RT-qPCR-positive patient samples. Table S1 summarizes sequence and assembly quality metrics for these samples. The mean unambiguous assembly length of these genomes was 9,039 bases (4,450-10,610) and mean coverage depth was 193x (1-1,834). 12 samples did not readily produce high-quality Lassa virus genomes. We did not find evidence consistent with other pathogenic viral infections in any of the samples from 2018, with the depth of sequencing available.
The 14 patients from whom we assembled Lassa virus genomes were reflective of the demographic characteristics of the larger cohort, including age (Fig. S1A), sex (Table 1) and geographic distribution (Fig. S1B). Clinically, the picture is of a nonspecific febrile illness that sometimes develops into a bleeding diathesis. Hemorrhage was documented in 2 of the 3 patients who died and in at least 3 of the 9 who recovered, suggesting a range of disease severity 29 . This is broadly consistent with clinical descriptions of Lassa fever: patients typically present with nonspecific symptoms, including fever, headache, malaise and general weakness, often indistinguishable from malaria or common viral diseases. Case fatality rates, though challenging to determine, are estimated at 15-20% among hospitalized cases 11 , though a recent study estimated case fatality rates in Nigeria during 2015-2016 to be 60% 30 .
To look for evidence of a novel viral genetic variant or sustained human-to-human transmission driving the 2018 case surge, we performed phylogenetic analysis of these 14 genomes from 2018. A maximum likelihood phylogeny shows that the 2018 genomes fall within previously known Lassa virus diversity in Nigeria ( Fig. 2A) and do not display substantial clustering by date of sampling, consistent with multiple zoonotic transmissions. Estimated dates for the branch points of closely related 2018 samples in this small dataset, which are in the range of years, do not support a surge in human-to-human transmission in 2018 (Fig. S2). We also identified several intra-host Single Nucleotide Variants at a minor allele frequency >5% in 5 of the 14 patient samples, indicating some virus evolution and de novo mutation within hosts. However, none of these variants were in coding regions and only 1 was shared between samples (Table   S2).

Genomic epidemiology of Lassa virus in Nigeria
We next assessed these genomes in the context of the recent history of Lassa virus diversity in Nigeria, to determine whether the larger picture showed patterns that could help explain the recent surge. To do so, we extended our dataset to include 63 new Lassa virus genomes from RT-qPCR-positive patient samples collected at ISTH between August 2015 and November 2016 (BioProject accession PRJNA436552; Table S3). The patients resided in 11 states, with most (68%) coming from Edo and Ondo. This combined dataset considerably expands and updates previous phylogenetic trees of Lassa virus in Nigeria.
Samples from 2015-2018 cluster geographically on the phylogenetic tree. All eleven samples sequenced here from northern Nigeria fall into lineage III (Fig. 2B), increasing our sampling of this lineage more than threefold. These samples confirm the high genetic diversity of this lineage and make clear that it is a regionally defined variant of Lassa virus. Our dataset further identifies a separation in lineage II between samples from southwestern and eastern states, with samples from the eastern states of Ebonyi, Taraba and Anambra forming a distinct sublineage (Fig. 2B). This pattern of distinct regional lineages, each internally diverse, indicates that Lassa virus has remained stably separated in the rodent populations of these regions; for example, the most recent common ancestor of lineage II occurred around 235 years ago (95% CI: 187-283; The data reported here also improve our understanding of Lassa virus genetic diversity across Nigeria, revealing clear geographic population structure and extensive diversity in regions that have previously been poorly sampled. Intriguingly, we see substantial genetic divergence between regions demarcated by two major rivers, suggesting the importance of established, local rodent populations in sustaining Lassa virus transmission 13 . Together, these results reaffirm the need for widespread geographic sampling of Lassa virus in Nigeria, including more extensive sampling from the rodent reservoir, in order to better understand its genetic diversity.
A comprehensive knowledge of this diversity is critical for development of urgently needed Lassa fever diagnostics and vaccines 2,3 .
The 2018 Lassa fever cases in this study were sequenced locally in Nigeria, leveraging longterm investments to establish local, responsive genomics laboratory capacity. These data were then rapidly shared with key public health organisations, who recognized the value of genomic data to inform case tracking and management. Continued development of local genomics capacity and growth of these collaborations will facilitate a more agile and integrated approach to outbreaks. We envision a model for genomics-informed outbreak investigation in which locally generated sequence data is rapidly integrated with traditional epidemiological data to refine response strategies.