DNA Sequencing Predicts 1st-Line Tuberculosis Drug Susceptibility Profiles

Background The World Health Organization recommends universal drug susceptibility testing for Mycobacterium tuberculosis complex to guide treatment decisions and improve outcomes. We assessed whether DNA sequencing can accurately predict antibiotic susceptibility profiles for first-line anti-tuberculosis drugs. Methods Whole-genome sequences and associated phenotypes to isoniazid, rifampicin, ethambutol and pyrazinamide were obtained for isolates from 16 countries across six continents. For each isolate, mutations associated with drug-resistance and drug-susceptibility were identified across nine genes, and individual phenotypes were predicted unless mutations of unknown association were also present. To identify how whole-genome sequencing might direct first-line drug therapy, complete susceptibility profiles were predicted. These were predicted to be pan-susceptible if predicted susceptible to isoniazid and to other drugs, or contained mutations of unknown association in genes affecting these other drugs. We simulated how negative predictive value changed with drug-resistance prevalence. Results 10,209 isolates were analysed. The greatest proportion of phenotypes were predicted for rifampicin (9,660/10,130; (95.4%)) and the lowest for ethambutol (8,794/9,794; (89.8%)). Isoniazid, rifampicin, ethambutol and pyrazinamide resistance was correctly predicted with 97.1%, 97.5% 94.6% and 91.3% sensitivity, and susceptibility with 99.0%, 98.8%, 93.6% and 96.8% specificity, respectively. 5,250 (89.5%) drug profiles were correctly predicted for 5,865/7,516 (78.0%) isolates with complete phenotypic profiles. Among these, 3,952/4,037 (97.9%) predictions of pan-susceptibility were correct. The negative predictive value for 97.5% of simulated drug profiles exceeded 95% where the prevalence of drug-resistance was below 47.0%. Conclusions Phenotypic testing for first-line drugs can be phased down in favour of DNA sequencing to guide anti- tuberculosis drug therapy.


Abstract Background
The World Health Organization recommends universal drug susceptibility testing for Mycobacterium tuberculosis complex to guide treatment decisions and improve outcomes. We assessed whether DNA sequencing can accurately predict antibiotic susceptibility profiles for first-line anti-tuberculosis drugs.

Methods
Whole-genome sequences and associated phenotypes to isoniazid, rifampicin, ethambutol and pyrazinamide were obtained for isolates from 16 countries across six continents. For each isolate, mutations associated with drug-resistance and drug-susceptibility were identified across nine genes, and individual phenotypes were predicted unless mutations of unknown association were also present.
To identify how whole-genome sequencing might direct first-line drug therapy, complete susceptibility profiles were predicted. These were predicted to be pan-susceptible if predicted susceptible to isoniazid and to other drugs, or contained mutations of unknown association in genes affecting these other drugs.
We simulated how negative predictive value changed with drug-resistance prevalence.
The negative predictive value for 97.5% of simulated drug profiles exceeded 95% where the prevalence of drug-resistance was below 47.0%.

Conclusions
Phenotypic testing for first-line drugs can be phased down in favour of DNA sequencing to guide antituberculosis drug therapy.
Mycobacterium tuberculosis killed more people than any other pathogen in 2016, when over 10 million active cases were estimated, and 1.7 million patients died. 1 In 2014, the World Health Organization (WHO) set a target to 'END TB' by 2035, acknowledging that success depends on the development of better preventative, diagnostic and therapeutic interventions. The global emergence of antimicrobial resistance poses a major challenge. Despite a call for universal access to drug susceptibility testing to direct individualised therapies, high costs and skills shortages mean it is unavailable in many countries with greatest need. Consequently, only 22% of an estimated 600,000 patients requiring treatment for multidrug-resistant tuberculosis were diagnosed and treated in 2016, 1 facilitating the onward transmission of multidrug-resistant strains. 2 The Xpert MTB/RIF (Cepheid, Sunnyvale, California, USA) assay has partially eased the global diagnostic need. It uses polymerase chain reaction technology to identify both M. tuberculosis complex and mutations in the rpoB gene (predictive of multidrug resistance) directly from clinical samples. 3 However, as it targets only a few potential resistance-conferring mutations, antimicrobial susceptibility cannot be reliably inferred from a negative result. 4 To direct individualised therapies, a diagnostic assay is needed to determine which drugs to give, in addition to which to avoid.
Advances in whole-genome sequencing mean it is now the most promising solution to the need for universal drug susceptibility testing. It is faster, more scalable, and likely to become cheaper than phenotypic testing. 5 As the number of genomic sites whole-genome sequencing covers are virtually unrestricted, it should be possible to infer M. tuberculosis antimicrobial susceptibility from the absence of resistance-conferring mutations. 6 Here we assess how well this performs for first-line antituberculosis drugs, considering WHO target product profiles for new molecular assays, 7 and whether whole-genome sequencing can be used to accurately direct anti-tuberculosis therapy.

Sample selection
Collections of M. tuberculosis complex isolates unenriched for resistance and largely sequenced prospectively for routine diagnostic reasons, or for disease surveillance, were included from Germany, Italy, the Netherlands and the UK. Collections enriched for antimicrobial resistance, were included from across six continents (Table1, Supplement S1). Analyses of both the unenriched and complete collection were planned.

Sequencing
Isolates were sequenced on Illumina platforms and reads processed by the Public Health England bioinformatics pipeline at Genomics England, 8 as described. 6 Reads were mapped to the pansusceptible M. tuberculosis reference genome (Genbank NC_000962.2) using Stampy (v.1.0.17) 9 , with repetitive regions masked. SAMtools mpileup 10 (v.0.1.18) made variant-calls based on a minimum depth of 5X and at least one read on each strand. Mixed-calls were assigned where minority alleles composed >10% of read depth. Insertions and deletions were determined using Cortex (v.1.0.5.21). 11 Drug susceptibility testing and prediction Phenotypic drug susceptibility testing was performed locally using MGIT 960 (Becton Dickinson, New Jersey, USA), 7H10 or Löwenstein-Jensen agar, or by microscopic-observation drugsusceptibility (MODS), with method-specific critical concentrations for isoniazid (MGIT 0.1-0.2µg/mL; Agar 0.2µg/mL; MODS 0.4µg/mL), rifampicin (MGIT 1.0µg/mL; 40µg/mL Agar), ethambutol (MGIT 5.0µg/mL; Agar 0.2µg/mL), and pyrazinamide (100µg/mL). Not all laboratories routinely tested all agents (S1). Genotypic predictions were based on mutations in, or upstream of, genes associated with resistance to isoniazid (ahpC, inhA, fabG1, katG), rifampicin (rpoB), ethambutol (embA, embB, embC), and pyrazinamide (pncA). 6 A knowledgebase of mutations predicting antimicrobial resistance, or not, was informed by (i) the molecular targets of WHO-recommended line-probe assays (MTBDRplus, MTBDRsl v1.0, HAIN Lifesciences, Germany), (ii) a systematic literature review, 12 (iii) the CDC, Atlanta, USA, panel and (iv) two recent studies, with no isolates in common with this study (S2), 6,13 of which one became available after this study commenced. 13 Isolates containing resistance-mutations were predicted phenotypically resistant, whereas isolates containing only wild-type sequence, phylogenetic mutations, 6 or mutations considered consistent with susceptibility, were predicted susceptible. Predictions were withheld for isolates containing mutations affecting target genes but of unknown association, or where no nucleotide-call could be determined at a resistance-associated site. In these circumstances, the genotype was reported 'unknown' or 'failed', respectively. Using phenotypic results as a gold-standard, sensitivity, specificity, negative and positive predictive value were calculated for the correct assignment of susceptibility or resistance. Primary analyses excluded phenotypes without a prediction.
Laboratory error was assumed where three or more phenotypes were discordant with an isolate's genotype, or where susceptible phenotypes were recorded despite the presence of high-level resistance katG S315T mutations for isoniazid, or rpoB S450L mutations for rifampicin. 14 Such isolates were excluded from further analysis.
Analysis was performed using STATA (Texas, USA, v13.1). No institutional review board approval was required except in Thailand, it was granted through Mahidol University (Si029/2557).
The study was first designed by TMW,TEAP,DWC, with subsequent contributions from others (supplement). Data were gathered at participating centres. Initial analysis was performed by TMW,TEAP,ASW,ZI,MH,SL,DW,PF,PM with later input from others (supplement). TMW wrote the first draft. TMW vouches for the analysis and had full access to the data; all authors agreed to publication. As some collections included clustered isolates, the analysis was repeated after randomly selecting one representative among genomically indistinguishable isolates, and again from isolates within five single nucleotide polymorphisms of another. No significant change in sensitivity or specificity was observed for any drugs (p>0.1, S4).

10
To reflect the emerging practice of routinely sequencing isolates for clinical care, the analysis was repeated for the subset of 4,397 isolates from German, Italian, Dutch and UK collections that were not enriched for resistance. Among these isolates, 335 (7.6%) were isoniazid resistant and 125 (2.8%) multidrug-resistant. For each drug, specificity and negative predictive values increased, whilst positive predictive values (the proportion of concordant resistant predictions) decreased relative to the overall results. There was no significant change in sensitivity (Table 2c).

Predicting complete phenotypic profiles
For DNA sequencing to help individualise therapy, a minimum requirement is that all first-line antimicrobial phenotypes are predicted. Phenotypic profiles were thus predicted for 7,516 isolates with phenotypic data available for all first-line drugs (S1&6). 'Unknown' or 'failed' was reported for at least one drug for 1,651 (22.0%) profiles. 5,865 (78.0%) were predicted completely, of which 5,250 (89.5%) were predicted correctly (S5). Among the 5,865 profiles, 4,007 were phenotypically pan-susceptible, of which 3952 (98.6%) were predicted correctly ( Table 4).
As the proportion of incompletely predicted profiles was substantial (22.0%), we assessed whether pan-susceptibility could be accurately predicted for some of these isolates anyway. Because isoniazid susceptibility predicts susceptibility to other first-line drugs, 15 we maximised confidence in isoniazid predictions by conditioning predictions on the absence of 'unknown' mutations in isoniazidrelated genes. 'Unknown' mutations relevant to other drugs were permitted. Doing this, pansusceptibility was correctly predicted for 4,481/4,582 (97.8%) isolates, including 545/1,651 (33.0%) previously incompletely predicted profiles (Table 4). Among the collections unenriched for resistance, 3439/3450 (99.7%) profiles were thereby correctly predicted pan-susceptible (S7).
To simulate how this approach would perform in settings with differing burdens of antimicrobial resistance, we assessed the decline in negative predictive value with increasing prevalence of resistance to individual drugs, and with prevalence of any resistance within drug profiles. We randomly sub-sampled 1,000 isolates to represent every 1% increment in antimicrobial-resistance prevalence between 10%-90%, repeating this 1,000 times for each drug and for complete drug profiles. Negative predictive value declined further for ethambutol and pyrazinamide than for complete drug profiles, but declined least for isoniazid and rifampicin. Below 47.0% prevalence of resistance to any drug, the simulated negative predictive value remained above 95% for 97.5% of drug profiles (Figure 1).

Discrepancy analyses
In Australia, eleven ethambutol susceptible isolates containing embB mutations were rephenotyped. Three repeat assays failed, but seven of the remaining eight yielded, now consistent, resistant phenotypes. In Peru, 10 of 16 repeated assays remained phenotypically susceptible by MODS despite fabG1 C-15T or G-17T mutations. In isolates from the Netherlands, six resistant phenotypes predicted susceptible were identified as clerical errors, and three susceptible phenotypes predicted resistant tested phenotypically resistant by alternative phenotypic assays (S8). Although additional rephenotyping was not possible, we conducted a 'per mutation' analysis to further assess discrepancies.
Of the 322 resistant phenotypes predicted susceptible, 290 (90.1%) had no mutations affecting targeted genes, and 32 (9.9%) had one or more of 15 mutations per isolate, each previously characterised as consistent with antimicrobial susceptibility. Supporting this, across all isolates in which these 15 mutations occurred as the sole mutation, they correctly predicted isoniazid susceptibility in 286/293 (97.6%) isolates and ethambutol susceptibility in 95/119 (79.8%) isolates. The one mutation relevant to pyrazinamide was seen in two isolates, both of which were phenotypically resistant. None of these mutations were relevant to rifampicin (S9).
14 of 17 (82.3%) mutations leading to rifampicin resistance predictions in phenotypically susceptible isolates were in the genetic region targeted by Xpert MTB/RIF and MTBDRplus.
Laboratory sample mislabelling probably also contributed discrepant results. This was estimated for each collection from the proportion of isolates excluded because of katG S315T or rpoB S450L mutations and susceptible phenotypes, the collection's discrepancy rate, and the prevalence of antimicrobial resistance (S10). Overall, about 43% of isoniazid, and 12% of rifampicin discrepancies were thereby attributable to mislabelling.

Discussion
This analysis of over 10,000 M. tuberculosis isolates collected from 16 countries across six continents, and representing all major lineages, demonstrates that whole-genome sequencing can now characterise susceptible first-line anti-tuberculosis drug profiles sufficiently accurately for clinical use.
The importance of this is twofold: First, it demonstrates that the genomic approach can be used to tailor individual treatment regimens. Extended to all drugs, individualised therapies promise to improve cure rates over those achieved by semi-empiric regimens directed by more limited diagnostic tests. 1 Second, it is now possible to reduce the phenotypic workload where routine whole-genome sequencing is performed.
The WHO's target product profiles for new molecular assays for M. tuberculosis require over 90% and 95% sensitivity and specificity, respectively. 7 Overall, both these targets were met for all drugs with the exception of specificity for ethambutol (93.6%). This is no surprise as phenotyping is an imperfect gold standard, in particular for isolates with embB mutations. 6,13,16 For the collections unenriched for resistance, all drugs did however meet these targets, as did the predictions of pansusceptibility in all collections. Only categorical agreement was assessed for complete drug profile predictions because of the number of permutations. These met the external quality assurance criteria (>80% concordance) for the European TB reference laboratory network. 17 There are three reasons why pan-susceptibility predictions were particularly accurate. First, the knowledgebase included both resistance-associated genomic mutations, and mutations compatible with phenotypic susceptibility. Second, anti-tuberculosis drug susceptibility phenotypes are not independent of one another, allowing the use of isoniazid susceptibility to predict susceptibility to other drugs. Third, no predictions were attempted for isolates containing genomic variation of unknown association in genes affecting isoniazid. This maximised confidence in isoniazid predictions that were made. Consequently, the prediction of drug profiles performed better than the per-drug analysis for ethambutol and pyrazinamide, and although there was a slight corresponding decline in performance for isoniazid and rifampicin, simulations showed that the prevalence of resistance would have to exceed that seen in most of the worst affected countries in the world before these predictions no longer satisfied the WHO targets. 1 Our findings showed substantial improvements over the in-silico predictions for the sensitivity of WHO-recommended PCR-based assays because whole-genome sequencing is able to identify many more mutations. These additional mutations were however simultaneously responsible for the losses in specificity, largely because of the number of mutations for which a minority of isolates did not manifest a resistant phenotype. A typical example of such is the rpoB I491F mutation which frequently gives a susceptible rifampicin result in liquid culture but has been linked to treatment failure. 4,18,19 The broader discrepancy analysis highlighted the same phenomenon. Whilst the predictive performance of individual mutations, whether probed by WHO-recommended assays or not, was good, each mutation has an error rate, occasionally leading to an unexpected phenotype in a minority of isolates. This is most likely where a mutation elevates the minimum drug concentration required to inhibit bacterial growth to close to the concentration above which an isolate is considered resistant.
Canonical ethambutol mutations are a classic example, 20 but there are many others including the mutations missed by the MODS assay in Peru. 16,21,22 Such phenomena are thus likely to explain the majority of isolates that were predicted resistant, yet were phenotypically susceptible. They are also the most likely reason why predicting pan-susceptible drug profiles was more accurate than predicting profiles apparently resistant to one or more drugs.
One study limitation is that the scale and cost of repeat sequencing and phenotyping of isolates meant that we could not definitively resolve most discrepancies. This was most concerning for phenotypically resistant isolates predicted susceptible. For these, possible explanations include phenotypic error, resistant minority bacterial populations undetected by sequencing, mechanisms of resistance linked to genes we did not interrogate, or laboratory labelling error.
More work remains to be done before predictions can be extended to second and third-line drugs, and to newer compounds. However, following external review, Public Health England has already decided to stop phenotyping isolates predicted pan-susceptible to first-line drugs (personal communication, Derrick Crook, Director, National Infection Service). Similar moves are expected in the Netherlands (Dick van Soolingen, Rijksinstituut voor Volksgezondheid en Milieu) and New York (Kimberlee Musser, Wadsworth Center, New York State Department of Health). For low and middleincome countries without easy access to phenotyping, there is now the prospect that emerging mobile sequencing platforms could be used to implement sequence-directed therapies, a potential solution to the call for universal susceptibility testing. Portable platform sequencing directly from spiked-samples has been achieved, although real-world systematic evaluation is still required. 23 Should whole-genome sequencing perform as well for second and third-line drugs as for firstline, a clinical trial could be needed to assess the performance of individualised over standardized treatment regimens in countries with a high drug-resistant disease burden. 24 Individualised therapies would be expected to reduce the amplification of resistance (to other drugs) in individual patients, sideeffects, likelihood of onward transmission, and to exert a weaker selection pressure on strains at a population level, which is key where empiric regimens have been targeted on the basis of very narrow data on antimicrobial susceptibility. 4 Welcome public health benefits could result from monitoring transmission using the very same sequences. 2 The current investment in whole-genome sequencing in high-income countries is likely to help accelerate implementation in lower-income, higher-burden countries where the potential benefit is greatest. 25 These data demonstrate how our understanding of the molecular determinants of resistance to first-line anti-tuberculosis drugs is now sufficiently good to start using DNA sequencing to guide therapy. Similar performance must now be replicated for the remaining drugs.  * More than one collection was derived from Italy and the UK, some enriched and some not enriched for resistance. See supplement for details. U=mutation of unknown association present; F=genotypic prediction failed due to missing data around a genomic resistance locus; All % results based on R/S genotypic predictions only, excluding U and F except where * for which denominator includes R, S, U and F. †p≤0.001 , ‡p≤0.01, and §p≤0.05 comparing sensitivity, specificity, NPV and PPV for each drug for (b) and (c) against (a), and comparing (d) against (c); p>0.05 for all results not marked †, ‡ or §. In silico predictions of resistance for Xpert and HAIN assays were based on the presence of non-wild type sequence within the genomic regions interrogated by these assays. 'F' was reported in the presence of minority alleles at relevant sites, just as for WGS predictions.    Negative predictive vales shown for individual drugs and complete drug profiles, according to simulated prevalence of resistance to each drug, or within each drug profile ('any resistance'). For each percentage prevalence between 10% and 90%, 1,000 isolates were randomly selected, 1,000 times. Lines indicate the median with shaded areas showing the 95% confidence intervals.