Genome Sequencing of an Extended Series of NDM-Producing Klebsiella pneumoniae Isolates from Neonatal Infections in a Nepali Hospital Characterizes the Extent of Community- versus Hospital-Associated Transmission in an Endemic Setting

NDM-producing Klebsiella pneumoniae strains represent major clinical and infection control challenges, particularly in resource-limited settings with high rates of antimicrobial resistance. Determining whether transmission occurs at a gene, plasmid, or bacterial strain level and within hospital and/or the community has implications for monitoring and controlling spread. Whole-genome sequencing (WGS) is the highest-resolution typing method available for transmission epidemiology. We sequenced carbapenem-resistant K. pneumoniae isolates from 26 individuals involved in several infection case clusters in a Nepali neonatal unit and 68 other clinical Gram-negative isolates from a similar time frame, using Illumina and PacBio technologies. Within-outbreak chromosomal and closed-plasmid structures were generated and used as data set-specific references. Three temporally separated case clusters were caused by a single NDM K. pneumoniae strain with a conserved set of four plasmids, one being a 304,526-bp plasmid carrying blaNDM-1. The plasmids contained a large number of antimicrobial/heavy metal resistance and plasmid maintenance genes, which may have explained their persistence. No obvious environmental/human reservoir was found. There was no evidence of transmission of outbreak plasmids to other Gram-negative clinical isolates, although blaNDM variants were present in other isolates in different genetic contexts. WGS can effectively define complex antimicrobial resistance epidemiology. Wider sampling frames are required to contextualize outbreaks. Infection control may be effective in terminating outbreaks caused by particular strains, even in areas with widespread resistance, although this study could not demonstrate evidence supporting specific interventions. Larger, detailed studies are needed to characterize resistance genes, vectors, and host strains involved in disease, to enable effective intervention.

Water purification by reverse osmosis and the use of purified water in ventilator humidifiers 6. Paper towels to replace cloth towels in all neonatal units 7. Routine change of staff and visitor gowns every morning 8. Hand washing implemented for all visitors; importance of hand hygiene emphasized to all staff on the neonatal units 9. Use of disposable tubing for ventilators 10. Cleaning protocols for ventilators implemented 11. Bedside Chlorhexidine hand cleansing solution (Microshield) used deployed in all neonatal units (equivalent to 0.5% w/v chlorhexidine gluconate + 70% ethanol v/v) 12. Implementation of enhanced microbiological surveillance 13. Closure of original clean nursery and repair of a leaking toilet in the ward above, which was being used by women undergoing vesico-vaginal fistula repair surgery.

Supplementary Section 3. Details of species identification using Kraken
Illumina reads from each isolate were compared to the complete bacterial, archaeal, and viral genomes in RefSeq (as of March 30, 2014); for each sample the species level match with the highest number of hits was reported.

Supplementary Section 5. Details of phyML model
We used PhyML version 3.0, with a generalized time-reversible (GTR) nucleotide substitution model, a gamma-distribution with four rate categories to estimate among-site variation in substitution rates, and 100 bootstrap replicates. Sites where at least one sample had a null/missing call were excluded from the input. For the input alignment, the variant sites derived from mapping to the MGH78578 reference were "padded" with invariant sites in a proportion consistent with the GC content and length of the reference genome (5.69Mb, 57.1%GC content).

Supplementary Section 6. Details of BEAST analysis, and sites excluded from the analysis
Three separate runs on the dataset were undertaken using a strict molecular clock model with the following priors: (i) a GTR nucleotide substitution model with estimated base frequencies; (ii) a discrete gamma distribution with four categories to account for variable substitution rates at each site; (iii) a constant population size; (iv) a random starting tree; and (v) a Monte Carlo Markov Chain (MCMC) length of 30000000 with sampling logged every 1000 iterations. The output of the three runs with respect to mixing and convergence was compared using Tracer v1.5 [33]; good mixing and convergence were observed and effective sampling sizes for all parameters were above 300. Log and tree files for the respective runs were combined with down-sampling using LogCombiner v1.7.5; mutation rates and the phylogeny were determined from these. TreeAnnotator v1.7.5 was used to select the maximum clade credibility tree. The figure below represents the BEAST phylogeny. Colored isolates represent longitudinally sampled isolates from the same individual/color. Blue bars around the node represent the 95% credibility interval around the node height, and in this time-scaled context, the uncertainty around the time-to-most-recent-common-ancestor (TMRCA). Starred nodes have posterior support values >98%.
For the input alignment, 51 variant sites derived from mapping to the Pacbio-derived chromosomal reference for the outbreak strain (excluding positions 3126261 and 3137776 which had been affected by the large deletion in PMK13b) were "padded" with invariant sites (in proportions consistent with the ACGT content of reference chromosome) to the length of the called genome (5,038,898/5,317,001 bases; represents sites where bases were called in all sequences). The molecular clock generated by BEAST was then multiplied by the called genome length to give a mutation rate/genome/year.