Temporal stability and genetic diversity of 48-year old T-series phages

T-series phages have been model organisms for molecular biology since the 1940s. Given that these phages have been stocked, distributed, and propagated for decades across the globe, there exists the potential for genetic drift to accumulate between stocks over time. Here we compared the temporal stability and genetic relatedness of laboratory-maintained phage stocks with a T-series collection from 1972. Only the T-even phages produced viable virions. We obtained complete genomes of these T-even phages, along with two contemporary T4 stocks. Performing comparative genomics, we found 12 and 16 nucleotide variations, respectively, in the genomes of T2 and T6; whereas there were ~172 nucleotide variations between T4-sublines when compared with NCBI RefSeq genome. To account for the possibility of artefacts in the NCBI RefSeq, we used the 1972 T4 stock as a reference and compared genetic and phenotypic variations between T4-sublines. Genomic analysis predicted nucleotide variations in genes associated with DNA metabolism and structural proteins. We did not however, observe any differences in growth characteristics or host range between the T4-sublines. Our study highlights the potential for genetic drift between individually maintained T-series phage stocks, yet after 48-years, this has not resulted in phenotypic alternations in these important model organisms. Importance T-series bacteriophages have been used throughout the world for various molecular biology researches, which were critical for establishing the fundamentals of molecular biology – from the structure of DNA to advanced gene-editing tools. These model bacteriophages help keep research data consistent and comparable between laboratories. However, we observed genetic variability when compared contemporary sublines of T4-phages to a 48-year old stock of T4. This may have effects in the comparability of results obtained using T4 phage. Here, we highlighted the genomic differences between T4 sublines and examined phenotypic differences in phage replication parameters. We observed limited genomic changes but no phenotypic variations between T4 sublines. Our research highlights the possibility of genetic drift in model bacteriophages.


Introduction 31
Bacteriophages, viruses that infect bacteria, were discovered in the early 20 th century. The 32 antimicrobial properties of phages initially sparked the interest of those early phage-pioneers 33 and they were quickly used to treat bacterial infections, such as dysentery and cholera (Rohwer 34 and Segall 2015). However, the ambiguous efficacy of phage therapy, along with the 35 introduction of more efficient chemotherapeutics, led to the subsequent decline in interest in 36 their use as therapeutic agents (Salmond and Fineran 2015). Nevertheless, since their 37 discovery, phages have become key model organisms to understand various aspects of modern 38 molecular biology. For example, understanding of the basis of mutation (Luria and Delbruck 39 1943), recombination (Luria and Human 1952), the genetic nature of DNA and its replication 40 (Crick et al 1961), and the sequencing of genes and genomes (Sanger et al 1977) were all 41 founded upon phage biology. Furthermore, the study of phage prokaryote resistance 42 mechanisms led to the discovery of the CRISPR/Cas system, which has become a key 43 technique for targeted mutagenesis and gene editing (Barrangou et  the T-series phages in the 1940s by Delbruck and colleagues'-the so-called 'phage group '. 49 This enabled phage researchers to make comparison of results between different laboratories, 50 of T4 (hereafter referred to as T4 sublines for simplicity) from phage laboratories in Australia 76 and the USA. We then compared the genetic and phenotypic differences of these sublines to 77 Hancock's 1972 stock of T4. 78 Figure 1: Photographs of historical stocks of T-series phages, stored in 1972. Each glass ampule contained 10 mL lysate stored in chloroform-sealed glass ampules at a titre of 1-5  10 9 plaques forming unit per mL. The lysates have been stored at 4 C since 1972. 79

Methods 80
Bacteriophage stock: 81 The stock of T-series bacteriophages in this study was obtained from Prof. Peter Reeves, from 82 The University of Sydney, Australia. The stock comprised lysates of T1, T2, T3, T4, T6 and 83 T7 phages, purified and stored in sealed glass ampules with chloroform in 1972. These phages 84 were used in several studies by Robert E. Hancock  Hancock and Braun 1976). T1 phage was not used in this study due to its potential 87 contamination hazard for our laboratory. These stocks have been stored at cold room (~4 C) 88 since the 1970s. In addition, we used two contemporary strains of T4; received from The 89 Bacteriophage T4 lab at the Catholic University of America led by Prof. Venigalla B. Rao 90 (named T4-Rao henceforth) and the phage collection of our lab (named T4-Barr henceforth).

Bacteriophage revival and propagation: 92
Lysates from the old stocks were each recovered in 15 mL falcon tubes. 1 mL of lysate was 93 mixed with susceptible hosts (E. coli B for T2/3/4/6 and E. coli BL21 for T7) and plated onto 94 Lysogeny Broth (LB) agar plates using the soft agar overlay method, followed by incubation 95 for 18-24 h at 37 C. Each plating was done in duplicate and repeated with lysates from at least 96 three different vials. In instances of no plaque formation, amplification was attempted using 97 the original lysate and its host in broth culture, then replated. Lysates with viable phages were 98 propagated following the Phage-on-Tap protocol (Bonilla et al 2016). 99

Phage DNA extraction and sequencing: 100
High titre lysate (>10 9 PFU/mL) was used for DNA extraction. To remove host's nucleic acid 101 contamination, lysates was treated with DNase (1 mg/ml) and RNase A (12.5 mg/ml) for 2 h 102 at 37 ºC, followed by an inactivation step of 5 min at 75 ºC. DNA was extracted using Norgen 103 phage DNA extraction kit (Norgen Biotek, Ontario, Canada), following the manufacturer's 104 instructions. The extracted DNA was vacuum dried into a pellet for transport. Sequencing was 105 performed using the Illumina® HiSeq 150 bp paired-end platform at the Genewiz® facilities 106 (Suzhou, China). 107

Bioinformatics analysis: 108
Illumina Hiseq platform generated five to six million raw reads from each sample. 109 RefSeqs. The genomes were then annotated manually copying the annotation from their 117 respective NCBI RefSeqs using Geneious v 9.1.8 (Kearse et al), provided a sequence similarity 118 of coding region of at least 98% followed by manual curation where required. 119 120 To identify nucleotide variants, the filtered raw reads were mapped to either NCBI RefSeqs or 121 our assembled genomes using Snippy v4.2 (https://github.com/tseemann/snippy) with the 122 setting of the minimum number of reads covering a site to be considered at 100 and the 123 minimum VCF variant call quality also at 100. To compare nucleotide identity, the complete 124 genomes were aligned by pairwise sequence alignment as implemented in Pyani 125 (https://huttonics.github.io/pyani/). Average nucleotide identity (ANI) percentage obtained 126 from the analysis was converted into matrix and visualised using pheatmap (Kolde 2012 LB and allowed to grow until an optical density 600 nm (OD600) reached 0.2 (~2 h), which 143 was then infected with phage at a multiplicity of infection (MOI) of 0.01. Phage was allowed 144 to adsorb for 5 min at 37 °C with orbital agitation at 120 rpm. The mixture was then pelleted 145 (4,000 g, 2 min, room temperature), resuspended in fresh LB broth and taken back to incubation 146 at 37 °C and 120 rpm. Samples (100 L) were repeatedly taken every 5 or 10 min for a total 147 period of 1 h, transferred into chloroform-saturated PBS, serially diluted, and plated to 148 determine PFU. Number of bacteria in initial inoculum was determined by plate count. PFU 149 per infected cells was calculated by dividing PFU by initial density of bacteria. The experiment 150 was repeated on at least three different occasions. All data were analysed and visualised using Graphpad Prism v8. Average and standard 166 deviation of all the replicates were calculated and compared using t-test to obtain p values. To 167 infer statistical significance, the threshold was set at values of p < 0.05. 168 Results 169

Stability of T-series phages in prolonged storage 170
To examine the viability of the 48-year old T-series phage stocks, we plated 1 mL of lysate 171 from T2, T3, T4, T6 and T7 phages with their respective hosts using the soft-agar overlay 172 technique. We did not open or plate T1 phage vials due to concerns with its persistence and 173 history of contaminating laboratory stocks of E. coli. The titres in plaque-forming units (PFU) 174 per mL were recorded from at least three different vials for each strain (Table 1). We did not 175 observe any plaques from T3 and T7 lysates. To verify whether there were any active phage 176 particles in the T3 or T7 stocks remaining at low titre, we propagated entire vial of the original 177 lysates with E. coli B and E. coli BL21, respectively, overnight in an attempt to recover viable 178 phages. However, no phages could be recovered following overnight amplification. Lysates of 179 T2 and T4 showed between 4 to 6 plaques per mL on top-agar, while T6 phage had between 180 10 4 to 10 5 PFU/mL (Table 1). Our results revealed that T-even series could be revived from 181 the lysates stored approximately 48-years ago, while T-odd (T3 and T7) series were not able 182 to be recovered from the 1970's stock. 183 RefSeq genome, and we therefore used T4-Hancock as a historical reference genome in our 206 subsequent analysis. 207

Comparative genomics of T4 sublines 209
To compare the complete genomes of T-even phages, we assembled the Illumina HiSeq reads 210 using Unicycler v0.4.3 with the setting of minimum contig length 1000bp, which produced a 211 single contig of a complete genome for each strain. The genome size of T-even series of 212 Hancock strains was broadly similar to the respective NCBI Refseq ( Table 2). The complete 213 genomes were aligned by pairwise sequence alignment as implemented in Pyani 214 (https://huttonics.github.io/pyani/) to obtain the average nucleotide identity (ANI). Analysis 215 revealed that genomes of T-even phages were closely related with each other, with more than 216 96% nucleotide sequence similarity between T2, T4 and T6 phages (Supplementary Figure 1). 217 When compared between T4 sublines, we observed that T4-Barr and T4-Rao showed higher 218 percentage similarity than when compared with T4-Hancock (Figure 2). 219

222
Next, we assessed the nucleotide variations (SNPs and indels) between T4 sublines. To obtain 223 an unbiased annotation, we manually annotated the complete genome of each subline using 224 information from NCBI T4 Refseq using an identity threshold of at least 98%. T4-Barr had 225 two insertions (frameshift) and six SNPs (five missense and one synonymous), when compared 226 to T4-Hancock. Our analysis showed five additional variants (one insertion and four missense 227 SNPs) in T4-Rao, taking the total number of variants to 13 (Table 3).  On the other hand, many missense variations were observed in genes associated with essential 241 structures such as two units of topoisomerase II (gp39 and gp52), which helps in DNA 242 metabolism, and the baseplate wedge initiator (gp7), tail sheath protein (gp18) and long tail 243 fibres (gp34 and gp37), all associated with phage adsorption to the bacterial host. Out of the 244 five additional variants in T4-Rao, three were observed in essential genes; tail sheath protein 245 (gp18), distal subunit of long tail fibre (gp37) and medium subunit of topoisomerase (gp52). 246

Growth characteristics of T4 sublines in E. coli B and E coli K12 247
We then sought to assess whether the identified mutations had effects on phage replication Furthermore, to examine the rate of bacterial killing by phages, we performed growth kill 267 assays. The three sublines of T4 were mixed at a multiplicity of infection (MOI) of 0.01 with 268 actively growing (OD600 ~ 0.2) E. coli B and K12 and growth was measured by optical density 269 (OD600) at 5 min intervals for 16 hours. All three sublines of T4 were able to suppress the 270 growth of E. coli B and E. coli K12 within one hour, with a sharp reduction in OD observed in 271 E. coli K12 with all three T4 sublines ( Figure 5). The T4 sublines showed different growth 272 patterns between the two different strains of E. coli, but there were no significant differences 273 between the growth patterns between the T4 sublines. 274 Finally, we examined the host range of the T4-sublines on a selection of laboratory and 275 pathogenic strains of E. coli. We used three standard laboratory strains and eight pathogenic 276 strains for this assay. On these spot plate assays, we observed that T4 phages had lytic activity 277 against six out of 11 strains, but there was no difference in host range between the T4-sublines 278 (Table 4). To address this, we investigated the genetic and phenotypic changes between a 48-year old T-295 series phage stock and contemporary laboratory strains. Our analysis of three T4 sublines 296 revealed minor genetic differences and no detectable variation in growth characteristic in their 297 usual hosts E. coli B and E. coli K12. We did, however observe substantial differences between 298 nucleotide sequence of NCBI RefSeq (Nucleotide accession no. AF158101) and our three T4 299 sublines, highlighting the need for an updated reference genome for T4 phage. 300

301
The complete genome of T4 was initially established through sequencing small fragments that 302 were obtained following cloning and direct PCR (Miller et al 2003). According to the Genbank 303 record, the first information on T4's genome sequence was recorded in 1981 and its latest 304 update was listed in 2003 (AF158101.6). A relatively high genetic variation between the 305 Genbank RefSeq and T4 sublines of this study indicates that there may be artefacts in the 306 original Genbank reference sequence. We therefore propose T4-Hancock as an updated 307 reference genome for the field, and we used this genome as the reference in our analysis. The 308 complete genome of T4-Hancock was 5 base-pair longer than the T4 Refseq. The reason for 309 this difference in the genome length may be related to the fact that the genome of T4 is 310 terminally redundant and circularly permuted, potentially altering the total length of 311 chromosome between generations (Streisinger et al 1967). 312

313
We further examined the genetic divergence between historical and two contemporary stocks 314 of T4 (T4-Barr and T4-Rao). Our analysis showed the insertion of 13 bp in T4-Barr and 14 bp 315 in T4-Rao when compared with T4-Hancock. This finding is in line with total chromosome 316 length of each subline that we obtained in our study. The average nucleotide identity 317 percentage, although the difference was very small, showed that the two contemporary 318 sublines; T4-Barr and T4-Rao shared a relatively higher percentage similarity to each other 319 than they shared with T4-Hancock. Interestingly, both contemporary sublines also shared the 320 nucleotide variations, with T4-Rao having five additional variations. The majority of 321 nucleotide variations were in essential genes including genes that encode long tail fibers (gp34 322 and gp37), tail sheath (gp7) and enzymes associated with DNA metabolism. Although we could 323 not track the history of our contemporary T4 phages, it is likely that these phages have gone We, however, did not observe any noticeable differences in the growth characteristics between 332 T4 sublines in our experimental conditions. Nevertheless, mutations in the essential structural 333 genes, such as the long tail fibers, could potentially affect the phage's adsorption to its host. 334 This study also examined the stability of T-series phages in prolonged storage. All the T-even 343 series lysates had active phages present, which could be propagated in laboratory after 48 years 344 in storage. However, we were not able to recover any viable phages from the lysate of T-odd 345 series (T3 and T7). Although we do not have complete information on the long-term storage 346 conditions of these phages over the last ~50 years, our results suggest that T-even series are 347 more stable than T-odd series on prolonged storage. This finding is further supported by the 348 close genetic relatedness of T-even phages, compared with the diversity seen across the T-odd 349 phages (Abedon 2000). 350

351
In conclusion, our analyses suggest that individually maintained T4 sublines undergo 352 continuous genetic drift that may cause micro-evolution in the model bacteriophage stocks. 353 The genetic variation in T4 sublines did not show a difference in our phenotypic analysis, 354 which is a favourable finding for the phage community, and the rationale proposed by Delbruck 355 and co-workers on the use of model bacteriophages to avoid incomparability of results between 356 laboratories proved to be still valid (Anderson 1992). However, the magnitude of genetic 357 changes may vary between laboratories, which highlights a need for a larger-scale comparative 358 study of model bacteriophages sublines.