Stability of the Parainfluenza Virus 5 Genome Revealed by Deep Sequencing of Strains Isolated from Different Hosts and following Passage in Cell Culture

ABSTRACT The strain diversity of a rubulavirus, parainfluenza virus 5 (PIV5), was investigated by comparing 11 newly determined and 6 previously published genome sequences. These sequences represent 15 PIV5 strains, of which 6 were isolated from humans, 1 was from monkeys, 2 were from pigs, and 6 were from dogs. Strain diversity is remarkably low, regardless of host, year of isolation, or geographical origin; a total of 7.8% of nucleotides are variable, and the average pairwise difference between strains is 2.1%. Variation is distributed unevenly across the PIV5 genome, but no convincing evidence of selection for antibody-mediated evasion in hemagglutinin-neuraminidase was found. The finding that some canine and porcine, but not primate, strains are mutated in the SH gene, and do not produce SH, raised the possibility that dogs (or pigs) may not be the natural host of PIV5. The genetic stability of PIV5 was also demonstrated during serial passage of one strain (W3) in Vero cells at a high multiplicity of infection, under conditions of competition with large proportions of defective interfering genomes. A similar observation was made for a strain W3 mutant (PIV5VΔC) lacking V gene function, in which the dominant changes were related to pseudoreversion in this gene. The mutations detected in PIV5VΔC during pseudoreversion, and also those characterizing the SH gene in canine and porcine strains, predominantly involved U-to-C transitions. This suggests an important role for biased hypermutation via an adenosine deaminase, RNA-specific (ADAR)-like activity. IMPORTANCE Here we report the sequence variation of 16 different isolates of parainfluenza virus 5 (PIV5) that were isolated from a number of species, including humans, monkeys, dogs, and pigs, over 4 decades. Surprisingly, strain diversity was remarkably low, regardless of host, year of isolation, or geographical origin. Variation was distributed unevenly across the PIV5 genome, but no convincing evidence of immune or host selection was found. This overall genome stability of PIV5 was also observed when the virus was grown in the laboratory, and the genome stayed remarkably constant even during the selection of virus mutants. Some of the canine isolates had lost their ability to encode one of the viral proteins, termed SH, suggesting that although PIV5 commonly infects dogs, dogs may not be the natural host for PIV5.

A virus that was first isolated almost 6 decades ago from rhesus and crab-eating (cynomolgus) macaque kidney cells was originally named simian virus 5 (SV5) because it was believed that monkeys were its natural host (1,2). However, wild monkeys do not have antibodies against SV5 and appear to be infected in captivity after contact with humans (3)(4)(5), who can be infected naturally (4,6). Subsequently, it was shown that SV5 also causes kennel cough in dogs, and as a consequence, it is often referred to in veterinary circles as canine parainfluenza virus. In addition, the virus has been isolated from pigs, and there is some evidence that cats, hamsters, and guinea pigs can be infected (4,7,8). Because SV5 has been isolated from numerous species, and its natural host remains unidentified, it is now named parainfluenza virus 5 (PIV5).
Two issues have complicated studies on defining the host range and prevalence of PIV5. First, the virus can appear as a contaminant of tissue culture cells, raising the possibility that some strains may have been isolated accidentally. However, limited studies of sequence diversity among strains suggest that this is unlikely to have occurred frequently, if at all (7). Second, antigenic crossreactivity occurs between PIV5 and human parainfluenza virus 2 (PIV2) (9,10). This led to an early suggestion that PIV5 (or SV5, as it was called at the time) should be classified as PIV2 of monkeys (11). However, sequencing studies confirmed that PIV5 and PIV2 belong to distinct species (Parainfluenza virus 5 and Human parainfluenza virus 2, respectively; genus Rubulavirus, subfamily Paramyxovirinae, family Paramyxoviridae, order Mononegavirales) (12). Thus, for example, PIV5 hemagglutinin-neuraminidase (HN) shares only 43% amino acid sequence identity with its PIV2 ortholog (13).
PIV5 has a nonsegmented, negative-sense RNA genome of 15,246 nucleotides (nt). The genome contains seven genes, which encode eight proteins and are flanked by 3=-leader and 5=-trailer sequences at the genome ends. From the 3= end, the genome encodes the nucleocapsid protein (N), V protein (V), phosphoprotein (P), matrix protein (M), fusion protein (F), small hydrophobic protein (SH), HN, and large protein (L) or RNA polymerase (8,12). V and P are unusual in being encoded by a single gene (V/P) and sharing the same initiation codon and 5=-coding region. However, they differ in their 3=-coding regions, with the V mRNA being a faithful copy of the genomic sequence and the P mRNA being a frameshifted version generated during transcription by the pseudotemplated addition of 2 extra G residues in a G tract. The outcome is that the V and P proteins share an N-terminal domain of 164 amino acid residues but differ in their C-terminal domains (58 and 228 residues, respectively). Complete PIV5 genome sequences have been deposited in public databases for the simian strain W3 (also called W3A), the human strain cryptovirus, and, recently, three canine strains and one porcine strain (Table 1). We have assessed PIV5 diversity by resequencing strain W3 and also determining the sequences of five human strains, three canine strains, and a porcine strain. In addition, we have investigated the stability of the PIV5 genome during passage in cell culture at a high multiplicity of infection (MOI) of strain W3 and a W3-derived mutant lacking a functional V gene. We found that the PIV5 genome exhibits low diversity both in vivo and in vitro and also that SH is not essential for infection of dogs or pigs.
Preparation and sequencing of PIV5 RNA. Viruses were grown in Vero cell monolayers at 37°C in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum (17). Cells grown in roller bottles were infected with PIV5 at an MOI of 10 PFU/cell. At 2 days postinfection, nucleocapsids in the form of ribonucleoproteins were purified from infected cells by a series of centrifugation steps, and genomic RNA was isolated, as described previously (17).
Viral RNAs were reverse transcribed, and the DNA was sequenced by standard approaches, using an Illumina GAIIx instrument. Reads with a length of 73 nt were aligned initially to the published strain W3 genome sequence (GenBank accession number AF052755) by using Maq (18), and the assembly was viewed by using Tablet (19). The reads for each strain were then aligned to the appropriate, derived consensus sequence, and the final assemblies were confirmed by inspection. Details of the read data and the GenBank accession numbers of the sequenced genomes are listed in Table 1.
Data on single-nucleotide polymorphisms (SNPs) within each genome sequence were extracted by using Maq, assessed to ensure their representation on both strands of the DNA sequence, and checked by viewing the alignments. The percentage of genomes containing each SNP was calculated by using a script (read_cleaner; D. Gatherer, unpublished data) to count the numbers of reads containing the variant nucleotide flanked on each side by 10 nt (20). In a few instances, adjustments were made to the sequence or positioning of this 21-nt region in order to cater for the location of SNPs close to each other or near the 3= end of the genome.
Bioinformatic analyses. Sixteen of the sequences listed in Table 1 were aligned by using ClustalW implemented in MEGA 5.2 (23); that of the original version of strain W3 [W3(cl)] (see below) was excluded. A maximum likelihood tree was drawn in MEGA by using a general timereversible, gamma-distributed (GTRϩ⌫) nucleotide substitution model. Nucleotide substitution was quantified in DNASp (24) by using the Jukes-Cantor substitution model over a sliding window of 500 nt. Codeml, as implemented in the PAML suite (25), was used to identify residues that have potentially evolved under positive selection and to calculate the ratio of nucleotide transitions to transversions (). Sequence distance matrices were calculated in MEGA by using the maximum compositional likelihood model with a gamma rate variation distribution.
Immunological detection of PIV5 SH. Vero cell monolayers in 25cm 2 plates were infected with strain W3, CPI ϩ , CPI Ϫ , or SER at an MOI of 10 PFU/cell. Cell lysates were prepared at 16 h postinfection, and proteins were separated by SDS-PAGE and transferred onto nitrocellulose membranes. Membranes were probed by using a polyclonal peptide antiserum  (17). Briefly, Vero cell monolayers in 75-cm 2 flasks were infected at an MOI of 5 PFU/cell with a low-passage-number stock of strain W3 (W3vM0). The culture medium was harvested every 2 to 3 days, and half was then used for analysis and the other half was used to infect another flask of cells. Twelve passages of strain W3 (W3vM1 to W3vM12) were performed. To monitor pseudoreversion of PIV5V⌬C (originally called rSV5V⌬C [27]), the virus was passaged 6 times in Vero cells from a working stock to generate PIV5V⌬Cp6.
RNA was isolated from purified nucleocapsids from the later strain W3 passages (W3vM8 to W3vM12), and sequence data were derived as described previously (17). Reads with a length of 73 nt were aligned to the W3vM0 sequence (GenBank accession number JQ743318) ( Table 1), and the alignments were visualized by using Tablet. Similarly, reads with a length of 76 nt obtained from PIV5V⌬Cp6 RNA were aligned to the PIV5V⌬C genome sequence. We have utilized the sequence data for W3vM0 and W3vM12 to characterize the DI populations in these virus stocks (17). SNPs were identified, and the proportions of variant nucleotides were determined, as described above. SNPs in a small region of the V open reading frame (ORF), in which pseudoreversion occurred, were investigated by identifying reads containing the flanking sequences and counting the numbers containing various combinations of variants in the intervening sequence.
Nucleotide sequence accession numbers. The sequences determined in this study were deposited in the GenBank database under accession numbers JQ743318 to JQ743328.

Sequence relationships among PIV5 strains.
We generated a total of 11 new PIV5 genome sequences by deep sequencing of genomic RNA isolated from purified nucleocapsids derived from infected cells ( Table 1). The consensus sequences of the newly analyzed PIV5 strains were compared among each other and the six PIV5 sequences available from GenBank ( Table 1). The newly derived sequence for strain W3 was generated from an uncloned virus stock that we originally obtained from Robert Lamb (Northwestern University, USA). It differs from the sequence deposited previously in GenBank (21), which was derived from cloned cDNA and which we refer to as W3(cl), by four synonymous substitutions (nt 505, 5882, 8164, and 9844) and two nonsynonymous substitutions (nt 4210 in M, resulting in G rather than V at residue 357, and nt 4599 in F, resulting in T rather than A at residue 24). From comparisons with the other strains, it was apparent that the differences at nt 505, 4210, 5882, 8164, and 9844 are unique to the strain W3(cl) sequence. The difference at nt 4599 is unique to the newly derived W3 sequence and is also represented as an SNP, with the minor variant being identical to the W3(cl) sequence. From this point in the study, we used only the new consensus sequence for strain W3. The length of the alignment of the 16 genome sequences was 15,252 nt, of which 1,192 positions (7.82%, including a 6-nt insertion in some strains) were variable. Among the variable positions, 842 positions represent differences in Ͼ1 sequence and are phylogenetically informative. Consistent with the level of positional variability over the entire alignment, the average pairwise difference is 2.1%, and no two sequences are Ͼ5% divergent ( Table 2). The most divergent pair are the human strain LN and the canine strain D277 ( A phylogenetic tree based on an alignment of the 16 genome sequences is shown in Fig. 1. The primate strains cluster together with 100% confidence, with the human cryptovirus sequence being the outlier in this group. The two porcine strains SER and KNU-11, which were isolated in Germany and South Korea, respectively (Table 1), are closely related to each other, with 100% confidence, and cluster with 98% confidence with the canine viruses CPI ϩ and CPI Ϫ , which were isolated in Germany (Table 1). Strain CPI Ϫ is formally a variant of strain CPI ϩ , which was isolated from the brain of a dog with temporary posterior paralysis and caused acute encephalitis when injected intracranially into gnotobiotic dogs (16). The variant was isolated by those authors from one such experimentally infected dog at 12 days postinfec- tion. Unlike strain CPI ϩ , variant strain CPI Ϫ fails to block interferon (IFN) signaling because of 3 amino acid substitutions in V, leading to a loss of the ability to target STAT1 for proteasomemediated degradation (28). The greatest genetic diversity is apparent among the canine strains, which were isolated in Germany, South Korea, and the United Kingdom ( Fig. 1 and Table 1). However, these strains cluster separately from the primate strains. Registering the caveat that confidence is Ͻ70% at the deepest nodes, the longest path is between the cluster of human strains and two of the South Korean canine strains (08-1990 and D277). In interpreting the data on sequence relationships, we cannot rule out the possibility that a degree of convergence may have occurred during propagation of the strains in Vero cells, which are unable to produce IFN. However, even if this were the case, it seems unlikely that it was significant, as host-dependent phylogenetic clustering of isolates was observed. Sequence diversity among PIV5 strains. In addition to illuminating the relationships among PIV5 strains, the sequences also provide information on the extent of variation along the genome, thus potentially yielding insights into regions in which this factor may relate to function. Diversity was analyzed in terms of nucleotide substitutions per site, expressed as (29), and was generally highest at gene junctions. In particular, three regions of relatively high variation were identified as peaks (Fig. 2). The first centers on nt 1700 and represents the C-terminal region of N (approximately residues 416 to 509), the 3= untranslated region (UTR) of the N mRNA, the 5= UTR of the V/P mRNA, and the N-terminal region of V and P (residues 1 to 76). The second centers on nt 4400 and represents the C terminus of M (residues 366 to 377), the 3= UTR of the M mRNA, the 5= UTR of the F mRNA, and the N terminus of F (residues 1 to 24). The third centers on nt 6300 and represents the 3= UTR of the F mRNA, the SH gene (examined in more detail below), and the 5= UTR of the HN mRNA.
A region of relatively moderate variation centers on nt 8300 and maps to the C-terminal region of HN (residues 524 to 565) and the 3= UTR of the HN mRNA (Fig. 2). Since, owing to immune selection, variation might be expected in the viral glycopro-teins, the amino acid substitutions in HN were scrutinized. Unique residues are apparent in certain groups of strains, such as most or all of the primate strains (L22, S49, G57, A254, S318, T460, and T536), the primate and porcine strains (D447), and the porcine strains (N120, T288, and S524). However, the level of variation observed overall is modest and not notably greater than that in other PIV5 proteins. Thus, although the substitutions observed might reflect a degree of selection for antibody-mediated evasion, the evidence is not compelling. A similar conclusion was drawn previously for F (7), and our more extensive analysis did not reveal any additional insights, with the exception that the E132K change, which was observed previously only in nonprimate strains, is not present in canine strain 1168-1. Also, strain W3 has seven substitutions absent from HN in other strains (A10, S114, M148, F209, G312, R368, and T491), which may be the result of extensive laboratory adaptation or indicative of a simian origin. A second simian isolate is required to distinguish between these two possibilities.
Three areas of relatively low variation are also evident (Fig. 2). The first centers on nt 2600 and corresponds to the C-terminal regions of P and V (residues 165 to 351 and 165 to 222, respectively). This part of the genome contains the overlapping proteincoding regions of the V/P gene and the part of the P gene encoding residues that interact with L (30). In the part of the genome that encodes L, two regions are particularly well conserved. These regions encompass approximately nt 10000 to 11200, which encode the region (amino acid residues 716 to 931) responsible for RNAdependent RNA polymerase activity (31), and nt 12800 to 13500 (residues 1463 to 1696), which are of undetermined function.
All of the variation among PIV5 strains occurs as substitutions, except for a 6-nt insertion in the region between the SH-and HN-coding regions in strains LN and RQ, which are closely related to each other ( Fig. 1 and Table 2). In most strains, the SH transcriptional termination site is located after nt 6502 to 6515, which (in the antigenome or mRNA sense) has the sequence 5=-U UUUAAAGAAAAAA-3=, followed by an intergenic U residue. Strains LN and RQ have an extended poly(A) site in this region (5=-UUUUAAAGAAAAAGAAAAAA-3= [the insertion is underlined]). The functional significance of this additional sequence is not known. However, the size of the insertion ensures that the strain LN and RQ genomes, like those of the other strains, adhere to the rule of six, whereby the genome sizes of certain paramyxoviruses occur in multiples of 6 nt (32).  In order to analyze the mode of PIV5 evolution, pairwise comparisons of the ratio of nonsynonymous to synonymous nucleotide substitutions () were made ( Table 2). A value of 0 indicates that nonsynonymous substitutions have been completely suppressed, a value of 1 indicates that neutral accumulation of substitutions has occurred, and values of Ͼ1 indicate that positive selection may have taken place. Except in a few instances in which sequences are very closely related and values are statistically unsound as a result (data in boldface type in Table 2), values for PIV5 strains isolated from various hosts are well below 1 (averaging 0. 19), indicating that moderate to strong constraint has been the prevailing selective mode. Values of for individual ORFs (Table 3) range from indicating strong selective constraint in L to indicating near neutrality in SH. An analysis of positive selection of specific amino acid residues revealed none at a significance level of a P value of Ͻ0.01, and three candidate residues (residues 447 in HN, 60 in M, and 31 in SH) were detected at a lower significance level (P Ͻ 0.05). However, we consider that the evidence for positive selection of specific PIV5 proteins, or residues therein, is not compelling.
The ratio of nucleotide transitions to transversions () for individual ORFs is shown in Table 3. The value for the combined protein-coding sequences is 8.17, which is somewhat higher than that found thus far for other viruses in the subfamily Paramyxovirinae, such as measles virus, for which the value is 5.1 (33,34). This overall value may have been enhanced by the high value for SH, for which special considerations may apply (as discussed below). The significance of the variation in for the other genes is not known.
As well as providing a view of diversity among the consensus PIV5 sequences, the sequence data also yielded information on variation within each of the 11 newly sequenced strains, detectable as SNPs, which were recorded when the minor variant was present in Ͼ2% of the sequence population ( Table 4). The total number was 111, and the number in each genome ranged from 0 in strain MEL to 19 in strain DEN. With the exception of one SNP, it was possible to infer from the genome sequence alignment which of the two variant nucleotides at each SNP most likely corresponds to the original and which corresponds to the mutation. Most minor variants (93/110; 85%) were inferred as representing mutations, and the proportions of instances in which this is not the case varied from genome to genome (e.g., they were more common in strain MIL than in strain DEN). Among the SNPs located in the ORFs, most (73/103; 71%) are nonsynonymous, perhaps indicating that the cognate variants may have been selected to detectable levels because they conferred a growth advantage. Only a single instance of an SNP shared among strains was discovered (nt 10755 in strains DEN and H221). It is notable that many of the SNPs in strain MIL map in the region at nt 1326 to 1950, which corresponds to the first peak of variation among strains (Fig. 2). There appear to be an underrepresentation of SNPs in the M ORF (observed/expected ϭ 0.38; P ϭ 0.05) and an overrepresentation in the HN ORF (observed/expected ϭ 1.62; P ϭ 0.02). The other ORFs and the nontranslated regions are represented as expected from a random distribution of SNPs.
Functional analysis of the PIV5 SH gene. As described above, one of the regions of relatively high variation maps to the SH gene (Fig. 2). This is due largely to a high level of U-to-C substitutions in the genomes of most nonprimate strains relative to those of primate strains (Fig. 3a). In this region, the primate strains may be placed into three groups, since the human strains MEL, MIL, DEN, LN, and RQ are identical. The nonprimate strains fall into six groups, since the porcine strains SER and KNU-11 are identical, as are the canine strain CPI ϩ and its variant CPI Ϫ and the canine strains 78524 and H221. The canine strains 08-1990 and D277 differ at a single nucleotide. Relative to strain MEL/MIL/ DEN/LN/RQ, various numbers of the 60 U residues in this region are replaced by C residues in the other strains, as follows: 1 in W3, 3 in cryptovirus, 23 in SER/KNU-11, 20 in CPI ϩ /CPI Ϫ , 2 in 1168-1, 3 in 8524/H221, and 21 in 08-1990/D277.
One of the U-to-C transitions in strains CPI ϩ /CPI Ϫ and SER/ KNU-11 is located in the SH AUG initiation codon, resulting in ACG. Moreover, the UAA termination codon is also mutated, appearing as CAG in strain CPI ϩ /CPI Ϫ and as CAA in strain SER/ KNU-11. The hypothesis that SH is not expressed by these strains, as was adduced in relation to strain KNU-11 (22), was tested by immunoblotting of lysates of Vero cells infected with strain W3, strain CPI ϩ , variant strain CPI Ϫ , or strain SER using a polyclonal peptide antiserum raised against the N-terminal 16 residues of strain W3 SH. The sequence of this region is well conserved among strains (Fig. 3a), with the peptides in strains SER and CPI ϩ /CPI Ϫ differing from those in strain W3 by 0 and 1 residues, respectively (not including the first residue). The results indicate that strain W3 expressed SH, whereas strain CPI ϩ , variant strain CPI Ϫ , and strain SER did not express an SH-related protein of a similar size (Fig. 3b) or larger (data not shown).
Sequence diversity of PIV5 passaged in cell culture. To determine whether the relative stability of the PIV5 genome observed among strains is also a feature of the virus passaged in vitro, the strain W3 genome was examined during sequential passaging at a high MOI in Vero cells. We generated this passage series previously, in order to analyze the production of PIV5 DI genomes (mostly of the trailer copyback variety), which are powerful inducers of the interferon response (17). Sequence data were obtained for the input virus (W3vM0; the same data were used to obtain the new sequence of strain W3) ( Table 1) and passages 8 to 12 in the series (W3vM8 to W3vM12). The major DI genome in W3vM12 accounted for 94% of all trailer copyback genomes (the major variety of DI genome) and was identified previously as resulting from the joining of nt 15062 (on the 5=-to-3= strand) to nt 14496 (on the 3=-to-5= strand) (17). The proportion of non-DI genomes in the population was estimated by aligning the reads separately to the nondefective and defective regions of the genome and deriving average read coverage values. This analysis indicated that W3vM8 to W3vM12 consisted of approximately 89, 86, 89, 94, and 98% DI genomes, respectively. For this exercise, W3vM0 was assumed essentially to lack DI genomes, and indeed, no reads  representing the novel junction in the major DI genome were detected. A total of 11 SNPs reached an abundance of Ͼ5% in at least one of the passages monitored ( Table 5). Replacement of the U residue at nt 14 by C was linked to replacement of the C residue at nt 25 by U. These SNPs are located in the promoter at the 3= end of the genome, and strikingly, the emerging variants dominated the viral population in W3vM12. Some of the other SNPs present in W3vM0 increased in abundance (at nt 2482, 10442, 11967, and 13261), and some decreased (at nt 4599, 5807, 5879, 7702, and 14927; the last was present in the major DI genome, and it was not possible to ascertain whether it was also present in the non-DI genome population). These results show that, even over 12 passages in vitro and under conditions where nondefective genomes were subjected to competition from an overwhelming proportion of DI genomes, the nondefective sequences exhibited little diversity.
In a further investigation of the stability of PIV5 in cell culture, we utilized PIV5V⌬C, which is a mutant derived from a molecular clone of strain W3. In this mutant, two translational termination codons have been introduced into the V ORF via substitutions of 3 nt that do not affect the amino acid sequence of P but result in expression of a truncated version of V lacking the C-terminal region (27) (Fig. 4a and b). Passaging of this virus in various cell lines, including Vero, has been shown to result in rapid mutation to a derivative that expresses full-length V, via substitutions in both of the introduced termination codons (27). To monitor this process of pseudoreversion in greater detail, the virus was passaged 6 times in Vero cells to generate a stock named PIV5V⌬Cp6. Seven SNPs were detected in PIV5V⌬Cp6. Three were located outside the V ORF (Table 4), and four were located in the V ORF (nt 2357, 2362, 2363, and 2372). The latter SNPs were U-to-C substitutions (sub1, sub2, sub3, and sub4) and are listed in Fig. 4c. Two (sub1 and sub3) are the same as those reported previously (27). The original sequence plus all arrangements of the four substitutions may result, in principle, in 16 different sequence con- stellations. However, only five (PIV5V⌬C, sub13, sub134, sub123, and sub1234) were detected in Ͼ0.2% of PIV5V⌬Cp6 genomes, together summing to 99.58% (Fig. 4d to h). The two changes (sub1 and sub3) in the termination codons resulted in full-length V containing an amino acid substitution (F170Q), and a third change (sub4) added a second amino acid substitution in V (Y175H), to generate the major variant (sub134) (Fig. 4f). In addition, a substitution in P was detected at lower levels (sub2, as represented in sub123 and sub1234) ( Fig. 4g and h). This experiment again demonstrated the relative stability of the PIV5 genome during passage at a high MOI and also showed that adaptation can nonetheless occur rapidly when a selection pressure is exerted. The two substitutions (sub2 and sub4) not associated with pseudoreversion of the termination codons were not detected in a previous experiment (27), and in our experiment, it is possible that they were not selected in their own right but were subsidiary to sub1 and sub3.

DISCUSSION
We generated 11 genome sequences for 10 PIV5 strains and a variant of one of these strains ( Table 1). The incorporation of published genome sequences for other strains facilitated the phylogenetic analysis of a total of 15 strains and the variant. In the tree (Fig. 1), the primate strains cluster together, as do the porcine strains, but there is much greater diversity among the canine strains. Nonetheless, there is remarkable sequence conservation overall, even though the strains originated from a variety of hosts and were isolated from a wide geographical area during a period of several decades (Tables 2 to 4). Surprisingly, the level of variation in HN and F, which are targets for antibody-mediated virus neutralization, was not significantly greater than that observed in most other viral proteins. Amino acid residues 342, 437, and 457 in HN have been linked to promoting resistance to neutralizing antibodies (35). However, with the exception of a Q342K substitution in two of the South Korean canine isolates, variation in these, or intervening, residues was not observed. Overall, these results suggest that antibody-mediated selection has not played a major role in the evolution of the PIV5 strains analyzed, perhaps because the virus is not particularly immunogenic or because cellmediated immunity is more important in controlling PIV5 infections. The RNA polymerases of paramyxoviruses, like those of most other RNA viruses, lack proofreading mechanisms and therefore have high error rates, resulting in the continual production of  large numbers of mutants during replication. Consequently, when a selection pressure is sufficiently strong, for example, in the case of PIV5V⌬C, which lacks a functional V ORF, mutations are generated with facility. In this context, it is striking that the PIV5 genome, as represented by the range of strains studied, exhibits so little diversity. Indeed, even the level of synonymous mutations among strains is impressively low and similar to what has been reported for another rubulavirus, mumps virus (36,37), and a morbillivirus, measles virus (38). The low level of variation in PIV5 is also similar to that observed in human PIV3 isolates. However, the level of variation observed between human and bovine PIV3 isolates is about six times greater than that observed between PIV5 strains isolated from different species. The reasons for the low levels of diversity in paramyxoviruses are not understood. Constraints potentially operate at several levels, affecting recognition by innate immune responses or the level and rate of protein translation, including biases against specific dinucleotides (39), biases in codon usage (recently shown to be important in HIV-1 replication) (40), and codon pair context (41,42). Codon usage appears to be able to exert an unexpectedly large effect on translational efficiency in some microbial organisms (43) and also appears to be virus specific, rather than host specific, in the subfamily Paramyxovirinae (B. K. Rima, unpublished data). Since paramyxovirus genomes are always encapsidated, secondary structures in the genomes or antigenomes are unlikely to provide a significant constraint on synonymous mutations. However, mRNA secondary structure can influence the rate of protein translation and thus affect indirectly the proper folding of nascent proteins (44). Finally, it must be registered that any RNA virus enters the RNA world of the host cell and thus must avoid the formation of double-stranded RNA (dsRNA) structures with complementary cellular RNAs, including microRNAs (miRNAs).
The rule of six (32) appears to be an important factor in the fitness of members of the subfamily Paramyxovirinae. Its relevance is demonstrated in our data by strains RQ and LN, which have an insertion of 6 nt at the 3= end of the SH gene. Since the SH gene of PIV5 is dispensable for growth in cell culture (21), the effect of this insertion cannot be evaluated easily. Toleration of a certain level of diversity at the 3= end of rubulavirus mRNAs may be seen in a variation in the mumps virus F gene, which led to the insertion of oligo(G) tracts immediately prior to the poly(A) tail without affecting viral growth in vivo (37).
The rule of six has been interpreted as indicating that all nucleotides in a genome must be attached to N molecules in groups of six (32). This, in turn, has established the concept of "phase," that is, the position of a nucleotide within a specific group of six. The phases (phases 1 to 6) of the 5= nucleotides of each of the mRNAs in all the PIV5 genomes show strong overall conservation, similar to other paramyxoviruses (45). If phase is an important feature of rubulaviruses, it might operate as a constraint on the occurrence of synonymous mutations. However, most of the variation in PIV5 strains is clustered in three regions between (i) the ORFs of the N and V/P proteins, (ii) the M and F ORFs, and (iii) the F and HN ORFs, including the entire SH gene (Fig. 2). The implication that constraint has operated primarily on protein-coding sequences is consistent with the observation that following the loss of protein function, many synonymous and (potentially) nonsynonymous mutations have accumulated, as observed for the SH gene in strains CPI ϩ /CPI Ϫ and SER.
Analysis of variation in the PIV5 SH gene indicated a prepon-derance of U-to-C substitutions in many of the nonprimate strains. As a mechanism, this suggests biased hypermutation due to an adenosine deaminase, RNA-specific 1 (ADAR1)-like activity that deaminates A residues in dsRNA to I residues, which then pair preferentially with C residues in subsequent rounds of replication (46). At some point, mutations in strains CPI ϩ /CPI Ϫ and SER/ KNU-11 have led to a change in the AUG initiation codon for SH to ACG (Fig. 3). Although ACG can act as an initiation codon in certain circumstances, for example, in expression of the Y proteins in Sendai virus (47), in PIV5 strains CPI ϩ /CPI Ϫ and SER, this change, perhaps in combination with others, has led to a lack of expression of SH. The function of SH is unclear, but recent studies have reported that it inhibits tumor necrosis factor alpha (TNF-␣)-induced apoptosis (48)(49)(50). In certain cell types, such as bovine kidney (MDBK) cells, PIV5 causes little cytopathic effect, but when MDBK cells were infected with recombinant PIV5 lacking the SH gene, an increased cytopathic effect was observed (48). Our results indicate that the SH gene was lost some time ago, as strains CPI ϩ /CPI Ϫ and SER were isolated in the 1970s and 1990s, respectively (Table 1), and this absence has been maintained in strain KNU-11, which was isolated in 2011 (22). In regard to strain CPI ϩ /CPI Ϫ , it seems very likely, given the genetic stability of PIV5, that the loss of the ability of variant strain CPI Ϫ to block IFN signaling was selected in vivo, possibly because IFN-sensitive viruses may be better able to establish prolonged or persistent infections (28).
Our study also shows that PIV5 also exhibits little diversity when passaged in cell culture. Thus, serial passaging of strain W3 at a high MOI, under conditions of competition with large proportions of DI genomes, generated only a modest number of mutations (Table 5). Although it would not be expected that competition with DIs would necessarily lead to the selection of nonsynonymous mutations, it was striking that, despite the fact that the PIV5 genome is not codon optimized, the only synonymous mutations selected were in the region of the leader promoter and that no other synonymous mutations were coselected with these promoter mutations. Similar observations were made for the mutant lacking V function (PIV5V⌬C), in which mutations were confined largely to pseudoreversion events in the V ORF (Fig. 4). As has been reported previously (27), it is clear that there is strong selection to remove the termination codons and allow expression of intact V, in which the C-terminal zinc finger motif is the most conserved element in members of the subfamily Paramyxovirinae. Pseudoreversion occurred even though the virus was passaged in Vero cells, which are not able to induce IFN. Thus, the pressure to regain V function cannot be related to blocking of the IFN response and is presumably due to a restoration of some other important function of V, for example, in controlling viral transcription or replication. A number of mutations could potentially have arisen in PIV5V⌬C that would have led to the loss of the termination codons, but the only pseudorevertants detected in our study and in a previously reported experiment (27) involved replacement of U by C residues (sub1 and sub3). In our experiment, two other U-to-C transitions also occurred closely adjacent to the termination codons (sub2 and sub4). These findings again suggest the operation of biased hypermutation, although (as discussed above for SH) an intrinsic bias of the RNA polymerase cannot be ruled out.
In conclusion, the sequences of PIV5 strains indicate that the genome exhibits little diversity in vivo and in vitro. However, the identity of the main animal reservoir of PIV5 remains unclear. SH appears to be dispensable for growth in dogs and pigs (although further experimentation will be needed to the determine the consequences, if any, of the loss of SH on PIV5 pathogenesis in dogs and pigs), and the absence of SH from some strains isolated from these animals may imply that they are not the natural hosts. Moreover, the fact that PIV5 is not known to cause detectable disease in humans, whereas infections in dogs may be severe, could be interpreted as indicating that humans are the reservoir host from which PIV5 has transferred into dogs and pigs. If so, this might explain why monkeys brought into captivity apparently become infected by PIV5 through contact with humans. To resolve the question of the origins of PIV5, many more strains from various sources need to be analyzed, particularly from geographical locations where dogs are not immunized against PIV5.
Finally, the findings based on the sequence information presented here raise questions that will form the basis of a number of future mechanistic studies, including (i) investigating the role of ADAR and the mechanisms that underlie biased hypermutation in viruses with encapsidated genomes; (ii) determining why there are relatively few synonymous mutations in viruses isolated over decades and from different species, given that the PIV5 genome is not codon optimized; (iii) elucidating what the selection pressures are on PIV5V⌬C to make it revert so quickly in the absence of a host cell IFN response; and (iv) determining why antibody-mediated immune selection appears not to be a major driving force in PIV5 evolution.