Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition

Utility of the small Staphylococcus aureus Cas9 is improved by broadening its targeting range through molecular evolution of its PAM sequence. CRISPR-Cas9 nucleases target specific DNA sequences using a guide RNA but also require recognition of a protospacer adjacent motif (PAM) by the Cas9 protein. Although longer PAMs can potentially improve the specificity of genome editing, they limit the range of sequences that Cas9 orthologs can target. One potential strategy to relieve this restriction is to relax the PAM recognition specificity of Cas9. Here we used molecular evolution to modify the NNGRRT PAM of Staphylococcus aureus Cas9 (SaCas9). One variant we identified, referred to as KKH SaCas9, showed robust genome editing activities at endogenous human target sites with NNNRRT PAMs, thereby increasing SaCas9 targeting range by two- to fourfold. Using GUIDE-seq, we show that wild-type and KKH SaCas9 induce comparable numbers of off-target effects in human cells. Our strategy for evolving PAM specificity does not require structural information and therefore should be applicable to a wide range of Cas9 orthologs.

Originally discovered as an essential component of the bacterial, clustered, regularly interspaced, short palindromic repeat (CRISPR) immune system, the CRISPR-associated protein 9 (Cas9) has become a widely used customizable nuclease for genome editing [1][2][3] . Cas9 cleavage specificity is directed by two short RNAs known as the CRISPR RNA (crRNA) and trans-activating CRISPR RNA (tracrRNA) 4,5 , which can be fused into a single guide RNA (sgRNA) 4,6,7 . The 5′ end of the sgRNA (derived from the crRNA) can base pair with the target site DNA, thereby permitting straightforward reprogramming of site-specific cleavage by the Cas9:sgRNA complex 4,8 . However, Cas9 must also recognize a specific PAM that lies proximal to the DNA that base pairs with the sgRNA 4,[9][10][11][12] , a requirement that is needed to initiate sequence-specific recognition 13 but that can also constrain the targeting range of these nucleases for genome editing. The widely used S. pyogenes Cas9 (SpCas9) recognizes a short NGG PAM 4,14 , which occurs once in every 8 bps of random DNA. By contrast, other Cas9 orthologs characterized to date can require longer PAMs 12,[15][16][17][18] . For example, S. aureus Cas9 (SaCas9), one of several smaller Cas9 orthologs that are better suited for viral delivery 12,17,18 , recognizes a longer NNGRRT PAM, which is expected to occur once in every 32 bps of random DNA. Broadening the targeting range of Cas9 orthologs is important for various applications, including the modification of small genetic elements (e.g., transcription factor binding sites 19,20 ) or performing allelespecific alterations by positioning sequence differences within the PAM 21 .
One potential strategy for improving the targeting range of orthogonal Cas9s that recognize extended PAMs is to alter their PAM recognition specificities. In previous work 22 , we demonstrated the feasibility of changing the PAM specificity of SpCas9 using a combination of structure-guided design and directed evolution performed with a bacterial cell-based selection system. A limitation of this approach is the need to evolve a separate variant for each potential PAM sequence, a challenge that becomes even greater for Cas9 orthologs that specify longer PAMs. An alternative strategy for such orthologs might be to evolve variants that have relaxed or partially relaxed specificities for certain positions within the PAM. The capability to engineer such variants would expand the utility of Cas9 orthologs that specify longer PAMs.
We devised an unbiased genetic approach for engineering Cas9 variants with relaxed PAM recognition specificities that does not require structural information. We tested this strategy using SaCas9, for which no structural data was available at the time we initiated these studies. In an initial step, we sought to conservatively estimate the PAM-interacting domain for SaCas9 by sequence comparisons with the structurally well-characterized SpCas9 [23][24][25] . Although SpCas9 and SaCas9 differ substantially at the primary sequence level ( Fig. 1a and Supplementary Fig. 1), alignment of both with ten additional orthologs enabled us to conservatively define a predicted PAM-interacting domain for SaCas9 (Online Methods and Supplementary Figs. 1 and 2).
Because the guanine at the third position in the SaCas9 PAM is the most strictly specified base 17 , we randomly mutagenized the predicted PI domain and used our previously described bacterial cell-based method 22 to attempt to select for mutants capable of cleaving sites with each of the three other possible nucleotides at the third PAM position (i.e., NN[A/C/T]RRT PAMs (NNHRRT); Supplementary  Fig. 3a). All but one of the surviving variants from the selections Broadening the targeting range of Staphylococcus aureus CrIsPr-Cas9 by modifying PAM recognition 1 2 9 4 VOLUME 33 NUMBER 12 DECEMBER 2015 nature biotechnology l e t t e r s against sites containing NNARRT and NNCRRT PAMs harbored an R1015H mutation (Supplementary Fig. 4), whereas we did not obtain any variants from the selections with NNTRRT PAMs. These results strongly suggested that R1015 might participate in recognition of the guanine at the third position of the SaCas9 PAM. Indeed, in our alignments we found that R1015 of SaCas9 is in the vicinity of SpCas9 R1335 (Supplementary Fig. 2), a residue previously implicated in recognition of the third base position of the PAM 22,24 . Consistent with this, we found that mutation of R1015 to an alanine or glutamine substantially decreased SaCas9 activity on a target site containing an NNGRRT PAM (Fig. 1b) when tested in our bacterial selection system (Supplementary Fig. 3b). Alanine or glutamine substitutions of other positively charged residues in the vicinity of R1015 did not have as strong of an effect on SaCas9 activity ( Fig. 1b and  Supplementary Fig. 2).
Our bacteria-based selection results also suggested that the R1015H mutation might at least partially relax the specificity of SaCas9 at the third position of the PAM. However, we found that the R1015H single mutant had suboptimal activity in our previously described human cell-based EGFP disruption assay 26 when tested against sites with any nucleotide at the third position of NNNRRT PAMs (Fig. 1c). Because this suggested that additional mutations might be required to increase or optimize the activity of the R1015H mutant in human cells, we randomly mutagenized a region encompassing the predicted PI domain of an SaCas9 variant that also harbored an R1015Q mutation. We then selected for variants from this library that could cleave target sites with each of the three different NNHRRT PAMs using our bacterial selection system. We used R1015Q because, unlike R1015H, this mutant did not show activity in bacteria (Fig. 1b, and data not shown, respectively). Although no surviving clones were again observed when selecting against NNTRRT PAMs, selections with the R1015Q variant against NNARRT or NNCRRT yielded mutations at E782, K929, N968, and a mutation of the Q at 1015 to H (Supplementary Fig. 5).
Combined with the selection results from wild-type SaCas9, the most frequent missense mutations identified across all selections were E782K, K929R, N968K and R1015H ( Fig. 1d and Supplementary  Figs. 4 and 5), suggesting that a combination of these mutations might permit efficient cleavage of sites that contain an A or C at the third position of the SaCas9 PAM. We therefore tested SaCas9 variants containing different combinations of these mutations using the human cell-based EGFP disruption assay with sgRNAs targeted to sites harboring each of the four bases at the third position of the PAM (i.e., on NNNRRT PAMs) ( Fig. 1e and Supplementary Fig. 6). We found that the variants with the triple mutant combinations E782K/ N968K/R1015H and E782K/K929R/R1015H were highly active at sites with NNNRRT PAMs (Fig. 1e and Supplementary Fig. 6), whereas the quadruple mutant variant containing all four mutations (E782K/ K929R/N968K/R1015H) had generally lower activities on these sites (Supplementary Fig. 6). We chose the E782K/N968K/R1015H variant (hereafter referred to as the KKH) for further characterization, and verified, using our human cell-based EGFP disruption assay, npg l e t t e r s that all three substitutions comprising the KKH variant were required for activity (Fig. 1e).
To more comprehensively define the PAM specificities of KKH and wild-type SaCas9, we used our previously described bacterial cell-based site-depletion assay 22 (Supplementary Fig. 7). This method yields Cas9 PAM specificity profiles by identifying the relative cleavage (and therefore depletion in bacterial cells) of DNA plasmids bearing randomized PAMs, quantified as a postselection PAM depletion value (PPDV). We performed sitedepletion experiments with both wild-type and KKH SaCas9 using libraries with two different spacer sequences each with eight randomized bases in place of the PAM (Supplementary Fig. 7). Control experiments using catalytically inactive SaCas9 showed little depletion of any PAM sequence (Supplementary Fig. 8a), enabling us to establish a threshold for statistically significant depletion as a PPDV of 0.794 (Supplementary Fig. 8b). Previous experiments have shown that PAMs with PPDVs of <0.2 in our bacterial site-depletion assay can be efficiently cleaved in our human cell-based EGFP disruption assay 22 . With wild-type SaCas9, the most depleted PAMs (based on mean PPDVs obtained from the two libraries) were, as expected, the four NNGRRT PAMs ( Fig. 1f and Supplementary Fig. 8c).
Notably, other PAMs with mean PPDVs < 0.1 included those of the form NNGRRN (Supplementary Fig. 8d), suggesting that for some spacer sequences the last position of the PAM may not be fully specified as a T in our bacteria-based assay (although a previous report demonstrated by an in vitro PAM depletion assay, ChIP-seq and targeting of endogenous human sites that a thymine at the sixth position of the PAM was highly preferred 17 ). By contrast, with the KKH variant, PAMs with mean PPDVs of <0.2 included not only the NNGRRT PAMs but also all four NNARRT, all four NNCRRT, and three of the four NNTRRT PAMs ( Fig. 1f and Supplementary  Fig. 8c,e). These results suggested that KKH SaCas9 appears to have a broadened PAM targeting range relative to its wild-type counterpart, and perhaps an enhanced preference for thymine at the sixth position of the PAM.
To assess the robustness of the KKH SaCas9 variant in human cells, we tested its activity on 55 different endogenous gene target sites containing a variety of NNNRRT PAMs (Fig. 2a) Fig. 9a). KKH SaCas9 functioned efficiently on spacer lengths of 21-23 nucleotides (Fig. 2d), spacer sequences with variable GC content (Supplementary Fig. 9b) and PAMs with variable GC content (Supplementary Fig. 9c). Sequence logos derived from sites cleaved with low, medium and high efficiencies (0-10%, 10-30% and >30% mean mutagenesis frequencies, respectively) revealed little sequence preference across the entire target site other than at the fourth and fifth positions of the NNNRRT PAM, and perhaps a slight preference for guanine at the second PAM position on sites cleaved with high efficiencies (Supplementary Fig. 9d). NNNRRT PAM npg l e t t e r s To demonstrate that the KKH variant enables modification of PAMs that cannot be targeted by wild-type SaCas9, we performed direct comparisons of these nucleases in human cells on sites bearing various NNNRRT PAMs. Assessment of 16 sites using our EGFP disruption assay and 16 endogenous human gene targets (Fig. 2e,f, respectively) showed that KKH SaCas9 robustly modified target sites bearing NNNRRT PAMs whereas wild-type SaCas9 efficiently targeted only sites with NNGRRT PAMs. For all 24 sites with NNHRRT PAMs, the KKH variant induced substantially higher rates of mutagenesis than wild-type SaCas9. On the eight sites with NNGRRT PAMs, KKH SaCas9 induced comparable or slightly lower levels of mutagenesis compared with wild type (Fig. 2e,f). These results collectively demonstrate that the KKH variant can cleave sites with NNNRRT PAMs, thereby enabling targeting of sites with NNHRRT PAMs that currently cannot be efficiently altered by wild-type SaCas9 in human cells.
To assess the impact of the KKH mutations on the genome-wide specificity of SaCas9, we used the GUIDE-seq (genome-wide unbiased identification of double-stranded breaks enabled by sequencing) method to directly compare the off-target profiles of wild-type and KKH SaCas9 with the same sgRNAs 27 . When tested with sgRNAs targeted to six endogenous human gene sites containing NNGRRT PAMs, we observed that wild-type and KKH SaCas9 induced nearly identical GUIDE-seq tag integration rates and on-target cleavage frequencies for all six sites (Supplementary Fig. 10a,b, respectively). Furthermore, wild-type and KKH SaCas9 induced mutations at similar numbers of off-target sites with each of the six sgRNAs (Fig. 3a,b). Off-target sites for the KKH variant generally adhered to the NNNRRT PAM motif, and off-target sites for wild-type SaCas9 adhered to an NNGRR[T>G] motif (Fig. 3b). With one of the sgRNAs, which induced the highest number of off-target sites among the six sgRNAs tested, we observed a similar number of off-target sites with  l e t t e r s wild-type and KKH SaCas9. However, the off-target sites were only partially overlapping between wild-type and KKH SaCas9, as might be expected given their different PAM specificities (Fig. 3b,c). Although we would not advocate the use of the KKH variant for targeting sites with NNGRRT PAMs (because wild-type SaCas9 can show higher on-target activities than KKH for these sites), these results suggest that KKH SaCas9 mostly cleaves off-target sites with the expected PAMs and generally induces numbers of off-target sites comparable to those observed with wild-type SaCas9.
To further examine the genome-wide specificity of KKH SaCas9, we tested five additional sgRNAs targeted to sites containing NNHRRT PAMs (Fig. 3d,e). Off-target sites detected by GUIDE-seq were generally low in number (comparable to the numbers observed with wild-type SpCas9 and SpCas9 variants in previously published experiments 22,27 ), displayed potential DNA-and RNA-bulged offtarget sites 28 , and contained expected PAMs. Taken together, our experiments demonstrate that the genome-wide specificities of wildtype and KKH SaCas9 are similar and generally show low numbers of off-target cleavage sites in human cells as judged by GUIDE-seq.
Although wild-type SaCas9 remains the optimal choice for targeting NNGRRT PAMs, the KKH SaCas9 variant we describe here can robustly target sites with NNARRT and NNCRRT PAMs and has a reasonable success rate for sites with NNTRRT PAMs. Thus, we conservatively estimate that the KKH variant increases the targeting range of SaCas9 by nearly two-to fourfold in random DNA sequence (Supplementary Note 1), thereby improving the prospects for more broadly utilizing SaCas9 in a variety of different applications that require highly precise targeting. Using GUIDE-seq, we demonstrated that KKH SaCas9 induces similar numbers of off-target mutations as wild-type SaCas9 when targeted to the same sites that contain NNGRRT PAMs. Also, KKH SaCas9 induces only a small number of off-target mutations when targeted to sites bearing NNHRRT PAMs. Although KKH SaCas9 recognizes a modified PAM sequence relative to wild-type SaCas9, our findings are not entirely surprising given that the total combined length of the protospacer and PAM is still long enough with the KKH variant (24-26 bps) to be reasonably orthogonal to the human genome. Furthermore, it is possible that modifying PAM recognition can improve specificity by altering the energetics of Cas9:sgRNA interaction with its target site (similar to the previously proposed mechanisms for improved specificities of truncated sgRNAs 29 or the D1135E SpCas9 mutant 22 ).
Our results suggested that R1015 in wild-type SaCas9 contacts the G in the third PAM position, a finding confirmed by structures of SaCas9 bound to target DNA sites that were published as we were finalizing this work for submission 30 . We speculate that the R1015H substitution removes this contact and relaxes specificity at the third position; however, loss of the R1015 to G contact could also conceivably reduce the energy associated with target site binding, which may explain why the R1015H mutation alone is not sufficient for robust activity at NNNRRT sites in human cells. Because the E782K and N968K substitutions both add positive charge, it is possible that they may make nonspecific interactions with the DNA phosphate backbone to compensate energetically for the loss of the R1015 to guanine contact. The recently published SaCas9 structure 30 supports this hypothesis, because E782 is in the vicinity of the target strand DNA backbone (near the spacer-PAM junction), and N968 is near the nontarget strand backbone within the PAM.
The genetic approach described here does not require structural information and therefore should be applicable to many other Cas9 orthologs. The only requirement to evolve Cas9 nucleases with broadened PAM specificities is that they function in our bacteria-based selection. Although previous studies demonstrated that PAM recognition can be altered by swapping the PAM-interacting domains of highly related Cas9 orthologues 25 , it remains to be determined whether this strategy is generalizable or effective when using more divergent orthologs. By contrast, the evolution strategies we have described here and in a previous study 22 can, in principle, be used to engineer PAM recognition specificities beyond those encoded within naturally occurring Cas9 orthologs. We envision that our overall strategy can be employed to expand the targeting range and extend the utility of the numerous Cas9 orthologs that exist in nature.

METHODS
Methods and any associated references are available in the online version of the paper.  Supplementary Table 1 and sgRNA target sites are listed in Supplementary  Table 2. A selection of the new plasmids in this study will be deposited with the non-profit plasmid repository Addgene: http://www.addgene.org/ crispr-cas (Supplementary Note 2).

Plasmids and oligonucleotides. A list and sequences of plasmids used in this study can be found in Supplementary Note 2. Oligonucleotides are listed in
Bacterial Cas9/sgRNA expression plasmids were used to express both a human codon optimized version of SaCas9 and the sgRNA, each expressed under a separate T7 promoter. Bacterial expression plasmids used in the selections were derived from BPK2101 (ref. 22) whereas those used in the site-depletion assay were modified to express a sgRNA with a shortened repeat:anti-repeat sequence (see below). All sgRNAs in these bacterial expression plasmids included two guanines at the 5′ end of the spacer sequence for proper expression from the T7 promoter.
To generate libraries of SaCas9 variants, amino acids M657-G1053 of SaCas9 were randomly mutagenized using Mutazyme II (Agilent Technologies) at a frequency of ~5.5 mutations/kilobase. Both wild-type and R1015Q SaCas9 were used as starting template for mutagenesis, resulting in two libraries with estimated complexities of greater than 6 × 10 6 clones.
Positive selection plasmids were assembled by ligating oligonucleotide duplexes encoding target sites into XbaI/SphI-digested p11-lacY-wtx1 (ref. 31). For the site-depletion experiments, two separate libraries containing different spacer sequences were generated. For each library, an oligonucleotide containing eight randomized nucleotides adjacent to the spacer sequence (in place of the PAM) was complexed with a bottom strand primer and filled in using Klenow(exo) (refer to Supplementary Table 1). The resulting product was digested with EcoRI and ligated into EcoRI/SphI-digested p11-lacY-wtx1. Estimated complexities of the two site-depletion libraries were greater than 4 × 10 6 clones.
For human cell experiments, human codon-optimized wild-type and variant SaCas9s were expressed from a plasmid containing a CAG promoter. sgRNA expression plasmids (containing a U6 promoter) were generated by ligating oligonucleotide duplexes encoding the spacer sequence into BsmBI-digested VVT1 (ref. 22) or BPK2660 (containing the full length 120-nt crRNA:tracrRNA sgRNA or a 84-nt shortened repeat:anti-repeat version, respectively). All sgRNAs used in this study for human expression included one guanine at the 5′ end of the spacer to ensure proper expression from the U6 promoter, and also used a shortened sgRNA (Supplementary Fig. 11) similar to that previously described 17 .
Bioinformatic analysis of Cas9 ortholog sequences. Similar to alignments performed in previous studies 15, 17,24 , Cas9 orthologs similar to both SpCas9 and SaCas9 were aligned using ClustalW2 (http://www.ebi.ac.uk/Tools/msa/ clustalw2/). The resulting phylogenetic tree and protein alignment were visualized using Geneious version 8.1.6 and ESPript (http://espript.ibcp.fr/ESPript/ESPript/). Bacteria-based positive selection assay. The bacterial positive selection assays were performed as previously described 22 . Briefly, Cas9:sgRNA plasmids were used to transform Escherichia coli BW25141(λDE3) 32 containing a positive selection plasmid. Transformations were plated on both nonselective (chloramphenicol) and selective (chloramphenicol + 10 mM arabinose) conditions. Cas9 cleavage of the selection plasmid was estimated by calculating the percent survival: (no. of colonies on selective plates/no. of colonies on nonselective plates)×100. To select for SaCas9 variants capable of recognizing alternative PAMs, the wildtype and R1015Q libraries with mutagenized PI domains were used to transform competent E. coli BW25141(λDE3) containing positive selection plasmids with NNAAGT, NNAGGT, NNCAGT, NNCGGT, NNTAGT or NNTGGT PAMs. Approximately 1 × 10 5 clones were screened by plating on selective conditions, and surviving colonies containing SaCas9 variants presumed to cleave the selection plasmid were mini-prepped (MGH DNA Core). All variants were re-screened individually in the positive selection assay, and those with greater than ~20% survival were sequenced to determine the mutations required for recognition of the alternate PAM.
Bacteria-based site-depletion assay. The site-depletion experiments were done as previously described 22 . Briefly, the randomized PAM libraries were electroporated into competent E. coli BW25141(λDE3) containing either wild-type, catalytically inactive (D10A/H557A) or KKH variant SaCas9/sgRNA plasmids. Greater than 1 × 10 5 colonies were plated on chloramphenicol/carbenicillin plates, and selection plasmids with PAMs resistant to Cas9 targeting contained within the surviving colonies were isolated by maxiprep (Qiagen). The region of the plasmid containing the spacer sequence and PAM was PCR-amplified using the primers listed in Supplementary Table 1. The KAPA HTP library preparation kit (KAPA BioSystems) was used to generate a dual-indexed Tru-seq Illumina sequencing library using ~500 ng purified PCR product from each site-depletion condition before an Illumina MiSeq high-throughput sequencing run at the Dana-Farber Cancer Institute Molecular Biology Core. The data from the site-depletion experiments were analyzed as previously described 22 , with the exception that the script was modified to analyze eight randomized nucleotides. The counts of all possible 8-nt strings for each site-depletion treatment can be found in Supplementary Table 3. Cas9 ability to recognize PAMs was determined by calculating the PPDV of any given PAM: the ratio of the post-selection frequency of that PAM to the pre-selection library frequency. A control experiment using catalytically inactive SaCas9 was used to establish that a PPDV of 0.794 represents statistically significant depletion relative to the input library.
Human cell culture and transfection. U2OS cells obtained from our collaborator T. Cathomen (Freiburg) and U2OS.EGFP cells harboring a single integrated copy of an EGFP-PEST reporter gene 33 were cultured in Advanced DMEM medium (Life Technologies) with 10% FBS, penicillin/streptomycin, and 2 mM GlutaMAX (Life Technologies) at 37 °C with 5% CO 2 . Cell line identities were validated by STR profiling (ATCC) and deep sequencing, and cells were tested bi-weekly for mycoplasma contamination. U2OS.EGFP culture medium was additionally supplemented with 400 µg/mL G418. Cells were co-transfected with 750 ng Cas9 plasmid and 250 ng sgRNA plasmid using the DN-100 program of a Lonza 4D-nucleofector following the manufacturer's instructions.
Human cell EGFP disruption assay. EGFP disruption experiments were performed as previously described 26,33 . Approximately 52 h post-transfection, a Fortessa flow cytometer (BD Biosciences) was used to measure EGFP fluorescence in transfected U2OS.EGFP cells. Negative control transfections of Cas9 and empty U6 promoter plasmids were used to establish background EGFP loss at ~2.5% for all experiments (represented as a red dashed line in figures). T7E1 assay. T7E1 assays were performed as previously described 33 to quantify Cas9-induced mutagenesis at endogenous loci in human cells. Approximately 72 h post-transfection, genomic DNA was isolated using the Agencourt DNAdvance Genomic DNA Isolation Kit (Beckman Coulter Genomics). Target loci were PCR-amplified from ~100 ng of genomic DNA using the primers listed in Supplementary Table 1. Following an Agencourt Ampure XP clean-up step (Beckman Coulter Genomics), ~200 ng purified PCR product was denatured and hybridized before digestion with T7E1 (New England BioLabs). Following a second clean-up step, mutagenesis frequencies were quantified using a Qiaxcel capillary electrophoresis instrument (Qiagen).

GUIDE-seq experiments.
GUIDE-seq experiments were performed and analyzed as previously described 27 . Briefly, U2OS cells were transfected as described above with Cas9 and sgRNA plasmids, as well as 100 pmol of a phosphorylated, phosphorothioate-modified, double-stranded oligodeoxynucleotide (dsODN) with an embedded NdeI site. Restriction fragment length polymorphism (RFLP) analyses were performed to determine frequency of dsODN-tag integration frequencies 22,27 , and T7E1 assays were performed to quantify on-target Cas9 mutagenesis frequencies. dsODN tag-specific amplification and library preparation 27 was performed before high-throughput sequencing using an Illumina MiSeq Sequencer. When mapping potential offtarget sites, the cut-off for alignment to the on-target spacer sequence was set at eight mismatches for 21 nucleotide spacers, nine mismatches for 22 nucleotide spacers, and ten mismatches for 23 nucleotide spacers. Off-target sites with potential DNA-or RNA-bulges 28 were identified by manual alignment.