In-planta Gene Targeting in Barley using Cas9, with and without Geminiviral Replicons

Advances in the use of RNA-guided Cas9-based genome editing in plants have been rapid over the last few years. A desirable application of genome editing is gene targeting (GT), as it allows a wide range of precise modifications, however this remains inefficient especially in key crop species. Here we describe successful, heritable gene targeting in barley using an in-planta strategy but fail to achieve the same using a wheat dwarf virus replicon to increase copy number of the repair template. Without the replicon, we were able to delete 150bp of the coding sequence of our target gene whilst simultaneously fusing in-frame mCherry in its place. Starting from 14 original transgenic plants, two plants appeared to have the required gene targeting event. From one of these T0 plants, three independent gene targeting events were identified, two of which were heritable. When the replicon was included, 39 T0 plants were produced and shown to have high copy numbers of the repair template. However, none of the 17 lines screened in T1 gave rise to significant or heritable gene targeting events despite screening twice the number of plants in T1 compared to the non-replicon strategy. Investigation indicated that high copy numbers of repair template created by the replicon approach cause false positive PCR results which are indistinguishable at the sequence level to true GT events in junction PCR screens widely used in GT studies. In the successful non-replicon approach, heritable gene targeting events were obtained in T1 and subsequently the T-DNA was found to be linked to the targeted locus. Thus, physical proximity of target and donor sites may be a factor in successful gene targeting.

Genome editing has exploded in recent years due to advances in programmable nucleases which allow a double stranded DNA break to be created at a predefined locus. First on the scene were Zinc-finger nucleases (Kim et al., 1996) followed by transcription activator-like effector nucleases (TALENS) (Christian et al., 2010) and more recently clustered regularly interspaced short palindromic repeats (CRISPR) systems, especially the Sp Cas9 (Jinek et al., 2012) which was the first CRISPR nuclease reported to function in plants (Feng et al., 2013;Li et al., 2013;Nekrasov et al., 2013;Shan et al., 2013;Xie and Yang, 2013). Although insertion of exogenously supplied DNA into plant genomes has been possible for many years via Agrobacteriummediated transformation or physical delivery, location was impossible to control precisely. Some success is reported inserting DNA in a precise manor by homologous recombination in rice without creating a double strand break (DSB) at the target site, although it was necessary to use a positive/negative selection system (Terada et al., 2002) which was later shown to produce no successful modifications in barley (Horvath et al., 2017). The value of creating a DSB at the target site to initiate DNA repair and facilitate insertion by homologous recombination was shown early on in plants with the non-programmable I-SceI meganuclease (Fauser et al., 2012), so it was a natural progression to repurpose Cas9 for precise insertional modifications. Many DSBs are repaired by non-homologous end joining (NHEJ) mechanisms which are error prone but shown to be capable of inserting an exogenously supplied DNA template at the break point in plants (Salomon and Puchta, 1998 Gene targeting (GT) can be defined as the introduction of a precise predefined modification into a plant genome, either an insertion, deletion or replacement via the introduction of a supplied repair template using homology dependent recombination (HDR) and usually a DSB at the target site. By making available a repair template containing the required modification flanked by sequence homologous to each side of the DSB, a precise change can be introduced into the genome. This change can be either small, for example a single amino acid conversion (Budhagatapalli et

5
Whilst GT is able to address both large and small precise modifications it is usually much harder to achieve than knock out and so researchers have sought ways in which rare events can be screened for easily and means to boost the frequency at which they occur. Early GT efforts in crops have focused on creating a precise change resulting in resistance to a herbicide or antibiotic which can then be used to select for resistant plants containing the desired GT event. ALS (acetolactate synthase) is a plant gene essential in the production of branched chain amino acids that is a target for inhibitors used as herbicides which has been extensively used in plants for GT experiments (Svitashev et  . Sometimes a visual marker has been used in the screen such as insertion of a 35s promoter upstream of ANT1 leading to a purple phenotype (Cermak et al., 2015), or restoration of gl1 leading to trichome production in Arabidopsis (Hahn et al., 2018). This approach however means that the modification is restricted to genes which allow such a selectable or visible phenotype, which many editing projects will not.

6
Many crop plants may only be transformed at efficiencies of a few percent or less, which, when combined with the low efficiency of GT makes regeneration of T0 gene targeted plants hugely labour intensive or just inconceivable. One way around this is to adopt an in-planta strategy whereby just a few primary transgenics containing the editing reagents are created, but the numbers required to retrieve the rare GT events are generated by the plants themselves through the normal process of flowering and seed production (Fauser et al., 2012;Schiml et al., 2014;Schiml et al., 2017). Each progeny plant may give rise to successful GT events, perhaps just as somatic sectors, but these can enter the germ line and prove to be heritable in subsequent generations. In this approach, all the editing reagents can be included on a single T-DNA with a selection cassette to allow transgenic production, a nuclease programmed to create a DSB at the target site and a repair template containing the desired modification with flanking sequence homologous to each side of the target site DSB. Recognition sequences for the nuclease can also be added to the ends of the repair template to allow cutting and its transfer to the target site (Schiml et al., 2014;Zhao et al., 2016). Here the screen can be based on the genotype rather than the phenotype, plants containing the required edits being detected by PCR for example. A widely adopted approach is to PCR screen using one primer within the modified region of the repair template and the second primer outside of the repair template in the sequence flanking the target site. In this way the PCR must cross the junction where the repair template stops, and the flanking genomic sequence begins.

7
It has been suggested that one major constraint on successful GT is the availability of repair template sequence at the correct time and in sufficient quantity for it to be incorporated as intended. In order to address this, Geminivirus replicons have been utilised ( (Budhagatapalli et al., 2015) and the second a stable modification of a non-functional hptII transgene to a functional form (Watanabe et al., 2016). The former was identified in 3 epidermal cells and the latter were one-sided GT events -one side of the repair was by HDR and the other by NHEJ.
Our aim was to achieve heritable Cas9 GT in barley which would modify a locus of interest that was not a transgene and not chemically or visibly selectable, thus we chose to create a partial deletion of a native barley gene of interest, simultaneously fusing an in-frame reporter to the remaining part. To keep the number of transgenics required to a minimum and to potentially make the approach suitable for genotypes more recalcitrant to transformation, we used an in-planta strategy and attempted to increase efficiency by incorporating the repair template within a Geminivirus replicon. We present efficiencies using strategies with and without inclusion of the replicon.

GT Construct Design
30 In our design strategy, high efficiency introduction of DSBs was considered important as the benefits of DSBs to GT have been previously reported (Fauser et al., 2012). Therefore, for our selected native barley target (HORVU4Hr1G061310), two protospacers were identified that gave good results and would allow for a strategy to delete around 150bp of the coding sequence of this single exon gene whilst simultaneously fusing in-frame mCherry in its place (figure 1.1a-c). Protospacer A was able to create indels, as detectable by Sanger sequencing of PCR amplicons covering the target site, in 9/18 (50%) of independent transgenic lines and protospacer B 16/18 (89%) of the same lines. To maximise the chance of success, we decided to incorporate both guides into our design as two DSBs at the target site might be better than one. In the repair template, homology to the target site was maximised by continuing the right and left homology arm sequences fully up to the Cas9 cuts sites i.e. 3bp from the native PAM. This allowed omission of the PAM on the left arm and the protospacer on the right arm of the repair template, preventing the Cas9 from cutting within it both before and after GT (figure 1.1b). Target  .2b) such that it would allow rolling circle replication of the repair template already present in construct A. Previously such replicons have often been shown functional in terms of replicative ability by using PCR to detect the circular replicating form of the linearly supplied unit. We chose to develop a qPCR copy number assay using amplicon/probe combinations in the repair template, hygromycin selection cassette (figure 1.2) and a single copy barley gene to enable quantification of replication in stable transgenic lines.
32 Construct C (figure 1.2c) was identical to B other than lacking the Cas9 and sgRNA cassettes, so able to amplify the repair template but unable to induce the site specific DSBs. This was to test the importance of targeted DSBs in GT which have been shown to be beneficial (Fauser et al., 2012) although not always essential (Terada et al., 2002).

Design of assay for detection of GT events
34 Construct D (figure 1.2d) was produced as a means of optimising the PCR screening strategy for GT detection, as high sensitivity and specificity would be vital due to the rarity of GT events and the expectation that we could be searching for somatic sectors which might represent a small proportion of the cells within leaf samples taken for analysis (Schiml et al., 2017). Construct D contains the repair template as found in constructs A, B and C however the homology arms have been extended for a few hundred nucleotides with the native HORVU4Hr1G061310 genomic sequence to include the binding sites for the F1, R2 and R3 primers (figure 1.2d). By creating a single copy transgenic line with construct D, as determined by qPCR assay, a more realistic scenario to derive template for optimisation was possible than by using plasmid alone. In order to allow distinction from true GT events, polymorphisms at the junctions of the extended flanks and the homology arms were introduced which would not be present in the predicted true GT events. Various PCR conditions were tried and the best (see methods) were found to work well with primer combinations F1/R1, F2/R2 and F1/R3. The most sensitive were found to be junction PCRs F1/R1 and F2/R2 which would identify GT events at either the left or right junction respectively. By serially diluting 30ng of construct D genomic DNA, considering the 5.3Gbp haploid barley genome and the average weight of 650 Daltons per base pair it was possible to calculate the number of template copies in each PCR reaction and thus determine the threshold sensitivity. This was found to be in the region of 40 copies for the F1/R1 primer pair (Supplementary file 1), so theoretically capable of identifying a somatic sector containing the same number of cells with a GT event. PCR with primers F1/R3, although covering the entire GT event over both left and right junctions was less sensitive, presumably due to the greater amplicon size and the competitive tendency of the smaller WT allele to amplify and dominate the products (see figure2 37 To check the identity and fidelity of these PCR products, F1/R1 and F2/R2 products were purified and Sanger sequenced for the lines 2158-9-1, 2158-14-1, 1826-5-2, 1826-8-1 and found to be identical and as expected for perfect GT events (Supplementary file 3). As expected, construct C lines (2291 prefix) also generated many copies of repair template, but unexpectedly also produced correctly sized PCR products with primers F1/R1 and F2/R2 which is shown in supplementary file 2. In fact, 8/16 (50%) of the 2291 lines gave both left and right junction PCRs of the size indicative of a GT event and furthermore when purified and sequenced gave exactly the same sequence as seen with the 1826 and 2158 lines. Looking at the relation between mCherry copy number and the presence/absence of F1/R1 and F2/R2 PCR products (supplementary file 2) it was apparent that high numbers of repair template and PCR success were linked. Whilst this could mean that increasing the number of repair template copies was causing GT it could also indicate that false PCR positives were being triggered by the high number of repair templates produced by the replicon.
38 To test this latter idea plasmid DNA containing the repair template was mixed with wild type Golden Promise DNA (where GT could not have occurred) and F2/R2 PCR performed. Initially 30ng of barley DNA (as used in all other screening PCRs described) was mixed with around 7.72 X 10 9 copies of repair template and this resulted in the production of the 1047bp F2/R2 band. This plasmid was then titrated against the 30ng wild type barley DNA (representing 5240 target site copies) to determine the minimum number of repair template copies per target site necessary to trigger the false positive when 30ng of barley DNA was used as template. This is shown in file S4 and was found to be in the region of 700 copies per target site, based on the 5.3Gbp genome size and the average weight of a single base pair to be 650 Daltons. This result can be related to the qPCR copy number determinations for mCherry (repair template) in the replicon lines where the numbers in supplementary file 2 relate to copies per haploid genome or in other words per target site (there is one copy of HORVU4Hr1G061310 per haploid genome). Looking at supplementary file 2 it is evident that F1/R1 and F2/R2 products begin to appear in 2158 and 2291 lines at around 600 or 700 copies of mCherry per genome/target site, meaning it is likely that many of the PCR bands produced in replicon lines are false positives. This was further confirmed by sequencing a band from the plasmid titration test (file S4) in the lane labelled 736641 which proved identical in sequence to the F2/R2 bands obtained for the 2158, 2291 and 1826 lines. Presumably by increasing the number of repair template copies with the replicon we had inadvertently also increased the likelihood of partial primer extension from within the repair template. For example, R1 could in one cycle of PCR be partially extended from within mCherry to somewhere in the left homology arm. After denaturation, the partially extended product would be free to anneal at its 5' end with the homologous site in the target region (template switching) where it could then be extended beyond the position of the F1 primer binding site. F1 could then prime against this site and extend to produce double stranded DNA of sequence identical to the predicted GT event and allow exponential amplification and production of the false positive.
39 The 1826 lines all had relatively low copy numbers of repair template (highest was 2), way below 600 per target site and so our testing indicated that the lines 1826-8-1 and 1826-5-2 would be true positives. Of course, the false positives in the replicon lines could be masking true positives in the background, so the 39 individual 2158 lines were subject to F1/R3 PCR along with the 14 individual 1826 lines, however none produced a band (data not shown) although this was unsurprising due to the low sensitivity of this large amplicon PCR. Accordingly lines 1826-8-1 and 1826-5-2 were sown out for T1 screening due to being likely true positive GT lines, whilst 17 F1/R1 & F2/R2 positive 2158 lines were selected for T1 screening based on the assumption that some true positive GT events may be masked by false positives created via replicon amplification.

Analysis of the T1 generation and beyond
41 Because of the false positive PCR issue, and to detect GT events and somatic sectors of significant size likely to become heritable, it was decided to screen T1 plants with the less sensitive F1/R3 primer pair. For each of the 17 selected T0 2158 lines approximately 70 siblings were sown out, giving a total of around 1200 from which no F1/R3 positives were identified. T0 line 1826-5-2 produced 228 seeds and all were sown and screened producing no F1/R3 positive band. T0 line 1826-8-1 was however more productive and yielded 467 seeds and from these, 3 T1 plants produced a band of 2.2Kb indicative of the sought-after GT event as well as a second band of 1.6Kb corresponding to the wild type allele.

Discussion
48 Figure 3 summarises the key findings described above for all plants analysed. Heritable GT was confined to line 1826-8-1 with the event in 1826-8-1_A occurring either in T0 or very early T1 and the 1826-8-1_C events occurring in T1 or T2. Additionally, a significant event leading to detection with the low sensitivity primer pair F1/R3 was recovered in 1826-8-1_B but lost by T2 so must have occurred in T1. This shows that the 1826-8-1 family tree had diverged before the origin of these independent GT events and so for some reason the line 1826-8-1 was relatively prolific in terms of GT. A comparable line 1826-5-2 showed somatic GT in T0 but did not go on to result in subsequent heritable GT. This may be related to the T-DNA containing the repair template being linked to the target site in 1826-8-1 but not in 1826-5-2. It was previously reported that if the repair template and target site were present on the same chromosome then GT was around twice as frequent as when they were on different chromosomes (Fauser et al., 2012). Successful GT in line 1826-8-1 also makes sense in light of evidence that DNA repair by HDR using a sister chromatid template is common in barley (Vu et al., 2014). Being on the same chromosome is likely to impact on the physical proximity of target and donor site. It was recently reported in rice that using a Cas9-VirD2 fusion to direct the repair template to the target site had a beneficial effect on GT (Ali et al., 2020). It is also reported that the zygosity of the repair template has a similar impact (Puchta et al., 1995), where a homozygous transgene was 50% more likely to lead to intrachromosomal HR based gene repair than if hemizygous. In line with this, all three 1826-8-1 T1 siblings of interest were homozygous for the T-DNA whilst the overall T1 T-DNA inheritance in this line showed 3:1 segregation. HORVU4Hr1G061310 fusion created was functional as screening all 19 1826-8-1_A F1 plants produced showed that they still contained the T-DNA based repair template (data not shown). Similarly, 6 GT positive T3 plants from each of 1826-8-1_C1, 1826-8-1_C2 and 1826-8-1_C3 all contained the T-DNA based repair template (data not shown). This repair template contains the promoter region, mCherry and much of the HORVU4Hr1G061310 CDS so may well have given a fluorescent signal despite not being integrated at the target locus by GT. This would not allow distinction of a signal arising from GT at the target locus and a signal from the repair template still located in the T-DNA.
50 2158 lines had very high copy numbers of the repair template although none of the 17 T0 lines screened in T1 gave rise to significant or heritable F1/R3 detectable GT events despite screening twice the number of plants in T1 for the 2158 lines compared to 1826 lines. 2158 repair template amplification was detected in T0 and again in T1 (supplementary file 7) where replication co-segregated with the T-DNA. Despite there being no issue affecting heritability of a replicationally functional T-DNA, we have no data to show whether the repair template copy number was increased in the cells giving rise to the germ line. With this in mind, it could be possible that the replicon had a positive effect on GT in somatic cells not giving rise to the germ line. However titration of repair template plasmid against wild type Golden Promise DNA in vitro indicated that the GT activity detected in T0 2158 lines was potentially a PCR artefact as junction PCR bands begin to appear at around 700 copies of repair template per target site, which is very close to the ratio seen in-planta with the replicon where the junction PCR began yielding product. Future GT experiments utilising high copy numbers of repair template should be aware that such an approach is liable to produce false positive PCR results and would benefit from strategies to prevent them. One way to do this may be to reduce the length of homology arms to a minimum, thus reducing the size of the region in which partial primer extension may occur before template switching during PCR. Whilst reducing the length of homology arms may result in a decrease in overall GT efficiency, relatively short homology arms of 196bp and 74bp have been shown to function in rice (Li et al., 2020). Another way to reduce false positive junction PCR may be to simply increase the size of the amplicon by moving the primer in the flanking non-repair template region further out, which may in turn reduce the chances of a partially extended product being fully extended after template switching. From our repair template titration experiments it would clearly be a good idea to test the potential for false positive junction PCR by mixing WT genomic DNA and repair template in vitro before PCR.
51 It may be that in our experiments, GT was occurring in replicon lines in the T0 generation but was indistinguishable from the false positive junction PCRs being triggered and was perhaps at a low mosaic density, or not in pre-germinal tissue, and so insufficient to allow its inheritance into T1. Although we screened a greater number of T1 progeny (1200>695) from a greater number of T0 parents (17>2) for the replicon (2158) compared to the non-replicon (1826) lines, we cannot be sure that this is a valid replicon/non-replicon comparison as indel formation at target sites may have been unequal for some reason. Using Cas9 has the potential disadvantage that indel formation is likely to mutate the "seed region" of the target site such that further DSBs are not possible as the relevant guide no longer matches the site. We know that both guide A and B were able to induce indels in 50% and 90% respectively of T0 lines as detectable by PCR and direct Sanger sequencing, which is a relatively insensitive assay and may well indicate that many target sites would no longer be available for GT. As we only had one non-replicon (1826) T0 transgenic that yielded heritable GT events, a larger number of primary transgenics and GT events would need to be investigated in order to make a replicon/non-replicon comparison. Additionally, the target sites in T0,T1,T2, etc. lines created could be sequenced to gain more insight into the remaining availability of WT target sites. 52 Recently it has been shown in Arabidopsis that timing the occurrence of DSBs to the egg cell greatly increases GT efficiency (Miki et al., 2018;Wolter et al., 2018). Similarly, by using Cas12a instead of Cas9 GT efficiency was increased (Wolter and Puchta, 2019). Two features here address the potential lack in availability of WT target sites that may be shutting down DSB formation in our experiment. Firstly, restricting DSBs to egg cells would mean each female gamete has the potential for DSBs to occur and in turn undergo GT, rather than a reduced or non-existent fraction resulting from indels formed earlier during development under ubiquitous Cas9 expression. Secondly, Cas12a cuts outside of its seed region and would be expected to resist a certain amount of indel formation and may therefore keep creating DSBs for an increased length of time compared to Cas9, giving more potential for GT to occur. It will be interesting to see if the benefits to GT of egg cell specific Cas12a can be translated to crops.
53 A previous report of in-planta GT in Arabidopsis (Hahn et al., 2018) found no beneficial effect from including the repair template within a replicon, whilst a single copy repair template (similar to our construct A) gave rise to inheritable GT. However, this study investigated the progeny of just three primary transformant lines per DNA construct and may also suffer from indels shutting down target sites. In tomato, bean yellow dwarf virus-based replicons have been shown to result in heritable GT events (Cermak et  54 Our work in barley has extended what has previously been shown in this species as we created the first heritable true GT events at a native locus. However, we were unable to segregate away the editing reagents on the T-DNA, possibly due to an inadvertent selection for linkage. Whilst it may be possible to separate the two loci by searching for meiotic recombinants this probably represents an unreasonable amount of work. Increasing the number of heritable GT events detected will probably allow the isolation of unlinked versions which would in turn be easier if GT efficiency was boosted in other ways, such as egg cell Cas12a expression. Additionally, a pooling strategy may enable more plants to be screened which should increase the numbers of GT events recovered.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author Contributions
TL designed the experiments, conducted molecular work and analysis and prepared the manuscript. AH carried out barley transformations. MC carried out the crossing programme. YM assisted with molecular analysis and crossing. WH contributed to the study design and manuscript preparation. All authors read and approved the final manuscript.    Guide efficiency is shown as % of lines in which indels detected. Complete protospacer/PAM sequences are absent from A and C to prevent cleavage by Cas9. A successful event leads to a partial deletion of the HORVU4Hr1G06131 CDS with the remainder being fused in-frame to mCherry. Forward (F) and reverse (R) screening primers are indicated as black horizontal arrows.  Constructs for plant transformation used in this study: RB right border, LB left border, HygR hygromycin resistance cassette,Cas9 Hs codon optimised SpCas9 driven by ZmUbiquiutin promoter, sgRNA guide RNA expression cassette for two guides (A & B), Left arm left homology arm, mCherry mCherry reporter CDS, Right arm right homology arm, LIR long intergenic region, SIR short intergenic region, REP replicase proteins, qPCR probe used for copy number determination. Thick black vertical bars indicate target sites for the Cas9/guides. Construct A is the basic GT version, construct B is the same except the repair template is contained between replicon sequences. Construct C is the same as B but lacks the Cas9/gRNA and ability to introduce DSBs. Construct D has no Cas9/gRNA or replicon, but has the repair template with extended homology arms. It was transformed into barley and used to establish a sensitive PCR assay. F1, R1, F2, R2, R3 are PCR primers used for screening. Gel showing segregation of GT event in 1826-8-1_A T2. Bands are the products of F1/R3 primers. + indicates GT allele, -indicates WT allele.