Longitudinal tracking reveals sustained polyclonal repopulation of human-HSPC in humanized mice despite vector integration bias

Clonal repopulation of human hemopoietic stem and progenitor cells (HSPC) in humanized mouse models remains only partially understood due to the lack of a quantitative clonal tracking technique for low sample volumes. Here, we present a low-volume vector integration site sequencing (LoVIS-Seq) assay that requires a mere 25μl mouse blood for quantitative clonal tracking of HSPC. Using LoVIS-Seq, we longitudinally tracked 897 VIS clones—providing a first-ever demonstration of clonal dynamics of both therapeutic and control vector-modified human cell populations simultaneously repopulating in humanized mice. Polyclonal repopulation of human cells became stable at 19 weeks post-transplant indicating faster clonal repopulation than observed in humans. Multi-omics data of human fetal liver HSPC revealed that in vivo repopulating clones have significant vector integration bias for H3K36me3-enriched regions. Despite this bias the repopulation remains normal, underscoring the safety of gene therapy vectors. LoVIS-Seq provides an efficient tool for exploring gene therapy and stem cell biology in small-animal models.

change in HSC differentiation 7 . However, generating a barcode library for every therapeutic test vector is 76 both cost-prohibitive and impractical; additionally, low DNA availability, lack of a universal barcode 77 counting method, and small barcode library size limit the accuracy of barcoding techniques. Finally, these 78 techniques lack the ability to identify genomic location of vector integration in host genomes. 79 In each transduced HSPC, the vector randomly integrates into the host genome, creating a unique vector-host 80 DNA junction sequence or VIS clone. A high-throughput integration sites (IS) sequencing assay can 81 simultaneously identify and track multiple VIS as well as detect probable mutagenic insertions. A 82 quantitative high-throughput VIS assay revealed that of all HSPC transplanted in rhesus macaques, ~0.01% 83 are long-term HSC and start contributing >1. cytotoxicity. An in vitro study using activated human CD34+ HSC also found lentiviral vector integration 106 preference for active genes 23 . These in vitro studies have tracked impact of vector integration on cell fate 107 over short time however long-term impact is only partially explored. Moreover, in human HSPC-more 108 specifically human fetal HSPC (FL-HSPC)-vector integration preference for epigenetic features is 109 unexplored. Importantly, influence of VIS-proximal epigenetic features on in vivo survival, proliferation, 110 and differentiation of gene-modified HSPC is unknown. 111 In this study, we present LoVIS-Seq, a combined MDA and VIS assay for low-volume samples. Using this 112 assay, we longitudinally track hundreds of clones in two different gene-modified cell populations 113 simultaneously repopulating in hu-BLT mice. This polyclonal repopulation resembles typical after-transplant 114 HSPC expansion in macaques and humans. In FL-HSPC, we found that vector integration in vivo detected 115 clones is biased for actively transcribed regions. Our method provides an efficient tool to study clonal 116 repopulation in murine and humanized-mouse models used for stem cell and gene therapy research. 117 118

Results: 119
Minimum 10,000 bone marrow cells or 25µl blood is sufficient for LoVIS-Seq: 120 To test our new assay, we collected bone marrow (BM) cells from hu-BLT (bone marrow-liver-thymus) 121 mouse (m860). Fetal liver CD34+ cells transduced with sh1005(anti-CCR5 shRNA)-EGFP vector or control 122 mCherry vector were mixed in equal ratio and transplanted in the mouse (Figure 1a-b, details in Methods). 123 We estimated clonal composition of the EGFP-WT (WT LTR-index) and mCherry-H1 (H1 LTR-index) 124 populations using unamplified bulk DNA, in triplicate, as described previously 8 . A total of 300 ±42 SD (216 125 ±22 SD mCherry-H1 VIS and 84 ±20 SD EGFP-WT) VIS were detected. The polyclonal profile in mouse 126 bone marrow (Figure 1c) resembled that found in hu-BLT mice 8 and in autologously transplanted mice 6 , 127 nonhuman primates 5,24 , and humans 4,25 . Next, to test our LoVIS-Seq assay (Figure 1d) that combines MDA 128 with VIS assays, we first performed MDA directly on 81,000, 27,000, 9,000, 3,000, and 1,000 bone marrow 129 cells, each in duplicate (Supplementary figure 1A, details in Methods section). Equal amounts of MDA-130 amplified DNA and unamplified bulk DNA were used for the VIS assay (Supplementary table 1). 131 We found high reproducibility of clonal profiles in different MDA samples and within-MDA replicates of 132 81,000 to 9,000 cells (avg. Pearson's r value >0.91) (Figure 1e and Supplementary figure 1B-F); for less than 133 9,000 cells, the reproducibility dropped (avg. Pearson's r=0.87 for 3,000 and 0.73 for 1,000 cells). 134 Importantly, reduced cell numbers caused a modest reduction in VIS detection (Supplementary figure 1G). 135 These data validate that MDA-amplified DNA from >10,000 bone marrow cells is sufficient for LoVIS-Seq. 136 Next, to test accuracy of LoVIS-Seq with hu-BLT mouse blood, we collected 100µl blood at week 13, 15, 17 137 and ~1ml of whole blood at week 19 post-transplant. The hu-BLT mice were transplanted with an equal mix 138 of human CD34+ cells transduced with anti-HIV EGFP-WT vector and control mCherry-H5 vector ( Figure  139 1f). Cells from 50µl blood were used for flow cytometry and the remaining cells were used for MDA 140 duplicates; each 25µl of blood (>10,000 human cells). High correlation (median Pearson's r =0.93) of 141 mCherry-H5 and EGFP-WT VIS clonal frequencies between unamplified and MDA-amplified DNA from 142 blood cells (Figure 1g) suggests that the clonal profile of entire mouse blood can be captured with 25µl of 143 blood. Importantly, the MDA replicates also showed high reproducibility (median Pearson's r>0.95, 144 Supplementary figure 3A). In conclusion, our LoVIS-Seq assay accurately captured the clonality of two 145 vector-modified cell populations in hu-BLT mouse blood using mere 25µl of blood or as few as 10,000 cells. 146 Clonal sharing between organs reveals normal repopulation and unique tissue distribution pattern: 183

LoVIS-Seq for simultaneous clonal tracking of therapeutic vector-modified and control vector-modified
Our VIS data show normal clonal repopulation in blood of hu-BLT mice; however, in nonhuman primates, 184 the early post-transplant clonal expansion patterns differ between blood and organs 29 . We performed VIS 185 analysis on bulk cells from bone marrow (BM) and spleens of our hu-BLT mice to investigate whether the 186 tissue/organ clonal expansion pattern differed from blood. We found a very similar clonal expansion pattern 187 (avg. Pearson's r =0.94) between blood and spleen ( Figure 3); however, clonal expansion in bone marrow 188 differed from blood and spleen. Interestingly, we observed that in all three tissue compartments, persistent 189 clones contributed the most to repopulation these results show normal clonal repopulation among the three 190 tissue compartments with substantial clonal sharing. 191

Influence of genomic location and proximal genes on clonal growth: 192
For each VIS, our assay provided both relative frequency and genomic location of integration allowing us to 193 monitor abnormal growth arising due to mutagenic insertions. Similar to the HIV-1 integration pattern 17 , our 194 VIS data from in vivo repopulating clones showed preference for high gene density chromosomes 195 Our data show VIS preference for genic regions, other studies using cell line and primary cells have reported 213 similar vector integration bias for active genes with low to moderate expression 18 . However, the 214 transcriptional state and expression level of the VIS-proximal genes is unknown in human FL-HSPC prior to 215 vector integration. To address this, we analyzed transcriptomic (RNA-seq) and functional genomic (ATAC-216 seq and ChIP-seq) data from uncultured human FL-HSPC 30 isolated and processed using protocol identical 217 to one used in our study (see Methods). Owing to the direct biological relevance of human FL-HSPC to our 218 humanized BLT mice models, the multi-omics data is well suited to investigate the impact of vector 219 integration on stemness of vector-modified HSPC. 220 The gene expression (RNA-seq) data show that of all detected clones, including the persistent clones and top 221 10 VIS clones, >77% VIS are within ±1Kb of transcriptionally active genes (FPKM>1) (Figure 4b outer  222 donut chart). This is significantly higher than the ~27% of random IS proximal to active genes (p< 0.001, 223

HSPC. 300
A recent study demonstrated use of CRISPR/Cas to introduce barcodes in the long-term HSPC and 301 longitudinally tracked a very limited number of HSPC clones 34 . Comparatively, using LoVIS-Seq we have 302 tracked ~10 times more HSPC clones per animal with high accuracy and reproducibility. It is pertinent to 303 note that to enable insertion of barcoded donor DNA into the host genome, HSPC need to undergo in vitro 304 preconditioning and incubation before transplantation. The double stranded breaks introduced by 305 CRISPR/Cas activate DNA damage responses causing significant delays in HSPC proliferation and affects 306 their in vivo repopulation 45 . Additionally, off-target gene-editing by CRISPR/Cas remains a concern. In 307 contrast, LoVIS-Seq does not require preconditioning of HSPC and provides a ready to use high-throughput 308 clonal tracking assay for small-animal models. Furthermore, LoVIS-Seq has wider applicability owing to its 309 adaptability to many lentiviral vectors commonly used to insert transgenes or reporter gene such as GFP.  Supplementary Tables 1 and 2). 380 DNA samples were subject to extension PCR using LTR specific biotinylated primers 381 /5BiotinTEG/CTGGCTAACTAGGGAACCCACT 3' and /5BiotinTEG/CAGATCTGAGCCTGGGAGCTC 382 3'. The extension PCR product was then digested using CviQI and RsaI restriction enzymes and biotin 383 primer bound DNAs isolated using streptavidin-agarose Dynabeads using magnetic separator as per 384 manufactures instructions. The vector-host junctions capture on streptavidin beads were processed for linker-385 mediated PCR (LM-PCR) methods as described previously 47,48 . The linker ligated vector-host junction DNA 386 was subjected to two step PCR. First step amplification was done using primer 5' 387 CTGGCTAACTAGGGAACCCACT 3' and first linker primer GTGTCACACCTGGAGATAT. We 388 removed the internal vector sequence by restriction enzyme (SfoI) digestion. The digested product of first 389 PCR was then amplified using primer 5'ACTCTGGTAACTAGAGATCC 3' and second linker primer 5' 390 GGAGATATGATGCGGGATC 3'. Since the LTR index sequence is included in the vector-host junction 391 the we obtain unbiased amplification all the H1, H5 and/or WT VIS sequences. Lentiviral vectors used in 392 this study as derived from FG12-mCherry lentiviral vector 46 and all the primers are designed accordingly. A 393 detailed protocol for VIS assay is provided in supplementary text. The amplicon libraries prepared using 394 custom made Illumina sequencing primers for Illumina MiSeq (m860 samples) or iSeq100 (m599, m599, 395 and m591 samples) sequencer. Sequences with a virus-host junction with the 3' end LTR, including both the 396 3'-end U5 LTR DNA and ³ 25 base host DNA (with ³ 95% homology to the human genome), were 397 considered true VIS read-outs. The sequence mapping and counting method was performed as described 398 previously 8 . In brief, sequences that matched the 3'end LTR sequence joined to genomic DNA as well as 399 LTR-indexes (H1, H5 or WT) were identified using a modified version of SSW library in C++ 49 . Reads 400 were classified as H1, H5, or WT VIS based on the LTR barcodes used in the experiment. VIS sequences 401 were mapped onto the human genome (Version hg38 downloaded from https://genome.ucsc.edu/ ) using 402 Burrows-Wheeler Aligner (BWA) software. Mapped genomic regions were then used as reference and VIS 403 reads were remapped using BLAST to further remove poorly mapped reads to get an accurate estimate of 404 sequence count. Final VIS counting was done after correcting for VIS collision events and signal crossover 405 as described previously. VIS with a final sequence count less than the total number of samples analyzed per 406 animal were removed. VIS clones with maximum frequency values below 1 st quartile were classified as "low 407 frequency", clones with maximum frequency value above 3 rd quartile were classified as "high frequency", 408 and clones with maximum frequency between the 1 st and 3 rd quartiles were designated "medium frequency". 409 VIS clones that were detected with frequency >0 at every week from 13-19 are termed "persistent clones". 410 The 10 high frequency VIS clones at each timepoint were selected as top 10 VIS. All the VIS data and list of 411 VIS-proximal genes is provided in supplementary file. 412

Random integration sites 413
Random integration sites were generated in silico using a custom python script. To mimic our VIS assay, we 414 randomly selected 1000 integration sites that were within ±1500bp of the nearest CviQI/RsaI (GTAC) site in 415 the human genome (hg38). 416

Clonal diversity analysis 417
For diversity analysis, we used Rényi's diversity/entropy 26 of order defined as follows 418 = 2 are the Shannon and 1/Simpson indexes, respectively. We calculated Renyi's diversity using the R 432 package BiodiversityR (https://cran.r-project.org/web/packages/BiodiversityR/index.html). For the above 433 analysis, we used raw sequence counts from two replicates without distinguishing between mCherry-H5 VIS 434 and EGFP-WT VIS. 435

RNA-seq data analysis 436
Raw sequence data of uncultured FL-HSPC (in triplicate) was pre-processed for quality using Fastqc. 437 Trimmomatic was used to remove adaptors and for quality trimming. After this, reads were aligned onto 438 human genome hg38 using RNA STAR aligner 50 . SAMtools was used to remove reads with low mapping 439 scores (< 20)

ATAC-seq analysis 443
Raw sequence data of uncultured FL-HSPC (in triplicate) was pre-processed for quality using Fastqc. 444 Adaptor removal and quality trimming was done using Trimmomatic. After this, reads were mapped onto 445 remove reads with low mapping (<20) scores, blacklisted regions 52 , and to generate BAM files. Picard tool 447 kit was used to remove duplicate reads. We used Genrich, a paired end peak caller, to identify ATAC peaks. 448 Software deepTools 53 was used to generate coverage (.bw) files and for visualization of open DNA in genes 449 and VIS-proximal regions. 450

ChIP-seq data analysis 451
Raw sequence data of uncultured FL-HSPC for histones, RNAploII, and input were pre-processed for quality 452 using Fastqc. Trimmomatic was used to remove adaptors and for quality trimming. After this, reads were 453 mapped onto human genome hg38 using bowtie2 with parameter --local. SAMtools was used to remove 454 reads with low mapping (<20) scores, blacklisted regions 52 , and to convert SAM to BAM format. Picard tool 455 kit was used to remove duplicates. MACS2 tool was used to call peaks for all histone marks and RNApolII 456 using input sample as control. Software deepTools 53 was used to generate coverage .bw files and for 457 visualization of histone/RNApolII in genes and VIS proximal regions. 458