Mechanisms regulating interindividual epigenetic variability at transposable elements

Transposable elements (TEs) are mobile genetic elements that make up a large fraction of mammalian genomes. While select TEs have been co-opted in host genomes to have function, the vast majority of these elements are epigenetically silenced in somatic cells. However, some TEs in the mouse genome, including the Intracisternal A-particle (IAP) subfamily of retrotransposons, have been shown to display interindividual variation in DNA methylation. While recent work has identified that IAP sequence differences and strain-specific KRAB zinc finger proteins (KZFPs) may influence the methylation state of these IAPs, the mechanisms underlying the establishment and maintenance of interindividual variability in DNA methylation still remain unclear. Here we report that sequence divergence and genomic context influence the likelihood that IAPs become variably methylated. IAPs that have diverged from the consensus IAP sequence have altered KZFP recruitment that can lead to decreased KAP1 recruitment when in a euchromatic environment. These variably methylated loci have a high CpG density and are bound by ZF-CxxC proteins, providing a potential mechanism to maintain this permissive chromatin environment and protect from DNA methylation. These observations indicate that variably methylated IAPs escape silencing through both attenuation of KZFP binding and recognition by CxxC-containing proteins to maintain a hypomethylated state.


Background
Sequences derived from transposable elements (TEs) make up a large fraction of mammalian genomes (Bourque et al., 2018). TEs are silenced by both histone modifications and DNA methylation to prevent these elements from having any deleterious impact on the host genome.
These loci have been reported to be sensitive to environmental stimuli such as diet and exposure to toxins (Dolinoy et al., 2007b;Morgan et al., 1999;Rosenfeld et al., 2013;Waterland et al., 2007;Waterland and Jirtle, 2003). The most studied metastable epiallele is found in the Agouti viable yellow mouse, where genetically identical mice display phenotypic diversity that manifests as a range of coat colors and covariant susceptibility to obesity (Bernal et al., 2011;Rakyan et al., 2002). This variable coat color has been linked to the methylation state of an Intracisternal Aparticle (IAP) retrotransposon (Dolinoy et al., 2010). When hypomethylated, this IAP acts as a novel promoter element that drives aberrant transcription of the agouti gene across all tissues, and subsequently, yellow coat color, hyperphagia and hyperinsulinemia. Elucidating the mechanisms underlying the establishment and maintenance of these variably methylated TEs is important to further our understanding of non-Mendelian phenotypic variability and epigenetic plasticity (Ecker et al., 2018(Ecker et al., , 2017Tejedor and Fraga, 2017).
Recent work profiling interindividual epigenetic variability in mice identified roughly 50 loci that are variably methylated in a manner similar to the IAP that drives the agouti mouse phenotype . They identified putative metastable epialleles as TEs that displayed 25% methylation variation between the second-highest and second-lowest average methylation level between biological replicates using the WGBS-seq datasets from purified immune cells in mice . These metastable epialleles were later confirmed by targeted profiling in multiple tissues, indicating that they are true metastable epialleles . The majority of these variably methylated loci are IAP retrotransposons Faulk et al., 2013;Kazachenka et al., 2018). IAPs are silenced in a sequence-specific manner by KRAB-domain containing zinc finger proteins (KZFPs) (Coluccio et al., 2018). KZFPs recruit KAP1, also known as TRIM28, which, in turn, recruits repressive protein complexes to deposit H3K9me3 and DNA methylation at these loci (Bulut-Karslioglu et al., 2014;Ecco et al., 2017;Schultz et al., 2002;Turelli et al., 2014). These KZFPs evolve in clusters through imperfect gene duplication and spontaneous mutations that can allow for the recognition of novel TEs Kauzlaric et al., 2017). Over time, TEs can mutate and escape from KZFP targeted silencing Kauzlaric et al., 2017). It has been proposed that these KZFPs are engaged in an evolutionary arms race with TEs to maintain their silencing (Jacobs et al., 2014).
IAPs have an unusually high CpG content that is more similar to CpG islands (CGIs) than to the rest of the mouse genome Kazachenka et al., 2018). CGIs are generally hypomethylated and bound by ZF-CxxC proteins, which help maintain a permissive chromatin environment (Clouaire et al., 2012;Gu et al., 2018;Long et al., 2013;van de Lagemaat et al., 2018). Notable CxxC domain containing proteins include CFP1, which is associated with protein complexes that deposit H3K4me3, as well as TET1/TET3, which are responsible for the active removal of methylation from CpG dinucleotides to maintain a hypomethylated state (Gu et al., 2018;Long et al., 2013). Mammalian genomes in general have an underrepresentation of CpG dinucleotides as a result of deamination at methylated cytosines over evolutionary time (Duncan and Miller, 1980;Lewis et al., 2016;Shen et al., 1994). The "richness" of CpG dinucleotides at IAPs is potentially due to the fact these elements are recently evolved TEs that have yet to undergo 'evolutionary' deamination at methylated CpG dinucleotides. It has been shown that recently evolved CpG dense TEs have the potential to be hypomethylated when there is no targeted suppression of these elements (Jacobs et al., 2014;Long et al., 2016;Ward et al., 2013).
Based on this evidence, we postulated that variably methylated CpG dense IAPs can have sequence variants that allow for escape from KZFP-mediated silencing (Bertozzi et al., 2021;Faulk et al., 2013;Kazachenka et al., 2018) and subsequently be bound by ZF-CxxC proteins, thereby preventing stable repression. Focusing on the IAPLTR1_Mm subfamily, we performed multiple sequence alignment and hierarchical clustering of these elements and confirmed that the VM-IAPs have a unique sequence that is distinct from other IAPLTRs. These 'divergent' IAPLTRs have altered KZFP binding and diminished KAP1 recruitment. Relative to silenced IAPs -that are found in gene poor regions or proximal to tissue specific transcriptswe find that VM-IAPs are over-represented proximal to constitutively expressed genes. Importantly, we find that CpG dense TEs (regardless of subfamily) have increased recruitment of ZF-CxxC proteins such as CFP1 and TET1 in the absence of KZFP-mediated silencing. This work indicates that it is binding-site competition between repressive (KZFP/KAP1) and protective (ZF-CxxC) factors that leads to the establishment of variable methylation at TEs.

IAPLTR sequence influences KZFP recruitment and VM-IAPs establishment
Recent work has identified novel VM loci, which are largely IAPLTR1_Mms (or IAPLTR1s) and IAPLTR2_Mms (or IAPLTR2s), across tissues in C57BL/6J mice . As KZFPs are responsible for silencing IAP elements in a sequence-specific manner (Coluccio et al., 2018), we examined if the previously identified VM-IAPs have shared sequence variants that could allow for escape from KZFP-mediated silencing. We performed multiple sequence alignment and hierarchical clustering of all IAPLTR1s and IAPLTR2s greater than 300 bps long using the ete3 pipeline (Huerta-Cepas et al., 2016). This profiling resulted in the characterization of four sequence 'clades' for IAPLTR1s ( Figure 1A and Figure Faulk et al., 2013;Kazachenka et al., 2018), these results demonstrate that sequence can be a contributing factor in the establishment of VM-IAPs.
To determine whether these identified IAPLTR clades have altered KZFP recruitment, we used published KZFP ChIP-seq libraries (31) to survey the occupancy of KZFPs at IAPLTRs. We identified that IAPLTR1 elements that belong to clade 1 have evidence of binding by ZFP989 and Gm21082, while IAPLTR1 elements that belong to clade 2 have evidence of binding by ZFP429 ( Figure 1A). However, IAPLTR1 clade 3 and clade 4 elements do not show evidence of binding by these KZFPs and have decreased KAP1 binding ( Figure 1A

Chromatin environment can influence establishment of a VM-IAP
While it is clear that sequence is a factor in the establishment of variable methylation, not all IAPLTRs with sequences identical to VM-IAPs are variably methylated ( Figure 1A), as has been reported previously (Kazachenka et al., 2018). We therefore examined the potential influence of genomic context, such as being in a euchromatic environment, on the establishment of variable methylation. We used proximity of the IAP to constitutively expressed genes (FPKM > 5 in all tissues profiled (Li et al., 2017)) and ENCODE annotated enhancer elements (ENCODE Project Consortium et al., 2020) as our proxy for a euchromatic environment. We found that 83% of the VM-IAPLTR1s were within 50kb of a constitutively expressed gene, while the remaining 17% of VM-IAPLTR1s were less than 1kb from an annotated enhancer element. In contrast, only 12% of the non-VM IAPs were proximal to a constitutively expressed gene and 7% were proximal to an enhancer ( Figure 1C). This indicates that while sequence allows these elements to have the potential to become variably methylated, VM-IAPs must be in a permissive chromatin environment to escape being silenced, as heterochromatin spreading from neighboring loci could silence these elements.

IAPEz-int sequence variants influence KZFP recruitment
As previously reported, 93% of VM-IAPLTR1s are full length IAPs, while 83% of the VM- As the majority of IAPLTR2s exist as solo LTRs this indicates that these elements can be repressed through binding of KZFPs to the IAP internal sequence. The 5' end of IAP internal sequence IAPEz-int, which is flanked by IAPLTR1s and IAPLTR2s, has been shown to be bound by the KZFP Gm14419 (Wolf et al., 2020). In order to investigate the potential of sequence variants in the IAP internal element influencing variable methylation, we performed multiple sequence alignments and hierarchical clustering of the first 150 bps of IAPEz-ints that are flanked by IAPLTR1 or IAPLTR2 elements. We identified three sequence variants and observed that IAPEz-int clade3 elements have the weakest Gm14419 binding signal (

Conservation of IAPLTR1 elements between mouse strains
IAP elements are a potential source of novelty between mouse strains and VM-IAPs in particular have been shown to be highly polymorphic (Gagnier et al., 2019;Kazachenka et al., 2018;Nellåker et al., 2012;Rebollo et al., 2020). In order to investigate the relationship between IAP sequence, potential for variable methylation and conservation across strains, we leveraged structural variant differences between mouse strains identified by the Sanger mouse genome project (Keane et al., 2011). We found that clade 3-4 IAPLTR1s, which have the lowest KAP1 binding of the IAPLTR1 clades ( Figure 1B), are the least conserved across mouse strains ( Figure   2C). We furthermore found that clade1 IAPLTR1 elements containing the clade 3 IAPEz-int elements are more strain specific than those elements with the clade 2 IAPEz-ints ( Figure 2C).

IAPs that have loss of KZFP binding have recruitment of the ZF-CxxC containing proteins TET1 and CFP1.
It has previously been reported that IAPLTR1s and IAPLTR2s have a significantly higher CpG density than the genomic average Kazachenka et al., 2018). Given the prevalence of variably methylation at these elements, we wanted to assess the relationship between CpG content and variable methylation directly. Utilizing a previously identified list of all VM-TEs in C57BL/6J mice (Kazachenka et al., 2018), we observed that TE subfamilies with a higher average CpG density had a larger percentage of elements that are variably methylated, with IAPLTRs being the most CpG dense and having the largest percentage of elements being variably methylated ( Figure 3A). The CpG content of VM-IAPLTRs, however, is similar to non-VM-IAPLTRs, indicating that high CpG content alone is not sufficient to induce variable methylation ( Figure 3B). As CpG dense loci are capable of being recognized and bound by ZF-CxxC proteins (Clouaire et al., 2012;Gu et al., 2018;Long et al., 2013;van de Lagemaat et al., 2018), we utilized a publicly available ChIP-seq dataset for the ZF-CxxC-domain containing protein TET1 in ES cells (Gu et al., 2018) to examine ZF-CxxC binding across all IAPLTR1s. We identified increased TET1 binding at clade3 IAPLTR1s elements, which is enriched for VM-IAPs ( Figure 3C). Additionally, as the ZF-CxxC-domain containing protein CFP1 is a subunit of the SET1 complex responsible for methylating H3K4, we examined available H3K4me3 ChIP-seq data from B cells  to determine if H3K4me3 is enriched at VM-IAPs. We found that VM-IAPs were more likely to have H3K4me3 signal compared to non-variable IAPs To determine if recently evolved CpG-dense TEs are also variably methylated in humans, we used a previously identified list of loci that display interindividual epigenetic variation in humans (Gunasekara et al., 2019). Profiling the observed distribution of variably methylated loci in humans compared to a random sampling of the genome, we observed that recently evolved TEs were overrepresented in the variably methylated dataset ( Figure 4A). TE subfamily ages were obtained from DFAM (Hubley et al., 2016). We also found a correlation between extent of variable methylation and the average CpG density for in each TE subfamily, with the highly CpG dense transposable element subfamily LTR12C having both one of the highest CpG densities and one of highest percentage of elements that are variably methylated ( Figure 4B and Figure 4 -figure supplement 1A). LTR12Cs are recently evolved TEs that have previously been shown to have latent regulatory potential (Ward et al., 2013) and can act as promoter elements in cancer (Babaian and Mager, 2016). Together our results highlight the potentially conserved importance of high CpG density in the establishment of variable methylation.

CpG dense TEs are hypomethylated and recruit ZF-CxxC proteins in the absence of KZFPmediated silencing
To further investigate the relationship between KZFP/KAP1 silencing, DNA methylation, and ZF-CxxC binding to CpG dense TEs, we profiled DNA methylation and ZF-CxxC binding in Tc1 mouse livers. This "transchromosomic" mouse carries the majority of human chromosome 21 (Hsa21) possesses human specific transposable elements without corresponding KZFPs (O'Doherty et al., 2005) and has previously been used to identify the latent regulatory potential of recently evolved TEs (Jacobs et al., 2014;Long et al., 2016;Ward et al., 2013). We performed nanopore sequencing to profile DNA methylation of human chromosome 21 in Tc1 mouse liver and compared this with existing DNA methylation data from HepG2 cells (WGBS from ENCODE, ENCSR881XOU (Davis et al., 2018)) ( Figure 5A,B). We found that TE-derived CGIs were hypomethylated in the absence of KZFP mediated silencing, consistent with previous reports (Jacobs et al., 2014;Long et al., 2016;Ward et al., 2013). In contrast, TE-derived CGIs that can be targeted by KZFPs shared between mice and humans were largely methylated ( Figure 5figure supplement 1A). These silenced TE-derived CGIs are largely Alu elements that are derived from the same ancestral sequence as mouse B2 SINE elements and can be silenced by the KZFP ZNF182, which is present in both mice and humans . To profile ZF-CxxC occupancy at these hypomethylated TEs, we performed CUT&RUN for CFP1 in Tc1 mice and compared this with existing CFP1 ChIP-seq data from human erythroblasts (van de Lagemaat et al., 2018). We observed that CpG dense TEs that are not targeted by KZFPs displayed CFP1 recruitment, while the silenced TEs, regardless of CpG content, were largely unbound ( Figure   5C). This CFP1 binding was also unique to the Tc1 mice as there was no CFP1 binding at these loci in humans ( Figure 5D), while non-TE CGIs in both mice and humans have CFP1 binding ( Figure 5 -figure supplement 1B). Together this work demonstrates that in the absence of targeted KZFP mediated silencing, TEs with high CpG content can be hypomethylated and bound by ZF-CxxC proteins.

Trim28 haploinsufficiency activates evolutionary young and CpG dense TEs
As Trim28/KAP1 is a key protein involved in the establishment of heterochromatin at TEs, we profiled Trim28 haploinsufficient mice to determine if decreased abundance of Trim28 allows for CpG dense TEs to escape silencing. Trim28 haploinsufficiency has previously been shown to impact the Agouti viable yellow mouse coat color (Daxinger et al., 2016) and drive a bimodal obesity phenotype in FVB/NJ mice (Dalgaard et al., 2016). We performed both H3K4me3 ChIPseq and CFP1 CUT&RUN in the livers of Trim28 haploinsufficient (Trim28 +/D9 ) mice and their wild type littermates (Methods; Figure 6A). We identified 69 loci that had novel H3K4me3 signal in the Trim28 +/D9 mice compared to control (Methods; Figure 6 -figure supplement 1A). The majority of these loci with novel H3K4me3 recruitment were recently evolved transposable elements ( Figure   6B). These loci also had a significant increase in CFP1 signal in Trim28 +/D9 mice relative to control mice ( Figure 6C and Figure 6 -figure supplement 1B). Additionally, we profiled changes in expression of TE subfamilies in the Trim28 +/D9 mice compared to wild type mice using our previously generated RNA-seq datasets (Dalgaard et al., 2016). We identified that TEs which have a significant increase in gene expression in the Trim28 +/D9 mice compared to wild type mice have significantly higher CpG density than expected when compared to non-responsive TEs ( Figure 6D). Together this work further demonstrates that loss of KAP1 mediated silencing allows for CpG dense transposable elements to be bound by ZF-CxxC proteins and escape targeted silencing.

Discussion
This work provides the foundation for a deeper understanding of how mammalian genomes are shaped during evolution. It has been proposed that genomes are locked in an "evolutionary arms race" between endogenous retrovirus and KZFPs to maintain proper silencing of the genome. It appears as though the variable methylation observed at IAPs could be the result of the IAPs winning this "arms race". We observed that VM-IAPs have unique sequence variants that allow them to have diminished KZFP recruitment. This provides an explanation as to why previous works stated there is a sequence bias for variably methylated IAPs in mice Faulk et al., 2013;Kazachenka et al., 2018). We also found that IAPs with the same sequence variants as the VM-IAPs are more likely to be polymorphic between individual mouse strains.
Interestingly, we find that the IAPLTRs flanking the "master" IAP that has been proposed to be responsible for an expansion of IAPs in C3H/HeJ mice (Rebollo et al., 2020) has the same sequence as the clade 3 IAPLTR1s (Figure 7 -figure supplement 1). Together this implies that these VM-IAPs may be more active in certain mouse strains and may be propagating due to a lack of repression.
Our work also highlights the importance of CpG density to the activation of the endogenous retroviruses. We find that ZF-CxxC recruitment can play a role in the protection from DNA methylation at CpG-dense TEs, while CpG poor TEs remain silenced. Additionally, we observed that recently evolved CpG-dense TEs were most likely to be variably methylated in humans. In the absence of targeted silencing, these CpG dense TEs have the potential to become hypomethylated and recruit CxxC domain containing proteins. These results illustrate how deamination of methylated loci could be a mechanism for driving permanent silencing of transposable elements over time. Overall, this work profiles the mechanisms involved in regulating variable methylation in mammalian genomes.
Based on our results, we propose a model that VM-IAP loci are established as a result of partial KZFP mediated silencing at CpG dense TEs, which are protected from silencing through recruitment of ZF-CxxC containing proteins such as TET1 (Figure 7). This potentially results in a stochastically silenced locus, which is then inherited by all tissues during development. Together our work has significant implications in furthering our understanding of how these recently evolved transposable elements can alter both the genomes and epigenomes of mammals.

Multiple sequence alignment and hierarchical clustering
IAPLTR1_Mm, IAPLTR2_Mm, and IAPEz-int elements were defined by RepeatMasker (SMIT and A., 2004) in the mm10 genome. Intact elements were selected as elements at least 75% the length of the consensus IAP. IAPs were processed using the ete3 pipeline (Huerta-Cepas et al., 2016) using mafft with default settings (Katoh and Standley, 2013), and bases that were found in less than 10% of the samples were trimmed from the multiple sequence alignment (Capella-Gutiérrez et al., 2009). The multiple sequence alignments were then hierarchically clustered using phyml (Guindon et al., 2010). Sequence clades were empirically selected. For IAPEz-ints, only IAPEz-int elements flanked by either IAPLTR1_Mm or IAPLTR2_Mm elements were selected for profiling. The first 150bps of the 5' end of the IAPEz-int were extracted due and processed using the ete3 pipeline as previously stated.
Heatmaps across the IAP multiple sequence alignments by using a custom script to find the read coverage at each base and were visualized using pheatmap https://cran.rproject.org/package=pheatmap. Heatmaps and aggregate plots of loci that extend beyond the IAP element were generated using deeptools (Ramírez et al., 2016). Regions of enrichment were called using MACS2 callpeaks (Zhang et al., 2008) with a q-val threshold of 0.001.

Chromatin context profiling
We used proximity to constitutively expressed genes or annotated enhancer elements as a proxy for profiling a euchromatic environment. Constitutive expressed genes were identified as genes that have a fragments per kilobase of transcript per million mapped reads (FPKM) greater than 5 in more than 90% of the samples profiled in the mouse gTEX project (Li et al., 2017), while enhancers were all ENCODE annotated enhancer elements (ENCODE Project Consortium et al., 2020).

Polymorphism analysis between mouse strains
Coordinates of the structural variants between mouse strains were obtained from the Sanger mouse genome project (Keane et al., 2011). Deletions between mouse strains were extracted for each of the profiled mouse strains. Bedtools intersect (Quinlan and Hall, 2010) with the -f 1 option was used to determine if an IAP was fully deleted in each mouse strain relative to C57BL/6J.
Phylogeny of the mouse strains was obtained from previous work (Nellåker et al., 2012). For branches where multiple mouse strains are similarly diverged from C57BL6/J, the IAP was considered to be present if the IAP existed in any of profiled mice at that branch.

MinION sequencing and alignment
Livers from B6129S-Tc(HSA21)1TybEmcf/J mice were purchased from JAX Laboratory. Purified DNA from one Tc1 mouse liver sample was prepared using the SQK-LSK109 ligation sequencing kit and run on R9 flow cells. 1ug of DNA was end repaired with NEBNext FFPE DNA repair (NEB) and Ultra II End-prep (NEB). The repaired DNA was cleaned with KAPA Pure Beads (KAPA).
Adapters were ligated using NEBNext quick T4 ligase (NEB). Prepared libraries were mixed with Sequencing Buffer from the SQK-LSK109 kit and approximately 200ng were loaded onto the MinION flowcell. Reads were aligned using bwa mem with the options -x ont2d -t 100 (Li, 2013) to a custom assembly that included all of the mm10 genome and human chromosome 21, and then sorted using samtools sort (Li et al., 2009). Reads were processed using minimap2 with the options -a -x map-ont (Li, 2018). Methylation state of CpGs was called using nanopolish with the options call-methylation -t 100 (Loman et al., 2015). Only loci with greater than 5x coverage were considered in the analysis. CpG islands were obtained from the UCSC table browser (Gardiner-Garden and Frommer, 1987;Karolchik et al., 2004). Methylation percentage was averaged across CpG islands.

CFP1 CUT&RUN sequencing and alignment
CUT&RUN was performed as previously described (Skene and Henikoff, 2017) using the Millipore CFP1 monoclonal antibody (ABE211). Briefly, unfixed permeabilized cells are incubated with the CFP1 antibody fused to A-Micrococcal Nuclease. Fragmented DNA was isolated and sequenced on an Illumina HiSeq 2500 System. We assessed standard QC measures on the FASTQ file using FASTQC (Andrews and Others, 2010) and adapters were trimmed using Trimgalore ("Babraham Bioinformatics -Trim Galore!," n.d.; Martin, 2011). Trimmed reads were aligned to either the mm10 genome or a custom assembly that included both the mm10 genome and human chromosome 21 using bowtie1 retaining only reads that could be mapped to unambiguously to a single loci using the -m 1 option (Langmead et al., 2009). Aligned reads were sorted using samtools (Li et al., 2009) and filtered for duplicate reads using the MarkDuplicates function of Picardtools http://broadinstitute.github.io/picard/. Regions of histone enrichment were called using MACS2 callpeaks (Zhang et al., 2008) with a q-val threshold of 0.001. Only peaks found in two animals were retained to remove biological noise. Additionally, due to small fragment size, loci that could not be mapped unambiguously by 90 bp fragments were removed from consideration using the deadzones function from RSEG (Song and Smith, 2011).

Animals
The generation of Trim28+/D9 has been described elsewhere (Blewitt et al., 2005). Animals were kept on a 12 hr light/dark cycle with free access to food and water and housed in accordance with international guidelines. Livers were isolated at 2 weeks of age and immediately snap-frozen for further processing.

ChIP-seq analysis of H3K4me3 ChIP-seq
Chromatin immunoprecipitation was performed with an H3K4me3 antibody (ab8580, Abcam) as previously described (Leung et al., 2013). Isolated DNA was sequenced on an Illumina HiSeq 2500 System. We assessed standard QC measures on the FASTQ file using FASTQC (Andrews and Others, 2010) and adapters were trimmed using Trimgalore ("Babraham Bioinformatics -Trim Galore!," n.d.; Martin, 2011). Reads were aligned to the mm10 genome using bowtie1 retaining only reads that could be mapped to unambiguously to a single locus using the -m 1 option (Langmead et al., 2009). Aligned reads were sorted using samtools (Li et al., 2009) (Quinlan and Hall, 2010). BigWigs were generated using the UCSC tool bedGraphToBigWig (Karolchik et al., 2004). Regions of enrichment were called using MACS2 callpeaks (Zhang et al., 2008) with a q-val threshold of 0.001. Loci that displayed enrichment of H3K4me3 in at least two animals were retained to remove biological noise. The loci with increased H3K4me3 as shown in Fig 6B-C were identified by finding peaks present in Trim28 +/D9 mice that were absent from WT mice and also had 3x more H3K4me3 signal in Trim28 +/D9 mice compared to WT. Heatmaps and aggregate plots were generated using deeptools (Ramírez et al., 2016).

Differential expression analysis of repetitive element subfamilies.
RNA-seq data was downloaded from the european nucleotide archive. Adapters were trimmed using Trimgalore ("Babraham Bioinformatics -Trim Galore!," n.d.; Martin, 2011). Trimmed reads were aligned to the mm10 genome using Bowtie2 (Langmead and Salzberg, 2012), and further processed using the RepEnrich2 package (Criscione et al., 2014). This package totals the number of reads mapping to each repetitive element subfamily. Differentially expressed transcripts were identified using DEseq2 (Love et al., 2014) as with a p-val less than 0.05 between Trim28 +/D9 and wild type mice.

Funding and Acknowledgements
Funding was provided by the National Institutes of Health, grants R01DK112041 and             and loss of KZFP binding have the potential to recruit ZF-CxxC proteins to protect these TEs from being silenced. However elements with high CpG density but strong KZFP recruitment or loss of KZFP binding and low CpG density will remain methylated.
Figure 7 -Supplemental 1: Comparison of the IAPLTR associated with the "master" IAP element identified in the C3H mice compared to the consensus sequence for each clade. Only a section of the IAPLTR that contains identifiable sequence variants for each clade is shown.