Coupled protein synthesis and ribosome-guided piRNA processing on mRNAs

PIWI-interacting small RNAs (piRNAs) protect the germline genome and are essential for fertility. piRNAs originate from transposable element (TE) RNAs, long non-coding RNAs, or 3´ untranslated regions (3´UTRs) of protein-coding messenger genes, with the last being the least characterized of the three piRNA classes. Here, we demonstrate that the precursors of 3´UTR piRNAs are full-length mRNAs and that post-termination 80S ribosomes guide piRNA production on 3´UTRs in mice and chickens. At the pachytene stage, when other co-translational RNA surveillance pathways are sequestered, piRNA biogenesis degrades mRNAs right after pioneer rounds of translation and fine-tunes protein production from mRNAs. Although 3´UTR piRNA precursor mRNAs code for distinct proteins in mice and chickens, they all harbor embedded TEs and produce piRNAs that cleave TEs. Altogether, we discover a function of the piRNA pathway in fine-tuning protein production and reveal a conserved piRNA biogenesis mechanism that recognizes translating RNAs in amniotes.

Comparing to our understanding of the piRNA pathway in fruit flies, it remains unclear what marks a transcript for piRNA processing in mammals. In fruit flies, piRNA loci in germ cells are epigenetically marked by heterochromatin-bound factor rhino [30][31][32] . piRNA precursors are derived from promoter-independent, unspliced cryptic transcripts in highly repetitive regions that harbor diverse TEs 33,34 . piRNA precursors are then directly channeled for piRNA processing on perinuclear RNA granules that are proximal to nuclear pores 35 . Maternally deposited piRNAs further recognize the pRNA precursors 4 post-transcriptionally to trigger a cascade of piRNA processing 36,37 . Unlike fruit flies, mouse germ cells are induced from somatic cells 38,39 , leaving parental piRNA deposition unlikely. The majority of mammalian piRNAs come from 5´ capped and 3´ polyadenylated, long, continuous, non-coding RNAs (lncRNAs) that are depleted of TEs .
The transcription factor A-MYB directs the expression of these lncRNAs from euchromatin regions without identifiable epigenetic markers distinguishing mouse piRNA loci, along with mRNAs that do not produce piRNAs, during pachynema (the pachytene stage of meiosis) 8,40 . Despite being annotated as lncRNAs, rather than directly channeling to RNA granules for processing, we recently demonstrated that ribosomes translate the upstream open reading frames (uORFs) of lncRNA piRNA precursors. Thus, transcription and translation of mammalian piRNA precursors utilize conventional machineries, therefore leaving the features that discriminate them from other lncRNAs and mRNAs elusive.
In eukaryotes, translation-dependent RNA quality control mechanisms identify and degrade aberrant transcripts with a premature termination codon (nonsense-mediated decay, NMD), lacking an in-frame stop codon (non-stop decay), or contain an elongation inhibitory structure (No-go decay). These quality control pathways are coupled with translation and facilitate the recycling of the ribosomes trapped on the faulty transcripts [41][42][43][44] . The finding of ribosome-guided piRNA biogenesis downstream of uORFs on lncRNAs 45 raises the possibility that aberrant translation intermediates could act as substrates for piRNA biogenesis. It nevertheless remains unclear how 80S ribosomes assemble on the non-coding regions of piRNA precursors, which contain frequent stop 5 codons, and how these ribosomes escape these translation-dependent surveillance mechanisms that recycle ribosomes and degrade RNAs 44 . The detection of ribosomes downstream of uORFs on lncRNA piRNA precursors is reminiscent of the re-initiation of ribosomes downstream of uORFs present in the 5´UTRs of canonical mRNAs when initiation factors have not yet dissociated from the 40S subunit 46 . Given that short uORF length is an important factor for translation of the main ORFs on mRNAs 46 , explaining why eukaryotic mRNAs are generally monocistronic, presence of long ORF on piRNA precursor should inhibit ribosome-guided piRNA biogenesis. It is thus mechanistically important for both piRNA biogenesis and translational regulation to test whether ribosome-guided biogenesis can only occur after the translation of a short ORF.
To test whether a short uORF is required for ribosome-guided piRNA biogenesis, we studied a specific class of piRNAs, namely 3´UTR piRNAs. In mice, piRNAs are divided into three major classes based on their origin 47,48 : (i) piRNAs from transposable elements (TE piRNAs); (ii) intergenic piRNAs derived from lncRNAs (pachytene piRNAs); and (iii) genic piRNAs that map to 3´UTRs of protein coding regions in the sense orientation (3´UTR piRNAs). TE piRNAs protect the animal germline genome from TEs and are essential for animal fertility, a function that is evolutionarily conserved in bilateral animals [1][2][3]5,49,50 . Pachytene piRNAs have only been reported in mammals thus far and have been shown to trigger the decay of mRNAs required for sperm maturation [51][52][53][54] .
3´UTR piRNAs have been detected in fruit flies, frogs, and diverse mammalian species [55][56][57] , but we currently do not know their function(s), how their production is regulated, nor whether their precursors are full-length mRNAs or cryptic transcripts corresponding 6 exclusively to the 3´UTR 56 . The lack of understanding of 3´UTR piRNAs hindered our efforts to identify either common features that mark such transcripts for piRNA biogenesis and/or identify a machinery that sorts diverse RNAs for piRNA biogenesis.
Here, we characterize the biogenesis of 3´UTR piRNAs in mice, demonstrating that their precursors are full-length protein-coding mRNAs. We further show that piRNA biogenesis from these precursors is coupled with efficient translation and that ribosomes guide piRNA precursor fragmentation on mRNA 3´UTRs. We demonstrate that this tight coupling of ribosome binding and piRNA biogenesis fine-tunes protein synthesis from mRNAs. Ribosome-guided piRNA processing occurs at the meiotic stage when ribosome recycling factors and NMD are temporally inhibited. Lastly, we demonstrate that 3´UTR ribosome-guided piRNA processing also occurs in chickens. Although 3´UTR piRNAs are derived from distinct sets of genes in mice and chicken, we found the presence of TE sequences to be a shared feature that serves to produce anti-sense TE piRNAs that cleave TEs post-transcriptionally, indicating TE-suppression is a conserved evolutionary force driving the 3´UTR piRNA production. Taken together with our previous studies, we find that a general and a conserved piRNA biogenesis pathway recognizes translating RNAs regardless of their ORF length.

3´UTR piRNAs are produced from full-length mRNAs
To test whether 3´UTR piRNAs are derived from full-length mRNAs or cryptic transcripts with alternative transcription start sites (TSSs), we set out to identify the 7 structure of the precursor RNAs by blocking their transcription and inhibiting the posttranscriptional processing of 3´UTR piRNAs. We previously defined a group of 3´UTR piRNAs increasing from 12.5 dpp to 17.5 dpp in mice 8 , when spermatocytes enter pachynema (Fig. S1a). Thus, we hypothesized that these 3´UTR piRNA precursors may be controlled by the transcription factor A-MYB, which also promotes the synthesis of pachytene piRNA precursors 8 . Chromatin immunoprecipitation sequencing (ChIP-seq) revealed that A-MYB binding sites are far from the 3´UTRs of mRNAs that come from 3´UTR piRNA-producing loci (uppl) (median distance >14 kb, Fig. 1a), but close to the TSSs of uppl mRNAs (median distance 81.5 nt, Fig. 1a). These sites are also enriched for MYB consensus sequences (Fig. 1b). To verify that A-MYB regulates uppl mRNAs, we studied mice carrying an A-Myb hypomorphic mutation 58 , which displayed significantly decreased levels of uppl mRNAs based on RNA-seq data from 14.5 dpp (p = 4.7 × 10 −2 ) and 17.5 dpp testes (p = 1.0 × 10 −3 , Fig. 1c, left, Fig. S1b). Therefore, as it does for pachytene piRNA precursors, A-MYB directly regulates the transcription of uppl mRNAs.
To test whether uppl mRNAs are 3´UTR piRNA precursors, we examined the effect of the loss of transcription of uppl mRNAs on 3´UTR piRNA biogenesis in A-Myb mutant mice. Like uppl mRNAs, 3´UTR piRNA abundance significantly decreased in mutant testes (Fig. 1c, right, p ≤ 4.8 × 10 −10 , S1b). 3´UTR piRNA depletion associated with uppl mRNA transcription loss could reflect an indirect effect of the meiotic arrest caused by the A-Myb mutant 58 or the lack of piRNA biogenic factors, the transcriptional activation of which requires A-MYB 8 . To test these possibilities, we blocked piRNA processing but not meiosis by conditionally knocking out (CKO) Mov10l1 in spermatocytes (Fig. S1c) 11 .
In the following analyses, we defined a group of non-piRNA producing mRNAs that display similar expression dynamics in mouse testes as uppl mRNAs to be the developmentally matched-control mRNAs for comparison (Table S1). These control mRNAs remained unchanged in Mov10l1 CKO mutants compared to control littermates (Figs. S1d, S1e). We found that 3´UTR piRNAs are depleted in Mov10l1 CKO mutants ( Fig. 1d, upper), whereas uppl mRNAs showed significant accumulation in Mov10l1 CKO mutants (Figs. 1d, S1e, p = 4.1 x 10 -11 ). We ruled out the possibility that the increased steady-state of uppl mRNAs is due to their transcriptional activation in Mov10l1 CKO mutants (Fig. S1f, p = 0.13). Together with the cross-linking immunoprecipitation data (CLIP) 13 that indicate MOV10L1 specifically binds to the entire length of uppl mRNAs (Fig. S1g), our results suggest that the increased steadystate abundance of uppl mRNAs is due to the lack of 3´UTR piRNA biogenesis. Overall, our results indicate that 3´UTR piRNAs come from full-length uppl mRNAs, and not from isoforms derived from uppl 3´ UTRs.

3´UTR piRNAs are derived from processed mRNAs
A recent study suggests that unspliced transcripts are substrates for piRNA processing 59 .
However, our data indicated that 3´UTR piRNAs were disproportionately produced after intron removal: >99% of piRNAs mapped to exons, compared to <0.1% that mapped to introns (17,200 ppm unique mapping 3´UTR piRNA reads from adult testes). After correcting for the length of exons and introns in piRNA-producing primary transcripts, exon-derived piRNAs were enriched by 630-fold compared to intron-derived piRNAs 9 ( Fig. S2a). We detected piRNAs that failed to map to the genome, but instead mapped to exon-exon junctions (Fig. 2a), further indicating that piRNAs were produced after intron removal and exon-exon joining. In uppl mRNA genes, 98% of the introns (207 out of 211) contained canonical GT-AG splice sites, not significantly different from the introns in control mRNA genes (326 out of 329 mRNA loci, χ 2 test, p = 0.55), suggesting that uppl mRNAs were spliced conventionally. Furthermore, the density of piRNAs falls off sharply after the 3´end of the transcript, i.e., the site of polyadenylation (Fig. S2a). Taken together with our previous piRNA precursor studies based on a combination of cap analysis of gene expression (CAGE) and the polyadenylation site sequencing (PAS-Seq) 8 , we conclude that 3´UTR piRNAs were produced after their precursor transcripts are fully processed-capped, spliced, and polyadenylated.

uppl mRNA precursors harbor extensive 3´UTRs
We used our recent reconstruction of the mouse testis transcriptome nt, similar to that of typical vertebrate mRNAs 6 1 . Since exon length distribution is associated with relative position in the transcripts, we further separated exons into first, middle, and last exons. The last exons for the control mRNAs and uppl mRNAs are the 10 longest among the three categories, whereas the middle exons are the shortest (Fig. 2c).
While there is no significant difference in the length distribution of the middle exons (p = 0.65), both the first and the last exons of uppl mRNAs are significantly longer than those of the control mRNAs (p ≤ 4.2 × 10 −8 , Fig. 2c). Long first exons were recently reported to be a conserved feature for pachytene piRNA precursors 4 0 , suggesting that the unique feature of piRNA precursors can be traced back to exon-intron structures and the selection of poly A cleavage sites.
Since 5´UTRs and 3´UTRs are typically localized to the first and last exons, respectively, we further separated the spliced transcripts according to their longest open reading frame (ORF). Compared to mRNAs in other tissues, testis mRNAs are reported to have shorter 3´UTRs 6 2 - 6 5 . Consistent with these reports, the 3´UTRs of the control mRNAs have a median length of 392 nt, whereas the 3´UTRs of the uppl mRNAs are significantly longer, with a median length of 4,583 nt ( Fig. 2d; p = 1.5 × 10 −11 ). While the lengths of the ORFs are similar (Fig. 2d, 1,315.5 nt vs 945 nt; p >0.01), the 5´UTRs of uppl mRNAs are also significantly longer than those of the control mRNAs (Fig. 2d, 346 nt vs 77 nt; p = 3.3 × 10 −7 ). Therefore, regulatory sequences (UTRs), rather than ORFs, contribute to the majority of the length difference between the spliced transcripts of uppl mRNAs and control mRNAs (Fig. 2b), and render more diverse translational regulations on uppl mRNAs.

3´ UTR ribosomes guide piRNA formation
The biogenesis of piRNAs from mRNA 3´UTRs suggests that cellular translation machinery may be used by piRNA biogenesis machinery to distinguish between 3´UTRs and ORFs. To test whether uppl mRNAs are translated, we performed ribosome profiling (Ribo-seq), in which RNA fragments protected from RNase A and T1 digestion are isolated from 80S fractions and sequenced 4 5 . We found that ribosome protected fragments (RPFs) from uppl mRNA ORFs displayed a three-nucleotide periodicity (Fig.   2e), indicating that elongating ribosomes translate these ORFs. We calculated their translational efficiency (RPF reads normalized to transcript abundance) and found that uppl mRNAs have a slightly lower translational efficiency compared to control mRNAs ( Fig. 2f, 59% of the median translational efficiency of control mRNAs, p = 0.037), as expected given their longer 5´UTRs (Fig. 2c) will take longer to scan through before the next 40S subunits start 6 6 and may also contain more upstream initiation sites and/or have more complex secondary structures that can reduce translational efficiency 6 7 . We tested codon usage, another key determination factor for translational efficiency 6 8 and found that uppl mRNAs and control mRNAs have a similar codon usage (Figs. S2e, S2f) Overall, uppl mRNAs are efficiently translated on their main ORFs without any sign of aberrant initiation or elongation that distinct them from other mRNAs that do not produce piRNAs. Taken together, uppl mRNAs function to produce both piRNAs and proteins.
To test whether ribosomes also guide piRNA biogenesis from uppl mRNAs as they do from lncRNAs 45 , we identified RPF signals coming from uppl mRNA 3´UTRs (Fig.3a).
3´UTR RPF signals were not seen in the control mRNAs (Figs. S3a, S3b, p = 1.1 x 10 -8 ), indicating the distribution of RPFs on uppl mRNA 3´UTRs is RNA-specific, rather than characteristic of typical testicular mRNAs. The in vitro RNase digestion used to obtain RPFs removed >98% of the mature 3´UTR piRNAs in testis lysates (Fig. S3c). We performed Ribo-seq on adult RiboTag mice after activating expression of hemagglutinin (HA)-tagged RPL22 (a ribosomal large subunit protein) in germ cells 45 and affinitypurified RPFs before sequencing. After IP, sequences from mitochondrial coding transcripts were also depleted ( Fig. S3d) given they are translated by untagged 55S mitochondrial ribosomes. In contrast, uppl mRNA ORF and 3´UTR sequences were retained, similar to RPFs from control mRNAs (Fig. S3d). Overall, these results indicate that the 3´UTR RPF signals are bona fide ribosomal footprints.
To test whether ribosome-bound 3´UTRs are recognized as precursors for 3´UTR piRNAs production, we performed partial correlation analyses 72 between RPF abundance and piRNA abundance in uppl 3´UTRs while controlling for the abundance of uppl mRNAs, as measured by RNA-seq. These analyses can distinguish a biogenic relationship between RPFs and piRNAs or independent correlations with their uppl mRNA precursors. We found that the abundance of RPFs and piRNAs at 3´UTRs are directly correlated with each other (Fig. 3b, right, r = 0.69, p = 2.9 x 10 -5 ). Thus, uppl 3´UTRs bound by ribosomes are processed into piRNAs.
To test whether ribosomes guide the degradation of 3´UTRs for piRNA production, we analyzed the position of the RPFs on uppl mRNA 3´UTRs. We aligned piRNAs to RPFs and plotted the 5´-ends of RPFs that overlapped with piRNAs ( Fig. 3c). For RPFs, their first nucleotide overlapped with the first nucleotide of piRNAs significantly more than with nucleotides residing 50 nucleotides upstream or downstream (Fig. 3c, right, Z score = 36±1, Z scores indicate how many standard deviations an element is from the mean; Zscore > 3.3 corresponds to p < 0.01). We ruled out the possibility that this overlap could 13 occur by chance or be due to ligation bias (Figs. S3e, S3f). Consistent with their 5´overlap, we found that the 5´-ends of RPFs from 3´UTRs displayed a uridine bias at the 5´-most position (1U) (Fig. 4a), reminiscent of the 1U bias in piRNA. These results indicate that 3´UTR ribosomes dwell at the sites that represent future piRNAs.
Given that in vitro RNA digestion (RNase T1&A) used to obtain RPFs does not yield a 1U bias, we tested the possibility that these RPFs are processed in vivo by piRNA processing machinery. We modified the conventional Ribo-seq procedure (which detects both 5´P and 5´OH RPFs) to specifically capture 5´P RPF species and 5´OH RPF species separately (Fig. S4a). 5´P species are products of in vivo enzymatic cleavage whereas the 5´-hydroxyl (5´OH) species arise mainly from in vitro RNase treatment (RNase T1&A in our procedure). To prevent mature piRNA contamination, we used the RPFs from affinity-purified 80S ribosomes for library construction (Fig. S4a). The 5´OH RPFs showed the expected in vitro digestion signature, even at uppl 3´UTRs (Fig. S4b).
However, the 5´P RPFs are >32-fold enriched relative to 5´OH RPFs at uppl 3´UTRs in comparison to uppl ORFs (Fig. 4b, p = 1.0 x 10 -6 ). This indicates that RPFs predominantly present with 5´P ends in 3´UTRs at steady-state, suggesting that in vivo cleavage occurs efficiently on ribosome-bound 3´UTRs.
Consistent with these sites representing hot spots for efficient in vivo cleavage, 3´UTR ribosomes are significantly enriched in the monosome fractions relative to polysome fractions in comparison to ORFs of the control mRNAs (Fig. 4c, p = 2.5 x 10 -10 ), as measured by Ribo-seq performed on purified monosome and polysome fractions as we 14 reported previously 45 . The 1U signature of 5´P RPFs occurs specifically at the uppl 3´UTRs but not at the ORFs of uppl RNA or the ORFs of control mRNAs (Fig. S4b), indicating that 3´UTR RPFs are cleaved by the piRNA processing machinery. Altogether, piRNA processing machinery generates the 5´P in vivo cleavage products with a ribosome bound at their 5´ extremities, and these 5´P ends become the 5´ ends of future piRNAs.

3´UTR piRNA biogenesis is coupled with upstream translation
The translation of uppl mRNAs could either be coupled or uncoupled with piRNA

Biphasic biogenesis before and after the stop codon
uppl ORFs also produce authentic piRNAs with a 1U bias (Fig. 4a), although piRNAs derived from 3´UTRs are >31-fold more abundant than they are in ORFs, which equates to an 8-fold difference when normalized to 3´UTR and ORF length respectively ( Consistent with the lack of in vivo processing of ORF RPFs, we found that ORF RPFs have a similar distribution between monosome and polysome fractions when compared to RPFs from the control mRNA ORFs (Fig. 4c, p = 0.45). Furthermore, the abundance of RPFs and piRNAs at ORFs do not correlate with each other when controlling for the abundance of uppl mRNAs (Fig. 3b, left, r = -0.3, p = 0.1), indicating a lack of a biogenic relationship between ribosome-bound ORFs and piRNAs. Therefore, our results indicate that ribosome-guided piRNA processing occurs at uppl 3´UTRs, but not at uppl ORFs.
The lack of a biogenic relationship between RPFs and piRNAs at uppl ORFs could be due to: (1) a different mechanism of piRNA processing where only a translationally suppressed subpopulation of uppl mRNA ORFs is processed into piRNAs, with the majority protected from processing; or (2) the mRNA ORFs are processed but do not generate piRNAs efficiently. To distinguish between these two possibilities, we monitored the fragmentation process on ribosomes. Using degradome sequencing of long RNAs (>200 nt) from affinity-purified ribosomes, we captured ribosome-bound 5´P RNAs. We found that the abundance of ribosome-bound 5´P RNAs from 3´UTRs was not higher than that from ORFs (p = 0.3, Fig. S4e), arguing against the possibility that the majority of ORFs were protected from processing. We then tested whether these 5´P RNAs have their 5´-ends aligned with the 5´-ends of piRNAs, which would suggest that the cleavage process forms the piRNA 5´ ends. We found that the 5´P RNAs from uppl ORFs had a significantly lower 5´-overlap (Z score = 3.2±0.4) with the 5´-ends of uppl ORF piRNAs compared to the 5´-overlap between 5´P RNAs from uppl 3´UTRs and uppl 3´UTR piRNAs (Z score = 18±1, Student's t test, p = 1.8 x 10 -5 , Fig. S4f). Consistent with this, neither ribosome-bound 5´P RNAs nor RPFs from uppl ORFs have the 1U bias, unlike that of the 5´P RNAs and RPFs from uppl 3´UTRs (Fig. 4a). Thus, although the uppl ORFs are cleaved, the lack of 5´overlap between the cleavage products and uppl ORF piRNAs indicates that the cleavage products are inefficiently processed into piRNAs.
To determine if auxiliary factors facilitate efficient processing of piRNAs at uppl 3´UTRs, we tested for TDRD5 (tudor domain containing 5) function, the disruption of which impacts pachytene piRNA production 45,78 . We found that piRNAs derived from uppl ORFs were still produced, but 3´UTR piRNAs were depleted in Tdrd5 mutants (Figs. 4f, S4f), indicating that uppl ORF piRNAs do not require TDRD5 for their production, whereas 3´UTR piRNAs do. Therefore, 3´UTR piRNAs derived from uppl 3´UTRs and ORFs have linked but distinct biogenic requirements. Taken together, our results indicate that efficient piRNA processing at uppl 3´UTRs requires both ribosomes and TDRD5, and that post-transcriptional processing differences at 3´UTRs and ORFs explain why even though the entire length of uppl mRNAs generates piRNAs, >96% of uppl piRNAs are derived from uppl 3´UTRs. These results are similar to the biphasic piRNA biogenic mechanism we identified previously in pachytene piRNAs from lncRNA piRNA precursors 45 . Therefore, combined with our previous work on lncRNAs 45 , our study reveals a general ribosome-guided mechanism by which piRNA precursors, regardless of their source, are converted into piRNA sequences (Fig. S4g).

Inhibition of co-translational surveillance pathways
Ribosomes failed to recycle at the 3´UTRs of mRNAs should be rescued/resolved by ribosome recycling factors such as PELOTA (the mouse homolog of yeast DOM34) 79-84 .
To understand how 3´UTR ribosomes avoid recycling by PELOTA, we performed immunostaining on squashed spermatocytes and spermatids. We found that PELOTA localized to the nuclei of pachytene spermatocytes, but not to the nuclei of round spermatids (Fig. 5a). Similar nuclear localization, specifically at the pachytene stage, has been reported for other translational control proteins 85 . Considering that PELOTA is the major player in no-go decay 86 and no-stop decay 87 , the sequestration of ribosome recycling factors in the nucleus suggests their associated mRNA decay pathways are inhibited at pachynema.
Although uppl mRNAs that have extensive 3´UTRs (Fig. 2d) should represent conventional substrates to elicit NMD 88 , the spreading of ribosomes on 3´UTRs should inhibit the NMD as demonstrated by polycistronic viruses or inhibition of recycling factors 89,90 . To test this idea, we performed IP in testis lysates using anti-UPF1 and antiphosphorylated UPF1, followed by RNA-seq. UPF1 plays a role in NMD target recognition and elimination, and the phosphorylation of UPF1 is required to activate NMD 74,91 . We found that uppl mRNAs are enriched in UPF1 IP (Fig. 5b, p = 3.9 x 10 -8 ), but this uppl mRNA-associated UPF1 did not show increased levels of phosphorylation compared to control mRNAs ( Fig. 4b, lower, p = 0.15), indicating the NMD is not activated on the uppl mRNAs given that they appear to be NMD substrates. The NMD pathway is further repressed globally at the pachynema in testis 92 . The lack of NMD activity, in conjunction with the nuclear localization of PELOTA/DOM34 at the pachytene stage, may be a prerequisite for piRNA biogenesis during normal development.
Taken together, ribosomes binding to the piRNA precursor uppl mRNAs temporally stagger with other translation-dependent mRNA decay pathways, allowing piRNA production at the pachytene developmental stage.

piRNA biogenesis fine-tunes protein production
Given that other translation-dependent mRNA decay targets endogenous mRNAs to finetune protein production 93 , we tested whether the piRNA biogenesis pathway impacts 20 protein production from uppl mRNAs. We performed quantitative mass spectrometry using Mov10l1 CKO mutants and their littermate controls. We observed a significant increase in steady-state protein levels from uppl mRNAs, with a median increase of 13%, in comparison to the control mRNAs (p = 3.2 x 10 -2 , Fig. 5c (solute carrier family 2, member 3, also known as Glut3) are haploinsufficient 116,117 , indicating the essences of sufficient dosage of gene products for their normal functions.
As expected with increased miRNA-guided AGO2-mediated cleavage, we detected significantly decreased RPF abundance from these target RNAs in comparison to the 22 control mRNAs with a median decrease of 29% (Fig. S5c, p = 1.8 x 10 -5 ). Given the sensitivity to gene dosage and the essential functions of these target genes, increased miRNA-guided AGO2-mediated cleavage of their transcripts and decreased protein synthesis may contribute to the infertility of Mov10l1 CKO mutants. Considering that AGO2 is just one of the uppl mRNA protein products, our data support the biological significance of 3´UTR piRNA biogenesis in fine-tuning protein abundance during normal development.

The biogenesis of 3´UTR piRNAs is evolutionarily conserved
To test whether the biogenic mechanisms for 3´UTR piRNAs is also seen for 3´UTR piRNAs found in other amniotes, we identified and annotated 3´UTR piRNAs in rooster (Gallus gallus). We used RNA-seq from roosters to assemble the testis-specific transcriptome and then aligned piRNAs to annotated mRNAs. To identify precursor transcripts, we required a piRNA abundance of >100 ppm and ≥ 90% of piRNAs mapping to 3´UTRs (a median percentage of 96.3% of mouse 3´UTR piRNAs derived from 3´UTRs). To ensure mRNAs are translated in rooster testes, we also required an RPF abundance ≥ 1 ppm from their ORFs. Using these criteria, we detected, in total, 23 transcripts that both produce piRNAs (Fig. 6a) and code for proteins (as shown by 3nucleotide periodicity, Fig. 6b), thus representing uppl mRNAs. The transcriptional start sites of these transcripts (but not their 3´UTRs) have a nearby H3K4me3 ChIP-seq peak, a signature of RNA Pol II transcription start sites 118 (Fig. 6a), arguing against the existence of 3´UTR specific isoforms, and 21 out of 23 H3K4me3 ChIP-seq peaks 23 completely overlap with A-MYB ChIP-seq peaks (Fig. S6a), indicating 3´UTR piRNAs also exist in chickens and they are derived from full-length mRNA precursors. We found that RPFs also extended into the 3´UTRs of chicken uppl mRNAs (Fig. 6a).
Authentic piRNAs with 1U bias are produced from uppl ORFs (Fig. 6d). Unlike 3´UTR RPFs, the abundance of uppl ORF RPFs does not correlate with piRNA abundance (r = -0.09, p = 0.69, Fig. S6d, right), and the ORF RPFs did not display a signature of in vivo cleavage ( Fig. 6d) nor correspond to future piRNA sites (Fig. 6c, left). Thus, chicken uppl ribosomes also guide endonucleolytic cleavages that generate piRNA 5´-ends in 3´UTRs, but do not do so in ORFs. In sum, although the mRNAs that produce piRNAs do not overlap between mice and chickens, the existence of ribosome-guided piRNA biogenesis from mRNA 3´UTRs in both mice and chickens suggests that an evolutionary conserved biogenic mechanism predates the divergence of mice and chickens approximately 330 million years ago 119 .

Transposon fragments are embedded in precursor mRNAs
To determine the common features of 3´UTR piRNA precursors in chicken and mice, we revisited the debate over whether 3´UTR piRNAs harbor TE sequences We computed the percentages of exonic and intronic nucleotides that are annotated as part of a TE for each locus and found that a significantly higher fraction of uppl mRNA 25 exons harbor SINE elements (p ≤ 2.7 × 10 −8 ) in comparison to the control mRNA genes that contained no exonic nucleotides corresponding to TEs (Fig. 7a). The intronic regions of uppl mRNA genes and the control mRNA genes contained a similar fraction of TEs (p ≥ 0.02, Fig. S7b). Thus, TEs are enriched in spliced uppl mRNAs in mice.
In adult mouse testis, we detected TE piRNAs produced from uppl mRNAs (Fig. 7b) that uniquely map to SINEs embedded in uppl mRNA 3´UTRs (Fig, S7c) , we compared their expression around the developmental stage when uppl piRNAs are produced. We found SINEs are more highly expressed compared to DNA, LINE, and LTR transposons (Fig. 7d). Therefore, our data indicate that a subset of piRNAs produced from mouse uppl mRNAs post-transcriptionally silence TEs.
To test whether the function of producing antisense TE piRNAs is biologically significant during evolution, we performed a similar analysis in chickens and found that the processing of chicken 3´UTR piRNA precursors also generates antisense TE piRNAs.
While SINEs are largely absent from the chicken genome 1 2 3 , we found significantly more LINEs embedded in chicken uppl mRNA exons (p = 2.5 x 10 -4 ), whereas control mRNA genes contained no exonic nucleotides corresponding to TEs (Fig. 7e). Unlike mice, where both sense and antisense TEs are embedded in uppl mRNAs, an antisense bias of TEs was detected in chickens (sense median = 0.0%, antisense median = 2.1%). LINE piRNAs manifest a 1U bias and not a 10A bias (Fig. 7f) are produced in chicken adult testes. These piRNAs, uniquely mapping to uppl mRNAs, can also target TEs in trans, post-transcriptionally generating secondary piRNAs (Fig. 7g). Although neither uppl genes, nor TE families (Figs. 7a, 7e), are conserved between mice and chickens, which may accommodate rapidly changing populations of TEs, our data show that the use of mRNAs embedded with TEs to produce antisense TE piRNAs that cleave TEs posttranscriptionally is a common strategy in amniotes.

DISCUSSION
Here, we systematically characterized 3´UTR piRNAs (which should be called genic piRNAs, as ORF regions also produce piRNAs). By defining the transcription factors associated with piRNA biogenesis and characterizing mutants with transcriptional and post-transcriptional processing defects, we demonstrate that full-length mRNAs are the precursors of 3´UTR piRNAs. These mRNAs undergo pioneer rounds of translation that are followed by the production of piRNAs. This coupling is mediated by post-termination 80S ribosomes on 3´UTRs that guide the endonucleolytic cleavages and generate the 5´ends of piRNAs. Together with our previous studies on piRNA biogenesis from lncRNA precursors 45 , we find that ribosomes guide piRNA biogenesis downstream of ORFs regardless of their length. Similar to other mRNA decay pathways, piRNA processing 27 from mRNAs fine-tunes the protein products from these mRNAs. This co-translational processing of piRNAs is found in both mice and chickens. Therefore, we reveal a general and conserved mechanism by which post-termination ribosomes guide piRNA 5´-end formation from non-protein-coding regions of RNAs in amniotes.
TE silencing in mice is thought to be carried out principally by prenatal piRNAs 124,125 .
However, TE surveillance is still required throughout spermatogenesis. The murine PIWI protein MILI is not detected after prenatal piRNA expression ends and before pachytene piRNA expression begins 126  Drosophila piRNA biogenesis 59 , both lncRNA piRNA precursors and mRNA piRNA precursors exhibit longer first exons, suggesting that unique exon-intron structure could be one unique feature. Furthermore, given the translation of short upstream ORFs is insufficient to mark a transcript for piRNA biogenesis, and the uppl mRNAs do not exhibit faulty translation on their main ORFs, the translation intermediates with posttermination ribosomes on a long 3´UTRs is a prime candidate for further testing. Last but not least, if the TE-rich prenatal piRNAs target and initiate the processing of the 3´UTR piRNA precursors, the embedding of TE elements could also serve as a determining feature for a transcript for piRNA processing. Thus, the study of 3´UTR piRNAs allow more comprehensive investigation into the unique features defining piRNA precursors.

The linked but distinct biogenic requirement before and after the stop codon of uppl main
ORFs suggest that post-transcriptional processing of piRNAs can be further divided into two phases: substrate recognition and efficient processing. Since uppl ORFs are still processed into piRNAs as opposite to other cellular mRNAs, this suggests that uppl mRNAs are recognized as substrates for piRNA processing as entire transcripts before endonucleolytic cleavages occur. Otherwise, if the substrate recognition couples with piRNA processing, as soon as the initial cleavage starts on 3´UTRs, the ORF portion of the transcripts would display no unique feature compared to other translating mRNAs.
Our finding that the ORF portion of the transcripts still enter piRNA processing but in an inefficient manner suggests that either post-termination ribosomes streamline piRNA production and/or the translating ribosomes inhibit piRNA processing machinery from accessing the RNAs. The requirement of TDRD5 for 3´UTR piRNA biogenesis suggests that TDRD5 may function by coordinating post-termination ribosomes with piRNA 31 processing machinary. TDRD5 detected in amniotes (harbor a homolog with > 50% protein sequence identity with the mouse TDRD5) but not in fish (< 10% of potential homologs) 135 may have co-evolved with appearance of ribosome-guided piRNA biogenesis.
In summary, we reveal a conserved and general piRNA biogenesis mechanism that recognize translating RNAs regardless of whether they harbor long ORF or not. The assembly of 80S ribosomes on non-coding region of RNA is not restricted by the length of the upstream ORFs and is temporally staggered with translational-dependent RNA quality control pathways, suggesting compromised ribosome recycling. The coupling of piRNA biogenesis with translation fine-tunes the abundance of proteins that are critical for spermatogenesis in both mice and chickens.

Animals
Mice were maintained and used according to guidelines for animal care of the NIH and

Small RNA sequencing library construction
Small RNA libraries were constructed and sequenced, as previously described 45

Polysome gradient
Fresh testes were lysed in 1 ml lysis buffer (

RNA sequencing
Strand-specific RNA-seq libraries were constructed following the TruSeq RNA sample preparation protocol, as previously described 45 . rRNAs were depleted from total RNAs with complementary DNA oligomers (IDT) and RNase H (Invitrogen, Waltham, MA, USA) 138,139 .
Generally, one mismatch is allowed for genome mapping. For mouse transcriptome annotation, 30 uppl mRNAs defined in our previous studies with mm9 8 were converted to mm10 coordinates with using liftOver 143 with minor manual correction (Table S1). We selected 43 control mRNAs (Table S1) from our recently reconstructed mouse testis transcriptome 60 by their similar expression dynamics as uppl mRNAs from 10.5 dpp to 20.5 dpp and by their lack of any piRNA production. For genes with alternative transcripts, the most abundant transcript involving that gene was selected. We reassembled mRNAs using RNA-seq data from rooster testes and have defined 3´ UTR piRNA precursor mRNAs as described in the text. Statistics pertaining to the highthroughput sequencing libraries constructed for this study are provided in Table S1.
For small RNA sequencing, libraries were normalized to the sum of total miRNA reads; spike-in RNA was used to normalize the libraries from each fraction of polysome profiling. Uniquely mapping reads >23 nt were selected for further piRNA analysis. We Ribo-seq analysis followed the modified small RNA pipeline including the junction mapping reads as previously described 45 . Uniquely mapping reads between 26 nt and 32 nt were selected for further analysis except for the analysis on the 40S footprints where 18 nt to 80 nt were selected. RPFs and 80S footprints from different developmental stages were normalized to the sum of reads mapping to mRNA protein-coding regions, assuming that mRNA translation was largely unchanged during spermatogenesis.
Libraries from harringtonine treatment were further normalized to the sum of reads mapping to mitochondrial coding sequences as previously described 146 . 40S footprints were normalized to the sum of reads mapping to mRNA 5´UTRs as it has been shown that the 40S binds 5´UTRs in a cap-tethered fashion thus 40S ribosomes do not accumulate upon harringtonine treatment 66  ChIP-seq reads were analyzed as previously described 8 . Multiple mapping reads were apportioned randomly to each location (-k 1 switch) and one mismatch was allowed (-v 1). ChIP peaks were identified using MACS2 147 using default arguments, input as control.
Degradome reads and CLIP sequencing reads were aligned to the genome using TopHat 2.0.12 148 . Reads were mapped uniquely using the '-g 1' flag. Uniquely mapping reads were selected for further analysis. Libraries were normalized to the sum of reads mapping to mRNA protein-coding regions, assuming that mRNA cleavage was largely unchanged during spermatogenesis. We analyzed a published anti-HA IP'ed degradome library from adult wild-type testis (GSM4160721) 128 , and MOV10L1 CLIP library mouse wild-type testis (PRJNA230507) 13 .
Statistical analyses were performed in R 3.5.0 149 . The significance of the differences was calculated by Wilcoxon rank sum test except as indicated in the text. The significance of correlation was performed using partial correlation analysis in addition to simple correlations 72 .

Nucleotide periodicity
Nucleotide periodicity was computed as previously described 45 . We first aligned the RPFs to each other using 5´-end overlap analysis and reported the distance spectrum. An annotated ORF was not a prerequisite for this analysis as the distance spectrum of RPFs from mRNAs already showed a 3-nt periodicity pattern. We then transformed the distance spectrum using the "periodogram" function from the GeneCycle package 150 with the "clone" method. The relative spectral density was calculated by normalizing to the value at the first position.

Generating simulated sequences as negative controls
We generated a random pool of 28-mer sequences using a sliding window of 1 nucleotide from 5´ to 3´ of the piRNA precursors. We then sampled from this 28-mer pool to match the first nucleotide composition of the real reads. These simulated sequences from piRNA precursors were used as random controls for piRNAs (source code available upon request).

Mass Spectrometry Sample Preparation and Analysis
After testes lysis, protein concentration was determined by BCA (Thermo Scientific).
Samples were then diluted to 1 mg/mL in 5% SDS, 50 mM triethylammonium bicarbonate (TEAB). 25 µg of protein from each sample was reduced with dithiothreitol to 2 mM, followed by incubation at 55°C for 60 minutes. Iodoacetamide was added to 10 mM and incubated in the dark at room temperature for 30 minutes to alkylate the proteins.
Phosphoric acid was added to 1.2%, followed by six volumes of 90% methanol, 100 mM TEAB. The resulting solution was added to S-Trap micros (Protifi) and centrifuged at         (c) Distance spectrum of 5´-ends of RPFs from uppl ORFs (left) and from 3´UTRs (right) that overlap piRNAs in adult rooster testes (n = 3). Data are mean ± standard deviation.
(d) Sequence logos depicting nucleotide bias at 5´-ends and 1 nt upstream of 5´-ends of the following species from adult rooster testes. Top to bottom: RPF species, and piRNAs, which map to control mRNAs, uppl ORFs and 3´UTRs, respectively.       Z-score = 8 ± 2 52          in mouse testes Mouse uppl Chicken uppl homolog 20