A putative de novo evolved gene required for spermatid chromatin condensation in Drosophila melanogaster

Comparative genomics has enabled the identification of genes that potentially evolved de novo from non-coding sequences. Many such genes are expressed in male reproductive tissues, but their functions remain poorly understood. To address this, we conducted a functional genetic screen of over 40 putative de novo genes with testis-enriched expression in Drosophila melanogaster and identified one gene, atlas, required for male fertility. Detailed genetic and cytological analyses showed that atlas is required for proper chromatin condensation during the final stages of spermatogenesis. Atlas protein is expressed in spermatid nuclei and facilitates the transition from histone- to protamine-based chromatin packaging. Complementary evolutionary analyses revealed the complex evolutionary history of atlas. The protein-coding portion of the gene likely arose at the base of the Drosophila genus on the X chromosome but was unlikely to be essential, as it was then lost in several independent lineages. Within the last ~15 million years, however, the gene moved to an autosome, where it fused with a conserved non-coding RNA and evolved a non-redundant role in male fertility. Altogether, this study provides insight into the integration of novel genes into biological processes, the links between genomic innovation and functional evolution, and the genetic control of a fundamental developmental process, gametogenesis.


60
The evolution of new genes is integral to the extensive genotypic and phenotypic 61 diversity observed across species. The best-characterized mechanism of novel gene 62 emergence is gene duplication [1,2]; however, rapid expansion in high-quality genomic 63 resources has provided mounting evidence of lineage-specific sequences and the existence of 64 alternative modes of new gene origination. One such mechanism is de novo evolution, the birth 65 of new genes from previously non-genic or intronic regions, which is now a widely 66 acknowledged source of protein-coding and RNA genes [3][4][5]. Although de novo origination 67 was once considered an unlikely event, catalogs of de novo genes have now been published for 68 an expansive range of species [6][7][8][9][10][11][12][13]. Multiple models explain how protein-coding de novo 69 genes may acquire both an open reading frame (ORF) and regulatory sequences permitting 70 transcription [14][15][16][17]. Interrogation of the biochemical and biophysical properties of the proteins 71 encoded by de novo genes has offered initial insight into the mechanisms of emergence and 72 functional potential of these genes [17][18][19][20].

73
The capacity of protein-coding de novo genes to evolve important functions is a topic of 74 interest from evolutionary, physiological and molecular perspectives [21]. In the last couple of 75 decades, the products of de novo genes have been shown to play diverse roles in a variety of 76 organisms. For example, de novo genes function in fundamental molecular processes in yeast, Drosophila genus, the gene has evolved a critical function in D. melanogaster. These results 138 underscore the importance of detailed functional and evolutionary characterization in 139 understanding the origins of new protein-coding genes and the selective forces that affect their 140 subsequent evolution.

143
An RNAi screen identifies a putative de novo gene essential for Drosophila male fertility 144 A previous pilot screen of 11 putative de novo evolved, testis-expressed genes identified 145 two genes that are critical for male fertility in Drosophila melanogaster [46]. This result, and 146 other recent work [e.g., 40, 47, 48], suggested that lineage-specific, newly evolved genes can 147 rapidly become important for fertility, perhaps by gaining interactions with existing protein 148 networks. To determine more comprehensively the frequency with which potential de novo 149 evolved genes become essential for fertility, we identified de novo or putative de novo evolved 150 genes with testis-biased expression. A previous computational analysis identified genes that 151 are detectable only within the Drosophila genus, lack identifiable protein domains, and show no 152 homology to other known proteins through BLASTP and TBLASTN searches [19]. We filtered 153 these genes to identify those expressed exclusively or predominantly in the testis, a common 154 site of de novo gene expression in animal species [27,34,38,49]. This resulted in a set of 96 155 target genes. replicates. Knockdown of goddard was used as a positive control. B) A single-mating, single-pair fertility 164 assay confirms the observed defect when males are knocked down for CG13541, as knockdown males Rivard,Ludwig,Patel et al. 9 showed significantly reduced fertility (control fertility (mean ± SEM): 109.0 ± 5.3; knockdown fertility: 0.2 ± 166 0.1; two-sample t-test assuming unequal variances, p = 5.6 x 10 -13 ).

168
We used testis-specific RNA interference to screen these genes for roles in male fertility.

169
We obtained RNAi lines from the Vienna Drosophila Resource Center (VDRC) and the 170 Transgenic RNAi Project (TRiP) and constructed additional lines using the TRiP-style 171 pValium20 vector [50], which induces efficient knockdown in the male germline. We tested an 172 RNAi line for each of 57 genes by using the Bam-GAL4 driver, which is expressed in the male 173 germline and which we enhanced with a copy of UAS-Dicer2. RT-PCR confirmed at least 174 partial knockdown in lines representing 42 genes (see example in Fig. S1). We then screened 175 knockdown males for fertility by allowing groups of 7 knockdown males to mate with 5 wild-type 176 females for 2 days. Progeny counts were standardized to the number of progeny produced by 177 concurrently mated groups of 7 control males and 5 wild-type females. The results are shown in 178 Fig. 1A. This initial screen identified CG13541, whose knockdown severely reduced male 179 fertility. We confirmed the result for CG13541 by performing single-pair mating fertility assays 180 (Fig. 1B). Consistent with our previous convention of naming testis-expressed genes after 181 American rocketry [46], we will from here on refer to CG13541 as atlas. While RNAi transgenes 182 designed to knockdown CG43072 and CG33284 caused full and consistently partial sterility, 183 respectively, we do not further consider these genes because subsequent gene knockout using 184 CRISPR genome editing indicated that neither gene plays a role in male fertility. In these 185 cases, the RNAi phenotypes might have been due to off-target knockdown.

188
We validated the observed fertility defect by using CRISPR/Cas9-based genome editing 189 to construct putative loss-of-function alleles for atlas (Fig. S2). The principal allele we used for 190 validation and the functional studies described below was a null allele that completely deletes 191 the atlas genomic locus. This allele was generated by targeting each end of the locus with a gRNA. We made three additional frameshift alleles by inducing double-stranded breaks at a 193 gRNA target site just downstream of the atlas start codon, which induced non-homologous end 194 joining. Males homozygous for the atlas deletion allele have the same fertility defect as 195 knockdown males ( Fig. 2A). Males homozygous for any of three frameshift alleles showed 196 significantly reduced, but non-zero, fertility (Fig. 2B). It is possible that residual atlas function 197 may be present in these animals, perhaps due to translation initiation at a downstream start 198 codon, which could generate a shorter protein with partial function. Each frameshift allele 199 retains the possibility of encoding an N-terminally truncated, but otherwise in-frame, version of 200 Atlas protein that would lack the first 60 amino acids (out of 172; see Fig. S2). Alternatively, it is 201 possible that the residual fertility of the frameshift alleles is caused by the gene's intact 3' UTR, 202 a topic we discuss in more detail below. Finally, we constructed a genomic rescue construct 203 carrying both the full atlas transcribed region and its native regulatory sequences. atlas null 204 males that carried a single copy of the rescue construct had fully restored fertility (Fig. S3).

205
Overall, these data demonstrate that atlas loss, and not an RNAi or CRISPR off-target, causes 206 nearly complete male sterility.

219 220
Atlas is required for proper spermatid nuclear condensation 221 We next examined how atlas loss of function impacted male fertility at the cellular level.

222
Dissection and phase-contrast imaging of atlas deletion null or knockdown male reproductive 223 tracts revealed that while the pre-meiotic and meiotic stages of spermatogenesis appeared 224 normal, sperm accumulated at the basal end of the testes, rather than in the seminal vesicles  with the observed conglomeration of sperm tails at the basal testes, SVs from either atlas null or 231 knockdown males contained fewer mature sperm (Fig. 3B, Fig. S4C). The nuclei of sperm from 232 null males also appeared wider and less elongated than those of controls. Together, these data 233 suggest that atlas is required after meiosis, as developing spermatids take on their final 234 structures.

235
We next examined two post-meiotic processes: individualization of 64-cell spermatid 236 cysts into mature sperm, and spermatid nuclear condensation. Individualization initiates when 237 an actin-rich individualization complex (IC) associates with the bundle of spermatid nuclei. The

238
IC then proceeds down the sperm tails, expelling cytoplasmic waste and remodeling cell 239 membranes to form 64 individual sperm. We visualized this process in males 0-1 days old, 240 when spermatogenesis occurs at high levels, by staining whole mount testes for actin ( Fig. 3C-

241
D). Although ICs associated with nuclear bundles present at the basal end of the testes in both 242 control and atlas null males, we observed significantly fewer nuclear bundle-associated ICs in nulls ( Fig. 3C-D). While control testes typically had several ICs progressing down sperm tails, 244 we saw a significantly reduced proportion of progressed bundles in nulls ( Fig. 3C-D). In some 245 null testes, we also observed individual investment cones dissociated from progressing ICs (Fig.   246 3C).

247
The ability of ICs to assemble at nuclear bundles and progress down sperm tails may be 248 reduced if nuclear condensation is aberrant [reviewed in ref. 52]. During Drosophila 249 spermiogenesis, round spermatid nuclei undergo a series of stepwise, morphological changes 250 that are the product of two distinct, but related processes: changes in the chromatin packaging 251 of DNA, and changes in nuclear shape [53][54][55]. The end result is thin, condensed nuclei. We 252 quantified this process in testes dissected from newly eclosed wild-type and atlas null males 253 expressing Mst35Bb-GFP, which marks the final stages of condensation. We shredded the 254 post-meiotic region of the testes in the presence of a fixative and counted the number of nuclear 255 bundles that exhibited each of five stages of condensation [53]: round nuclei, early canoe-stage 256 (unmarked with Mst35Bb-GFP), late canoe-stage (marked with Mst35Bb-GFP), elongated 257 nuclei, and fully condensed nuclei (Fig. 4). Condensation of the nuclear bundles in atlas null 258 testes progressed at similar rates to controls through the late canoe stage (Table S1).

259
However, in atlas null males, all nuclear bundles that progressed past the canoe stage (which 260 included ~60% [range: 26-100%] of all observed bundles) showed an aberrant "curled" 261 phenotype ( Fig. 4; Table S1). These data suggest that Atlas protein is required during the later 262 stages of nuclear condensation and are consistent with the idea that the loss of atlas affects 263 nuclear condensation in a way that reduces IC assembly and sperm individualization (see Fig.   264 3C). Rivard,Ludwig,Patel et al. 13 266   Table S1, nuclear bundles from atlas null testes consistently took 287 on a curved shape after the canoe stage, though the degree of curvature was variable, as exemplified by 288 the two examples of elongated nuclei from atlas null testes above.

290
That condensing spermatid nuclei are misshapen in the absence of atlas suggests the 291 possibility that Atlas protein is critical for nuclear condensation. This idea is further supported 292 by its predicted biochemical properties. Previously characterized spermatid chromatin binding 293 proteins are small and highly basic [53,56,57], as the excess of positively charged amino acid 294 side chains facilitates ionic interactions with negatively charged DNA. Many such proteins (i.e.,

295
Tpl94D, Mst35Ba, Mst35Bb, Prtl99C and Mst77F) also contain a conserved protein domain, the 296 high-mobility-group box (HMG-box) domain [55,[58][59][60][61][62][63][64], suggesting that this type of chromatin binding protein could have originated through gene duplication and divergence. Consistent with 298 its putative de novo origin, Atlas lacks a detectable HMG-box domain. However, Atlas is 299 otherwise similar to these other sperm chromatin binding proteins: the ~20 kDa protein has a 300 highly basic predicted isoelectric point of 10.7, and its primary sequence contains the sequence 301 KRDK, which matches the canonical consensus sequence for nuclear import, K(K/R)X(K/R) 302 [65]. To test the hypothesis that Atlas is nuclear localized, and could thus bind DNA, we 303 generated an atlas-GFP transgene under UAS control and expressed it ubiquitously using 304 tubulin-GAL4 and in the early male germline using Bam-GAL4. In both larval salivary glands 305 and early male germline cells, Atlas-GFP appeared to be nuclear localized (Fig. S5).

306
While these results were consistent with Atlas protein localizing to the nucleus, they did  (Fig. 5C). Actin-based ICs were also observed in the basal testes, but 317 generally did not co-localize with Atlas-GFP, suggesting that Atlas-GFP is present in 318 condensing nuclei before IC association (Fig. 5C). This result, taken together with the aberrant 319 nuclear condensation in the absence of atlas (Fig. 4), is consistent with the idea that Atlas is a 320 transition protein. Transition proteins are chromatin components that act transiently during 321 spermatid nuclear condensation. A series of transition proteins first replace histones as the primary DNA binding proteins in the nucleus and then give way to protamines, the proteins that 323 package chromatin in mature sperm [53,55,58].

324
To further elucidate the role of atlas in nuclear condensation, we next examined Atlas-325 GFP localization in the presence of either an early spermatid nuclear marker, histone H2Av-326 RFP [55,71], or Mst35Bb-dsRed [51], a marker of nuclei from the late canoe stage through final 327 condensation. Atlas-GFP showed no co-localization with H2Av-RFP, suggesting that Atlas 328 functions after histone removal (Fig. 5B). In contrast, some GFP-positive bundles co-localized 329 with Mst35Bb-dsRed, but others did not ( Fig. 5D and Fig. S7B). These data suggest that Atlas

347
The whole testes from which the basal portions are shown in panels C and D are shown in Fig. S7.

349
To determine the stage(s) of nuclear condensation at which atlas functions, we analyzed 350 the shape of fixed nuclear bundles from shredded testes isolated from atlas-GFP males on the day of eclosion. Based on the stage of the defect in atlas null males ( Fig. 3-4) and the pattern 352 of Atlas-GFP-positive bundles in whole-mount testes (Fig. 5), we hypothesized that Atlas-GFP 353 would localize to the later stages of nuclear condensation. Consistent with this hypothesis, we 354 did not detect Atlas-GFP in round or early canoe stage bundles ( Fig. 6A-B). Atlas-GFP co-355 localized with DNA in late canoe stage bundles (Fig. 6C). Interestingly, when nuclei elongated 356 further, GFP was detected not in the nucleus, but as puncta basal to the nuclei (

365
6D) represents a mechanism for removing transition proteins from the nucleus after they exert 366 their functions. We observed above that some Mst35Bb-GFP also appears to be removed from  Rivard,Ludwig,Patel et al. 20 Evolutionary origins of atlas

381
To better understand the evolutionary origin of atlas and its evolution since emergence, 382 we used a combination of BLAST-and synteny-based approaches to identify atlas orthologs 383 throughout the genus [46,72]. One notable feature of this two-exon gene is that the protein-384 coding region (519 nucleotides) is contained entirely within the first exon (622 nt); the longer, 385 second exon (910 nt) appears to be entirely non-coding (Fig. 7). Surprisingly, the second exon 386 is more widely conserved. BLASTN detected significant matches to this region (range of hit  (Table S2). In D. ananassae, however, the protein-coding region is found on the X than Atlas, suggesting they may be lineage-specific paralogs of other genes (Fig. S9). In sister 425 species D. pseudoobscura and D. persimilis, we detected a male-expressed transcript predicted 426 to encode a protein with a pI > 10 in the region syntenic to the location of atlas in D. virilis, but 427 the predicted protein sequences showed no significant BLASTP similarity to atlas orthologs 428 (Table S2). While this predicted protein may represent a divergent atlas ortholog, the abSENSE 429 method predicts low probabilities of BLASTP detection failure when searching for Atlas protein  (Table S2). The ortholog status of this predicted 435 protein is also unclear, but because of its dramatically altered size and pI, it is unlikely to have a 436 functional role equivalent to that of D. melanogaster Atlas.

437
To investigate whether the protein-coding region may have reproductive functions in 438 other species, we used sex-specific RNA-seq data from numerous Drosophila species curated 439 by the Genomics Education Partnership [72; thegep.org] and verified several of these results by 440 RT-PCR (Fig. 7, Fig. S10). In all species in which atlas was detected, the protein-coding region 441 is expressed specifically in males regardless of its genomic location (Fig. 7). Interestingly, the 442 non-coding region shows male-specific expression in species lacking an unambiguous, 443 orthologous coding region, such as D. pseudoobscura and D. mojavensis. Conversely, while D. 444 yakuba and D. erecta express the protein-coding region robustly, we found no RNA-seq 445 evidence to support expression of the non-coding second exon, in spite of its sequence 446 conservation (Fig. S8). Based on its high level of sequence conservation, consistent genomic location and expression in a variety of species, it is possible that what we now consider to be 448 the 3' untranslated region of atlas from D. melanogaster was, ancestrally, a non-coding RNA.

449
The FlyBase database reports two transcript isoforms of atlas in D. melanogaster: the 450 atlas-RA isoform is 986 nucleotides, while the atlas-RB isoform is 1528 nt. These isoforms 451 differ in how much of the second, non-coding exon is included in the transcript. We used RT-

452
PCR of whole male cDNA to assess the presence of these isoforms and their relative 453 abundances. Primers designed to amplify a region present in both isoforms produced products 454 that appeared more abundant than primers designed to amplify only the long isoform, even 455 though both primer pairs appeared to amplify genomic DNA with equal efficiency. Based on 456 RT-PCR band intensities and controlling for product size and genomic PCR band intensities, we 457 estimated that the short isoform is about 3-fold more abundant. This difference in abundance is 458 mirrored in available RNA-seq data, which show approximately 3-to 4-fold higher levels of 459 expression in the upstream part of exon 2 (Fig. S8), a pattern that also appears in D. simulans 460 and D. sechellia. Evaluating the potential significance of this finding awaits functional 461 characterization of the non-coding region.

462
As we have observed for other putative de novo genes with essential male reproductive 463 functions [46], the pattern of atlas protein-coding sequence presence/absence across the 464 phylogeny is difficult to explain parsimoniously. If we assume that gene birth events are less 465 frequent than gene deaths, since the latter can occur through many possible mutational events 466 and can happen separately along multiple phylogenetic lineages, our data support the 467 hypothesis of a single origin of the protein-coding sequence at the base of the genus, followed 468 by independent loss events on the lineages leading to D. grimshawi, D. mojavensis and D.

469
willistoni, and potentially also D. pseudoobscura/persimilis (Table S2). We summarize these 470 findings for 12 representative species of Drosophila in Fig. 7. The general patterns of loss do 471 not change when all species of Table S2 are considered, though an additional loss in the 472 melanogaster group is likely due to the absence of a detectable ortholog in D. kikkiwai and D.
serrata. As noted above, the pattern of gene loss can also appear due to orthology detection 474 failure [74], for which we tried to account with our additional search methods described above.

475
We also note, however, that the probability of BLASTP-based ortholog detection failure is 476 relatively low for some Drosophila species that lack atlas, including D. pseudoobscura

524
Here, we screened 42 putatively de novo evolved genes for major effects on male D.

525
melanogaster reproduction. Our primary screen identified three genes whose knockdown  The null deletion and frameshift alleles of atlas all caused significantly reduced fertility 547 (Fig. 2), but the deletion allele resulted in essentially complete sterility, while residual fertility 548 remained in males homozygous for each frameshift allele. We noted above and depicted in Fig.

549
S2 how the frameshift alleles have the potential to encode an N-terminally truncated form of

550
Atlas that would contain amino acids 61-172 of the wild-type protein, if these alleles allow 551 translation initiation at the methionine-encoding codon 61. Such a truncated protein would be 552 35 percent shorter than wild-type, lack the predicted nuclear localization sequence, and have a 553 reduced isoelectric point of 7.1. Each of these factors could contribute to reduced functionality.

554
The broad sequence conservation of the 3' UTR across the Drosophila genus, including 555 in species that lack the atlas coding sequence, suggests the alternative possibility that the 3'

556
UTR also contributes to fertility. In one scenario, the 3' UTR could act as part of the atlas locus

563
Based on the current evidence, however, we think that the primary way in which atlas 564 impacts fertility is through its protein-coding sequence. This conclusion is supported by: the 565 ~60-90 percent reduction in fertility in even the frameshift mutants; the stability of the Atlas-GFP 566 fusion protein and its presence in the spermatid nuclear condensation stages that immediately 567 precede the timing of the null mutant phenotype; and, the observation that the protein-coding 568 region shows a more highly conserved expression pattern than the UTR region within the 569 melanogaster group (Fig. 7 and Fig. S8).

570
Atlas is an essential transition protein 572 Several lines of evidence suggest that Atlas is a transition protein that facilitates the 573 change from histone-based to protamine-based chromatin packaging in spermatid nuclei. Atlas 574 localizes throughout spermatid nuclei (Fig. 6) and has biochemical properties consistent with 575 direct DNA interaction. The protein appears specifically at the late canoe stage of nuclear 576 compaction ( Fig. 5B-D). Its lack of overlap with testis-specific histones (Fig. 5B), partial overlap 577 with Mst35Bb (Fig. 5D), removal from needle-stage nuclei (Fig. 6) and absence from mature   [87][88][89]. These latter two possibilities are illustrated by some of 651 the other spermatid chromatin binding proteins previously characterized as "non-essential" (e.g.,

707
The movement of the atlas protein-coding sequence to chromosome 2 also created the 708 two-exon gene observed in D. melanogaster, in which the longer second exon appears to be 709 entirely non-coding. This second exon is highly conserved across the genus in both sequence 710 and genomic location, and it shows male-specific expression in several species that lack the 711 protein-coding sequence upstream ( Fig. 7 and Fig. S8). These patterns of conservation 712 suggest that the second exon might originally have been a non-coding RNA, a class of molecule 713 whose importance in Drosophila male reproduction has recently become recognized [109,110]. which it is encoded from the X chromosome, or the genetic ablation of the conserved region in 722 species lacking the protein-coding sequence will provide additional insights.

723
A final issue raised by our results is the exact timing and mechanism of origin for the 724 atlas protein-coding sequence. The bioinformatic screen [19] that identified atlas and the other 725 genes tested in Fig. 1 was designed to identify both "de novo" genes, defined as protein-coding 726 regions in Drosophila that had recognizable, but non-ORF-maintaining, TBLASTN hits in outgroup species, and "putative de novo" genes, which had no TBLASTN hits in outgroup . Hence, in addition to using synteny to search for orthologs, we used HMMER, 740 which employs hidden Markov models and builds a sequence profile of the target protein using 741 information from multiple orthologs. Since HMMER also did not detect orthologs outside of 742 Drosophila, we hypothesize that atlas evolved de novo at the base of the genus. However, 743 since we remain unable to identify the non-protein-coding sequence from which atlas arose, we 744 continue to refer to atlas as a putative de novo gene [5].

745
Overall, we find that while many putative de novo evolved genes are expressed in the D.

746
melanogaster testes, few have major, non-redundant effects on fertility. However, several such 747 genes have evolved critical roles at distinct stages of spermatogenesis and sperm function. We 748 showed previously that the putative de novo gene saturn is required for maximal sperm 749 production, as well as for the ability of transferred sperm to migrate successfully to sperm 750 storage organs in females [46]. Another putative de novo gene, goddard, is required for sperm 751 production and encodes a cytoplasmic protein that appears to localize to elongating axonemes 752 [20,46]. Loss of goddard impairs the individualization of spermatid bundles [20], thus exerting an effect that appears to be upstream of those observed for saturn and atlas. Here, we report 754 another novel function for a putative de novo gene: encoding an essential transition protein that 755 is necessary for proper nuclear condensation in spermiogenesis. Taken together, these results 756 demonstrate that while many de novo genes may play subtle roles or share functional 757 redundancy with other genes, de novo genes can also become essential players in complex 758 cellular processes that mediate successful reproduction.

762
De novo and putative de novo genes inferred to be no older than the Drosophila genus 763 were identified previously [19]. We filtered these genes with publicly available RNA-seq data

775
Houston, TX, USA) and crossed into a y v background to screen for v + . We attempted at least 776 two rounds of transgenic production for each gene. In total, we were able to obtain and test

777
RNAi lines for 57 of the 96 identified genes. Table S3 shows all RNAi lines used and lists the 778 short hairpin sequences cloned for the TRiP lines we constructed.

779
We initially screened males knocked down for each candidate gene for major fertility 780 defects by crossing groups of 7 knockdown or control males to 5 virgin Canton S females, 781 letting the adults lay eggs for ~48 hours, and then discarding adults and quantifying the resulting 782 progeny by counting the pupal cases, as previously described [46]. To assess the degree of 783 knockdown achieved, 10 whole males of each line were homogenized in TRIzol reagent (Life

784
Technologies, Carlsbad, CA). RNA isolation, DNAse treatment, cDNA synthesis and semi-785 quantitative RT-PCR with gene-specific primers were performed as previously described; amplification of RpL32 was used as a positive control [46]. We evaluated knockdown efficiency 787 by agarose gel electrophoresis of RT-PCR products as one of four levels: "complete" if no 788 product from the knockdown cDNA sample was visible via agarose gel electrophoresis; "near 789 complete" for a very faint knockdown product that was also much less abundant than the control 790 product; "partial" for a more robust knockdown product that was still visibly less intense than 791 control; and "not knocked down" if the product intensity for the knockdown sample equaled or 792 exceeded that of the control. Any gene that did not show at least partial knockdown was 793 discarded from further analysis, leaving a total of 42 genes successfully screened.

807
We also constructed three frameshift, expected loss-of-function alleles for atlas by using 808 CRISPR to induce non-homologous end joining at a single PAM site just downstream of the 809 atlas start codon. Vasa-Cas9 embryos were co-injected and screened for w-progeny as 810 described above. We then used squish preps to isolate DNA from G1 flies and used a PCR-

811
RFLP assay to detect mutations. PCR products spanning the gRNA-targeted site were digested with BfaI (New England Biolabs (NEB), Ipswich, MA); undigested products in which the 813 expected BfaI site was lost indicated a mutation, which was balanced and then confirmed by 814 PCR and sequencing of homozygous mutant lines.

815
We used scarless CRISPR editing and homology-directed repair (HDR) to insert the 816 GFP protein-coding sequence in-frame at the end of the atlas protein sequence (see Fig. S6

871
To assess the level of fertility conferred by the atlas-GFP allele, we crossed atlas-GFP 872 and w 1118 flies to Datlas/SM6. Males with genotypes atlas-GFP/Datlas and +/Datlas were 873 compared using the single-pair fertility assay described above.

874
To observe the production of sperm in knockdown or mutant males, we introduced the 875 Mst35Bb-GFP marker into these males, which labels mature sperm and late-stage spermatid 876 nuclei with GFP [51]. Samples were prepared, imaged and analyzed as described previously

880
We used the Gateway cloning system (Thermo) to construct an atlas-GFP transgene 881 expressed under UAS control (primers in Table S4). The atlas protein-coding sequence in 882 pENTR was recombined with pTWG (Drosophila Genomics Resource Center, T. Murphy) as 883 described above. The resulting plasmid was then inserted into w-flies using P-element-884 mediated transposition (Rainbow Transgenics), w+ G1s were selected, and several 885 independent insertions were balanced. We crossed male UAS-atlas:GFP flies to females from 886 two different driver lines: tubulin-GAL4 (to drive ubiquitous expression) and Bam-GAL4 (to drive 887 expression in the early germline). We dissected larval salivary glands of the tub>atlas:GFP 888 males, since these cells are exceptionally large and ideal for visualizing subcellular localization.

904
Earlier nuclear stages were visualized with histone H2AvD-RFP (BDSC stock #23651), which is 905 present in round spermatid nuclei and the earliest stages of nuclear elongation [53,121].

906
Images with H2AvD-RFP were obtained with epifluorescence microscopy, since we lack an 907 appropriate confocal laser for RFP.

908
To examine spermatid nuclei at various stages of condensation, we visualized nuclear

921
stacks were taken on a Leica SP5 microscope, images were captured by LASAF, and ImageJ 922 was used to flatten stacks into a single, two-dimensional image. All intact nuclear bundles were 923 counted for each dissection.

924
For the experiments measuring nuclear condensation stage (Table S1), a sample size of 925 N = 10 for each genotype was selected based on the magnitude of the atlas null phenotype and 926 the consistent differences observed in previous dissections of these genotypes with Mst35Bb-

927
GFP. Likewise, for the IC-nuclear bundle association and IC progression analysis (Fig. 3C-D

961
We analyzed the molecular evolution of the atlas protein-coding sequence by obtaining