Adaptive Evolution of Animal Proteins over Development: Support for the Darwin Selection Opportunity Hypothesis of Evo-Devo

Abstract A driving hypothesis of evolutionary developmental biology is that animal morphological diversity is shaped both by adaptation and by developmental constraints. Here, we have tested Darwin’s “selection opportunity” hypothesis, according to which high evolutionary divergence in late development is due to strong positive selection. We contrasted it to a “developmental constraint” hypothesis, according to which late development is under relaxed negative selection. Indeed, the highest divergence between species, both at the morphological and molecular levels, is observed late in embryogenesis and postembryonically. To distinguish between adaptation and relaxation hypotheses, we investigated the evidence of positive selection on protein-coding genes in relation to their expression over development, in fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. First, we found that genes specifically expressed in late development have stronger signals of positive selection. Second, over the full transcriptome, genes with evidence for positive selection trend to be expressed in late development. Finally, genes involved in pathways with cumulative evidence of positive selection have higher expression in late development. Overall, there is a consistent signal that positive selection mainly affects genes and pathways expressed in late embryonic development and in adult. Our results imply that the evolution of embryogenesis is mostly conservative, with most adaptive evolution affecting some stages of postembryonic gene expression, and thus postembryonic phenotypes. This is consistent with the diversity of environmental challenges to which juveniles and adults are exposed.


Introduction
There are two main models to explain the relationship of development and evolutionary divergence. The early conservation model suggests that embryonic morphology between different species within the same group progressively diverges across development (Von-Baer 1828); such groups are usually understood to be phyla in a modern context. In contrast, the hourglass model proposes that middle development (the morphological "phylotypic" period) has the highest morphological similarity (Duboule 1994;Raff 1996). On the basis of recent genomic studies, both models have some level of molecular support. Some studies support the early conservation model (Roux and Robinson-Rechavi 2008;Artieri et al. 2009), while most recent ones support the hourglass model (Kalinka et al. 2010;Irie and Kuratani 2011;Levin et al. 2012;Quint et al. 2012;Drost et al. 2015;Hu et al. 2017;Zalts and Yanai 2017). And in fact the two models may not be mutually exclusive (Piasecka et al. 2013;Liu and Robinson-Rechavi 2018).
Both the early conservation and hourglass models predict that late development has high evolutionary divergence. This high divergence of late development has been interpreted as a consequence of relaxed developmental constraints, that is, weaker negative selection. For example, Garstang (1922) and Riedl (1978) suggested that the development of later stages is dependent on earlier stages, so higher divergence should be found in the later stages of development (cited in Irie and Kuratani 2014). Indeed, many studies have found evidence for relaxed purifying selection in late development (Castillo-Davis and Hartl 2002;Roux and Robinson-Rechavi 2008;Artieri et al. 2009;Kalinka et al. 2010;Liu and Robinson-Rechavi 2018). An alternative explanation, however, known as Darwin's "selection opportunity" hypothesis (Darwin 1871) (cited in Artieri et al. 2009), proposed that highly divergent late development could also be driven by adaptive evolution (positive selection), at least in part. This could be due to the greater diversity of challenges to which natural selection needs to respond in juvenile and adult life than in early and middevelopment. Notably, weaker negative and stronger positive selections are not mutually exclusive. For example, Cai and Petrov (2010) found the accelerated sequence evolution rate of primate lineage specific genes driven by both relaxed purifying selection and enhanced positive selection. Necsulea and Kaessmann (2014) suggested that the high evolution rate of testis transcriptome could be caused by both sex-related positive selection and reduced constraint on transcription.
As far as we know, few studies have tried to distinguish the roles of adaptation versus relaxation of constraints in late development (Artieri et al. 2009), and no evidence has shown stronger adaptive evolution in late development. Yet there is an intuitive case for adaptation to act on phenotypes established in late development, because they will be present in the juvenile and adult, and interact with a changing environment.
In the case of detecting individual gene adaptation, one of the best established methods is using the ratio x of nonsynonymous (dN) to synonymous (dS) substitutions (Yang and Nielsen 1998;Hurst 2002). Because synonymous changes are assumed to be functionally neutral, x > 1 indicates evidence of positive selection. As adaptive changes probably affect only a few codon sites and at a few phylogenetic lineages, branch-site models allow the x ratio to vary both among codon sites and among lineages (Yang and Nielsen 2002;Zhang et al. 2005). Polymorphism-based methods such as frequency spectrum, linkage disequilibrium and population differentiation can also be used to identify changes due to recent positive selection (Vitti et al. 2013).
As several genes with slight effect mutations can act together to have a strong effect, adaptive evolution can act on the pathway level as well (Daub et al. 2013;Berg and Coop 2014). In the case of polygenic adaptation, a gene set enrichment test has successfully been applied to detect gene sets with polygenic adaptive signals (Daub et al. 2013;Daub et al. 2017). This gene set enrichment analysis allows to detect weak but consistent adaptive signals from whole genome scale, unlike traditional enrichment tests which only consider top scoring genes with an arbitrary significance threshold.
In order to estimate the contribution of positive selection to the evolution of highly divergent late development, we have adopted three approaches. First, we used modularity analysis to obtain distinct sets of genes (modules) which are specifically expressed in different meta developmental stages (Piasecka et al. 2013;Levin et al. 2016), and compared the signal of positive selection across modules. Second, we applied a modified "transcriptome index" (Domazet-Loso and Tautz 2010) to measure evolutionary adaptation on the whole transcriptome level. Finally, we used a gene set enrichment approach to detect polygenic selection on pathways and studied the expression of these gene sets over development. Each approach was applied to developmental transcriptomes from Danio rerio, Mus musculus, and Drosophila melanogaster and to results of the branch-site test for positive selection in lineages leading to these species. All the analyses found a higher rate of adaptation in late and in some stages of postembryonic development, including adult.

Results
In order to characterize the signal of positive selection, we used the log-likelihood ratio test statistic (DlnL) of H 1 to H 0 models with or without positive selection, from the branchsite model (Zhang et al. 2005) as precomputed in Selectome on filtered alignments ) and as used in Roux et al. (2014) and Daub et al. (2017). Briefly, DlnL represents the evidence for positive selection, thus a branch in a gene tree with a higher value indicates higher evidence for positive selection for this gene over this branch.

Modularity Analysis
For the modularity analysis, we focused on different sets of specifically expressed genes (modules) in each developmental period. Our expectation is that genes in each module have specific involvement during embryonic development (Piasecka et al. 2013), so different adaptation rates of these genes should reflect a stage-specific impact of natural selection. In addition, as the modules decompose the genes into different meta development stages, they allow to avoid the potential bias caused by imbalanced time points in each meta development stage from our transcriptome data sets; for example, many more "late development" samples in fly than in the other two species studied. For D. rerio, we obtained seven modules from our previous study (Piasecka et al. 2013 Because not all genes have any evidence for positive selection, we first compared the proportion of genes either with strong evidence (q-value < 0.2) or with weak evidence (no threshold for q-value; DlnL > 0) of positive selection across modules. For strong evidence, the proportion is not significantly different across modules in M. musculus and D. melanogaster (supplementary fig. S2, Supplementary Material online). In D. rerio, however, there is a higher proportion in the juvenile and adult modules. For the weak evidence, D. melanogaster has a higher proportion in pupae and adult modules, but there is no significant difference in D. rerio and M. musculus (supplementary fig. S3, Supplementary Material online).
We then compared the values of DlnL for genes with weak evidence of positive selection ( fig. 1). In order to improve the normality of nonzero DlnL, we transformed DlnL with fourth root (Hawkins and Wixley 1986;Roux et al. 2014;Daub et al. 2017).
In D. rerio, we detected an hourglass pattern of DlnL, at its highest in late modules. Specifically, in the juvenile module, the mean DlnL is significantly higher than the mean DlnL for all genes (P-values reported in table 1). We note that the adult module also has higher mean DlnL, even though it is not significant. In the pharyngula module, the mean DlnL is significantly lower than the mean DlnL for all genes, as expected under the hourglass model. In the other modules, the mean DlnL is not significantly different from the mean for all genes.
In M. musculus, similarly, we found an hourglass pattern of DlnL. The late embryo module has a higher mean DlnL than all genes, while the middle embryo module has a lower mean DlnL than all genes.
In D. melanogaster, however, we observed an early conservation pattern of DlnL. Specifically, in the early embryo module, the mean DlnL is lower than the mean DlnL for all genes. In the adult module, the mean DlnL is higher than the mean Adaptive Evolution of Animal Proteins . doi:10.1093/molbev/msy175 MBE DlnL for all genes. There is no significant difference for the other modules.
It should be noted that the patterns reported in this modularity analysis are relatively weak, especially in D. melanogaster. After multiple test correction, some of the reported differences are not significant anymore (supplementary table S1, Supplementary Material online).
Overall, these findings suggest that positive selection is stronger on genes expressed in late development or in adult than in early and middle development. It also indicates that DlnL on gene modules in different phyla supports different evolutionary developmental biology (Evo-Devo) models (hourglass vs. early conservation).

Transcriptome Index Analysis
Although modularity analysis guarantees independence between the sets of genes which are compared, it only considers a subset of genes. This leaves open whether the higher adaptive evolution in late development and adult holds true for the whole transcriptome as well, or just for these modular genes. Additionally, while trends were detected, significance is weak. To consider the composition of the whole transcriptome and to increase our power to detect a signal of positive selection in development, we used a modified "Transcriptome Age Index" (Domazet-Loso and Tautz 2010) to calculate the weighted mean of DlnL for the transcriptome. Notably, all expression levels were log-transformed before use, unlike in Domazet-Loso and Tautz (2010). See discussion in Piasecka et al. (2013) and Liu and Robinson-Rechavi (2018), but briefly log-transformation provides insight on the overall transcriptome rather than a small number of highly expressed genes. We named this modified index "Transcriptome Likelihood Index" (TLI). A higher index indicates that the transcriptome has higher expression of transcripts from genes with high DlnL between models with and without positive selection.
In D. rerio, generally, the pattern resembles an hourglasslike pattern ( fig. 2). The TLI first decreases and reaches a minimum in the late stage of gastrula (8 h), and then progressively increases until adult (ninth month), with finally a slight decline. In addition, in the adult stage, female has higher TLI than male, although the difference is weak. To test whether TLIs are different between developmental periods, we compared the mean TLI of all stages within a period, between each pair of periods (see Materials and Methods). We found that middle development has low TLI, early development has medium TLI, late development and maternal stage have very similar high TLI, and adult has the highest TLI. Except late development and maternal stage (P ¼ 0.24), all pairwise comparisons are significant: P < 5.7e-07.
In M. musculus, we observed a clear hourglass-like pattern of TLI. For the mean TLI comparison, we found low TLI in middle development, medium TLI in early development, high TLI in late development, and the highest TLI in maternal stage (all pairwise comparisons are significant: P < 2e-16). Of note, unlike in D. rerio, the "late development" here only contains late embryo stages, but no postembryo stages. This may explain why late development has lower TLI than the maternal stage in this data set.
In D. melanogaster, we found the TLI progressively increasing over development, suggesting an early conservation model. Unlike in D. rerio, we found that male has higher TLI than female in the adult stage. For the mean TLI comparison, early development has low TLI, middle development has medium TLI, late development has high TLI, and adult has the highest TLI (all pairwise comparisons are significant: P < 2e-16). Red (respectively blue) dots indicate modules for which the mean of DlnL is significantly (P < 0.05) higher (respectively lower) than the mean of DlnL from all modules. The green dashed line denotes the mean value of DlnL from all modular genes.

MBE
As in the modularity analysis, but with much stronger signal, both D. rerio and M. musculus support the hourglass model, while D. melanogaster follows an early conservation model. Again, from whole transcriptome level, these results indicate that genes with evidence for positive selection are more highly expressed in late development and adult. Interestingly, the maternal stage has a comparable high TLI to late development. This could be related to the maternal stage being dominated by adult transcripts (Tadros and Lipshitz 2009). In this respect (transcriptome evolution), the maternal stage should maybe be regarded as a special adult stage rather than as an early embryonic stage.

Polygenic Selection Analysis
Positive selection can be detected at the biological pathway level, even when individual genes within the pathway only fix small effect mutations (Daub et al. 2013(Daub et al. , 2017Berg and Coop 2014). Thus, we searched for such signals of positive selection on pathways. Briefly, we calculated the sum of DlnL (SUMSTAT statistic) for a pathway and inferred the significance of this SUMSTAT with an empirical null distribution (Tintle et al. 2009;Daub et al. 2013Daub et al. , 2017. In total, we identified 10, 4, and 9 pathways with a significant signal of positive selection, respectively, in lineages leading to D. rerio, M. musculus, and D. melanogaster (q-value < 0.2, table 2).
The function of these pathways, while not our primary focus, is consistent with adaptive evolution of juvenile or adult phenotypes. First, we found metabolism-related pathways in all three species, suggesting pervasive adaptation, possibly related to diet; this is consistent with previous results in primates (Daub et al. 2017). Second, in D. rerio and D. melanogaster, several pathways are involved in morphogenesis and remodeling of organs (e.g., laminin interactions, extracellular matrix [ECM], and ECM-receptor interaction), suggesting potential adaptive evolution of morphological development. Third, there are several pathways involved in aging in D. melanogaster and M. musculus (e.g., reactive oxygen detoxification, longevity regulation, and mitochondrial translation), suggesting potential role of natural selection on modulating lifespan or on metabolic activity. Forth, in D. rerio, we detected one pathway related to environmental adaptation: Visual phototransduction; adaptations in vision are expected for aquatic species which under a wide variety of visual environments (Sabbah et al. 2010).
If late development and adult are under stronger positive selection at the pathway level as well, we expect genes involved in pathways with a signal of positive selection to be more highly expressed at these periods. Thus, we computed the ratio of median expression between positively selected pathway genes and genes included in pathways not positively selected. As the median expression in the first time point of M. musculus is 0, we removed it from our analysis.
In D. rerio, the ratio of median expression keeps increasing until the juvenile stage. Then, it slightly decreases ( fig. 3). In M. musculus, except the first time point, the ratio of median expression also progressively increases. In D. melanogaster, there is a small peak in the first time point, but it quickly decreases to minimum within the same developmental Adaptive Evolution of Animal Proteins . doi:10.1093/molbev/msy175 MBE period. Then, it keeps increasing until the middle of the larval stage. Finally, for the last development stages, it resembles a wave pattern: Decrease, increase, and decrease again. Again, we also tested the difference between male and female in adult stages for D. rerio and D. melanogaster. Unlike the observation in the transcriptome index analysis, here we found that male has higher ratio of median expression than female in both species.
Overall, consistent with previous results, we found that late development and adult tend to express genes involved in pathways enriched for signal of positive selection, indicating that adaptive evolution at the pathway level mainly affects these stages. While there is some signal of early development adaptive evolution on single genes, the later developmental signal is more consistent at the pathway level. Because pathways link genes to phenotypes (Müller 2007;

Correcting Confounding Factors
As some nonadaptive factors (such as gene length, tree size [number of branches], and branch length) can be correlated with DlnL and affect our results (Daub et al. 2017), we investigated the correlation between DlnL and these potential confounding factors. Generally, we found a small correlation between DlnL and tree size, but a larger correlation between DlnL and gene length or branch length (supplementary fig. S4, Supplementary Material online). One explanation for this high correlation between DlnL and gene length is that long genes could accumulate more mutations than short genes, so we have more power to detect positive selection with higher number of mutations (Fletcher and Yang 2010;Gharib and Robinson-Rechavi 2013). So, we checked the influence of gene length on our results. Because branch length is inferred from the number of mutations, and higher branch length can be driven by higher evolutionary rate due to positive selection, we did not check further the correlation between DlnL and branch length. In order to investigate whether gene length might have affected our results, for modularity and TLI analysis, we tested whether patterns purely based on gene length are similar to those based on DlnL or not. Surprisingly, we found an opposite pattern of gene length, relative to DlnL. For modularity analysis, the modules with higher DlnL have significantly lower mean gene length than all genes (supplementary fig.  S5, Supplementary Material online). For transcriptome index analysis, the stages with higher TLI trend to have lower transcriptome index for gene length (supplementary fig. S6, Supplementary Material online), suggesting that these stages trend to express shorter genes. These findings imply that the detection of higher positive selection in late development is not driven by gene length.
Immune system genes can bias positive selection analyses, as they evolve under pervasive positive selection (Flajnik and Kasahara 2010). To control for this, we also confirmed our findings after removing immune genes from our analysis (supplementary fig. S7, Supplementary Material online).

Developmental Constraint Hypothesis and Darwin's Selection Opportunity Hypothesis
Despite the repeated observation that late development is highly divergent for diverse genomic properties (sequence evolution, duplication, gene age, and expression divergence) in diverse animal species (Roux and Robinson-Rechavi 2008;Domazet-Loso and Tautz 2010;Kalinka et al. 2010;Irie and Kuratani 2011;Levin et al. 2012;Piasecka et al. 2013;Drost et al. 2015;Liu and Robinson-Rechavi 2018), the underlying evolutionary forces driving such a pattern remain obscure. The "developmental constraint" hypothesis (Raff 2000;Brakefield 2006) suggests that this high divergence is due to relaxed purifying selection, whereas Darwin's "selection opportunity" hypothesis proposes stronger positive selection (as discussed in Artieri et al. 2009;Kalinka and Tomancak 2012).
Several studies have found evidence, direct or indirect, to support the importance of developmental constraints (Castillo-Davis and Hartl 2002; Roux and Robinson-Rechavi 2008;Artieri et al. 2009;Kalinka et al. 2010). For example, we (Roux and Robinson-Rechavi 2008) found that genes expressed earlier in development contain a higher proportion of essential genes, and Uchida et al. (2018) found strong embryonic lethality from random mutations in early development. Weaker purifying selection in late development would imply that genes expressed in this period have less fitness impact, which is consistent with the paucity of essential genes. Here and in Liu and Robinson-Rechavi (2018), the branch-site codon model allows us to isolate the contribution of purifying selection to coding sequence (CDS) evolution. We found indeed that genes under weaker purifying selection on the protein sequence trend to be expressed in late development (Liu and Robinson-Rechavi 2018). This provides direct evidence of relaxed purifying selection in late development.
To the best of our knowledge, there has been no direct test of Darwin's "selection opportunity" hypothesis. One such study, in D. melanogaster, was proposed by Artieri et al. (2009). Unfortunately, they only had relatively poor expression data (expressed sequence tags) and limited time points (embryonic, larval/pupal, and adult), and they did not find any direct evidence of higher positive selection in late development. As they noticed that the accelerated sequence evolution of genes expressed at adult stage was confounded by male-biased genes, they argued that the rapid evolution observed in late development could be due to specific selective pressures such as sexual selection. A recent study, in D. melanogaster, provides indirect evidence: Using in situ expression data and population genomic data to map positive selection to different embryonic anatomical structures, Salvador-Mart ınez et al. (2018) found larva stage enriched with signal of positive selection. Our results clearly provide a quantitative test which supports a role of positive selection in the high divergence of late development. While our sampling is very far from covering the diversity of developmental modes of animals, we show consistent patterns in a placental mammal, a direct development ray-finned fish, and a holometabolous insect. While it is possible that other patterns will be found in species with different development, this shows that adaptation in late development is not limited to one model. We show that this is not due to testis-expressed genes (supplementary fig. S8, Supplementary Material online). In addition, in vertebrates, we also found some evidence of adaptive evolution in early development on single genes. This indicates that some changes in early development might be adaptive consequences to diverse ecological niches, as proposed by Kalinka and Tomancak (2012). It should be noted that our results also provide counter evidence to the adaptive penetrance hypothesis, which argues that adaptive evolution mainly occurs in the middle development (Richardson 1999

Reunification of Structuralist and Functionalist Comparative Biology
There have been two major approaches to comparative biology since the late 18th century: The structuralist approach (which gave rise to Evo-Devo) emphasizes the role of constraints and often focuses on investigating spatial and timing variations of conserved structures in distantly related species. In a modern context, the focus is often on comparing developmental genes' expression between species. The functionalist or adaptationist approach (which gave rise to the Modern Synthesis and most of evolutionary biology) emphasizes the role of natural selection. In a modern context, the focus is often on investigating adaptive mutations. It has been suggested that these two approaches could not be reconciled (Amundson 2007), as the former underscores how mutations generate morphological diversity, while the later underscores whether mutations are fixed by positive selection or not. A good example of the differences between structuralist and adaptationist comes from the debate between Hoekstra and Coyne (2007) and Carroll (2008). As a structuralist, Carroll suggested that mutations affecting morphology largely occur in the cis-regulatory regions. However, as adaptationists, Hoekstra and Coyne argued that this statement is at best premature. Their main argument was that they did not find that adaptive evolution was more likely to occur in cis-regulatory elements, but rather in protein-coding genes, from both genome-wide surveys and single-locus studies. It is important to note that Carroll's theory is specific to morphological evolution, but not directly related to evolutionary adaptation. Basically, both sides could be correct and were mostly discussing different things.
As both adaptation and structure are part of biology, we should be able to explain both in a consistent manner. Here, we try to bridge positive selection and morphological evolution by combining developmental time-series transcriptomes, positive selection inference on protein-coding genes, modularity analysis, transcriptome index analysis, and gene set analysis. From both modularity analysis and transcriptome index analysis, we found that genes highly expressed in late development and adult have higher evidence for positive selection. From polygenic analysis, we found that the expression of positively selected pathways is higher in late development and adult. Overall, these results suggest that higher morphological variation in late development could be at least in part driven by adaptive evolution. In addition, CDS evolution might also make a significant contribution to the evolution of morphology, as suggested by Hoekstra and Coyne (2007) and Burga et al. (2017). This is also supported by the observation of tissue-specific positive selection in D. melanogaster development (Salvador-Mart ınez et al. 2018). It should be noted that we do not test here whether regulatory sequence evolution plays a similar or greater role, as we do not have equivalent methods to test for positive selection in regulatory regions.

Materials and Methods
Data files and analysis scripts are available on our GitHub repository: https://github.com/ljljolinq1010/Adaptive-evolution-in-late-development-and-adult.

Expression Data Sets
For D. rerio, the log-transformed and normalized microarray data were downloaded from our previous study (Piasecka et al. 2013). These data include 60 stages from egg to adult, which originally come from Domazet-Loso and Tautz (2010). For M. musculus, the processed RNA-seq (normalized but nontransformed) data were retrieved from Hu et al. (2017). These data include 17 stages from 2 cells to E18.5. We further transformed it with log 2 .
For D. melanogaster, we obtained processed (normalized but nontransformed) RNA-seq data from http://jsb.ucla.edu/ software-and-data, accessed July 2016 (Li et al. 2014), which originally come from Graveley et al. (2011). These data have 27 stages from embryo to adult. For the last three stages, as data were available for male and female, we took the mean. We further transformed it with log 2 .

Branch-Site Likelihood Test Data
The log-likelihood ratio (DlnL) values of a test for positive selection were retrieved from Selectome ), a database of positive selection based on the branch-site likelihood test (Zhang et al. 2005). One major advantage of this test is allowing positive selection to vary both among codon sites and among phylogenetic branches. The branch-site test contrasts two hypotheses: The null hypothesis is that no positive selection occurred (H0) in the phylogenetic branch of interest, and the alternative hypothesis is that at least some codons experienced positive selection (H1). The log-likelihood ratio statistic (DlnL) is computed as 2*(lnLH1 À lnLH0). Importantly, in order to mitigate false positives due to poor sequence alignments, Selectome integrates filtering and realignment steps to exclude ambiguously aligned regions.
We used DlnL from the Clupeocephala branch, the Murinae branch, and the Melanogaster group branch for D. rerio, M. musculus, and D. melanogaster, respectively. One gene could have two DlnL values in the focal branch because of duplication events. In this case, we keep the value of the branch following the duplication and exclude the value of the branch preceding the duplication.

Pathways
We downloaded lists of 1,683 D. rerio gene sets, 2,269 M. musculus gene sets, and 1,365 D. melanogaster gene sets of type "pathway" from the NCBI Biosystems Database (Geer et al. 2010). This is a repository of gene sets collected from manually curated pathway databases, such as BioCyc (Caspi et al. 2014), KEGG (Kanehisa et al. 2014), Reactome (Croft et al. 2014), The National Cancer Institute Pathway Interaction Database (Schaefer et al. 2009), and Wikipathways (Kelder et al. 2012).

CDS Length
We extracted CDS length from Ensembl version 84 (Yates et al. 2016) using BioMart (Kinsella et al. 2011). For genes with several transcripts, we used the transcript with the maximal CDS length. Liu and Robinson-Rechavi . doi:10.1093/molbev/msy175 MBE Testis-Specific Genes Testis-specific genes for M. musculus and D. melanogaster were obtained from a parallel study (Liu and Robinson-Rechavi 2018). The testis-specific genes were defined as genes with highest expression in testis and with tissue specificity value !0.8.

Immune Genes
To control for the impact of immune system genes, we downloaded all genes involved in the "immune response" term (GO:0006955) from AmiGO (Carbon et al. 2009) (accessed on April 25, 2018, and repeated analyses with these genes excluded.

Module Detection
For D. rerio, we obtained seven modules from our previous study (Piasecka et al. 2013). This is based on the Iterative Signature Algorithm, which identifies modules by an iterative procedure (Bergmann et al. 2003). Specifically, it was initialized with seven artificial expression profiles, similar to presented in supplementary figure S11, Supplementary Material online. Each profile corresponds to one of the zebrafish meta developmental stages. Next, the algorithm will try to find genes with similar expression profiles to these artificial ones through iterations until the processes converges. This method has proven to be very specific, but lacks power with medium or small data sets (<30 time points). For M. musculus and D. melanogaster, the sample size is not enough, so we used the method introduced by Levin et al. (2016). Firstly, we generated standardized gene expression for each gene by subtracting its mean (across all stages) and dividing by its standard deviation. Next, we calculated the first two principal components of each gene based on the standardized expression across development. As the expression was standardized, the genes form a circle with scatter plot (supplementary fig.  S9, Supplementary Material online). Then, we computed the four-quadrant inverse tangent for each gene based on its principal components and sorted these values to get gene expression order from early to late (supplementary fig. S10, Supplementary Material online). Next, we performed Pearson correlation of the standardized expression and idealized expression profile of each module (supplementary fig. S11, Supplementary Material online). Finally, for each module, we defined genes with correlation coefficient rank in top 10% as modular genes. Clearly, the genes in earlier modules have higher gene orders (supplementary fig. S9, Supplementary Material online).

Randomization Test of Modularity Analysis
For each module, we randomly chose the same number of DlnL from all modular genes (genes attributed to any module in that species) without replacement and calculated the mean value. We repeated this 10,000 times and approximated a normal distribution for the mean value of DlnL. The P-value that the mean value of interested module is higher (or lower) than the mean value from all modular genes is the probability that the randomly sampled mean value of DlnL is higher (or lower) than the original mean value of DlnL. In the same way, we also estimated the P-value of the median DlnL value.

Transcriptome Index of Log-Likelihood Ratio (TLI)
The TLI is calculated as: where s is the developmental stage, DlnL i is the value of loglikelihood ratio for gene i, n is the total number of genes, and e is is the log-transformed expression level of gene i in developmental stage s. Here, we used all DlnL values without applying any cut-off on DlnL or the associated P-value. For genes with DlnL < 0, we replaced it with 0. For M. musculus, we calculated the TLI from a merged data set, instead of computing it on two data sets separately.

Polynomial Regression
For polynomial regression analysis, we keep increasing the degree of polynomial model until no further significant improvement (tested with ANOVA, P < 0.05 as a significant improvement). For M. musculus, as the development time points in transcriptome data set are close to uniformly sampled, we used the natural scale of development time for regression. For Caenorhabditis elegans, D. melanogaster, and D. rerio, however, we used the logarithmic scale to limit the effect of postembryonic time points.

Detection of Polygenic Selection
We performed a gene set enrichment approach to detect polygenic signals of positive selection on pathways (Ackermann and Strimmer 2009;Daub et al. 2013Daub et al. , 2017. For each pathway, we calculated its SUMSTAT score, which is the sum of DlnL of all genes within this pathway. The DlnL Adaptive Evolution of Animal Proteins . doi:10.1093/molbev/msy175 MBE values were fourth-root transformed. This approach makes the distribution of nonzero DlnL approximate normal distribution (Canal 2005;Roux et al. 2014;Daub et al. 2017). So, with fourth-root transformation, we limit the risk that the significant pathways we found be due to a few outlier genes with extremely high DlnL. The SUMSTAT score of a pathway is calculated as: where p represents a pathway, and DlnL i represents the value of log-likelihood ratio for gene i within pathway p. Pathways <10 DlnL values were excluded from our analysis. Like in TLI analysis, we used all DlnL values and replaced <0 values with 0.

Empirical Null Distribution of SUMSTAT
We used a randomization test to infer the significance of the SUSMTAT score of a pathway. To correct for the potential bias caused by gene length, we firstly created bins with genes that have similar length (supplementary fig. S12, Supplementary Material online). Secondly, we randomly sampled (without replacement) the same number of genes from each bin, to make the total number of genes equal to the pathway being tested. Thirdly, we computed the SUMSTAT score of the randomly sampled DlnL values. We repeated the second and third processes 1 million times. Fourthly, we approximated a normal distribution for SUMSTAT score of the interested pathway. Finally, the P-value was calculated as the probability that the expected SUMSTAT score is higher than the observed SUMSTAT score.
Removing Redundancy in Overlapping Pathways (Pruning) Because some pathways share high DlnL value genes, the identified significant pathways might be partially redundant. In other words, shared genes among several pathways can drive all these pathways to score significant. We therefore removed the overlap between pathways with a "pruning" method (Daub et al. 2013(Daub et al. , 2017. Firstly, we inferred the Pvalue of each pathway with the randomization test. Secondly, we removed the genes of the most significant pathway from all the other pathways. Thirdly, we ran the randomization test on these updated gene sets. Finally, we repeated the second and third procedures until no pathways were left to be tested. With this pruning method, the randomization tests are not independent and only the high-scoring pathways will remain, so we need to estimate the false discovery rate (FDR) empirically. To achieve this, we applied the pruning method to pathways with permuted DlnL scores and repeated it for 300 times. So, for each pathway, we obtained one observed P-value (P*) and 300 empirical P-values. The FDR was calculated as follows: where p 0 represents the proportion of true null hypotheses, V(P*) represents the estimated number of rejected true null hypotheses, and R(P*) represents the total number of rejected hypotheses. For p 0 , we conservatively set it equal to 1 as in Daub et al. (2017). ForV(P*), in each permutation analysis, we firstly calculated the proportion of P-value (from permutation analysis) P*. Then, the value ofV(P*) was estimated by the mean proportion of P-value (from permutation analysis) P* for the 300 permutation tests. For R(P*), we defined it as the number of P-value (from original analysis) P*. For q-value, we determined it from the lowest estimated FDR among all Pvalues (from original analysis) ! P*.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.