Evolution of regulatory networks controlling adaptive traits in cichlids

Seminal studies in vertebrate protein evolution concluded that gene regulatory changes likely drive anatomical innovations. Even in the unparalleled East African cichlid fish radiation, we demonstrate cis-regulatory divergence as a contributor to phenotypic diversity. To further investigate this mechanism, we extended the Arboretum algorithm, initially designed for yeast adaptation to study the evolution of regulatory expression divergence in complex vertebrate species. For the first time, we reconstruct tissue-specific gene regulatory networks underpinning evolutionary adaptations from multiple vertebrate species in a phylogeny. Our framework consists of identifying ancestral reconstructed and extant species gene co-expression modules and integrating their associated regulators to investigate gene regulatory network evolution contributing to traits of phenotypic diversity in the East African cichlids. Along the phylogeny, we identify tissue-specific co-expression patterns across six tissues of five cichlids species that were predicted to be regulated by divergent suites of regulators. We report striking cases of rapid network rewiring for adaptive trait genes, such as the visual system. In regulatory regions of visual opsin genes e.g. sws1, in vitro assays confirm that single nucleotide polymorphisms (SNPs) in transcription factor binding sites (TFBSs) have driven network rewiring between species sharing the same visual palette. SNPs overlapping TFBSs segregate according to phylogeny and ecology suggesting ecotype-associated network rewiring in East African cichlid radiations. Our unique integrative approach inferring multispecies regulatory networks allowed us to identify regulatory changes associated with traits of phenotypic diversity in radiating cichlids.


Introduction
Early studies analyzing protein evolution in vertebrates concluded that evolutionary changes in 'regulatory systems' likely drive anatomical innovation [1][2][3] . Central to this regulatory system is the binding of TFs to cis-regulatory elements (promoters and enhancers) that flank coding genes and control the spatial and temporal expression pattern of nearby/distant genes in developing organisms. As TFs can regulate several genes (nodes) independently, these regulatory interactions (edges) form gene regulatory networks (GRNs). Changes to GRNs can often be an important source of evolutionary innovation 4 ; defined as 'GRN rewiring' and characterized as regulatory edges present in one or more (rewired in) species but lost in any of the other species where the orthologous TF has another regulatory edge. Rewiring can occur through mutations within cis-regulatory regions, altering the TF binding capability and target gene expression; this is opposed to trans changes affecting the protein-coding sequence of TFs that are under stronger purifying selection as it would impair function and regulation of downstream targets 5 . Given all forms of evidence, there is no doubt that the evolution of cis-regulatory modifications driving gene expression evolution and GRN rewiring events plays a major role in species adaptation. In order to unravel the genetic basis of functional and physiological diversification we need to determine the relative combined contribution of coding sequences, regulatory sequence evolution, and gene expression in regulatory network evolution. Several studies have integrated some of these datasets, largely focusing on evolution of gene regulatory network inference in unicellular prokaryotes, E. coli 6 and several non-vertebrate eukaryotes, including yeast 7,8 , plants 9 , fruit-fly 10 and echinoderms 10,11 . Whilst there are efforts to collate and integrate several datasets for model vertebrates, including human and mouse 12 , very little is known about the phenotypic effect of genome-wide regulatory network divergence, especially amongst non-model vertebrates 13 .
In vertebrates, ray-finned fishes are the largest radiation of any group and therefore a powerful model group to focus on genome-wide patterns of regulatory network evolution.
Ray-finned fishes comprise ~30,000 species 14 , the vast majority of which are teleosts (~29,000) exhibiting a large diversity of body plans, behaviors and habitats. Amongst teleosts and even all vertebrates, the East African cichlid radiations represent arguably the most dramatic examples of adaptive speciation. In the great lakes of East Africa (Victoria, Malawi and Tanganyika) and within the last few million years 15,16 , one or a few ancestral lineages of haplochromine cichlid fish have given rise to over 1500 species.

Gene co-expression is tissue-specific and highlights functional evolutionary trajectories across cichlids
Using RNA-Seq data of five species and across 6 tissues in the Arboretum 7 algorithm, we identified ten modules of 12,051-14,735 co-expressed genes, with a per species average of 1,205-1,474 genes per module (Fig. 1a). Modules of coexpressed genes across the five species have varying expression levels ranging from strongly induced e.g. module 1 (eye), no change e.g. module 5 and weakly to strongly repressed e.g. module 3 (heart, kidney and muscle) (

State changes and variation in gene expression across tissues and species underpins divergence of regulatory networks
In our approach, Arboretum 7 module assignment is drawn from a prior distribution at the last common ancestor (LCA), allowing us to follow the evolution of gene co-expression and regulation based on ancestral assignments of the species tree 7 . Orthologous genes of each species can be assigned to non-orthologous modules (referred to as 'state changes') ( Fig S-R1a), indicative of potential co-expression divergence and transcriptional rewiring from the LCA. By examining module state changes along the phylogeny, we identified 655 unique state changes along ancestral nodes (Fig. 1b). In recently rapidly radiating haplochromines (M. zebra and P. nyererei) and cichlid lineages, this highlights expression divergence of regulatory TFs and developmental genes such as rxrb (nuclear receptor), shh and ihh (morphogenesis). Many important cellular and developmental TFs e.g. foxo1, hoxa11 and lbx1 have switched transcriptional programmes between the ancestral nodes (51 -Anc4/3; 20 -Anc3/2; 34 -Anc2/1). Using a measure of gene expression tissue-specificity, tau [54], we show that genes with no state change in module assignment (green bars) have an even, narrow to midintermediate breadth of expression whereas state changed genes (red bars) have a narrow to broad expression breadth (Fig. 1c), representative of orthologs clustering in non-orthologous modules (state changes). Cis-regulation underlying some of these transcriptional programmes was analyzed based on enrichment of TFBS motifs in gene promoter regions. Regulatory TFs are enriched in module gene promoters according to tissue-specific function; for example, promoters of module 1 genes (eye-specific expression) are significantly enriched (FDR <0.05) for TF motifs involved in retina-and lens-related development/functions e.g. CRX, PITX3 and OTX1 19 (Fig. S-R1c).
Variability in enrichment of retina/lens related TFs motifs e.g. RARα/β/γ and RXRα/β/γ 20 in all species module 1 gene promoters except N. brichardi, highlights differential gene regulatory programmes and networks underlying traits under selection in cichlids, like the visual system 21 .

Fine scale nucleotide variation at binding sites likely drives functional regulatory divergence in cichlids
Central to gene expression regulation are cis-regulatory elements (including promoters and enhancers) that harbor several TFBSs and mutations thereof, can drive GRN evolution, altering target gene transcription without affecting the expression pattern of genes co-regulated by the same TF. The impact of noncoding sequence variation on gene expression was tested based on the evolutionary rate of 4622 1:1 orthologous gene promoter sequences against synonymous (fourfold degenerate) sites of protein coding regions, used as a proxy for neutral evolution. In the five cichlid genomes, there is no significant increase in evolutionary rate at promoter regions compared to fourfolddegenerate sites (Fig. S-R2aA, C, E, and F), with also no difference in promoter evolutionary rate between 1) state-changed and non-state changed genes; and 2) genes with conserved and divergent expression (Fig. S-R2b). We identify very few outlier genes with significantly higher evolutionary rate at promoter regions than corresponding fourfold sites at ancestral nodes (12- GO enrichment analysis of cichlid pairwise SNPs overlapping gene regulatory regions highlight associations with key molecular processes e.g. signal transduction -non-state changed promoter TFBSs ( Fig. S-R2d). These findings imply that discrete nucleotide variation at regulatory binding sites likely drive functional gene co-expression variation in cichlids through GRN rewiring events.

Cichlid adaptations are likely driven through gene regulatory network rewiring events facilitated by discrete changes in regulatory binding sites
To study patterns of core and divergent regulatory interactions that could be associated with cichlid phenotypic diversity, we applied an integrated framework ( hence, disrupted binding sites will offer insights into TF-TG rewiring between species; 2) gene orthology is well characterized (as opposed to other regulators, like miRNAs); and 3) direct correlation to tissue co-expression patterns can be made (Fig. 1a). TF-TG sets of 1-to-1 orthologs allow network comparison along the whole phylogeny and are >59% conserved between species (Table S-R3b-c), likely representing core functional GRNs.
In comparison, when all TF-TG edges are included, the percentage conservation is lower (<41%) between species (Table S- To determine whether differentially co-expressed (state changed) TFs are rewired in TF-TG interactions, we first focus on 1-to-1 orthologous genes in TF-TG interactions, termed 'TF-TG 1-to-1 edges', along the five cichlid tree. These edges and focusing on TFs in particular, are highly associated with morphogenesis and cichlid traits under selection e.g. eye and brain development ( To determine the phylogenetic and ecological association of discrete regulatory changes, we analyzed patterns of SNP divergence in candidate gene promoter TFBSs identified in the five cichlids and 73 Lake Malawi species 18 (Fig. S-R3d). Depending on flanking sequence conservation, species with the same genotype as M. zebra may harbor the same gene promoter TFBS and if different, likely diverged. For example, we note scenarios of 1) SNP-TFBS genotype divergence in distantly-related clades e.g. NKX2.1-sws1; and 2) SNP-TFBS genotype conservation in mainly same/closely-related clades e.g. EGR2-cntn4. This shows that genes associated with traits under selection and cichlid phenotypic diversity e.g. visual systems 21 and morphogenesis 17 , harbor SNP genotypes overlapping TFBSs that segregate according to phylogeny and ecology of radiating lake species.

Regulatory networks are rewired in genes associated with traits under selection and underpinned by discrete cis-regulatory changes
Through our integrative approach, we examine the regulatory network topology of several genes associated with unique cichlid traits 24,25 that overlap our six studied tissues. One such trait involves visual system adaptation through the differential utilization of diversely expressed complements of seven cone opsin genes responsible for color vision 21 . The evolution of largely unexplored cichlid GRNs and diverse palettes of co-expressed opsins can induce large shifts in adaptive spectral sensitivity of adult cichlids and thus, we hypothesize that opsin expression diversity is the result of rapid adaptive evolution in cichlids. Indeed, by focusing on species utilizing the same wavelength visual palette and opsin genes, we previously note that several visual opsin genes (rh2b, sws1, sws2a and rho) have highly rewired regulatory networks (Extended Data Table S brichardi and M. zebra and in both species networks, we identify common regulators associated with retinal ganglion cell patterning e.g. SATB1 26 as well as several unique regulators associated with nuclear receptor signaling e.g. RXRB and NR2C2 27 and retinal neuron synaptic activity e.g. ATRX 28 (Fig. 3a). Overall, there are substantially more predicted unique TF regulators of sws1 in M. zebra (38 TFs) as compared to N. brichardi (6 TFs) (Fig. 3a). Such diverse regulation can increase sws1 expression and in turn, increase spectral sensitivity to UV light and the ability for M. zebra to detect/feed on UV-absorbing phytoplankton and algae, as previously shown for Lake Malawi cichlids 29 .
Also, tight TF-based regulation of N. brichardi sws1 could induce rapid shifts in expression and spectral shift sensitivities between larger peak ƛmax of 417 nm in N.
brichardi single cones 30 compared to 368 nm of M. zebra Sws1 31 . Diverse regulation of sws1 is likely based on TFBS variation as we identify a SNP that has likely broken the M. zebra NR2C2/RXRB shared motif that is otherwise predicted 2kb upstream of the N. brichardi sws1 transcription start site (TSS) (Fig. 3b). Functional validation via EMSA confirms that NR2C2 and not RXRB binds to the predicted motif (Fig. 3b) in the N. brichardi sws1 promoter, forming a complex, and the SNP has likely disrupted binding, and possibly regulation of M. zebra sws1 (Fig. 3c-d). Based on these results and regulatory mutation effect on cichlid opsin expression 32 , point mutations in TFBSs are likely driving their evolution and GRN rewiring events in traits that are under selection in radiating cichlids.

Regulatory network rewiring events are linked to phylogeny and ecology of East African cichlid radiations
To validate whether GRN rewiring, as a result of TFBS variation, can be associated with phylogeny and ecology of lake species, we looked further into examples of SNP-TFBS genotypes that are diverged between M. zebra and other available Lake Malawi species 18 (Fig. S-R3d). The homozygous SNP (T|T) that breaks binding of NR2C2 to M. zebra sws1 promoter (Fig. R4a) is 1) conserved with the fellow algae eater, T. tropheops that also utilizes the same short-wavelength palette; 2) heterozygous segregating (P. genalutea -C|T and I. sprengerae -T|C) in closely related Mbuna species; and 3) homozygous segregated (C|C) in distantly related Mbuna species (C. afra, C. axelrodi and G. mento) and most other Lake Malawi species of which, some utilize the same short-wavelength palette and are algae eaters e.g. H. oxyrhynchus (Fig. 4). This suggests that in species closely related to M. zebra with similar diet and habitat, sws1 may not be regulated by NR2C2, whilst other species could be, similar to N. brichardi ( Fig. 3, Fig. 4). In another example, regulation of rho by GATA2, and not its duplicate, GATA2A, may be sufficient for dim-light vision response in some rock dweller species (M. zebra and possibly P. genulatea, T. tropheops and I. sprengerae) but both gata2 copies could regulate rho in many other Lake Malawi species (79% with C|C genotype) as well as O. niloticus and A. burtoni (Fig. S-R4d). This highlights potential differential usage of a duplicate TF in dim-light vision regulation. Based on these examples of SNPs overlapping TFBSs that segregate according to phylogeny and ecology, it appears that ecotype-associated network rewiring is a key driver of adaptation in East African cichlid radiations.

Discussion
Gene regulatory network divergence can serve as a substrate for the evolution of phenotypic diversity and adaptation. In both unicellular and multicellular organisms, various mechanisms of regulatory and gene expression divergence underlying phenotypic diversity have been elucidated: horizontal gene transfer and regulatory reorganization in bacteria 33 ; gene duplication in fungi 34 ; cis-regulatory expression divergence in flies 35 ; variable gene co-expression in worms 36 ; dynamic rewiring of TFs in plant leaf shape 9 ; cis-regulatory mutations in stickleback fish 37 ; alternative splicing 38 and differential rate of gene expression evolution shaped by various selective pressures 39,40 in mammals. Despite these changes to genomic and regulatory architecture, some of these groups have remained virtually unvaried for millions of years of evolution, whereas certain organisms, like the near 1,500 species of East African cichlid fish, have rapidly radiated and diversified in an explosive manner. Alongside ecological opportunity 16

, East
African cichlid diversification has been shaped by complex evolutionary and genomic forces largely based on a canvas of low genetic diversity between species 18 , gene duplications and divergent selection acting upon regulatory regions 17 ; all of which imply the rapid evolution of regulatory networks underlying traits under selection. However, very little is known about the evolution of regulatory networks (genotype) and their potential phenotypic effect across ecologically-diverse cichlid species (ecotype) 41 . To study the various mechanisms of regulatory divergence towards phenotypic diversity as shown in other organisms 9,[33][34][35][36][37][38][39][40]42 and cichlids 17,41 , we developed a novel framework to 1) identify ancestral and extant species co-expression modules and 2) integrate associated regulators (cis-regulatory elements, transcription factors and miRNAs) to dissect gene expression and regulatory contribution to cichlid phenotypic diversity.
Along the phylogeny, our analyses identify gene co-expression modules with tissuespecific patterns and differential trajectories across six tissues of five cichlids; a trend previously noted in non-developmental programmes of worms 36 and shown to be under regulatory control in cichlids 17 . The differential gene co-expression trajectories across cichlids are predicted to be regulated by divergent suites of regulators, including TFs that are state-changed in co-expression module assignment. Similar to the rewiring of TFs associated with plant leaf shape 9 , this suggests transcriptional rewiring events and differential gene expression evolution linked to cichlid phenotypic diversity.
Cis-regulatory elements (including promoters and enhancers) are central to cichlid gene expression regulation (Brawand et al. 2014) and harbor several TFBSs that when mutated, can drive GRN evolution. Similar to that shown in the adaptive evolution of threespine sticklebacks 37 , we show that discrete nucleotide variation at binding sites likely drives functional regulatory divergence through gene regulatory network rewiring events in cichlids. When this is analyzed across species networks, we report striking cases of rapid network rewiring for genes known to be involved in traits under natural and/or sexual selection, such as the visual system, shaping cichlid adaptation to a variety of ecological niches. In regulatory regions of visual opsin genes e.g. sws1, in vitro assays confirm that SNPs in TFBSs (NR2C2) have driven network rewiring between species sharing the same visual palette. The impact of gene duplications, also implicated in fungi adaptation 34  Our unique integrative approach has created a rich resource documenting gene coexpression, regulatory and network divergence as drivers of cichlid adaptive evolution and possibly speciation events. In this study we largely focus on cis-regulatory mechanisms of GRN rewiring however, our resource will allow for studies on the regulatory effect of other mechanisms e.g. miRNAs and gene duplications on network topology during cichlid evolution. Whilst it appears that cichlids utilize an array of regulatory mechanisms that are also shown to drive phenotypic diversity in other organisms 9,34-37,42 , we conclude that discrete changes at regulatory binding sites, driving functional regulatory divergence and network rewiring events is likely to be a major source of evolutionary innovation, contributing to phenotypic diversity in radiating cichlid species. Beyond visual systems, we identify network rewiring of genes associated with several cichlid adaptive traits, including runx2 associated with jaw morphology 43 ; ednrb1 in pigmentation and egg spots 17,44 ; and egr1 implicated in behavioral phenotypes 45 .
Based on this 'catalogue' of trait-associated GRNs, genotype-phenotype-ecotype relationships contributing to cichlid phenotypic diversity can be functionally validated by 1) high-throughput in vitro assays -testing polymorphic TF-TG interactions; 2) tol2 transgenesis 46,47 -testing cis-regulatory ability to recapitulate and promote divergence of TG expression; 3) genome editing techniques 48 -phenotyping the regulatory genotypes linked to ecology; and 4) genotyping parents and mutants -confirm, compare and correct trait-associated genotypes in GRNs.

A systematic comparative framework to study the evolutionary dynamics of tissue-specific regulatory networks in cichlids
We developed a systematic comparative framework (Fig. S-M1)

Construction of cichlid gene trees
By considering the gene tree of 18,799 orthologous groups (orthogroups), Arboretum 7 is able to generate module assignments reflecting many-to-many relationships between orthologs resulting from gene duplication and loss. To construct gene trees with different levels of duplication, we obtained protein sequences of longest transcripts from five cichlids as well as stickleback, spotted gar and zebrafish as outgroups. Spotted gar was added as it predates the teleost-specific genome duplication event (3R) and zebrafish, as a model teleost to leverage known molecular interactions as an initial prediction of functional interactions/associations in cichlids based on orthology. We applied OrthoMCL 49 followed by TreeFix 50 to learn the reconciled gene trees. We noticed that several of the trees exhibited incomplete lineage sorting for the cichlid specific subtree but disappeared once the tree was relearned using the cichlid only species. We therefore relearned gene trees for the cichlid only species -in total, we reconstructed 17858 gene families of which, 108 had gene duplication events. A fraction of these (29 gene families) also exhibited incomplete lineage sorting. Incomplete lineage sorting was also observed for gene groups without gene duplications, of the 17756 gene families that had no duplication, 810 exhibited incomplete lineage sorting (ILS).

Inference of multi-and single-tissue transcriptional modules in five cichlids
We ran Arboretum 7 , an algorithm for identifying modules of co-expressed genes on gene expression values of six tissues (brain, eye, heart, kidney, muscle, testis) from five cichlid species, namely O. niloticus (On), N. brichardi (Nb), A. burtoni (Ab), P. nyererei (Pn) and M. zebra (Mz) 17 . Gene expression values were based on RNA sequencing of several adult individuals; the sample source, tissue isolation, RNA and library preparation, sequencing, assembly and annotation are described previously 17 . To ensure equality in n-fold change of expression, the gene expression values were log transformed as: log(x+1), where x is the raw expression value 17 , and "log" is the natural logarithm, and then expression was normalized across each gene to have mean zero to be used as input for Arboretum 7 . Selection of the six tissues allowed us to study tissuespecific associated traits under natural and/or sexual selection in cichlids: Brain (development, behavior and social interaction); Eye (adaptive water depth/turbidity vision); Heart (blood circulation and stress response); Kidney (hematopoiesis and osmoregulation associated with water adaptation); Muscle (size, shape and movement associated with dimorphism and agility); and Testis (sexual systems associated with behavior and dimorphism).
In total, 18,799 orthogroups (see Methods: Construction of cichlid gene trees) and their associated expression data and gene tree information were inputted into Arboretum 7 . In total, this represents 59-68% of all protein-coding genes in the five cichlid genomes 17 .
Certain annotated cichlid genes could not be included for a few reasons: 1) Lack of tissue expression data for all five species; 2) No mapped reads for selected tissues; 3)

Lack of co-expression with other genes; and 4) Use of single development stage (adult).
We selected the number of modules using a combination of strategies. First, we tried to identify the optimal number of multi-tissue modules automatically from the data by scoring a model based on the penalized log likelihood. This gave us the optimal k of 19, however, we also looked at lower values of k, for example, k = 7 and k = 15 for observation of co-expression clustering patterns in the next step. Secondly, we manually inspected the modules to see if increases of k yield patterns of expression that we have not seen before or generate recurring patterns. Finally, we devised a metric for the top three random initializations, based on a silhouette index, orthology overlap, and crossspecies cluster mean dissimilarity; selecting the optimal k stable to the initialization.
Based on our strategy we found k = 10 modules to be optimal. Using a similar approach, this time for single tissues clustering, we found k = 5 modules to be optimal.
Handling ILS in Arboretum. The Arboretum algorithm internally tries to reconcile a tree that is not obeying the species tree by adding additional duplication and loss events. An alternate approach is to use a different species trees each representing the different ILS types and estimating the parameters of each such tree. However, there are many different cases of ILS, as identified previously 17 and the number of gene trees in each category varied significantly. However, estimating the conditional distributions for each branch in each ILS type would not be feasible as there are not enough example trees.

Functional and transcription factor binding site enrichment in modules
We use the FDR-corrected hypergeometric P-value (q-value) to assess enrichment of Gene Ontology (GO) terms and TFBSs (motifs) in a given gene set. We summarize the enrichment of terms/motifs with q<0.05 statistical significance and conservation in all extant and ancestral species. GO terms for the five cichlids were from those published previously 17 . To study cis-regulatory elements likely driving tissue-specific expression patterns, we defined gene promoter regions using up to 5 kb upstream of the transcription start site (TSS) of the gene. Motif enrichment in cis-regulatory regions was carried out using TFBSs obtained by the method below.

Transcription factor (TF) motif scanning
TFBSs of known vertebrate transcription factors (TFs) were obtained from the JASPAR vertebrate core motif (2018 release) 51 . Binding peak information from ChiP-seq experiments of various human and mouse TFs were retrieved from GTRD v17.04 12 and associated to protein coding genes within a vicinity of 10kb. Using core motif sequences available from JASPAR 51 or alternative databases like UniPROBE 52 and HOCOMOCO 53 , uniform length motifs were identified within the TF binding peaks. In cases where the core motifs were not available, they were predicted de novo from the peaks themselves using MEME 54 with default settings. The aforementioned steps provided a list of transcription factor-target gene (TF-TG) interactions with the exact coordinates of the corresponding binding site(s). Cichlid sites were extrapolated based on 1) orthology; 2) minimum 70% sequence similarity 55,56 ; and 3) functional domain overlap as derived using Interpro scan 5 57 to both source organisms (human, mouse). Cichlid extrapolated sites were used to construct cichlid-species specific (CS) Position Specific Scoring Matrices (PSSMs) for each TF using the info-gibbs script from the RSAT tool suite 58 . In cases where the number of extrapolated sites per species was less than three, we aggregated the sites to construct generic cichlid-wide (CW) PSSMs. Using the PSSMs for each TF, we scanned the gene promoters and CNEs with FIMO 59 using either 1) an optimal calculated p-value for each TF PSSM; or 2) FIMO 59 default p-value (1e-4) for JASPAR 51 PSSMs and PSSMs for which an optimal p-value could not be determined. Statistically significant motifs were called using a q-value (False Discovery Rate, FDR) <0.05 and grouped in confidence levels and scores of: 1a) overlap of mouse and human to cichlid extrapolated -0.3; 1b) mouse to cichlid extrapolated -0.2; 1c) human to cichlid extrapolated -0.15; 2a) FIMO 59 scans using extrapolated CS matrices -0.125; 2b) FIMO 59 scans using extrapolated CW matrices -0.110; and 2c) FIMO 59 scans using JASPAR 51 matrices -0.115.

Calculating tissue-specificity index (tau)
As a measure for tissue specificity of gene expression, we calculated τ (Tau) 60 using logtransformed and normalized gene expression data (as inputted to run Arboretum): The values of tau vary from 0 to 1; ubiquitous or broad expr (τ ≤ 0.5); intermediate expr (0.5 < τ < 0.9); and tissue-specific or narrow expr (τ ≥ 0.9) [54]. Amongst existing methods, τ has been shown to be the best for calculating tissue-specificity 61 . Testis normally express far more genes than any other tissue, generally displaying a tissuespecific pattern of expression. As tau was used to assess genome-wide expression levels across all tissues, but between species, testis expression data was included for each species to obtain a true representation of variation in transcriptional programs.

Variation and evolutionary rate at coding and non-coding regions
We Otherwise, we estimated evolutionary rate for each branch and ancestral node in the species tree at promoter regions and fourfold degenerate sites, using 1:1 promoter and cds alignments in baseml and codeml programs in the PAML package 64 , requiring that at least 10% of the alignment contains nucleotides and that at least 100 nucleotides are present for each species.

Reconstructing regulatory networks
To infer essential drivers of tissue-specific expression in cichlids, we constructed regulatory and functional interaction/association networks through the integration of several datasets and approaches (Fig. S-M1). All genes and their interaction/association networks were, in most cases, constrained by Arboretum module assignments to correlate to their respective patterns of tissue-specific expression and co-expression module analysis. This maintains a structured and connected network approach.
We first used species-and module-specific gene expression levels to infer an expression-based network. For this, we merged the cichlid gene expression data into a single 30 (Five species, six tissues) dimensional dataset to learn cichlid-specific TF-Target gene relationships using the Per Gene Greedy (PGG) method, a prior expressionbased network inference method 65 . We projected the network into species-specific networks by taking into account edges that would not be present due to gene loss. We then integrated predicted TF-TG interactions (see Methods: Transcription Factor motif scanning) based on TFBS prediction in module gene promoter regions.

Functional landscape of reconstructed regulatory networks
We use the FDR-corrected hypergeometric P-value to assess enrichment of GO terms for genes in reconstructed networks. We used GO terms for the published five cichlids 17 and carried out enrichment analysis as previously done for Arboretum module genes (see Methods: Functional and transcription factor binding site enrichment in modules).

Transcription Factor (TF) -Target Gene (TG) network comparisons between species
To assess degree distribution of nodes in networks, we used the igraph package 66 in R.
Network connectivity of orthologous gene networks was assessed by creating a gene-bygene adjacency matrix for the five species TF-TG regulatory networks. To assess conservation and variation across the whole phylogeny and/or considering any species/clade -specific novelty, three sets of nodes (genes) were considered: a) node edges of 1-to-1 orthologs (presence in all five species networks); b) node edges of 1-to-1 orthologs in species pairwise comparisons, and; c) a superset of all node edges present in networks of any of the five species. For each gene-specific network investigated, we sum up the adjacency matrices for the intersection and superset of genes. Nodes are ranked according to a score where the sum of all columns is divided by the number of occurrences (entries which are non-zero), ultimately giving a measure of conservation.

TF-TG gain and loss rates
Gain and loss rate analyses was similar to that performed previously 8

Regulatory rewiring analysis of gene sets
The DyNet package 22   system.

Electrophoretic mobility shift assay (EMSA) validation of predicted TF-TG
interactions EMSA was carried out using double-stranded Cy5 fluorophore 5'-modified (IDT) DNA probes, in vitro expressed DBDs (see above) and the Gel Shift Assay Core System (Promega). Double-stranded DNA probes were generated by annealing sense and antisense oligonucleotides (see Table S