The Phenotype Paradox: Lessons From Natural Transcriptome Evolution on How to Engineer Plants

Plants have evolved genome complexity through iterative rounds of single gene and whole genome duplication. This has led to substantial expansion in transcription factor numbers following preferential retention and subsequent functional divergence of these regulatory genes. Here we review how this simple evolutionary network rewiring process, regulatory gene duplication followed by functional divergence, can be used to inspire synthetic biology approaches that seek to develop novel phenotypic variation for future trait based breeding programs in plants.


INTRODUCTION
Single nucleotide variants are amongst the most prevalent modifications in genomes (Altshuler et al., 2010). Furthermore, classical genetics focuses on the use of non-synonymous/synonymous mutation rate ratios to infer a baseline level of selection on gene sequences. Whilst this can be useful to infer how protein sequence variants may contribute to phenotypes it remains incredibly challenging to infer how point mutations might give rise to novel phenotypes that are a culmination of the coordinated action of tens to thousands of genes. In 1973 Susumu Ohno (Ohno, 1970) suggested how gene duplication might help drive the evolution of new phenotypes. He reasoned that purifying selection acting on essential genes could be circumvented by sequence duplication allowing evolution of redundant protein sequences, giving rise to novel functionality. However, Ohno also recognised that novel phenotypes could also simply be achieved through evolution at regulatory sites in duplicate gene sequences. This could lead to altered spatiotemporal expression facilitating evolution of novel traits. This might help to overcome the negative impact of gene dosage effects, where increasing protein abundance destabilizes networks or pathways (Voordeckers et al., 2015). This premise also extends beyond a single gene, to multi gene, and whole genome duplication (WGD). Here duplicate cellular pathways or metabolic processes are free to evolve along different spatiotemporal expression trajectories. Thus, diversity and cellular plasticity are attained purely through differential regulation of duplicate gene sets. This could occur through sequential evolution of cistronic transcription factor binding sites (TFBS) that bring target genes under coordinated regulatory control, as may have been the case for certain metabolic pathways in plants (Shoji, 2019). Two paralogs, QPT1 and QPT2 encode an enzyme involved in nicotinamide adenine dinucleotide and nicotine biosynthesis in tobacco. QPT1 is expressed at basal levels whilst QPT2 exhibits coordinated expression with nicotine biosynthesis genes. Furthermore, the promoter of QPT2 contains three sequence motifs that the ERF189, a positive regulator of nicotine biosynthesis, binds to in vitro. These three motifs provide graded positive activation of QPT2. Overall this suggests that TFBS bound by ERF189 evolved within the promoter of QPT2 facilitating its integration into the nicotine biosynthesis regulon (Shoji and Hashimoto, 2011). Alternatively, genes encoding transcription factors (TFs) might be duplicated with altered expression and/or functionality of TFs culminating in pleotropic regulatory cascades, thereby impacting entire pathways and cellular subsystems ultimately driving phenotype evolution.
Gene duplication could reduce selective pressure on redundant sequences allowing neutral evolutionary processes to generate novel phenotypic plasticity that might subsequently serve as an evolutionary advantage (Ohno, 1970;Wilson et al., 1977). In plants, gene duplicates experience a relatively relaxed period of selection before they are either silenced or take on novel, redundant or semi-redundant roles (Lynch and Conery, 2000;Blanc and Wolfe, 2004;Maere et al., 2005;Jiao et al., 2011). During this evolutionary filtering process it is noteworthy that regulatory genes are often preferentially retained whilst their paralogs often undergo gene expression divergence (Blanc and Wolfe, 2004;Maere et al., 2005). This highlights the role that gene duplication plays in driving transcriptome network evolution.
Contemporary evolutionary studies have understandably focused on prokaryotes with short generation times. Genome sequencing of bacterial strains grown under the same environmental conditions for over 50000 generations revealed how bacterial lineages gained mutations in regulatory genes allowing them to functionally diverge and occupy concurrent niches within a continuous culture (Plucain et al., 2014). Directed evolution in bacteria has identified solutions that modify gene expression, including TFs, with functionality arising from non-functional gene networks (Crameri et al., 1997;Yokobayashi et al., 2002). Additionally, synthetically rewiring TF networks in bacteria and yeast have generated novel phenotypes under stressful conditions (Isalan et al., 2008;Windram et al., 2017). This again suggests that TF gene expression evolution can aid in the generation of phenotypic novelty.
In this perspective we will highlight how evolution by gene duplication has shaped plant genomes. In particular, we will illustrate how evolution of duplicate TF gene expression, through modification of cistronic promoter sequences, helps to drive the generation of phenotypic novelty via cascading pleotropic regulation effects on target genes. Furthermore, we show how this process can be used to inspire the development of synthetic regulatory constructs that alter plant responses to environmental stress. We highlight how network structure can be used to select regulators for transcriptional rewiring (Figure 1). We show how this synthetic biology approach offers a novel way to optimise plant responses to environmental stimuli.

Evidence for Transcriptional Rewiring Driving Plant Evolution and Domestication
Phylogenetic studies suggest that all flowering plants are palaeopolyploids having undergone at least two WGD events (Jiao et al., 2011). Although the fate of most duplicate genes is death by gene silencing (Lynch and Conery, 2000), it appears that transcriptional regulators are often preferentially retained, with some duplicates appearing to shape the developmental regulation that gave rise to seed bearing and flowering plants (Blanc and Wolfe, 2004;Maere et al., 2005;Jiao et al., 2011;Jiang et al., 2013). In Arabidopsis it seems that WGD drove TF numbers to increase by more than 90%. Duplicate gene expression rapidly diverges after these WGD events in some cases with entire, nonhomologous, co-regulated gene expression networks diverging alongside each other away from their cognate paralogs (Blanc and Wolfe, 2004). This coordinated divergence in expression of co-regulated genes suggests that upstream regulators may be undergoing evolution at the protein sequence or gene expression level culminating in altered expression of target genes. Furthermore, quantitative trait loci in promoters are selectively enriched within TFBS (Weirauch et al., 2014). Overall, this suggests that WGD and subsequent gene expression divergence drives functional divergence of gene duplicates.
Processes governing environmental stress response are known to involve complex transcriptional networks containing large TF families (Kreps et al., 2002;Windram et al., 2012;Lewis et al., 2015). These families have arisen through various forms of whole genome and single-gene duplication (Riechmann and Ratcliffe, 2000;Eulgem et al., 2000;Feller et al., 2011;Lehti-Shiu et al., 2017). Furthermore, genes involved in biotic stress response also appear to be preferentially retained after small scale and WGD events (Maere et al., 2005). It has also been noted that many historic WGD events in plants appear to have occurred during periods of major environmental stress and instability (Vanneste et al., 2014).
Gene duplication and expression divergence has influenced the genomes of many important crop species. For instance, gene duplication has shaped the evolution of metabolic pathways that affect the flavor and aroma of tea. Gene duplication has expanded gene families associated with synthesis of secondary metabolites in lipids, carotenoids, terpenoids, and shikimate, which serve as precursors to compounds that confer tea aroma and flavor, and gene families associated with the synthesis of catechins, which are responsible for the astringent taste found in tea (Wei et al., 2018). Gene duplication and subsequent expression divergence has also driven capsaicin biosynthesis evolution in peppers, where neofunctionalization of capsaicin synthase (CS), the enzyme responsible for the final step of capsaicin synthesis, occurred following a recent duplication event in peppers, which granted CS a role in capsaicinoid synthesis (Kim S. et al., 2014). Triads of homoeologs from wheat's three subgenomes exhibit striking relative expression differences across different tissue types (Ramírez-González et al., 2018). Thus it appears that expression bias within homeolog triads influences tissue specific transcriptome networks. Also, these dynamic triads were enriched for genes involved in defence, environmental responses and secondary metabolism. Swanson-Wagner and colleagues showed that maize co-expression networks have diverged significantly from maize's wild ancestor teosinte (Swanson-Wagner et al., 2012). Genes actively involved in this rewiring included TFs, while a number of genes involved in defence processes were differentially expressed between maize and teosinte. Similarly, differentially expressed paralogs in the seedlings of tomato and its wild relatives include genes involved in stress responses and defence responses (Koenig et al., 2013).
TF gene expression could evolve in several ways. Perhaps the most obvious is simple sequence perturbation via random point mutations within TFBS. This may have been how promoter evolution in a set of TF genes gave rise to both cold and drought tolerance in Arabidopsis (Haake et al., 2002). Also, a single nucleotide polymorphism in the regulatory region of the TF gene qSH1 is responsible for the loss of seed shattering during rice domestication (Konishi et al., 2006). Alternatively, random insertion of transposable elements (TE) might also significantly influence gene promoter activity. TE have been responsible for amplifying of E2F TFBS by 85% in Brassica species (Hénaff et al., 2014). TE also make up a substantial portion of many eukaryotic genomes (Wendel et al., 2016), up to 85% in the case of maize. TEs are often activated under periods of stress (Grandbastien, 1998) and appear to drive expression divergence in newly constructed synthetic wheat allotetraploids (Kashkush et al., 2002). Similarly, dynamically expressed homeologs in wheat more frequently contained transposable elements (TE) in their promoters and showed lower conservation of TFBS (Ramírez-González et al., 2018). Insertion of a TE into the regulatory region of the TF teosinte branched1 (tb1) drives apical dominance in single stemmed maize by enhancing expression of tb1 (Doebley et al., 1997;Studer et al., 2011).
Overall we see that WGD has significantly influenced the evolution of plant transcriptome networks. Whilst it has been observed that TF duplicates are more often retained after WGD rather than smaller duplication events (Maere et al., 2005), studies looking at domestication traits reveal a plethora of underlying single TFs with altered promoter sequences, appearing to drive TF expression divergence (Swinnen et al., 2016). This includes several TF genes with large TE element insertions in their regulatory regions suggesting that substantial regulatory rewiring can help to drive rapid TF expression divergence and trait evolution.

Simulating Transcriptome Networks
Overall evolutionary studies suggest that TFs represent useful and logical targets for crop trait development using directed evolution. However, one major challenge is identification of key TFs to focus on. Plant genomes contain thousands of TFs, whilst several hundred might be involved in responses to an individual stimulus (Windram et al., 2012;Lewis et al., 2015). In this section we outline how modelling of transcriptome networks can be used to identify key transcriptional regulators in plant transcriptome networks.
There are many approaches to inferring gene regulatory networks (GRNs) from expression data ( Table 1). Information theory based approaches (Zhang et al., 2012;Villaverde et al., 2014) use measures such as correlation and mutual information to establish relationships between genes. This approach is suitable for handling large amounts of expression data due to their relative simplicity and thus lower computational demands (Hecker et al., 2009), but application is limited to steady state data. The networks that are built using such approaches are typically undirected, meaning although relationships between genes are established, the regulator in these inferred interactions is unknown. The loss of this information is critical, as establishing the directionality of the relationships can give insight of how information flows through the network.
To reconstruct directed networks, inference approaches such as Dynamic Bayesian Models, ordinary differential equation (ODE)-based models and machine learning-based models are often used (Delgado and Gómez-Vela, 2019). These approaches can take advantage of time series expression data to infer dynamical and causal relations between genes, with each having different limitations ( Table 1). The selection of an appropriate approach to use is dependent on the biological system in question, and it has been demonstrated that combining predictions from different approaches produces better reconstructions of networks (Marbach et al., 2012). Although causal inference approaches are able to generate directed network models, a shared limitation in many state of the art algorithms is scalability. These approaches are typically very computationally intensive, and application is generally limited to small GRNs. Certain algorithms attempt to tackle this issue by using prior knowledge, for example the causal structural identification network inference method (Penfold et al., 2015) allows the selection of specific genes as potential TFs in order to reduce the number of computations. There have also been developments in algorithms specifically for large scale reconstructions (Thiagarajan et al., 2017;Liu et al., 2017), but their use has only been demonstrated on network sizes between 500 and 1,000 genes.
To interpret simple and complex networks, network measures can be calculated for each gene and examined. Network measures provide a numerical representation of how a gene controls information flow within the network, and so can often indicate the importance of a gene. Degree centrality is a measure of the number of interactions that a gene forms in a network. This can be separated into in-degree, the number of regulators a gene has, and out-degree, the number of target genes a TF gene has ( Figures 1B, C). Key genes typically have high out-degree, as the higher number of target genes indicate greater regulatory influence, and are more likely to influence multiple biological processes (Jeong et al., 2001;Barabási and Oltvai, 2004;Yu et al., 2008). Betweenness centrality measures how often a gene mediates the shortest path between other gene pairs. High betweenness genes function as bridges between otherwise distant network modules. Thus the removal of such genes could severely disrupt information flow in the network (Yu et al., 2007) (Figures 1B, C). Hierarchy can also reflect importance, as influential genes are more likely to occupy higher positions, where they can exert greater control over the network through regulation of downstream TFs which allows changes to propagate through the network (Bhardwaj et al., 2010) (Figures 1B, C). The use of these existing network inference methods and the development of new methods that can deal with both directionality and scalability can be used to identify genes key to certain biological processes.

Engineering the Transcriptome Using Genetic Rewiring
In this final section we seek to outline how targeted experimental interventions can be used to develop novel phenotypes using genetic rewiring. Specifically, we suggest how TFs identified through network analysis serve as pragmatic targets for plant trait creation. One way to artificially engineer the transcriptome network is to introduce an expression modified TF duplicate to effectively rewire the network (Figure 1). To do this the ORF of a TF gene is fused to the promoter region of a second gene. This rewiring of regulation allows signals to flow differently through   (Villaverde et al., 2014), BACON (Godsey, 2013), HiDi (Deng et al., 2017), RegnANN (Grimaldi et al., 2011). the network, altering the spatiotemporal expression of the rewired TF and potentially its target genes (Isalan et al., 2008) ( Figure 1). Experimental rewiring of transcriptional networks in bacteria and yeast have revealed rewiring solutions that allowed these organisms to adapt to stressful environments (Isalan et al., 2008;Windram et al., 2017). Furthermore, studies of regulatory networks in plants suggest that stress response networks may be less tightly controlled and less complex than developmental networks (Jin et al., 2015). These plant stress networks appear to have shorter regulatory paths and lower interconnectivity. Moreover, our previous studies in yeast (Windram et al., 2017) further suggests that synthetic network rewiring that shortens hierarchies through fusion of top tier hierarchy gene promoters to lower tier ORFs with high out-degree and/or high betweenness centrality generates rewired networks with enhanced stress response phenotypes. In plants, to make the stress regulatory networks more responsive, we could "flatten" the regulatory hierarchy to improve the responsiveness of stress networks. With knowledge gained from network analysis we can select plant promoters that are at the top of hierarchies, and TF ORF with high degree and betweenness centrality.
By rewiring networks through introducing synthetic p r o m o t e r -O R F f u s i o n s , a n o u t c o m e a k i n t o neofunctionalization of duplicated genes can be achieved. That is, this synthetic fusion expresses a second ORF in addition to the native one, but the synthetic ORF is regulated differently in space and time due to having a different promoter. As such, in an applied context, engineering plant phenotypes using transcriptome rewiring could provide interesting solutions to improve plant stress response. Rewiring could bypass the limitations of engineering plant phenotypes using genetic knockouts and constitutive overexpression of genes. These methods might strongly perturb signal flow through the transcriptomic network. As many TFs in regulatory networks form cooperative assemblies (protein-protein-DNA) a strong perturbation in TF protein levels might interfere with these assemblies, impeding network function. Constitutive overexpression of TFs, may outcompete other regulatory proteins that bind to target gene promoters, or titrate out rare cofactors (Rydenfelt et al., 2014). Comparatively TF knockouts directly reduce connectivity of the regulatory network, TF absence might also prevent certain transcriptional assemblies being formed. This strong biasing/reduction in connectivity in the regulatory network might lead to a decreased range of effective stress responses (Mittler, 2006;Atkinson and Urwin, 2012).
In the Arabidopsis immune network, it has been shown that wrky4 mutants have reduced susceptibility to the biotrophic bacterial pathogen Pseudomonas syringae, but an increased susceptibility to the necrotrophic fungal pathogen Botrytis cinerea (Lai et al., 2008). It has also been shown that although constitutive overexpression of wrky31 in rice reduces susceptibility towards the fungus Magnaporthe grisea, it also reduces lateral root elongation and formation (Zhang et al., 2008). These examples highlight how gene knockout and overexpression can have both beneficial and deleterious effects under different conditions. Because rewiring allows fine manipulation of the spatiotemporal regulation within the network, directed engineering to improve the plant against a specific type of stress may be possible, without substantially compromising the tunability of the network to deal with other types of stress (Tsuda et al., 2009;Tsuda and Katagiri, 2010;.

CONCLUSION
Plants have revealed the tremendous potential for TF duplication and expression divergence to drive phenotype evolution. Similarly, for thousands of years crop breeders have sought out phenotypes that enhance yield, with many of these traits driven by TF rewiring. Advances in genomics and systems biology now afford us with the tools to study plant transcriptomes in tremendous detail and early experimental rewiring reveals a commonality in TFs that make good rewiring targets. The fascinating and complex polyploid genomes of crops, such as wheat, demonstrate not only a tolerance to TF rewiring but also offer up multiple TF sequences that can be targeted to drive selective improvement of such crops to specific environmental stresses.

AUTHOR CONTRIBUTIONS
JL, KN and OW all wrote the manuscript. JL and KN contributed equally to this work.

FUNDING
This work is was supported by the Natural Environment Research Council (NE/M018768/1).