Bayesian genome scale modelling identifies thermal determinants of yeast metabolism

The molecular basis of how temperature affects cell metabolism has been a long-standing question in biology, where the main obstacles are the lack of high-quality data and methods to associate temperature effects on the function of individual proteins as well as to combine them at a systems level. Here we develop and apply a Bayesian modeling approach to resolve the temperature effects in genome scale metabolic models (GEM). The approach minimizes uncertainties in enzymatic thermal parameters and greatly improves the predictive strength of the GEMs. The resulting temperature constrained yeast GEM uncovers enzymes that limit growth at superoptimal temperatures, and squalene epoxidase (ERG1) is predicted to be the most rate limiting. By replacing this single key enzyme with an ortholog from a thermotolerant yeast strain, we obtain a thermotolerant strain that outgrows the wild type, demonstrating the critical role of sterol metabolism in yeast thermosensitivity. Therefore, apart from identifying thermal determinants of cell metabolism and enabling the design of thermotolerant strains, our Bayesian GEM approach facilitates modelling of complex biological systems in the absence of high-quality data and therefore shows promise for becoming a standard tool for genome scale modeling.


Introduction Results
GETCool: Using Bayesian statistical learning to integrate temperature dependence in enzyme-constrained GEMs In this study, we developed a novel approach for incorporating temperature dependence into an enzyme-constrained GEM (ecGEM) 16 (Fig 1) with the resulting model termed enzyme and temperature constrained GEM (etcGEM). The approach combined the following steps: (i) etcGEM construction (Fig   1a-d), (ii) flux balance analysis (FBA) and (iii) Bayesian statistical learning (Fig 1e). The ecGEM, which includes, besides the traditional stoichiometric matrix, also enzyme abundances and activities, provided an excellent template to directly integrate the enzyme temperature effects. Firstly, for a given reaction, the flux cannot exceed the capability of the enzyme, which is defined as the product of the functional enzyme concentration [ ] ! and its "#$ . Secondly, the total amount of enzymes that the cell can afford is also limited 27 . Inclusion of temperature constraints into ecGEM was thus achieved by making [ ] ! and "#$ temperature dependent, and by incorporating the additional cost of enzymes in the denatured state (Fig 1a, Method M1). Three thermal parameters were required for each enzyme in the resulting etcGEM, including (i) the melting temperature % (Fig 1b), (ii) the heat capacity change ∆ & ‡ (Fig 1c) and (iii) the optimal temperature (&$ (Fig 1d Method M2). Moreover, to capture the temperature effects on the energy cost of non-growth associated maintenance (NGAM), a temperature dependent NGAM expression term was estimated from experimental data and included in the model.
To resolve the challenges arising from the uncertainties in the parameter values, we used Bayesian statistical learning 26 , which is a probabilistic framework that has been successfully applied for quantifying and reducing uncertainties in various fields including deep learning 28 , ordinary differential equations 29 and biochemical kinetic models 30 . The approach uses experimental observations ( ) to update Prior distributions ( ( )) of model parameters to Posterior ones ( ( | )) (Fig 1e). We refer to the model equipped with sampled from ( ) or ( | ) as a Prior or Posterior etcGEM, respectively.
The resulting Posterior etcGEMs provided a more reliable platform to study the thermal dependence of cell metabolism, with an inherent benefit that the uncertainty in the interpretation and prediction from the improved Posterior etcGEMs could also be quantified.   32 and then applied for all enzymes. As a result, the etcYeast model was obtained with an expansion of 2,292 temperature-associated parameters for a total of 764 metabolic enzymes (Fig 1a).
The temperature dependence of NGAM was inferred from experimental data (Methods M4, Fig S1).
We observed that etcYeast predictions made using the initial parameter values could not correctly recapitulate experimental observations (Fig S2, (Fig S3). We then used all three datasets to sample 100 Posterior etcGEMs, where each model achieved an average ) higher than 0.9 on all three datasets ( Fig S4) and could therefore accurately describe the observed measurements (Fig 2a-c and Fig S5). The increased performance on all three datasets clearly demonstrated the need to update the parameter Prior distribution to a Posterior one.
Next, we explored which parameters had been updated in the Bayesian approach. Principal component analysis of all 21,504 parameter sets generated in the approach showed how the Priori distributions were gradually updated to distinct Posterior distributions (Fig 2d). Further comparison between Prior and Posterior distributions revealed that in all three parameter categories, a reduced variance in the updated parameters was more likely than a change in mean values (Fig 2e, protein-wise comparison shown in Fig S6). Particularly for enzyme ( Fig S6). Importantly, we observed that the approach tended to change the enzyme (&$ rather than its % and ∆ & ‡ parameters (Fig 2e). In addition, a machine learning approach (Methods M6) further revealed that, out of all three parameter types, the largest contribution to the improved Posterior etcGEM performance during the Bayesian approach was from enzyme (&$ s (Fig 2f).

Yeast growth rate is explained by temperature effects on its enzymes
With the Posterior etcGEMs capable of describing various experimental observations (Fig 2a-c), we analysed how the temperature effects on each of the three processes -NGAM, "#$ and the protein denaturation process -contribute to whole cell growth (Fig 3a). We observed that, at temperatures below 29 °C, the temperature dependent "#$ was the only factor that affected the cell growth rate. In the range between 29 and 35 °C, both "#$ and NGAM determined the growth rate. The contribution of enzyme denaturation to the temperature dependence of cell growth, however, was observed only at temperatures higher than 35 °C, with the denaturing effect becoming the dominant effect at ~40 °C and lack of cell growth at 42 °C. Therefore, in contrast to previous reports indicating that an over 10-fold increase in NGAM cost with the temperature change from 30 °C to 33 °C was the major limiting factor to cell growth 5,35 , our modelling approach showed that the increased NGAM has a merely moderate effect on growth rate (Fig 3a).
Interestingly, the temperature dependence of enzyme "#$ s alone could explain the temperature We further observed that, even though the model contained only 764 enzymes from a total of ~6,700 proteins 38 , protein denaturation alone could still explain termination of cell growth at 42 °C (Fig 3a).
However, in the Posterior etcGEMs, only 9 enzymes (1%) with a mean melting temperature below 42 °C were present (ERG1, ATP1, ALA1, KRS1, SER1, HEM1, PDB1, ADH1 and TRP3) (Fig S7), of which three (ATP1, HEM1, PDB1) are located in the mitochondria 39 . The other enzymes remained in the native state even at temperatures several degrees higher than 42°C (Fig 3d), though they were enzymatically active only in the temperature window of cell growth between 10 °C and 42 °C (Fig 3f), due to the low values beyond this temperature range (Fig 3e). Metabolic shifts are explained by temperature-induced proteome constraints Published reports show that at temperatures above 37°C in chemostat cultures with a dilution rate of 0.1 h -1 , yeast shifts its metabolism from a completely respiratory one to a partly fermentative one, which is also accompanied by a large increase in glycolytic flux 23 . Since our updated Posterior etcGEMs are able to simulate this metabolic shift (Fig 2c and Fig S5), we used them to further explore the mechanisms behind the observed process. We observed that the shift occurs due to a proteome constraint, meaning that the total protein level in the cell reaches an upper bound (Fig 4). The proteome constraint occurs due to the decrease in enzyme specific activities with increasing temperature (Fig 3f) and since the maximal protein amount in the cell is limited 27 . As a result, the cell has to synthesize more enzymes to maintain cell growth at the given growth rate (Fig 4) until the enzyme amount hits the upper bound. This is also consistent with earlier studies showing that the activation of the Crabtree effect in chemostat cultures at 30°C is due to a proteome constraint 16,40 . When the temperature increases above 36 °C, ATP production by glycolysis is dramatically increased, while ATP production by the mitochondria decreases (Fig 4). Even though the respiratory pathway produces more ATP per glucose amount, the fermentative pathway produces more ATP per protein mass and therefore becomes more energetically efficient when the cell reaches a proteome constraint 40 . In addition, three key mitochondrial enzymes (ATP1, HEM1 and PDB1) (Fig S7) were found to be unstable, which make the respiratory pathway even more resource-inefficient for ATP production. etcGEM uncovers growth rate-limiting enzymes To investigate which enzymes limit the cell growth at superoptimal temperatures, the flux sensitivity coefficient of each enzyme was calculated (Methods M5). Among all the enzymes in the model, the squalene epoxidase ERG1 displayed an order of magnitude higher median flux sensitivity coefficient than other enzymes, indicating that it is the most flux-controlling enzyme at 40 °C (Fig 5a) and above (Fig S8).
Furthermore, removal of the temperature constraint on ERG1 increased the simulated specific growth rate from 0.09 to 0.14 h -1 (Fig 5b). We therefore evaluated the impact of replacing the wild-type ERG1 gene with ERG1 from the thermotolerant yeast Kluyveromyces marxianus (kmERG1, Methods M7). At first, at the lethal temperature of 42 °C, only a small improvement in growth rate (from 0.01 to 0.06 h -1 ) was predicted and no significant growth difference was detected between the wildtype and the strain with kmERG1 ( Fig   S9). However, already after 2 generations of adaptation at 40 °C, the strain with KmERG1 indeed showed significantly better growth than the wild type (Fig 5c).
The reduced growth rate at 42 °C is likely caused by an impaired function of several different enzymes, and rescuing a single enzyme is insufficient to improve the growth rate. Therefore, in order to characterize the set of growth rate-limiting enzymes at 42 °C, we gradually removed the temperature constraints on enzymes (set kcat and denaturation temperature independent) in the order of decrescent flux sensitivity coefficient values in each of the Posterior etcGEMs. Interestingly, in the case of recovering the cell growth rate to 0.2 h -1 , we found an agreement among all Posterior etcGEMs that 10 enzymes are required to be fully functional at 42 °C (Fig 5d). Since each model predicted a different subset of such enzymes, an ensemble approach was used to count the number of models (votes) in which an enzyme is predicted to be one of 10 such enzymes (Fig 5e). In total, 82 enzymes were predicted by at least one Posterior etcGEM, and only 24 (out of 82) enzymes were each predicted by more than 10% of the Posterior etcGEMs (Fig 5e, inset). Among these 24 enzymes 12 enzymes were engaged with Glycolysis and 3 enzymes were involved in sterol biosynthesis: ERG1, and HMG1,2 catalyzing the flux-controlling steps in sterol biosynthesis 41 . The remaining enzymes were mainly involved in DNA or protein synthesis related pathways. Inset shows the names and pathways of genes predicted by more than Posterior etcGEMs 10% they are involved in.

Discussion
Here, we present a Bayesian genome scale modelling approach to resolve the temperature dependence of cellular metabolism, termed GETCool. Using an enzyme-constrained GEM 16 as a template, we modelled the temperature effects on each individual enzyme by including temperature dependent terms for the independent processes of denaturation as well as catalysis (Fig 1A). Due to the high level of uncertainty and low accuracy associated with the initial thermal parameter values (Fig S6), which were a result of experimentally measured noise or variability arising from machine learning or theoretical predictions (Methods M5), the model predictions initially could not correctly recapitulate experimental observations (Fig   2a-c and Fig S2). We therefore used Bayesian statistical learning that enabled updating our Prior guess of the highly uncertain thermal parameters to a more accurate Posterior estimation of these parameters according to observed phenotypic data (Fig 1e). The resulting Posterior etcGEMs accurately describe the experimental observations (Fig 2a-c)  in the enzyme catalytic process (Fig 1c), which leads to a negative curvature for temperature dependence of enzyme activity in the absence of denaturation 32 . We found that with this theory, temperature dependence of kcats acts as a major contributor to the cell growth rate at all temperatures, which can especially explain the decline in cell growth rate right after the optimal growth temperature (Fig 3a). Yeast enzymes only maintain high kcats in the temperature window of cell growth (Fig 3e), which means that the metabolism becomes inefficient at superoptimal temperatures due to the general decrease in enzyme turnover without denaturations (Fig 3d-f).
Using the Bayesian genome scale modelling approach to quantitatively depict the temperature effects on yeast metabolism led to insights into the long-standing discussion on the roles of different cellular factors in cellular fitness under heat stresses 4,5,7,23,42 . For instance, protein denaturation has been suspected as one of the main causes of the decline in cell growth beyond the optimal growth temperature point. However, recent high throughput measurements of melting temperatures (Tm) for 707 S. cerevisiae proteins revealed a Tm distribution with a mean value of 52 °C and a minimum of 40 °C 7 , which suggests that protein denaturation alone might not be sufficient to explain the decline of yeast cell growth between 30°C (optimal growth temperature, OGT) and 42°C (lethal temperature point). An alternative explanation is provided by the evidence of a significant increase of non-growth associated ATP maintenance (NGAM) observed with yeast cells grown in anaerobic chemostat cultivations at high temperatures (33-40°C) compared to ones grown at low temperatures (5-31 °C) 5 , which suggests an imbalance in cellular energy allocation in the superoptimal temperature range. Quantitative assessment using our modelling approach revealed that the impaired cell growth is caused by a combination of decreased kcat values, increased NGAM costs and protein denaturation (Fig 3). Furthermore, between 30 and 35 °C, the combined decrease in kcats and increase in NGAM explains the decline in cell growth, whereas with temperatures above 35 °C, protein denaturation becomes the dominant factor, causing cell death at 42°C. However, in accordance with published findings that cellular proteomes have a broad distribution of protein stability with only proteins at the tail of the distribution being problematic 43 , using our approach we identified only ~1% unstable enzymes denatured at the lethal point (Tm lower than 42 °C , Fig 3d).
We identified two interesting metabolic pathways involved in yeast thermotolerance: sterol metabolism and mitochondrial energy metabolism. With sterol metabolism (Fig 5d), it is known that high sterol levels help yeast cells survive under heat stress 44 and changes of the sterol composition of the yeast membrane from ergosterol to fecosterol 45 can significantly increase yeast thermotolerance. However, yeast was found to downregulate its whole ergosterol biosynthesis at both transcription and translation levels when increasing the temperature from 30°C to 36 °C (Fig S10). Our modeling approach identified three problematic enzymes (Fig 5d: HMG1,2 and ERG1) in the sterol metabolism, which are also flux-controlling enzymes in the sterol biosynthesis pathway 46 . We experimentally confirmed that replacement of ERG1 with its ortholog in the thermotolerant yeast K. marxianus can significantly improve the cell growth at 40 °C (Fig 5c). We thereby hypothesize that, since those three enzymes are problematic at superoptimal temperatures, there is no need for the cell to maintain high expression and translation levels of other enzymes in the same pathway.
Instead, it has to downregulate its whole ergosterol biosynthesis to save resources and increase fitness.
With mitochondria, previous studies have indicated that the mitochondrial genome plays an important role in yeast thermal adaptation [47][48][49] . We found that out of the 9 unstable enzymes identified with the Posterior etcGEMs (with a Tm lower than 42 °C, Fig S7), three (ATP1, HEM1 and PDB1) belonged to the mitochondrial energy metabolism. Simulation of chemostat data (Fig 4) revealed that at superoptimal temperatures, yeast prefers to produce ATP via the glycolysis metabolism instead of the mitochondrial energy metabolism in the mitochondria. Furthermore, mitochondria only exists in eukaryotes and almost all of them have evolved to have an optimal growth temperature below 40 °C 3 . All these findings indicate that mitochondria are not evolved to be functional at very high temperatures (e.g. >42 °C). Since mitochondrial energy metabolism is not essential for yeast cell growth, as there are alternative energy pathways (Fig 4), this also explains why we could not successfully predict mitochondrial enzymes to be engineering targets for the recovery of cell growth at 42 °C (Fig 5d), despite the existence of three unstable enzymes in the mitochondrial energy metabolism.
In conclusion, we demonstrate the usefulness of a Bayesian genome scale modeling approach for reconciling temperature dependence of yeast metabolism. Describing the link between temperature and cell physiology is of industrial importance, e.g. for finding optimized production of biochemicals 24,50-52 , but also in medicine, e.g. to understand the effects of temperature on human metabolism [53][54][55] . Furthermore, based on its success here, we foresee that our method can be integrated into genome scale modelling approaches in general. This approach can also become a staple of GEM modeling in order to resolve uncertainties present in the data, which can be important as GEMs have become a widely used platform for integration of various biological data, such as transcriptomics and proteomics data that are associated with large uncertainties 56 .

M1. A temperature dependent enzyme-constrained genome scale metabolic model (etcGEM)
The central concepts of an enzyme constrained model 16 in which kB is the Boltzmann constant, h is Planck's constant, is the universal gas constant, and ∆ ‡ ( ) is the free energy difference between the ground state and the transition state. The latter can be expanded as where ∆ 1 ) ‡ , ∆ 1 ) ‡ and ∆ & ‡ are the differences in enthalpy, entropy and heat capacity change between the transition and ground states, respectively, and 4 is the reference temperature. This theory has been successfully applied to study the temperature dependence of enzyme activity 32,33 and evolution 36 .
Since there is not enough detailed information regarding the heat-induced denaturation process of yeast proteins, a simple two-state model denaturation was assumed as in many other studies 20,21,31 . In such a  difference between the denatured state and the native state and can be expressed as where ∆ 9 ( ) and ∆ 9 ( )are the enthalpy and entropy changes between the denatured and native states at temperature . It has been found that convergence temperatures : * (373.5 K) and < * (385 K) exist for ∆ 9 and ∆ 9 respectively 42,57,58 . At such temperatures, the ∆ 9 and ∆ 9 converge to a common value of ∆ * and ∆ * . Thereby, in which ∆ &,9 is the difference in heat-capacity change between the denatured and native states.
In summary, the values of ∆ ‡ ( ) and ∆ 9 ( ) need to be determined in order to model the temperature dependence of enzyme activities, and they can be associated with six unknown parameters: and ∆ * , ∆ * and ∆ &,9 for ∆ 9 ( ).

M2. Computation of thermal parameters
Since it is difficult to directly measure those six thermal parameters (∆ 1 ) for each enzyme, indirect measurements have to be used to approximate the larger set of thermal parameters. As there are six free variables in the system, six different equations are required to solve for those parameters.
1) At the protein melting temperature % : 2) At the enzyme optimal temperature (&$ , the enzyme activity is maximized: 3) "#$ at the enzyme optimal temperature (&$ is known: 4) ∆ & ‡ value can be approximate from temperature dependence of cell growth rate 32 5) We found that there is a very strong linear correlation ( ) = 0.998, Pearson's correlation) between ∆ * and ∆ * of 116 proteins from Sawle L et al 42 (Fig S9) ∆ * = 299.58∆ * + 20008 J/mol (10) 6) For some enzymes, A4 , where a 90% possibility exists that an enzyme molecule is in the denatured state, is experimentally measured: As a result, the six thermal parameters ∆ 1 ) 9 can be obtained by solving the above equations.
In the case of lacking A4 or failed to obtain a positive ∆ &,9 , protein sequence length was used to estimate ∆ * and ∆ * 42 as below: ∆ * = 13.27 + 448 (13)

M3. Sequential Monte Carlo based Approximate Bayesian Computation (SMC-ABC)
Approximate Bayesian Computation 34 was applied to infer parameter sets from Posterior distributions.
Given an observed dataset and a model specified by / sampled from the Prior distribution ( ), if the distance between simulated data U and observed is less than a given threshold , then this / is accepted as the one sampled from W W , U Z < Z. W W , U Z < Z is often used to approximate the Posterior ( | ) when is sufficiently small. In case of high-dimensional parameter space and/or when the ( ) is very different from ( | ), the acceptance rate would be very low and thus this approach becomes computationally expensive to generate a population of / from W W , U Z < Z. In this work, a sequential Monte Carlo approach was designed as follows to generate a population of / sampled from W W , U Z < Z:

M4. Collection and estimation of enzyme thermal parameters in etcYeast7.6
The enzyme-constrained model for yeast with minimal medium was taken from 16 .

Melting temperatures
Among the 764 enzymes included in ecYeast7, the % (melting temperature) and A4 (the temperature at which 90% of the protein is in the denatured state) for 266 yeast proteins have been reported previously 7 . For enzymes lacking an experimentally measured % , a melting temperature of 51.9 °C (the average of existing % s of 707 yeast proteins) was assumed. In the original paper 7 , the 95% confidence interval was reported for peptides measured in the experiments and the average standard error was estimated at 3.4 °C. This same value was used as the uncertainty measure for the experimentally determined % s, since the standard error for protein % was not available. The % of the 266 enzymes was then described with a normal distribution ( %,-, 3.4), in which %,-is the experimentally measured melting temperature of protein . For enzymes that uses the mean % of 707 proteins 7 as % estimation, the corresponding uncertainty is described as the the standard deviation of the the 707 % s, equalling 5.9 °C. Thereby, a normal distribution (51.9, 5.9) was used.
Enzyme optimal temperature (&$ values of all enzymes in this study were calculated using a previously described machine learning method 22 , which predicts enzyme (&$ based on primary sequences. This model has a coefficient of determination (R2 score) of 0.5 on the test dataset. Root-mean-squared error (RMSE) of the prediction was then estimated with: