Quantitative approaches to energy and glucose homeostasis: machine learning and modelling for precision understanding and prediction

Obesity is a major global public health problem. Understanding how energy homeostasis is regulated, and can become dysregulated, is crucial for developing new treatments for obesity. Detailed recording of individual behaviour and new imaging modalities offer the prospect of medically relevant models of energy homeostasis that are both understandable and individually predictive. The profusion of data from these sources has led to an interest in applying machine learning techniques to gain insight from these large, relatively unstructured datasets. We review both physiological models and machine learning results across a diverse range of applications in energy homeostasis, and highlight how modelling and machine learning can work together to improve predictive ability. We collect quantitative details in a comprehensive mathematical supplement. We also discuss the prospects of forecasting homeostatic behaviour and stress the importance of characterizing stochasticity within and between individuals in order to provide practical, tailored forecasts and guidance to combat the spread of obesity.


Introduction
In this supplementary document we provide the basic mathematical details of each model detailed in the manuscript. We provide a brief overview of each model, detailing its most important features, before giving the equations characterising the model. We then summarise the main findings of each model, including experimental comparison where appropriate. Further details, such as bifurcation analyses or derivation of optimal controls are not included in full for reasons of brevity, however they are summarised at the end of each section. Readers are directed to the paper in which these details are derived for full details. Although we have altered some notation for ease of reading (especially where variable names were several characters long) we have remained as close as possible to the notation of the original papers, making transitioning into reading the primary literature as simple as possible.

Endocrine models
Models in this section describe the dynamics of one or more endocrine regulatory systems in response to external perturbations. The most extensively modelled regulatory system is glucostasis -the regulation of blood glucose by insulin and glycogen. In this case external stimuli are infusions of glucose. This class of models has a surprisingly rich phenomenology, especially when considered over long timescales and extended to include the dynamics of pancreatic β cells (which secrete insulin, and whose growth depends nonlinearly on glucose concentration). Insights from glucostasis have been used to investigate other mechanisms such as leptin-mediated control of feeding, and models of the interaction between multiple regulatory systems and behaviour are beginning to be investigated.

Bergman -Minimal model of insulin secretion
Bergman's famous 'minimal model' [1] aims to describe the response of body to the intravenous glucose tolerance test (IVGTT). In the IVGTT glucose is injected as a bolus into the system, and a first-phase insulin secretion is also modeled as a bolus arriving along with the glucose. This model improves on the work of Bolie by considering a second-phase insulin secretion at the pancreas I rather than having a single insulin variable representing the average concentration in the body. This incorporates the fact that it takes time for insulin to travel from the pancreas to the rest of the body, an idea which is developed further in delay differential equation models [SI 2.3]. Changes in I occur in response to dynamics of glucose G and insulin in the 'remote compartment' X, where insulin only acts once it has made its way to the remote compartment. This model leads to a set of 3 ODEs (following the notation of de Gaetano and Arino [2]) dI(t) dt = P 4 (G(t) − P 5 ) + t − P 6 (I(t) − I b ) The baseline levels of glucose and insulin are given by G b and I b respectively, P 1 is the insulinindependent rate of glucose uptake by the body, P 2 the rate of decrease in tissue glucose uptake and P 3 is the insulin-dependent rate. P 4 is the rate of pancreatic release of insulin per mg/dl of glucose, P 5 is the target glycemia in the pancreas and P 6 is the decay rate of insulin in the blood. By fitting time series to this model, a number of metabolic parameters representing glucose tolerance can be obtained, allowing classification of individuals into groups of high and low tolerance.

Bolie -Simple insulin and glucose dynamics
In 1961, Bolie defined a two-component ODE model of insulin-mediated glucose regulation [3]. In a system of volume V , insulin X and glucose Y are modelled as a system of two coupled ODEs with arbitrary clearance functions F i whereİ andĠ are rates of arrival of insulin and glucose into the system. Normalising by volume V and linearising the functions F i around a baseline level x = X 0 − X, y = Y 0 − Y with constant insulin and glucose arrival rates p and q gives This system of equations captures rate of insulin secretion as depending on a baseline rate p as well as insulinase with sensitivity α and glucose-dependent pancreatic secretion with rate constant β. Glucose enters the system at rate q, and is stored as glycogen at rate γx and used by tissues at rate δy. This system resembles a damped harmonic oscillator, and can be solved for an initial bolus injection of insulin D I or glucose D G . If the coefficients are set so as to prevent oscillations and give a smooth return towards baseline, the dynamics for a glucose tolerance test are Bolie also derives dynamics for an insulin tolerance test and general equations where the critical damping criterion is relaxed.

Li, Kuang and Mason -Delay differential equation models of glucostasis
Li, Kuang and Mason [4] write delay differential equations for glucose G and insulin I to model the delays associated with remote insulin secretion in the pancreas, rather than by incorporating a separate 'remote' compartment. They use the same functions f 1 -f 5 as Sturis and Tolić [SI 2.13], leading to two coupled delay differential equations Simulating this system numerically shows the existence of self-sustaining oscillations in glucose and insulin levels. This system is similar to that of Sturis and Tolić, with the hard delay kernels in Li, Kuang and Mason serving the same purpose as the auxilliary variables x 1 -x 3 in Sturis and Tolić. A similar model has also been developed for the IVGTT where the remote compartment is replaced by a delay kernel [5].

Roy and Parker -Extending Bergman's model
Roy and Parker [6] extend the model of Bergman [SI 2.1] to include free fatty acids (FFAs), which are another source of energy for the body and are particularly heavily used by skeletal muscle, and infusion protocols other than a simple bolus. Plasma FFA is modeled similarly to glucose in the Bergman minimal model with a remote compartment Z and plasma compartment F . A further variable Y represents the action of insulin to store or utilise FFAs. The ODEs of the model are where p i are rate constants of the model (see [6] for details), apart from p 9 which is a function of G.
are baseline values of the respective time-varying concentrations and V G , V F are the volumes in which glucose and free fatty acid are distributed. Endocrine concentrations vary in response to exogenous stimulation: u 1 (t) is the exogenous insulin infusion, u 2 (t) is the arrival rate of glucose and u 3 (t) is the arrival rate of lipid. Roy and Parker fit the model to experimental data and find R 2 ≥ 0.8756 after the fitting procedure. High fatty acid concentrations impair glucose uptake in both the model and experiments. The model further predicts that insulin will have an antilipolytic effect, which was verified using experimental data. The addition of the extra compartments was justified by using the Akaike Information Criterion [7], with the improved fit outweighing the penalty from increased model complexity.

Dalla
Man -Nonlinear glucose absorption model used in the Type 1 Diabetes simulator Dalla Man et al [8] model the stomach as a three compartment ODE, which attempts to reproduce the biphasic nature of gastric emptying. Movement between the stomach compartments is linear, whereas gastric emptying contains a nonlinear term. The variables q sto1 and q sto2 model the amount of glucose in compartments 1 and 2 of the stomach respectively, q gut is the amount of glucose in the gut and Ra is the rate of glucose absorption into the bloodstream. The system of equations for a given glucose dose D arė where k 21 is the rate of movement from compartment 1 to compartment 2, δ(t) is a delta function, k abs is the absorption rate and f is the fraction of intestinal absorption that reaches the blood. The stomach emptying rate k empt varies with stomach contents q sto according to the equation where the stomach contents q sto is the average of q sto1 and q sto2 , i.e. q sto = q sto1 + q sto2 . The dual sigmoid nature of the stomach emptying rate causes the stomach to empty slower. A simplified model without this nonlinear emptying rate failed to accurately reproduce the data.

Lehmann and Deutsch -Trapezoidal gastric emptying kinetics
Lehmann and Deutsch [9] formulated a model of gastric emptying with a single gut compartment q gut . The gut compartment dynamics and glucose arrival rate to bloodstream Ra(t) are given byq (24) and the fraction of glucose arrival f is calibrated to each individual using the total glucose dose D. This model posits trapezoidal emptying dynamics for G empt (t): where T up is the length of time spent in the rising part of the trapezoid, T down the length of the falling period and T max the time spent at maximum emptying speed. V max is the rate of emptying needed to process all the ingested glucose in the assigned time. Although the kinetics are in principle relatively simple, this trapezoidal structure leads to a relatively large number of parameters.

Elashoff model -Modified power exponential gastric emptying
The gut dynamics used by Elashoff [10] are identical to the Lehmann and Deutsch model, apart from the stomach contents which is modelled as a power exponential where k is emptying rate and β controls the lag phase of gastric emptying. The rate of gastric emptying is the time derivative of q duo , i.e. G empt (t) =q duo (t). This is similar to the model of stomach fullness used by Siegel et al. [11] where the fraction of stomach fullness is given by q f rac = 1 − (1 − e −kt ) β . Although this is a worse fit than the Dalla Man model, it is substantially more parsimonious with only 2 free parameters.

Neural networks -Overview
Neural networks aim to approximate some nonlinear function f * (x) with input x. Typically x is a large vector, for instance a series of blood glucose and insulin measurements.
Deep networks calculate f through the composition of functions f (x) = f 1 (f 2 (x)). These functions (called activation functions) are parametrised by weights (denoted W ) and constants c, which we want to learn from data. For instance, the commonly used rectified linear unit uses the activation function Although simple composition of these nonlinear functions may seem restrictive, with a sufficiently large network it is possible to approximate arbitrary functions. Application of this operation across two layers with no restriction on the weights W i or constants c i at layer i gives the network function as This yields a vector which can be converted into a scalar prediction (for instance blood glucose at some future time point) by another layer. Layers which are not generating a prediction are referred to as hidden layers. Recurrent neural networks are networks whose structure is optimised for prediction of sequential data. Recurrent neural networks receive inputs at a series of time points x t , which are used to compute output observations o t based on hidden units h t . Hidden units are updated at each time step, i.e. at time t, the hidden units h t depend only on h t−1 and the observations x t , with loss being calculated from output o t and data y t . This encoding of sequential structure makes them ideal for time series prediction tasks, as well as natural language processing. Neural networks are an extremely active area of current research, and an excellent introductory textbook has recently been published [12].

Time series methods -Overview
Time series analysis attempts to separate stochastic and deterministic parts of the dynamics under regulation and learn the parameters of the deterministic part. In general, time series models assume discretised timesteps, where the value of each timestep Y t is a function of the previous values and some noise t : Given the very broad specification above, there are two possible approaches: attempt to learn a nonlinear function using a range of function approximation techniques (for instance neural networks), or to assume a more tractable model structure and learn parameters of that model directly. We discuss the latter approach here, as it has been widely used in applications to glucostasis. One common tractable model class is linear processes, where the value at each timestep is a linear function of the preceding ones. Moving average models (M A(q)) are linear functions of the noise at the previous q timesteps: so this is a weighted average of the last q noise observations. An autoregressive model (AR(p)) uses a number p of past observations where we want to learn the terms φ i . This can be rewritten in terms of e i as a general linear process. These two are combined to form the autoregressive moving average (ARMA) model. The other important component is the addition of exogenous variables -in this case Y t is also a function of other variables X t whose dynamics is unknown and we are simply given. This could be energy expenditure or food intake, for instance. As with support vector machines we do not cover inferring the parameters θ and φ from observed time series, however extensive resources are available [13,14,15].

Support Vector Regression -Overview
Support vector machines (SVMs) are a computationally-efficient method for prediction that allow the prediction to be a nonlinear function of the data. The basic idea of an SVM is that although a linear function of the data may not yield accurate predictions, it may be possible to transform the data into some new space where a linear function of the transformed data is a good predictor. SVMs are often used for classification, where the function to be learned is the boundaries between different classes, however we will discuss their application to regression problems [16]. We follow the introductory material given in a comprehensive tutorial [17], but do not cover the optimisation methods necessary to actually learn parameters in SVMs. Begin by attempting to learn a linear predictor f using data x and parameters w, b: where ·, · denotes the inner product in the data space (typically R n , although other inner product spaces are possible). The problem is typically phrased in terms of convex optimisation, with the goal being to minimise the norm of the weights ( w 2 = w, w such that the regression line never has more than error with respect to the data y, leading to the constraints This differs from conventional linear regression, where the aim is to minimise the sum of squared errors and (unless regularisation is being used) there is no constrain on w. This optimisation problem can be unsolvable for some values of as the data may simply be broadly distributed. To deal with data outside of the margin , we introduce variables ξ i and ξ * that penalise the distance of each data point from the 'allowed region' within of the regression line. This leads to the optimisation problem: subject to the constraints This introduces a tradeoff between weight minimisation and error tolerance, which is controlled by the parameter C. Because this formulation allows the tools of convex optimisation to be used, the problem can be solved efficiently. However, if we have nonlinear patterns in the data, we will fail to learn them with this model. We would like transform the data x → Φ(x) such that it is linearly separable in the transformed space, however for complex and high-dimensional data (e.g. images) this is computationally infeasible. The problem is resolved using the 'kernel trick'. This relies on the insight that we only use the data via its dot product, never directly. Functions that satisfy a series of properties (stated in [17]) allow us to compute the dot product in the transformed space without needing to calculate the coordinate transforms explicitly. This renders the problem tractable, replacing the inner product ·, · with the kernel k(·, ·) in the optimisation equations above. The effectiveness of SVMs depends strongly on the kernel used, although a range of kernels exist, including for more complex data structures such as graphs.
2.11. Topp et al. -Multiple-timescale analysis of glucostasis reveals the routes to beta cell extinction Topp et al. [18] formulate a multiple-timescale model incorporating fast-timescale regulation of glucose G by insulin I and slow dynamics of pancreatic beta cell mass β. Glucose is described by the following ODE the rate of production at G = 0 is R 0 , E G0 is the effectiveness of glucose at I = 0, and S I is insulin sensitivity. Insulin clearance rate is taken to be linear in insulin concentration with rate k, whereas production is linear in β and sigmoidal in G with a maximal rate σ: The glucose-insulin model evolves on a much faster timescale than the beta cell dynamics, which are given by the balance of replication and death, with replication R = (r 1r G−r 2r G 2 )β and death D = (d 0 − r 1a G + r 2a G 2 )), giving The baseline beta cell death rate is d 0 , and r 1 = r 1r + r 1a and r 2 = r 2r + r 2a are the net rate constants for glucose-dependent beta cell growth or death. The system has two stable steady states, one corresponding to a normal, non-pathological state and another at a pathological state where glucose concentration is elevated and beta cell mass diminshes to zero. These are separated by a saddle point. Analysis of this system predicts a number of possible pathways into diabetes: movement of the non-pathological fixed point to a hyperglycemic state, bifurcation eliminating the saddle point and non-pathological fixed point, and an interaction between fast and slow subsytems driving glucose levels up faster than beta cell mass adaptation can cope with.
2.12. Wang, Khan and van den Berg -Multiscale modelling of glucostasis shows the importance of short-term differences Building on both of the preceding ideas, Wang et al.develop a multiscale model with oscillatory glucose input [19]. They extend Sturis and Tolić's model [SI 2.13] by adding a time-varying glucose source ψ in (t), leading to the following inhomogeneous system of ODEs where ψ II is insulin-independent uptake, ψ ID is insulin-dependent, ψ GR is glucose release from hepatic glycogen stores, and ψ GX is glucose excretion. The parameter φ is a rate constant for the exchange of insulin between the blood and interstitial spaces, which have volumes V p and V i respectively. The mean life time of insulin is τ i in interstitial space and τ p in the blood. The delay constant τ d governs how long it takes for glucose to be released from hepatic glycogen. The differences between this model and the model of Topp et al. are time-varying glucose influx, the addition of insulin dependence in pancreatic beta cell dynamics, and a glucose and insulin-dependent pancreatic beta cell neogenesis term ψ N G . Wang et al. separate the dynamics into slow and fast systems, and analyse the behaviour of the system. They find that β cell mass can be significantly altered by the way that glucose is delivered -the same amount of glucose can lead to much worse consequences if delivered in several short bursts than if it were continuously infused, showing that short-timescale dynamics can have an effect on long-term onset of pathology.

Sturis and Tolic -Oscillatory insulin delivery increases hypoglycemia
This is an extension of Sturis' earlier work [20] and models intercellular insulin I i , plasma insulin I p and glucose G with a gamma-kernel delay of n = 3 between plasma insulin and its effect on glucose production given a constant rate of glucose infusion G in [21]. This model is specified by the set of equations where E is the transfer rate between plasma and intercellular space, V i and V p are the volumes of the interstitial space and plasma respectively and t i , t p , and t d are rate constants. Insulin production in the pancreas is determined by glucose concentration according to the following sigmoidal function where R m is the maximal release rate and C 1 and a 1 are constants fitted from previous work [22,23]. The function f 2 (G) models insulin-independent glucose utilisation by cells where U b is the baseline uptake rate and C 2 is a constant which is fitted to prior experimental data [24]. Brain and nerve cells consume glucose in an insulin-independent manner, whereas the consumption of muscle and fat is determined by both glucose and insulin availability through the following equations where U 0 and U m are rate constants and C 4 is fitted to data. Finally, insulin affects hepatic glucose production according to the following equation: where R g is the maxiumum rate. Tolić et al.simplify their model by replacing the functions f i with polynomial approximations and analyse its properties. They find that oscillatory insulin delivery has a greater hypoglycemic effect than continuous delivery.
where the function Ω is given by Forbes' law. Leptin production rate is modelled as being linearly dependent on fat mass with rate constant γ L and having degradation rate The rate of change of leptin receptor density R is assumed to vary quadratically in L where γ R is the basal production rate, λ 1 the leptin-dependent growth rate, δ R the basal death rate and λ R2 the leptin-dependent degradation rate. Finally, food intake is given as a saturable function of leptin concentration where the function Φ R (L) models the activation of leptin receptors, and is given by a Hill function A bifurcation analysis shows the existence of stable solutions; as γ I and λ R2 are varied these move from possessing a single healthy solution through to a bistable state where obese or healthy steady states are possible, to a single stable equilibrium in an obese state. Oscillating food inputs can lead to the onset of leptin resistance and subsequent obesity. Comparison of the model to experiments where leptin was continuously injected show a good agreement to the data.

Energy balance models
Energy partition models are typically deterministic ODE models which consist of a number of body compartments (such as fat and lean tissue) and a set of functions for partitioning energy intake between them. They aim to forecast body composition changes over time given data for calorie intake and expenditure, but generally make no attempt to predict behaviour.

Forbes, Hall's updated Forbes' Law -Lean and fat mass partitioning
From a regression study of women across a wide range of ages and levels of adiposity, Forbes found the following regression relationship between lean body mass L and fat mass Under this model changes of body composition are taken to lie along the regression line described above -this does not account for changes in body composition due to weight training, for example, however it appears to provide a good model for weight loss or gain in typical females from dietary causes. Assuming that the body is comprised of two compartments L and F such that total body weight W = L + F , an infinitesimal change in body weight will lead to a fat-dependent infinitesimal change in lean body mass according to the equation Hall generalised Forbes' law to macroscopic changes in bodyweight from an initial fat mass F i to a final fat mass F f [26] by using the fact that ∆W = ∆F + ∆L to write By expressing ∆L as L f − L i and using Equation 66 this can be rewritten as After some algebra, the macroscopic change in lean mass for a given change in body weight is given by an expression involving the Lambert W function W: Forbes' law thus predicts that lean mass will increase with increasing body weight, but that higher adiposity will diminish the rate of increase. Thus energy will be partitioned to lean or fat tissue differently depending on the current level of adiposity, contrary to the model of Dugdale and Payne [SI 3.4] in which a constant ratio is assumed (called the P-ratio).

Hall et al. -Energy balance predicts body composition dynamics
Hall and coauthors have published a number of papers on the subject of energy homeostasis, including predicting dynamics of body weight, composition and fuel selection in humans [27,28,29] and mice [30,31]. These models rely on the principle of energy conservation, and split the organism's body into two compartments -the fat and lean compartments. Energy balance can be specified in terms of energy balance to and from each of these compartments where ρ F and ρ L denote energy densities of fat and lean tissues respectively, and E and I are energy expenditure and intake. Applying Forbes' relationship and defining α = F/10.4 gives Accounting for all sources of energy expenditure provides a model for energy expenditure where K is the cost of thermogenesis, β a factor to account for the thermic effect of feeding, γ F and γ L the metabolic costs of fat and leat tissue and λ the cost of physical activity, which is assumed to vary with body weight. The parameters η F and η L are the biochemical efficiencies of synthesising fat and lean tissue respectively; K and λ are determined from model calibration, but β, γ L , γ F , η F and η L are fixed from previous experiments. Physical activity is taken to be partly determined by diet; in order to determine this relationship, mice were switched to a different diet at timepoints t 1 and switched back at t 2 . This data was used to calibrate λ for a variety of diets. Fuel selection is another important quantity predicted by this model -meals with different macronutrient compositions but the same amount of calories may have different effects. Accounting for fat, protein, and glucose intakes separately as I F , I P and I G respectively, and assuming nutrients are either immediately oxidated (O F , O P , and O G respectively), used in de-novo lipogenesis (DNL) or gluconeogenesis (GNG) gives the following set of ODEs It can be shown [30] that these lead to equations for net oxidation of each macronutrient group (where net oxidation is defined as oxidation without de novo lipogenesis or gluconeogenesis) However, these quantities cannot be measured directly in experimental settings. A measurement which can be calculated from gas exchange data is the respiratory quotient RQ, which uses the fact that fat and carbohydrate oxidation consume oxygen and produce carbon dioxide in different proportions to determine the balance of energy production from these sources. Writing RQ in terms of macronutrient consumption gives Comparing this with the food quotient F Q, which measures the oxygen consumption and carbon dioxide production from combusting the food outside of the body provides a measure of macronutrient imbalance, which can be used to predict body weight and composition changes Once fitted to experimental data, this theory appears to have widespread applicability in predicting body weight and composition changes for several organisms and in a range of circumstances (see above).

Alpert -Body composition dynamics in hyper-and hypophagia
Alpert considers a two compartment energy partition model with fat store f and fat free mass l of energy densities α and β respectively [32]. Food is ingested at a rateṖ of which is digested gives rise to an energy balance equation where energy is lost due to resting metabolic rate RMR, which depends on lean mass l and growth g, and total energy expenditure with activity coefficient δ. The energy available is a fraction of the total intakeṖ . A number of further assumptions, including constant or slowly-varying partitioning of excess energy between fat and fat-free mass (denoted by x), lead to a pair of ODEs for body composition in hyperphagia with ∆ denoting changes from initial value (e.g.Ṗ =Ṗ 0 + ∆Ṗ ). These have solutions Fitting to experimental data gives x = 0.87 ± 0.06. The model makes a number of surprising predictions: during hyperphagia fat increase is initially independent of activity levels, increasing amounts of energy are partitioned towards fat-free mass, and the fat store is asymptotically bounded. Although Alpert cites some experimental evidence for these claims, the evidence given is not conclusive and seems at odds with an intuitive understanding of body composition during hyperphagia.

Dugdale and Payne -Stochastic energy partition model
Dugdale and Payne define a computational energy partition model with stochastic energy intake and expenditure [33,34]. The model is discretised at a one day resolution. Expenditure E and intake I are normally distributed with means µ E and µ I and standard deviations σ E , σ I respectively. The compartments are tissue fat T , structural fat S, 'fast' lean tissue F and 'slow' lean tissue L, and these are used to derive the energy required for tissue maintenance M according to their masses where ρ is the energy consumption per kg of a given compartment. The prefactor is given from a WHO Technical Report estimating minimum energy requirements. This gives the daily calorie balance B as Each individual is characterised by the fraction of excess energy deposited to as lean tissue P , with the remainder being deposited as fat tissue. Energy storage is assumed to be inefficient, with only proportions E 1 and E 2 of energy allocated to lean tissue and fat creation respectively actually being deposited. For a day with an energy surplus, tissue deposits are updated by the following rules: where Q = 1 + E 1 P + E 2 (1 − P ) and the factor of 4680 converts between energy and body mass. For negative balance the situation is similar, but energy retrieval is taken to be completely efficient and storage fat can also be metabolised Finally, the transfers between fast and slow lean tissue are calculated according to the change in ratio between fast and slow lean tissues. For initial values L 0 and F 0 , K 1 = F 0 /L 0 and K 2 = F/L. Now Dugdale and Payne compare their model to a number of underfeeding and overfeeding experiments and find reasonable quantitative agreement, especially in the case of underfeeding obese subjects.

Westerterp and Speakman -An alternative to set-and settling-point theories
In addition to set-and settling-point theories, Speakman has put forward a model known as the dual intervention point model [35,36]. In set-point and settling-point models, control of energy homeostasis in scenarios of energy excess and energy deprivation occurs via the same mechanism. In the dual intervention point model, energy homeostasis is weakly regulated between an upper and lower limit, where settling-point behaviour occurs due to environmental factors, and strongly regulated above or below these limits, where set-point control acts on energy homeostasis. This has the advantage of explaining the asymmetry of body weight drift, as the lower limit is set by physical and physiological concerns, and cannot go lower without endangering the organism. In prehistory, the upper weight limit is considered to be irrelevant as in practice high body weights would be selected against by food shortages and predation. However, once predatory pressure and food shortages became less of a consideration, regulation of unhealthily high body weight was left in the hands of physiological mechanisms of bodyweight regulation, which are expected to vary considerably and drift between generations. A mathematical model of this theory has yet to be developed, although Speakman & Westerterp have formulated a model of weight loss under total starvation that captures some aspects of the theory [37]. It has some similarities to the combination model of Tam et al. [SI 4.2]; if the mechanism of energy sensing is endocrine then formulating a model in terms of two endocrine regulatory systems, one acting on each control threshold, would give this model physiological relevance and test theories of endocrine control. It may also account for the disparity of timescales in endocrine regulation -going below the lower threshold is an immediate threat to life, whereas going above the upper threshold is less optimal, but not immediately dangerous.

Kozusko -Empirically-derived nonlinear energy partitioning
Kozusko [38] attempts to account for the observed decrease in energy expenditure per unit of body weight by allowing energy expenditure E to depend on body weight W according to the following equation where α accounts for the aforementioned variation. In Kozusko's model, α is linear in W away from a setpoint α 0 and body weight W 0 to starvation levels α s , W s which can be rewritten as α = α 0 β 1 + β 2 Kozusko uses an empirical function of the fat free mass ratio (R), which is defined as fat free weight divided by total body weight, to predict an individual's value of β 2 via the function for parameters f and m to be determined from fitting. The model is compared to Kleiber's scaling law for energy expenditure [39] and the Harris-Benedict equations [40] when applied to data from the Minnesota starvation experiment. The adjustment for reduced energy expenditure allows the model to more accurately capture adaptation leading to lower weight loss, whereas models without this behaviour tend to dramatically overestimate weight lost under caloric restriction. Kozusko's model still overestimates expected weight loss, but is a significant improvement. The free parameters f and m allow fitting to data in order to account for individual variation, which may lead to more accurate prediction for individuals and may thus be of clinical use. The purely empirical basis of this model makes it less informative about mechanisms underlying adaptation; although it describes and may predict individual behaviour, it does not speculate on its physiological basis, nor account for situations with ad libitum feeding.

Antonetti -Energy balance incorporating thermic effect of feeding
Antonetti [41] formulates a similar energy balance model in terms of change of internal energy ∆U , energy from food E f , energy excreted E w , environmental heat loss Q L , and energy expended on work W k , with a fraction α allocated to specific dynamic action (the increase in heat generation following feeding) Acting over a period of days ∆Θ and collecting terms which can be expressed as an ODE where γ is a constant of proportionality relating energy U to body weight W . Writing Q L and W k as scaling in body weight to some power, Antonetti obtains the ODE which is solved numerically, with the proportionality constants K A and K B obtained from data. This yields predictions for body weight change under different activity and calorie intake conditions. Antonetti states that, at the time, clinical data was insufficient to provide a valid test of this equation, although he does attempt to make some comparisons.

DEB -Generalised energy flow model
Dynamic Energy Budget (DEB) theory attempts to model a generalised organism and its allocation of energy. A review of the basics of DEB theory are presented by Sousa et al [42], which we attempt to summarise here. DEB theory considers the flows of energy in a generalised organism which can invest energy into structure V (which acts as a measure of the 'volume' of the organism), energy reserve E and maturation E H . Structure and reserve are referred to as 'generalised compounds', which represent many biological compounds making up parts of the organism. Different types of organism are represented by different parametrisations of the model, particularly the parameters representing power allocations to different compartments.
Food intake flow is p X , of which p A is assimilated into the reserve and the remainder is excreted. The organism consumes p C to fuel its activities, so Energy flux allocated to growth at rate p G increases structure V dependent on specific cost of growth Energy can also be allocated to somatic maintenance p M and maturity maintenance p J . Energy is allocated first in pairs according to an allocation function κ(V, E) such that p M + p G = κp C and p J + p R = (1 − κ)p C . Maintenance costs are paid first in this model, with the remainder being allocated to maturity, reproduction, or growth. Feeding is dependent on surface area J Xm and food density X where f (X) scales feeding rate by food availability. Organisms are assumed to have two different 'life-history events', which are signalled by the maturity variable E H passing set values. These are birth at E H = E B H and puberty at E H = E P H . After puberty, maturation stops and no more energy is allocated to maturation. Maturation energy allocation is thus accounted for by the equation Along with a number of other assumptions, this model can recover a general expression for growth. If abundant food and no change in life stage are assumed, von Bertalanffy's law for the 'size' of the organism L [43] can also be derived where r B is a function of maintenance rate, food availability, and fraction of energy invested into growth. This is a well-known growth law obeyed by many species. DEB theory has been developed beyond these foundations to account for a wide range of situations and organisms, as is found on the DEB theory website.

Chow and Hall -Dynamical systems analysis of the Energy Balance model
Chow and Hall investigate the fixed points of a class of energy partition models based around the Energy Balance model [44], [SI 3.2]. They begin with a three compartment flux balance model for fat F , glycogen G, and protein P where ρ F , ρ G and ρ P are the energy density of each compartment, I F , I C , I P the intake rates and f F f C are the fraction of expenditure derived from each compartment. By assuming that glycogen is constant on long timescales and accounting for the additional mass due to water content of each compartment, this can be rewritten to a two variable model for fat mass F and lean tissue mass L The function f = f (I F , I L , F, L) is a function governing how energy is drawn from each compartment to meet expenditure requirements. For instance, imposing Forbes' law [25], yields the equation where α = ρ F F/10.4ρ L as before. If intakes I F and I L are held constant, the function f will determine the position and nature of the fixed points of the model, which will occur at the intersection of the F and L nullclines A stability analysis of the energy partition model given by Forbes' law reveals the existence of a marginally stable attracting invariant manifold rather than one or more isolated fixed points. This indicates that any temporary change in intake or expenditure will cause body composition to move to a new point on the manifold, rather than return to its previous value before the perturbation. Chow and Hall point out that experiments involving weight change through aerobic exercise or changes in energy intake will be unable to distinguish between different classes of model, and that experiments that directly alter body composition or fat utilisation fraction are necessary to investigate this but have not yet been carried out.

Control theoretic models
Models using control theory attempt to find 'optimal' behaviour policies for an organism in response to a given environment. These typically revolve around choosing a policy to maximise some cost function which keeps track of how well the policy deals with the environment. Policies can be deterministic or stochastic, and take place in either discrete or continuous time steps, and these choices determine how the optimal control can be found.

Davis and Levine -Proportional-integral control of a single feeding bout
Davis and Levine construct a control theoretic negative feedback model for the control of fluid meal size by gut filling based on experimental findings which they review in the introduction to their paper [45]. Intake I(t) is modelled as a proportional-integral controller based on a drinking rate d and an ingestion signal (t) such that I(t) = d (t). The ingestion signal is taken to have the form: where p is a parameter representing the significance of the food for the animal (for instance providing salty food to a sodium-depleted animal) and g(t) is the concentration of the flavour signalling the relevance of the food. The product gp is referred to as palatability. The parameter r is called the retention coefficient and represents the time taken for fluid to be absorbed into the gut; a small value of r indicates rapid absorption. Finally, k is a proportionality constant that determines how much gut fullness contributes to the decrease in eating rate. Assuming that g(t) is a step function: the control system can be solved using Laplace transforms to find the ingestion rate I(t) and the cumulative ingestion rate C(t) which is used to suggest that that palatability plays a crucial role in the total amount of feeding that occurs in a meal. The model shows an ability to fit previous experimental data well over ingestion time ranges of 30 minutes, and can reproduce behaviour from both sham feeding and altered food palatability, but does not predict an ending time for the feeding bout.

Tam et al. -Leptin-mediated control of body weight
Tam et al. [46] consider leptin-mediated regulation of energy homeostasis on a timescale of weeks in order to distinguish between set-point and settling-point control of bodyweight, and to investigate the effects of leptin resistance. Leptin production into the bloodstream (which has volume V ) is assumed to be proportional to fat mass F with a synthesis rate R syn and is removed from the blood by the kidneys in a concentration-dependent manner with rate R clear . Plasma leptin L p enters the brain through both saturable and non-saturable pathways, leading to changes in brain leptin L b . Thus for both set-point and settling-point models the leptin equations are (with some condensing of notation) The rate constants R syn , R clear , k 1 , k 2 , and k 3 are all derived from literature values rather than fitted. Body mass M is given by fat mass F and lean mass M L , with changes in body mass occurring solely due to increases in fat mass. Fat mass changes according to the imbalance between intake I and expenditure E, leading to the body composition equations where ρ F is the energy density of fat.
In the settling-point model energy expenditure E and intake I are given by saturable functions of brain leptin concentration Rate constants k 4 , k 6 and k 7 are derived from literature values, with k 5 and k 8 fitted from data.
The set-point model describes energy intake and expenditure through the use of a proportional-integral controller which scales E and I to return L b to a set point L * Constants c 1 and c 2 are derived from intake and expenditure at steady-state in the settlingpoint model. a 1 -a 4 could not be fitted directly from data, so were chosen arbitrarily to prevent oscillations, however they do not affect the steady-state values of the set-point model. Leptin resistance was modelled by changing the value of the rate constant k 2 , which modifies the blood-brain transport of leptin. Due to a lack of experimental data, the authors used the following phenomenological equation using the Heaviside step function Θ, k 9 is a scaling factor and k 10 is the leptin concentration where resistance begins. Tam et al.propose a combination model which incorporates features of both the set point and settling-point model, as neither fully account for observed experimental data; the setpoint model is too effective in preventing obesity, whereas the settling-point model does not provide a strong enough defence of body weight.

Jacquier et al. -Endocrine control of body composition usng the Energy Balance model
Jacquier et al. [47] construct a model of coupled ODEs linking energy balance ∆ E to changes in food intake and energy expenditure, with food intake being controlled by variables representing leptin, glucose, and insulin. Energy expenditure is determined by fat mass S and fat-free mass W , as well as an activity parameter ξ and rate parameter R. Food intake is given as the minimum of hunger h and available food a, giving an energy balance equation where ρ S and ρ W are energy densities of fat and fat-free tissue respectively. The evolution equations for S and W are as in the Hall model, x = dW/dS = ζ +ψ exp(κS) governs the partitioning of energy between fat and lean mass. In this case the parameters are estimated during model fitting. Plasma leptin l is given by: with production rate γ 2 and clearance rate γ 1 . Plasma glucose u is governed by a similar equation involving food intake, production rate µ 1 and clearance rate µ 2 : Ghrelin e is inhibited by food intake, and has the following rate equation: These three variables are combined to determine the hunger variable h, given in terms of number of Joules required Allowing hunger to depend on endocrine levels, which in turn depend on metabolic state, makes this model more physiologically relevant, and considers food intake as a feedback control system. Finally, the rate parameter R is allowed to vary to simulate adaptation to different food intakes by comparing food consumed in a recent period to that eaten over a long time period  [48,49]. The main strength of this model is that although it simulates down to a fine time resolution, it also models long-timescale dynamics by allowing fat reserves to modulate feeding behaviour. In addition it incorporates circadian effects and a simple model for energy expenditure due to time-varying metabolic rate, as well as stochastic feeding behaviour. These features make it probably the most fully-realised model for energy homeostasis developed to date, however as it predates many modern endocrinological discoveries it is not formulated in terms of specific endocrine mechanisms. The model is formulated as a block diagram, which we reproduce below (Figure 1), before outlining a number of the most important dynamical equations in the model. The central regulator of feeding in the Booth model is the energy flow E, which is comprised of absorption of energy from the gut A, flow to/from lipids L, metabolic rate M and the gut satiety Energy flow triggers feeding (or its cessation) through a two-threshold model; if F = 0 and energy flow is less than some threshold E H then feeding rate becomes some positive constant F . Feeding then continues until a satiety threshold E S is reached, at which point F = 0. These thresholds can be a constant, however Booth et al.also consider thresholds undergoing a mean-reverting random walk. Food enters the stomach, the fullness of which depends on feeding rate F as well as absorption A(t) The absorption rate into the bloodstream depends on a varying rate R(t) and the energy content of the stomach G A(t) = R(t) G(t).
A number of other factors such as the thermic effect of feeding and sensory qualities of food are included in the model (see Figure 1). Simulating the model generates predictions for feeding bouts, metabolic rates, and body mass, and can reproduce experimental results to a reasonable degree of accuracy [50].

McFarland and Sibly -A general framework for optimal behaviour
McFarland and Sibly [51,52] attempt to formulate behaviour in terms of an optimality principle. By denoting all environmental and physiological factors relevant to the behaviour under consideration as a vector x (which is referred to as causal factor space) and denoting behaviour sequences as movement within causal factor space, given some cost function C(x, u) the fitness of a behaviour sequence u(t) over a time interval [0, T ] is defined as being The Pontryagin minimum principle gives the optimal behaviour sequence by maximising a Hamiltonian-like equation: where N is the dimension of causal factor space. This can be solved analytically for quadratic cost functions which yields an exponentially decaying intake rate and an exponentially satiating cumulative intake -the same prediction as Davis and Levine's control theory based model. This indicates that their mechanistic approach, which agreed well with behavioural observations, is also optimal according to this cost function.

McNamara and Houston -Dynamic programming for optimal behaviour
McNamara and Houston [53] consider a behavioural sequence over a finite interval [0, T ], discretised into time periods t = 0, 1, 2, . . . , T . This behaviour sequence is typically assumed to be a short period of time with no opportunity for reproduction in [0, T ]. The fitness of a behaviour sequence is indicated by how beneficial it is to the organism's chance of reproduction at the end of the time interval, which is denoted R(x) for state x. Actions can have stochastic payoffs, so the state at time t is a random variable denoted X t , with x representing a definite state in a discrete state space. Policies (series of actions a) are denoted by π. Because both time and state are discretised, there is a finite set of combinations of x and t, and this problem is solvable for the optimal policy using dynamic programming. The central object of this approach is the following function which gives the expected future reproductive success from (x, t) if the organism follows the optimal policy. This can be shown to provide a way to choose the optimal action a * for a given (x, t) which also allows a way of calculating the cost of performing a suboptimal action a c(x, a , t) = H(x, a * , t) − H(x, a , t).
This framework is applied to an example of a small bird that can perform one of several foraging actions at several time points during the day, or rest and expend no energy. The bird expends energy overnight and cannot forage, so must end the day with enough energy to survive. The dynamic programming approach allows the calculation of optimal policies for this problem. By using the equation for the cost of a suboptimal policy McNamara and Houston offer an explanation for the 'dawn chorus': it is typically less costly to perform a non-foraging task in the first time period of the day given a relatively small level of reserves, as it is likely that foraging for the rest of the day will be sufficient to allow survival.

Niyogi et al. -Optimal stochastic policies in working for a reward
Niyogi et al.formulate a model of mouse behaviour at the level of individual activity bouts, motivated by a brain stimulation reward [54]. Although not directly modelling feeding behaviour, this model has the potential to be adapted to this end as both feeding and work for stimulation reward are settings in which some activity (feeding, lever pressing) is carried out for some reward (food, brain stimulation). The main difference is that brain stimulation does not appear to 'satiate', whereas the utility of food decreases significantly as more is eaten. Their model is formulated in economic terms; brain stimulation reward and leisure are both modelled as possessing some level of utility. A bout of brain stimulation reward has utility RI, and leisure time has a reward which varies as a function of leisure time duration C L (τ ) for leisure time τ . Niyogi et al.investigate reward functions that are linear, supralinear, and weighted combinations of linear and supralinear in τ . In order to obtain the reward, subjects are required to work for a duration called the 'price' P . Work in the pre-reward state consists of bouts of work interspersed with bouts of leisure. Once the accumulated work time w reaches P , brain stimulation reward is administered. Following the reward there is a further leisure bout of length τ Pav + τ L (see [54] for details on the origin of the term τ Pav ) before the model transitions to the pre-reward state again. The state space S is therefore a combination of a discrete component in {pre, post} and a continuous component w ∈ [0, P ), and as such has some similarities to a class of stochastic models known as piecewise deterministic Markov processes [55,56]. The subject chooses between actions depending on a stochastic policy π, which determines the total average reward rate ρ π where π([L, τ L ]|post) and π w L are the probabilities of engaging in leisure for a time τ L in the post-and pre-reward states respectively, and n L|[pre,w] is the (random) number of leisure periods in the pre-reward interval. The policy π for a given action and its duration is state-dependent, and is assumed to be a soft-max policy, so greater expected returns will be preferred but suboptimal choices are sometimes made. The softmax policy equation for a subject in state s is for reward values Q π (s, [a, τ a ]) of engaging in action a for duration τ a given current state s. The free parameter β specifies the degree of stochasticity, and the function µ a (τ a ) is a prior over durations. Value functions for leisure in the pre-and post-reward states are where the three terms represent the utility of leisure, the opportunity cost of leisure (given by the average reward value), and the long-run value of transitioning to the following state V π . This is given by the equation The Q-function for working in the pre-reward state is more complex as there are two possible outcomes: accumulated work is sufficient to earn reward, or it is not. These are represented by the first and second term respectively in the following equation Niyogi et al.consider low, medium, and high-payoff situations, and find that in the high payoff setting subjects work as much as possible, whereas in low-payoff situations almost no work is done. In the medium payoff case the model predicts multiple work bouts with short leisure bouts, followed by a leisure bout of duration greater than the experiment's length.

Yamaguchi et al. -Learning rewards from behaviour
Yamaguchi et al. formulate a reinforcement learning (RL) model of thermotactic behaviour in C. elegans [57] based on maximum entropy inverse reinforcement learning [58]. In RL individuals act to maximise reward, which they accmplish by trading off exploration (in order to find rewarding states) and exploitation (using the high-reward states already discovered). In inverse RL, we assume that these rewards have already been completely discovered by the agent, which is acting to maximise these rewards. We do not know the agent's reward function, however, and seek to determine it from data. An important tool for this is the value function V (s), which maps states s to a number indicating the long-term reward expected from that state. For an agent with some decision-making policy π that maps states to actions (which we will return to later), the value of a state s under π is given by V π (s) = E[R|s, π] where R is the sum of future rewards r t , discounted at some rate γ ∈ [0, 1] The aim of RL is to find the optimal policy π * that maximises the return. We will not go into methods for doing so, however an excellent introductory textbook is available [59]. In order to infer r, we first infer V , but to do so, we need a model of the agent's decision policy. Yamaguchi et al. model thermotaxis as having a preference for both temperature T and its derivative dT , meaning that climbing or descending the temperature gradient as appropriate is explicitly rewarded. Worms are modelled as having both 'passive' stochastic dynamics p(s |s) that give the uncontrolled motion of the worms as well as the decisionmaking behaviour given by π. Passive dynamics captures both inertial motion and 'diffusive' motion, given by P ((T , dT )|(T, dT )) = N (T |T + dT δt, σ T )N (dT |dT, σ dT ), where N (x|µ, σ) is the likelihood of x from a normal distribution of mean µ and variance σ. Under the assumption that rewards are penalised by their divergence from the passive dynamics, the policy π * is given by π * (s |s) = P (s |s) exp(−v(s ) s P (s |s) exp(−v(s )) , which gives the likelihood of the observed state transitions s t → s t+1 : L = Π t π * (s t+1 |s t ).
This allows inference of the value function (and thus rewards) given observations of behaviour. They find inverse RL can recover synthetic data correctly, and also learn meaningful rewards and values from observed data.
4.9. Fulcher, Phillips, and Robinson -Neural mass theoretic treatment of sleep/wake cycle quantitatively predicts behaviour Neural mass theory is a simple ODE model of neuronal dynamics in response to external stimuli. Each distinct neuronal population (for example monoaminergic neurons) is treated as a single variable, which represents the average firing rate of that population. When two neuronal populations are connected, the mean cell body potential of the downstream population changes as a sigmoid function of the firing rate of the upstream population. External stimuli are represented as drives altering the mean cell body potential of each population directly. Fulcher, Phillips, and Robinson [60] build on a previously developed homeostatic model of sleep regulation due to the interaction between the mutually inhibitory wake-promoting monoaminergic (MA) and sleep-promoting ventrolateral preoptic nuclues (VLPO) neuronal populations [61]. In neural mass theory the firing of each population of cells is averaged out into a single mean firing rate, denoted Q M for MA cells and Q V for VLPO cells. The firing rate Q of a population of cells is a sigmoidal function of the mean cell-body potential of the population V where Q max is the maximum firing rate, θ is the mean firing threshold relative to resting, and σπ/ √ 3 is the standard deviation of the firing rate distribution of the population. All of these quantities are in principle measurable from electrophysiological experiments, and may be measurable by two-photon imaging techniques. The two populations inhibit one another and respond to exogenous drives D V and D M according to standard neural mass dynamics (as reviewed in [62]) .
The rate of response for a given neuronal population is τ j and ν jk is the strength of the effect of populations k on populatio j, which can be negative (inhibitory) or positive (excitatory). Fulcher, Philips, and Robinson explore the effects of a homeostatic and circadian sleep drives H and C on VLPO as well as a constant drive A on MA, as well as impulsive drives ∆D V , ∆D M such as loud noises. Drives to VLPO and MA are given by The dynamics of C and H are given by where χ is the clearance rate of the somnogen and µ gives the rate of somnogen production when MA neurons are firing (indicating an awake state). C is circadian, so ω = (2π/24)h −1 .
Comparison to experimental data reveals that the model predicts the level of auditory stimulus required to wake sleeping subjects given the time since sleep onset. The similarity of this system to the AGRP/POMC feeding control circuit is striking; a mutually inhibitory pair of neurons with by homeostatic and circadian drives, whose dynamics in turn cause behaviour that regulates the homeostatic state. It is possible that an application of neural mass theory to the AGRP/POMC system may also yield quantiative predictions of behaviour.