Parameter Estimation in Epidemiology: from Simple to Complex Dynamics

We revisit the parameter estimation framework for population biological dynamical systems, and apply it to calibrate various models in epidemiology with empirical time series, namely influenza and dengue fever. When it comes to more complex models like multi‐strain dynamics to describe the virus‐host interaction in dengue fever, even most recently developed parameter estimation techniques, like maximum likelihood iterated filtering, come to their computational limits. However, the first results of parameter estimation with data on dengue fever from Thailand indicate a subtle interplay between stochasticity and deterministic skeleton. The deterministic system on its own already displays complex dynamics up to deterministic chaos and coexistence of multiple attractors.


INTRODUCTION
A major contribution to stochasticicty in empirical epidemiological data is population noise which is modelled by time continuous Markov processes or master equations [1,2,3]. In some cases the master equation can be analytically solved and from the solution a likelihood function be given [4]. The likelihood function gives best estimates via maximisation or can be used in the Bayesian framework to calculate the posterior distribution of parameters [3]. Here we start with an example of a linear infection model which can be solved analytically in all aspects and then generalize to more complex epidemiological models which are relevant for the description of e.g. influenza or dengue fever [5], on the cost of having to perform more and more steps by simulation to obtain the likelihood function by complete enumeration [6] or even just to search for the maximum in extreme cases.
Recent applications to a multi-strain model applied to empirical data sets of dengue fever in Thailand, where the model displays such complex dynamics as deterministic chaos in wide parameter regions [5,7], stretch the presently available methods of parameter estimation well to its limits. Finally, the analysis of scaling of solely population noise indicates that very large world regions have to be considered in data analysis in order to be able to compare the fluctuations of the stochastic system with the much easier to analyze deterministic skeleton, which however can already show deterministic chaos [8,9], here via torus bifurcations [10].

AN ANALYTICALLY SOLVABLE CASE
The simplest epidemiological model, the SI system, where infection is only aquired from the outside, leads to a master equation which is not only linear in probability but also in the state variables, leading to a linear mean field approximation for the dynamics of the expectation values [3]. Though very simple in its set-up, it can be applied to real world data of influenza in certain stages of the underlying SIR model, when considering the cumulative number of infected cases during the outbreak [4], which can besolved using the characteristic function or approximated by Kramers-Moyal expansion to preserve eventual attarctor switching in more complicated models, relevant for e.g. influenza [11].

Solving the master equation
For suitable initial conditions the master equation can be solved to obtain the transition probabilites needed to construct the likelihood function. The dynamics of the characteristic function, obtained from the master equation is giving the partial differential equation using β * := (β /N)I * , and initial condition p(I,t 0 ) = δ I,I 0 , hence g(κ,t 0 ) = e iκI 0 , with solution

Likelihood function from master equation
>From the transitions probabilities we can construct the likelihood function, i.e. the joint probability to find all data point from our empirical time series interpreted as a function of the model parameters p(I n ,t n , I n−1 ,t n−1 , ..., as shown in Fig. 1 b) for the likelihood function and its maximum as best estimatorβ for the parameter β . Here the best estimator can be calculated analytically as well [3].

Bayesian framework
The Bayesian framework starts by using the likelihood function L(β ) = p(I|β ) with I = (I 0 , I 1 , ...I n ) and constructs from it the probability of the parameters conditioned on the present data, here from the time series of the epidemic system, the Bayesian posterior p(β |I). This can be only achieved by imposing a prior p(β ), a probability of plausible parameter sets, hence we have For the linear infcetion model all steps can still be performed analytically. The posterior is given explicitly by prior parameters a and b. Fig.  2 a) shows the comparison of a histogram of best estimates from many realizations of the stochastic process (red), an information which is not given sufficiently in most empirical systems, often we only have a single realization, and the results from the parameter estimation, a Gaussian approximation from the best estimate and the Fisher information (green), and finally the Bayesian posterior (black) obtained from a conjugate prior (blue), which is here nicely broad, not imposing much restriction to the considered parameter values.

NUMERICAL LIKELIHOOD VIA STOCHASTIC SIMULATIONS
In cases in which not all steps or even no step can be performed analytically, a comparison of stochastic simulations, depending on the model parameters, with the empirical data is the only available information on the parameters. In Fig. 2 b) an SIR model is fitted to empirical data from influenza, here due to the relative simplicity of the stochastic model, still a numerical enumeration of the whole relevant parameter space is computationally possible (equivalent to a flat but cut-off prior. In more complicated models, e.g. dengue fever models and data from e.g. Thailand, even a complete enumeration of the parameter space is not computationally possible, though only six parameters and nine initial conditions have to be estimated. Particle filtering, i.e. an often quite restricted distribution of parameters and initial conditions, a hard prior in Bayesian language, is stochastically integrated and compared to the empirical time series, selecting the best performers as maximum of the likelihood function.

Application to more complex models
In such cases, in which the model can display deterministic chaos, like the one in dengue we investigate [5,7], the short term predictability and long term unpredictability (as measured by the largest Lyapunov exponent) even prohibits the comparison of stochastic simulations with the entire time series. Hence iteratively short parts of the time series are compared with the stochastic particles, and via simulated annealing the variability of the particles is cooled down, a)   priors narrowed in Bayesian language. The final method is called "maximum likelihood iterated filtering" (MIF). We apply this method to dengue data from Thailand, and display here bifurcation diagrams obtained from the best estimates, see Fig. 3, extrapolating finally to larger population sizes, see Fig. 4.