Modelling modal gating of ion channels with hierarchical Markov models

Many ion channels spontaneously switch between different levels of activity. Although this behaviour known as modal gating has been observed for a long time it is currently not well understood. Despite the fact that appropriately representing activity changes is essential for accurately capturing time course data from ion channels, systematic approaches for modelling modal gating are currently not available. In this paper, we develop a modular approach for building such a model in an iterative process. First, stochastic switching between modes and stochastic opening and closing within modes are represented in separate aggregated Markov models. Second, the continuous-time hierarchical Markov model, a new modelling framework proposed here, then enables us to combine these components so that in the integrated model both mode switching as well as the kinetics within modes are appropriately represented. A mathematical analysis reveals that the behaviour of the hierarchical Markov model naturally depends on the properties of its components. We also demonstrate how a hierarchical Markov model can be parametrized using experimental data and show that it provides a better representation than a previous model of the same dataset. Because evidence is increasing that modal gating reflects underlying molecular properties of the channel protein, it is likely that biophysical processes are better captured by our new approach than in earlier models.


IS, 0000-0002-0706-6307
Many ion channels spontaneously switch between different levels of activity. Although this behaviour known as modal gating has been observed for a long time it is currently not well understood. Despite the fact that appropriately representing activity changes is essential for accurately capturing time course data from ion channels, systematic approaches for modelling modal gating are currently not available. In this paper, we develop a modular approach for building such a model in an iterative process. First, stochastic switching between modes and stochastic opening and closing within modes are represented in separate aggregated Markov models. Second, the continuous-time hierarchical Markov model, a new modelling framework proposed here, then enables us to combine these components so that in the integrated model both mode switching as well as the kinetics within modes are appropriately represented. A mathematical analysis reveals that the behaviour of the hierarchical Markov model naturally depends on the properties of its components. We also demonstrate how a hierarchical Markov model can be parametrized using experimental data and show that it provides a better representation than a previous model of the same dataset. Because evidence 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/ by/4.0/, which permits unrestricted use, provided the original author and source are credited.

Introduction
Ion channels regulate the flow of ions across the cell membrane by stochastic opening and closing. As soon as it became possible to detect currents generated by the movement of charged ions through the channel via the patch-clamp technique [1], Colquhoun & Hawkes [2] developed the theory of modelling single ion channels with continuous-time Markov models which describe the time-course of opening and closing that is reflected in single-channel currents by stochastic jumps between zero (closed) and one or more small non-zero current levels in the pA range (open). The activity of an ion channel is usually measured by its open probability P O . But by 1979, Patlak et al. [3] had already observed spontaneous changes of channel activity in glutamate-activated channels, shortly afterwards followed by Magleby & Pallotta [4,5], who made similar observations in the calcium-activated potassium channel. Since then this phenomenon, known as modal gating, has been ubiquitously observed across a wide range of ion channels but the significance of modal gating has remained unclear. See Siekmann et al. [6] for a more comprehensive review of the experimental literature. Colquhoun & Hawkes [7] modified their general theory from Colquhoun & Hawkes [2] for the analysis of bursts. Bursts are defined as 'closely spaced openings, separated by longer shut periods' [7, p. 4], which means that they are related to modes with a high level of activity. Thus, the papers Colquhoun & Hawkes [2,7] contain a comprehensive theory for calculating various statistical properties of the channel kinetics from a given Markov model. However, the problem of constructing models that capture spontaneous changes of channel activity in a systematic way has, so far, not been addressed in the literature.
In this study, we present a general framework for building data-driven models of ion channels that account for modal gating. This is essential for accurately representing the dynamics of an ion channel-instead of producing a misleading constant intermediate open probability P O , a model should represent the switching between highly different levels of activity characteristic of each mode. This is illustrated in figure 1 where data points labelled M 1 form a segment characterized by a low open probability, whereas the segment labelled M 2 is characterized by a high open probability. In a realistic time series, the changes between M 1 and M 2 occur on a time scale so slow that directly fitting a model (even if they have a sufficient number of open and closed states) to the data will not be able to resolve the infrequent switching between high and low open probabilities but instead will most likely lead to a model with a constant intermediate open probability. Moreover, modes of an ion channel have been associated with biophysical properties of the channel protein [6]. Therefore, a model accounting for modal gating is more likely to appropriately relate the dynamics of ion channels to underlying biophysical states of the channel protein.
Blatz & Magleby [8] presented an early modelling study of three modes observed in a chloride channel. They chose segments representative of an inactive, an active and a flicker mode and went through a thorough model selection process. In this way, they obtained models for each of the three modes. They estimated the order of magnitude of the transitions between these modes and presented a qualitative model structure that illustrates the transitions between the three modes. The model that will be developed here can be regarded as a quantitative development of the idea by Blatz & Magleby [8].
After this early study, modal gating has only rarely been considered in ion channel models. But recently, shortly after the discovery of modal gating in the inositol-trisphosphate receptor (IP 3 R) by Ionescu et al. [9]-an observation that has been received with great interest in the IP 3 R community, Mak & Foskett [10]-Ullah et al. [11] and Siekmann et al. [12] independently proposed two different models that represent modal gating in the IP 3 R. Both models are discussed in more detail in §5. The model by Ullah et al. [11] has most recently been used for investigating  T k : stochastic processes S k and T k Figure 1. After a statistical analysis of modal gating [6], experimental data are partitioned into segments based on different levels of open probability P O by inferring changepoints j n . For the small section of data shown in (a), the channel spontaneously jumps at t ≈ 3.55 s from a low P O close to zero (M 1 ) to a high level of activity with P O ≈ 75% (M 2 ). At t ≈ 3.575 s, the channel leaves the highly active mode M 2 and returns to the low level of activity characteristic for M 1  the influence of modal gating on calcium puffs [13] and for studying the impact of increased IP 3 R activity in Alzheimer's disease [14]. One difficulty in appropriately representing modal gating of ion channels in a model is the fact that for a time series of measurements collected from an ion channel, it is impossible to infer directly in which mode the channel is at a given point in time. However, Siekmann et al. [6] have shown how this information can be obtained by statistical changepoint analysis (figure 1). Previously, segments representative of different modes were either selected by visual inspection or by estimating the open probability using moving averages. Ionescu et al. [9] presented a heuristic algorithm that segments the data based on an analysis of burst durations and burst-terminating gaps. Siekmann et al. [6] detected mode changes by identifying significant changes of the open probability between adjacent segments in a time series recorded from an ion channel. In contrast with previous approaches, the uncertainty of the inferred changepoints where mode switching has supposedly occurred can be comprehensively assessed because Siekmann et al. [6] calculated probability distributions for the changepoint locations. As a result, after this analysis has been carried out, for each point in the time series it is not only known if the channel is open (O) or closed (C), but also-with an associated level of uncertainty calculated by the method-in which of the modes M 1 , M 2 , . . . the channel is. Previously, we observed stochastic switching between a nearly inactive mode M 1 and a highly active mode M 2 in data from the IP 3 R [6]. In this paper, we will represent the stochastic process of switching   [15], empirical histograms suggest that the sojourn time distribution f M 1 (t) within mode M 1 is not exponential (see figs 5 and 6 in Siekmann et al. [6] and figures 2a and 5). For this reason, in general, more than one state is needed for accurately representing the process of switching between modes. This means that modal sojourn times are represented by phase-type distributions, a class of distributions which is defined by the time a Markov chain spends in a set of transient states until exiting to an absorbing state [16,17]. We assume that the infinitesimal generatorM representing the switching between modes M i , i = 1, . . . n M , has the following block structure: where the block matricesM i,i ∈ R m i ×m i , m i ∈ N, on the diagonal describe transitions between states that represent the same mode M i , whereas the off-diagonal blocksM i,j ∈ R m i ×m j represent transitions between states representing different modes M i and M j , i = j. An example for a model for switching between two modes M 1 and M 2 is shown in figure 3a.
Our modal gating analysis illustrated in figure 1 not only enables us to represent the stochastic process of switching between modes M i but by studying the dynamics within representative  inter-modal transitions (a) Figure 3. Modular components of a model for modal gating. (a) An example for an aggregated Markov modelM representing inter-modal dynamics, the stochastic switching between two modes, M 1 and M 2 . M 1 is modelled by an aggregate of two states, whereas M 2 is represented by one state. The rates m 23 and m 32 stand for transitions between both modes. Note thatM may, in general, represent transitions between more than two modes, therefore, the statesM i j are numbered consecutively by subscripts j, whereas the superscripts i indicate the mode M i . (b) Models Q 1 and Q 2 representing the stochastic opening and closing that is characteristic of mode M 1 or M 2 , respectively. The states C i k and O i k are numbered similar to theM i j . Note that k = 1, . . . , n i for each mode M i in contrast with the statesM i j where the index j runs from 1 to the total number of states. In figure 4, we show how M and the Q i s are combined in a model that accurately represents both inter-modal transitions as well as intra-modal kinetics. (a) Inter-modal transitions and (b) intra-modal dynamics.
segments we can investigate the processes of stochastic opening and closing characteristic of each mode. For the example in figure 1, the dynamics within mode M 2 can be analysed by considering the sequence of open and closed events between j k and j k+1 . The dynamics within a mode M i can be represented by a Markov model with infinitesimal generator Q i which is obtained by fitting to representative segments of the same mode [12]. Similar to the sojourn times in the modes M i , the open and closed time distributions f O (t) and f C (t), respectively, are non-exponential and more than one open or closed state may be needed for accurately representing the dynamics. For the example shown in figure 1, we obtain two models with infinitesimal generators Q 1 and Q 2 (figure 3b).
In this paper, we develop a new mathematical model, the continuous-time hierarchical Markov model, that accounts simultaneously for both transitions between modes as well as the stochastic opening and closing within modes. A hierarchical Markov model in discrete time has been previously described by Fine et al. [18] but because we are not aware of a continuous-time version discussed in the literature, we develop the mathematical theory in detail and prove some fundamental properties. For ion channel modelling a continuous-time representation of the For transitions between modes it is decided stochastically in which state the target mode is entered. The transitions are determined by initial distributions over the states of the models Q i . Thus, for our example, we have to choose two stochastic vectors p 1 = (p 1 1 , p 1 2 ) and p 2 = (p 2 1 , p 2 2 , p 2 3 ) that give the initial distributions over the states of Q 1 and Q 2 . In order to ensure that the states are indeed entered with the chosen initial distribution, the rates m 23 exiting M 1 and m 32 exiting M 2 are weighted with p 1 and p 2 .
dynamics is more appropriate because it is commonly assumed that ion channels are able to make faster transitions than currently resolved by experiments. For the example of modal gating, we assume that switching between modes M i is a top-level process that regulates the bottom-level process, the opening and closing of the channel characteristic of a particular mode M i . This is illustrated in figure 3.
The statesM i j are numbered consecutively by subscripts j, whereas the superscripts i indicate the mode M i . While the model is in mode M 1 or analogously within one of the statesM 1 1 orM 1 2 (figure 3a), its opening and closing is described by the infinitesimal generator Q 1 (figure 3b). As soon as M 1 is left to stateM 2 3 , the current state of model Q 1 is vacated and a state of model Q 2 is entered. Now, opening and closing is accounted for by Q 2 until the stateM 2 3 and mode M 2 is left and stateM 1 2 is entered. The transitions between modes described viaM and the dynamics within modes captured by Q i illustrated in figure 3 can be represented in a Markov model with infinitesimal generator M that is derived from the individual componentsM and Q i . The idea is illustrated in figure 4 and developed formally in §2.
In order to account for the statesM i j as well as the states O i k and C i k representing the opening and closing within M i , the state space of the full model consists of the Cartesian products of thẽ M 1 1 andM 1 2 , two 'copies' of Q 1 appear in the full model, whereas there is only one 'copy' of Q 2 which is represented by only one state inM.
For transitions between modes, it is decided stochastically in which state the target mode is entered. The transitions are determined by initial distributions over the states of the models Q i . Thus, for our example, we have to choose two stochastic vectors p 1 = (p 1 1 , p 1 2 ) and p 2 = (p 2 1 , p 2 2 , p 2 3 ) that give the initial distributions over the states of Q 1 and Q 2 , respectively. For simplicity, we assume that this initial distribution does not depend on the state from which the transition originates so that-independent of the originating state-each state in the target model is entered with the same probability. In order to ensure that the states of Q i are indeed entered with the chosen initial distribution, the transition rates have to be 'split' accordingly. For our example, the rates m 23 exiting M 1 and m 32 exiting M 2 are weighted with the stochastic vectors p 1 and p 2 . The mathematical details of the construction of this model are presented in §2.
It is a strength of our approach that it enables us to build data-driven models of modal gating in a modular way. After segmenting ion channel data with the method by Siekmann et al. [6], we obtain a stochastic sequence of events M i that describes the time course of transitions between different modes. The infinitesimal generatorsM and the Q i can then be parametrized from these data. We demonstrate the practical implementation of this approach in §3 using experimental data by Wagner & Yule [15] and compare the results with our previously published model of the same dataset [12].
We investigate the mathematical structure of the continuous-time hierarchical Markov model in more detail in §4. In particular, we show that many important properties of the infinitesimal generator M of the full model can be derived from the generatorsM and Q i . We expect that similar to its discrete-time counterpart [18], the continuous-time hierarchical Markov model will have a variety of applications beyond the modelling of modal gating considered here.
We discuss our approach to modal gating in §5. In particular, we explain why our new modelling framework provides a representation of ion channel dynamics that is likely to provide a structure that realistically captures biophysical processes.

Material and methods (a) Preliminaries
We now develop formally the hierarchical Markov model illustrated graphically in figures 3 and 4. First, let us describe the structure of the probability distribution p over the states of the hierarchical Markov model. Let v = (v 1 ; v 2 ; . . . ; v n M ) denote a state probability distribution of the modelM. That is, for i = 1, . . . , n M , v i is the probability distribution of the states in mode M i . In general, we will allowM to be an aggregated Markov model so that each of the components v i of the vector v may itself be a vector. With the term aggregated Markov model, we refer to a model where possibly multiple rather than one Markov states are used for representing the same experimental observation. Multiple states of the same aggregate cannot be directly distinguished based on experimental observations. Aggregated Markov models are capable of accounting for observations whose dwell times are distributed according to a mixture of exponentials rather than the exponentially distributed sojourns of single Markov states. We make the convention that components v i and v j that are meant to refer to a vector are separated by semicolons, whereas components of a vector are separated by commas. Let us first assume for simplicity that all modes M i are represented by only one state so that the components v i are scalars. Then the distribution p over the states of the full model M is a weighting of the distributions w i over the distributions over the states of the models Q i . Thus, we obtain p := (v 1 · w 1 ; . . . ; v i · w i ; . . . ; v n M · w n M ). Here '·' denotes scalar multiplication of vectors w i with scalars v i . If more than one state is needed for representing the modes M i , we must generalize appropriately the 'weighting' of a vector w i with a vector v i . Such a generalization is provided by the tensor product '⊗'.  2.1 (Kronecker product ⊗). We will only need the special case of the tensor product for matrices, the Kronecker product. Let A ∈ R m×n , B ∈ R p×r . Then The Kronecker product also applies to vectors by identifying column vectors with (m × 1)-and row vectors with (1 × m)-matrices.
where id m and id n are the identity matrices of the respective dimensions.
For some properties of Kronecker product and sum that we require for our analysis of the hierarchical Markov model ( §4), we refer to appendix A. For a distribution v over the states of an aggregated Markov model, subvectors that represent the distributions over the states of the same mode M i can be naturally described by partitions.

Definition 2.3 (Partitioned vectors, multi-indices). A multi-index is any vector
and How distributions p over the states of a hierarchical Markov model relate to distributions over the states ofM and Q i can be clarified by the tensor product of partitioned vector spaces.

Definition 2.4 (Tensor product
Then the tensor product u m·n of d-partitioned vectors v m and w n is defined by -Vectors u m·n ∈ R m ⊗ m,n R n can be written as linear combinations where m · n again denotes the component-wise product of m and n.

(b) A hierarchical Markov model for modal gating
Based on the block structure (1.1) ofM, we now show how a transition matrix for the full model can be calculated from its components ((m 0 ,M), . Let m and n be the multiindices defined above. The transitions within the modes M i are represented in the full model by block Moreover, we define the matrix of initial conditions for a transition from Q i to Q j by where the row vector p j ∈ R 1×n j is the initial condition for Q j from definition 2.5, and u T n i ∈ R n i ×1 is a column vector of ones. We observe that P i,j ∈ R n i ×n j so that, for i = j we have M i,j =M i,j ⊗ P i,j ∈ R m i n i ×m j n j . We can now define the components of a continuous-time hierarchical Markov model and calculate its infinitesimal generator: Then the infinitesimal generator M of the aggregated model for modal gating is calculated as follows: It is straightforward to generalize this definition recursively to an arbitrary number of hierarchies. From definition 2.4 and (2.3), we know that an arbitrary distribution p over the states of the full model can be represented by a linear combination of tensor products of the form (2.3). We now require for initial distributions that they should arise from a single tensor product of initial distributions over the states ofM and initial distributions over the states of the Q i .  Let v m be the initial distribution over the states of the top-level modelM and w n , a vector whose components w i are initial distributions over the states of the models Q i . Then the initial distribution p 0 m·n over the states of the full model M is calculated by the tensor product '⊗ m,n ' introduced in definition 2.4: Remark 2.2. We make some remarks regarding the interpretation of definition 2.6: -Note that whereas v m is a stochastic vector, w n is not. It is easy to see that p 0 m·n is a stochastic vector. -Algebraically, definition 2.6 constrains initial distributions to so-called pure tensors which can be written as a single tensor product rather than a linear combination of tensor products. -Statistically, definition 2.6 says that for the initial distribution the probabilities of being in a stateM i j and a state Q i k are stochastically independent: the joint probability of being inM i j and Q i k is the product of the individual probabilities (2.7).
It is an interesting question if the time-dependent solution p m·n (t) or the stationary distribution of the full model M remain in the form p m·n (t) = v m (t) ⊗ m,n w n (t) for t > 0. In fact, this is generally not the case.

Remark 2.3 (Caution).
In most situations, p m·n (t) cannot be written as a pure tensor p m·n (t) = v m (t) ⊗ m,n w n (t) for t > 0. As discussed in proposition 4.4, we obtain a solution (v m (t) ⊗ m,n π n ) for a solution v m (t) ofM and a vector π n of stationary solutions π i of Q i if and only if we choose initial conditions p i = π i for all Q i .

(c) Example
As an example for the construction of the infinitesimal generator M from the components , we present a model that will be used in §3 for experimental data from the inositol trisphosphate receptor (IP 3 R).
Let the infinitesimal generator for the switching between modes bẽ and the models representing the intra-modal kinetics with initial conditions with R := m 31 + m 32 .

(d) Parametrizing the model with experimental data
In order to parametrize the components ((m 0 ,M), (p i , Q i ) n M i=1 ) of our model, the infinitesimal generatorsM and Q i have to be inferred from ion channel data. We assume that the original data, a sequence of current measurements recorded with a constant sampling interval τ , have been statistically analysed so that they have the form of figure 1. Apart from visual inspection, mode changes have been investigated based on calculating the open probability within a window of a certain number of data points. One problem with these methods based on moving averages is that-depending on the window size-instantaneous jumps are transformed to gradual transitions so that the transitions between modes cannot be localized very accurately. By contrast, the heuristic method by Ionescu et al. [9] localizes switching events at specific data points but the uncertainty of the segmentation into different modes cannot be quantified. By contrast, the method by Siekmann et al. [6] calculates probability distributions for the position of each transition between different modes so that for each detected transition between different modes comprehensive information on the uncertainty is available. After a time series has been segmented each measurement is classified as open (O) or closed (C) and it has also been determined in which mode M i the channel was at this point in time. From the results of a probabilistic method such as Siekmann et al. [6] rather than assigning a particular mode to each data point, it is possible to calculate a probability distribution for the different modes. This may improve the results for datasets where mode changes cannot be localized very accurately. The Markov modelM is then inferred from the sequence S k of modes M i , whereas the models Q i are parametrized from sequences of T k that are representative of a particular mode. For example, in figure 1, the five data points between j n and j n+1 could be used for inferring the model Q 2 representing the stochastic opening and closing within mode M 2 .
All models are parametrized with the Bayesian method developed in Siekmann et al. [19,20] or, alternatively, any other algorithm for fitting Markov models to single channel data. For inferring the infinitesimal generatorM the likelihood has the form where (S k ) is a sequence of observations of modes M i separated by the sampling interval τ ,M is the infinitesimal generator of an aggregated Markov model,μ is the stationary distribution ofM and u T is a column vector of ones. The matrices P S k project to the states of the model that represent the mode observed at data point k. For example,

Data-driven modelling of modal gating
Our new framework enables us to easily construct and parametrize models for modal gating following a transparent iterative process: InferringM and Q i using the Bayesian approach briefly described in §2d ensures that the resulting model will be highly parsimonious because at each step a model with the optimal number of parameters for representing stochastic switching between modes, and opening and closing within modes, is determined. We demonstrate the practical implementation of this process using data collected by Wagner & Yule [15] and compare the results with our previously published model of the same dataset [12]. Previously, we have statistically analysed mode switching exhibited in the data by Wagner & Yule [15] and found two modes, the nearly inactive mode M 1 with a very low open probability and the highly active mode M 2 with P O ≈ 70% (see Siekmann et al. [6] for details). As illustrated in figure 1, we have a stochastic sequence of events M 1 and M 2 that are separated by a sampling interval τ = 0.05 ms. We have results from two types of the inositol trisphosphate receptor (type I IP 3 R and type II IP 3 R) for various calcium concentrations (Ca 2+ ), 0.01 µM, 0.05 µM and 5 µM, at fixed concentrations of 10 µM inositol trisphosphate (IP 3 ) and 5 mM adenosine trisphosphate (ATP). Empirical histograms of the sojourn times in M 1 and M 2 for all except one dataset indicate that whereas time spent in the active mode M 2 may be represented satisfactorily by one state, accurately representing sojourn times in the nearly inactive mode M 1 seems to require at least two states (e.g. figure 2). Whereas one state accounts for the support of the sojourn time density in mode M 2 (figure 2b), the more widespread sojourn time density in mode M 1 is better approximated by two states (figure 2a). Thus, for five of our six datasets we parametrizeM with the structure of (2.8). For one dataset (type II IP 3 R at 0.05 µM Ca 2+ ), the histograms suggest that we need a model with two states representing M 1 and two states representing M 2 (figure 5). Thus, for these data we use the following infinitesimal generator:   In our previous study [12], we have already fitted a model with two states to representative segments of the inactive mode M 1 and a model with four states for representing M 2 , see (2.9) for the form of the infinitesimal generators Q 1 and Q 2 . Interestingly, we could show that Q 1 and Q 2 were independent of the concentrations of IP 3 , ATP and Ca 2+ . The parameter values from the Supplementary Material of Siekmann et al. [12] are reproduced here for convenience (electronic supplementary material, table S3).
(d) Step (iv): the generator M of the full model After the modelsM, Q 1 and Q 2 have been obtained, we finally need to specify the initial distributionsm 0 , p 1 and p 2 . Consistent with the experimental assumption that recording of the data was started when the channel had reached steady state, we setm 0 =μ, p 1 = π 1 and p 2 = π 2 , whereμ, π 1 and π 2 are the stationary distributions ofM, Q 1 and Q 2 , respectively. After all

(e) Results
Owing to the problems with fitting the infinitesimal generatorM (2.8) mentioned in §3b, one may ask if a simpler two-state model representing the dynamics of modal gating would be preferable. However, the ability of a three-state model to approximate the sojourn distribution of the nearly inactive mode M 1 more accurately (figure 2a) was found to be crucial for obtaining a better fit of the closed time distribution in comparison with the model from Siekmann et al. [12] (figure 2c). That the model structure of the hierarchical model proposed here is better able to capture the properties of the entire time series data seems even more convincing because it has-unlike the original model from Siekmann et al. [12]-been built without directly fitting to the time series at any step of its construction.
In electronic supplementary material, figure S2, we show that the bimodal closed time distribution observed for some combinations of ligand concentrations arises due to the mixing of the closed time distributions within nearly inactive mode M 1 and active mode M 2 both of which only have one distinct maximum.
Stronger differences between both models are observed for a dataset collected from type II IP 3 R for 10 µM IP 3 , 5 mM ATP and 0.05 µM Ca 2+ . For this experimental condition, the effect of modal gating can be observed without statistical analysis (electronic supplementary material, figure S3a). Figure 5 shows that both modes M 1 and M 2 exhibit a widespread distribution of sojourn times which can only approximately be captured by a four-state model with two states each for both M 1 and M 2 . Whereas the new hierarchical model can approximate the empirical distributions of both modes relatively well, the model from Siekmann et al. [12] fails due to the fact that only one characteristic sojourn time for each mode can be captured by the pair of transition rates accounting for modal gating in this model ( figure 5). Owing to the failure to account for the modal sojourn time distributions, we expect the model from Siekmann et al. [12] to reproduce the kinetics observed in the data much less accurately than the new hierarchical model. In order to illustrate this, we simulated both the Siekmann et al. [12] model and the new model and compared them with a segment of experimental data of the same length ( figure 6). The sample path was plotted in blue when the channel was in mode M 1 , whereas it was plotted in brown when the channel was in mode M 2 . The same colours were used for colouring the data based on the results of the statistical analysis from Siekmann et al. [6]. In the data segment shown here, both dwell times in the active mode M 2 of about 0.2-0.5 s are observed as well as very brief sojourns of a few milliseconds. Consistent with the dwell time distribution (figure 2), the long but not the short sojourns in the active mode M 2 are captured by the model from Siekmann et al. [12], whereas the hierarchical model developed in this study reproduces both long and short sojourns in this mode. Interestingly, as we show in the electronic supplementary material, for this particular dataset the channel seems to change its behaviour at an even slower time scale by spontaneously increasing the observed prevalence in M 2 for an extended period of time before returning to the initial level of activity (electronic supplementary material, figure S3a).

Mathematical analysis of the hierarchical Markov model
In the previous section, we demonstrated that the hierarchical Markov model introduced in §2 provides a statistically efficient framework for systematically building models for modal gating.

(a) Eigenvalues
Before we calculate the eigenvalues for general infinitesimal generators M of the full model, we remark that in most cases relevant for ion channel modelling we may assume that the matrices M and Q i appearing in our model are diagonalizable-this is implied by the so-called detailed balance conditions: where π is the stationary distribution of an infinitesimal generator Q = (q ij ). A matrix Q = (q ij ) with (4.1) is diagonalizable with real eigenvalues because by choosing the transformation matrix diag(π ) 1/2 it is similar to a symmetric real matrix. Detailed balance is usually assumed to hold for ion channel models because it can be related to thermodynamic reversibility of the transitions between different states in the model. Note that (  Proof. Detailed balance implies thatM and the Q i are diagonalizable with real eigenvalues. In particular, all matrices have full sets of eigenvectors. This enables us to construct eigenvectors of the infinitesimal generator M of the full model from the eigenvectors ofM and the Q i . Using the compatibility condition of matrix multiplication and tensor product (A 2) we calculate Noting that Q i u T n i = 0 and P i,k u T n k = u T n i , we finally get Because this holds for all blocks we obtain the desired result. (ii) All except for the ith block of w are zero, so we get Becausew i is an eigenvector ofM i,i ⊕ Q i we know thatw i (M i,i ⊕ Q i ) = νw i . For w to be an eigenvector, it remains to be shown that all other blocks vanish. Let u be a left eigenvector ofM i,i associated with the eigenvalueζ and v a left eigenvector of Q i associated with the eigenvalue λ. Thenw i can be written asw i = u ⊗ v according to (A 3). Substituting this and The term vu T n i is the standard scalar product v T , u T n i of the vectors v T and u T n i . Because the row sums of Q i are zero, u T n i is in the right nullspace of Q i . By assumption, v is an eigenvector associated with any eigenvalue λ = 0. This means that v is not in the left nullspace of Q i , so it must be orthogonal to any vector in the right nullspace. It follows that (4.2) vanishes as required.
be the Schur decompositions ofM andM i,i ⊕ Q i . Letū T n i = 1/ √ n i u T n i be the vectors obtained by normalizing the vectors of ones u T n i .
(i) The matrices W i may be chosen so that they have the form be row-partitioned according to the block structure ofM from (1.1). Then the matrix is unitary. Proof.
(i) Because the row sums of Q i vanish, the vectorū T n i is a right eigenvector of Q i associated with the eigenvalue zero. Without loss of generality, we can chooseū T n i as the first column of W i . (ii) By construction, all column vectors of S are normalized. Thus, it remains to show that they are also pairwise orthogonal. By definition, any two distinct column vectors appearing in the same block of S are orthogonal. It is trivial that column vectors from different blocks are orthogonal unless one of the two appears in the first block of S. Thus, let θ T be a column vector of Θ and v T i ⊗w T i be a column vector of any V i ⊗ W i . With the shorthand for tensor products consistent with partitions (2.3) introduced in definition 2.3, the scalar product ·, · of the two columns is and due to the zeroes in all except for the ith block, all other summands vanish. Noting that u, v = u(v * ) T = u T v * can be interpreted as a special case of matrix multiplication (where ' * ' denotes component-wise complex conjugation) we can use (A 2): But becauseū T n i appeared as a column in the original unitary matrix W i , thew T i are all orthogonal toū T n i so that the above scalar product vanishes. Thus, the matrix S is unitary. Proof. We demonstrate that with the matrix S from (4.3), we obtain a Schur decomposition of the matrix M. We need to show that A = S * MS is upper triangular. The block structure of S is rectangular with n M × (n M + 1) blocks which means that S * has an (n M + 1) × n M block structure. Thus, the resulting matrix A will have (n M + 1) × (n M + 1) blocks and its diagonal will consist of the eigenvalues ofM in the upper left block followed by the remaining eigenvalues from the submatricesM i,i . We show that all blocks A i,j are upper triangular which implies that A is indeed upper triangular. First, a lengthy calculation shows that A 1,1 is a block-wise expanded form of Θ * M Θ and thus upper triangular. One can see directly that the remaining elements on the block diagonal are and, therefore, all upper triangular. It remains to show that the lower diagonal blocks A i,j with i > j vanish. We will demonstrate that the A i,j vanish provided thatW * iū T Equation (4.4) is just another way of saying thatū T n i is orthogonal to all columns ofW i . But this is true because from Lemma 4.1(i) we know thatū T n i is the first column of W i , so it must be orthogonal to all column vectors ofW i .
We now calculate the subdiagonal blocks A i,j , i > j. First, we calculate the blocks A ·,1 on the first block column. We observe that Because S * is block diagonal below the first row, we can calculate because in the row (k + 1)th row of S * for k = 1, . . . , n M only the kth block is non-zero. By taking advantage of (A 2), we obtain where we have used Q kūT n k = 0 and P k,jūT n j =ū T n k . Again using (A 2), we calculate This vanishes due to (4.4) as explained above. For the remaining blocks A k+1,l+1 , k > l = 1, . . . , n M − 1, we simply calculate Replacing P k,l byū T n k ⊗ p l (2.5), we get where-due to the termW * kū T n k -we again conclude with (4.4) that A k+1,l+1 vanishes.

(b) Sojourn times in modes
We will now investigate the sojourn times within the states that represent the modes M i . The switching between modes is represented by a model with infinitesimal generatorM and  Proof. For simplicity we only treat the case of two aggregates of states, M 1 and M 2 . For the sojourn time within M 1 we have where p 0 = p 0M 2 ⊗ p 0 Q 2 is a suitably normalized initial state probability distribution. Substituting from (2.6), we obtain for where we have used (A 4) for calculating the matrix exponential. Now, according to the compatibility of tensor and matrix product (A 2) which will be used repeatedly below. Also note that exp(Q 1 t) · P 1,2 = P 1,2 . Multiplying this on the right by u T where we have evaluated P 1,2 u T n 2 = u T n 1 in the right-most term. Analogous calculations will be carried out automatically below. The above result is now multiplied on the left by M 2,1 =M 2,1 ⊗ P 2,1 : (M 2,1 ⊗ P 2,1 )[exp(M 1,1 t)M 1,2 u T m 2 ] ⊗ u T n 1 = [M 2,1 exp(M 1,1 t)M 1,2 u m 2 ] ⊗ u T n 2 . Finally, we multiply the preceding result on the left by p 0 = p 0M 2 ⊗ p 0 Q 2 and compute Now, because (p 0 Q 2 u T n 2 ) = 1, we obtain the desired result:

(c) Full solution for p i = π i
If we choose initial conditions p i = π i , where the π i are stationary distributions of the models Q i , the solution of the full model has a particularly simple form.

Proposition 4.4 (Full solution for
Let v m (t) be the time-dependent solution for the initial condition w 0 n andμ n be the stationary solution of the infinitesimal generatorM with their partition m. Let π i , i = 1, . . . , n M be the stationary distributions of Q i or written as a partitioned vector, π n with its partition n. If for each generator Q i we set p i = π i and we choose an initial distribution p 0 m · n = v 0 m ⊗ m,n π n consistent with definition 2.6, the solution p m · n (t) of the full model is p m · n (t) = v m (t) ⊗ m,n π n = (v 1 (t) ⊗ π 1 ; . . . ; v i (t) ⊗ π i ; . . . ; v n M (t) ⊗ π n M ). (4.5) By taking the limit t → ∞, we obtain the stationary distribution μ m · n =μ m ⊗ m,n π n = (μ 1 ⊗ π 1 ; . . . ;μ i ⊗ π i ; . . . ;μ n M ⊗ π n M ).  The stationary distribution (4.6) is independent of the initial distribution p 0 m·n , so, for p i = π i , we converge to the stationary distribution (4.6) also for p 0 m·n = (v 0 m ⊗ m,n w 0 n ) with w 0 n = π n and even for arbitrary initial conditions p 0 m·n that are inconsistent with definition 2.6.
Proof. That (4.5) is a solution can be shown by substituting p m·n (t) = v m (t) ⊗ m,n π n into dp(t) dt = p(t)M, (4.7) where M is the generator of the full model (2.6). First, we calculate the left-hand side: where the last equality (4.8) follows because v m (t) is a solution of the model generated byM. We now show that we also obtain (4.8) from the right-hand side of (4.7). For the ith component For the first summand, the contribution of Q i vanishes because of π i Q i = 0 (4.9) Because of π j P j,i = π i , the second summand simplifies to With (4.9) and (4.10), we derive for each component: This means that the right-hand side of (4.7) is indeed of the form (4.8) which confirms that (4.5) is a solution.

Conclusion
We have proposed a new model for representing modal gating, the spontaneous switching of ion channels between different levels of activity. The model is suitable for modelling channels with an arbitrary number of modes and is capable of representing both the probabilistic opening and closing within modes as well as the stochastic switching between modes that regulates these dynamics.

(a) Modular representation of modal gating
In comparison with previous studies, the model presented here incorporates modal gating in a much more transparent way. Ullah et al. [11] developed their model of the IP 3 R from a binding scheme. First, the authors determined the set of open and closed model states from a statistical model selection criterion. Second, they determined which of these states should account for which of the three modes observed by Ionescu et al. [9]. by optimizing a likelihood that accounted for various sources of single channel data including statistics of modal gating. This treats the parameter space of their model as a black box from which a suitable set of parameters capable of accounting for all datasets is selected by optimization. We expect such an approach to be statistically less efficient than a model whose structure incorporates modal gating more explicitly.
Siekmann et al. [12] used modal gating as the underlying construction principle of their model by separating the inference of parameters related to dynamics within modes from estimation of parameters related to switching between modes. First, models for the inactive mode M 1 and the active mode M 2 were inferred by fitting segments of data representative of each of the two modes-in fact, the same models were reused in the present study. However, because at that time rigorous statistical techniques for segmenting ion channel data by modes were not available, the time scales of the switching between both modes was inferred by connecting the submodels for M 1 and M 2 with a pair of transition rates whose values were then determined from a fit to complete traces of single channel data. Similar to Ullah et al. [11] modal gating was thus incorporated into the model without explicitly considering its stochastic dynamics apparent in the data.
In this study using our previously developed method Siekmann et al. [6], we were able to explicitly account for transitions between different modes inferred from experimental data. The method partitions a time series into segments based on the open probability P O of the channel. In this way, spontaneous changes of channel activity can be detected. Because the analysis is based on Bayesian statistics, comprehensive information on the uncertainty of the results is available via the posterior distribution. For the IP 3 R data used here the inferred times at which the channel made a transition to another mode had very low estimated standard deviations (less than one data point up to a few data points). Whereas for this study, it was therefore sufficient to use point estimates of the change times, the full posterior distribution can be used for channels whose modes cannot be distinguished with similar accuracy.
Thus, the statistical method from Siekmann et al. [6] enables us to fit a modelM directly to the stochastic process of mode switching inferred from the experimental data instead of arbitrarily introducing transition rates between modes as in our previous study [12]. Therefore, we can accurately represent mode switching, only adding exactly as many parameters as required. In comparison with our previous model, the new model described here requires only two additional parameters. Inspection of the sojourn time histograms show that these two parameters are essential in order to account for the fact that sojourns in the nearly inactive mode M 1 exhibit two different time scales which cannot be represented by a model with less parameters.
Because Siekmann et al. [6] distinguished modes based on their open probabilities, it may be difficult to distinguish modes with similar characteristic open probabilities P O but with different kinetics, like, for example, an active mode with high P O and a flicker mode with similiar P O but faster transitions between open and closed states. This possibility can be excluded by fitting models Q i to the observed open and closed events in segments representative for each mode. For the IP 3 R dataset used for this study we could not only show that Q 1 and Q 2 , respectively, did not differ significantly for segments from the same time series but were also ligand-independent.
The hierarchical Markov model developed in this study allows us to combine the 'modules' Q i accounting for opening and closing within modes withM representing mode switching to a representation of both aspects of the single channel dynamics. It is important to note that  For analysing statistical properties of modal gating, an advantage of our model is that in addition to representing the channel being open or closed each state is also associated with a mode. The analysis of bursts according to Colquhoun & Hawkes [7] depends on the selection of open and 'short-lived' closed states that represent a burst and a class of 'long-lived' closed states that account for gaps between bursts. This not only requires additional assumptions but it is also unclear how the state space of an unstructured Markov model should be partitioned if a channel has multiple modes. No such difficulties arise in our model because each mode is represented by an aggregate of open and closed states. It is, therefore, clear how the relevant states should be chosen so that various properties such as, for example, the open and closed times of each mode, can be calculated using the theory of Colquhoun & Hawkes [2] or Colquhoun & Hawkes [7]. Because here, mode switching is defined as spontaneous changes between different open probabilities rather than clusters of open events, the methods from Colquhoun & Hawkes [2] seem more suitable. The theory of Colquhoun & Hawkes [7] leads to more complicated calculations due to the assumption that bursts must commence with an opening of the channel, whereas this does not necessarily have be the case for a sojourn in a mode. In summary, this means that an additional benefit of our modelling approach is that statistical properties of modes can be calculated more easily from our model than from a general Markov model.
The modular structure of our hierarchical model which separates the representation of transitions between modes (inter-modal kinetics) from the dynamics within modes (intra-modal kinetics) not only provides a more parsimonious representation than previous models but, most notably, evidence is accumulating that in channels that exhibit different modes the switching between modes may be more important for their physiological function than intra-modal kinetics. This is strongly suggested by recent studies of the IP 3 R. In a study of insect type I IP 3 R, Ionescu et al. [9] observed three modes with essentially identical kinetic properties across different ligand concentrations, whereas the overall dynamics of the channel was determined by the highly ligand-dependent prevalence of the channel in these modes. Thus, Ionescu et al. [9] proposed that modal gating is the major mechanism of ligand regulation in the IP 3 R. This was confirmed for mammalian type I and type II IP 3 R data by Siekmann et al. [6] and led to the interpretation that ion channel kinetics is restricted to a fixed repertoire of modes which have to be mixed appropriately in order to respond to given ligand concentrations. Ligand-dependent switching between ligandindependent modes suggests that physiological function may depend more strongly on the slow time scale of switching between modes rather than the fast opening and closing of the channel within a mode. This was indeed recently shown in two studies of the role of IP 3 R in intracellular calcium dynamics. Cao et al. [26] showed that the essential features of calcium oscillations in airway smooth muscle could be preserved after iteratively simplifying the model from Siekmann et al. [12] to a two-state model that only accounted for switching between the two modes neglecting the kinetics of transitions between multiple open and closed states within the modes. Siekmann et al. [27] applied similar reduction techniques to demonstrate that also the stochastic dynamics of small clusters of IP 3 R s can be captured by a two-state model reduced to the dynamics of mode switching. In our new hierarchical model, inter-modal and intra-modal kinetics are represented separately so that the model representation with the right level of detail can be chosen based on the requirements of a specific application.

(b) Biophysical implications of modal gating
Although modal gating has been observed for a long time it has rarely been accounted for in ion channel models. The crucial importance of modal gating has only recently been appreciated among investigators of the IP 3 R channel and it is now widely recognized in the community [10]. Various independent sources of evidence indicate that modal gating must be accounted for, both for understanding IP 3 R function as well as for gaining insight into biophysical properties of the channel molecule. As mentioned in the previous section, the role of IP 3 R in intracellular calcium dynamics is defined by its behaviour on the slow time scale of transitions between different modes rather than the fast time scale of opening and closing [26,27]. Previously, Ionescu et al. [9] discovered that the IP 3 R adjusts its level of activity depending on ligands such as calcium by regulating the proportion of time that the channel spends in different modes. This was subsequently confirmed by the statistical analysis by Siekmann et al. [6]. These results reveal the major functional implications of modal gating, so one may ask if any insight can be gained into the underlying biophysics. In their early model of modal gating in a chloride channel, Blatz & Magleby [8] postulated that different modes may be related to different conformations of the channel protein. Direct experimental evidence into how different modes arise from biophysical constraints of the channel protein is accumulating. Two examples include a thorough analysis of the potassium channel KscA discussed in more detail below [28][29][30] and a more recent study by Vij et al. [31] on the acethylcoline receptor. Also see the commentary by Geng & Magleby [32]. This suggests that modes form a fixed repertoire of possible behaviours defined by the molecular properties of the channel. Being constrained to a few different modes, ion channels overcome these limitations by switching between modes.
This implies that methods for identifying different modes in single channel data not only provide us with more accurate insight into the channel dynamics but may also reveal the transitions between different biophysical states of the channel. As mentioned above there are strong indications that each mode is reflected by a different three-dimensional arrangement of the channel protein, known as a conformational state. Thus, the aggregates of states in ion channel models that account for different modes M i correspond to different conformations at the level of the channel protein. In such a model, the transitions between states representing different modes reflect the rates of conformational changes.
This direct correspondence between aggregates of states and underlying biophysics is important to note because interpreting individual states in Markov models for ion channels is problematic in general, at least without additional experiments. For the simplest possible representation of a gating ion channel is a two-state Markov model with only one open and one closed state it is, of course, obvious that these two different model states at the same time correspond to different biophysical states of the channel protein. This 'mechanistic' interpretation explains the popularity of this type of model. The Markov assumption implies that the open and closed times of a two-state model are exponentially distributed which means that durations of channel openings and closings both have characteristic time scales τ O and τ C given by the parameters of the exponential sojourn time distributions f O (t) and f C (t). However, many ion channels exhibit multiple characteristic open and closed times that cannot be represented by exponential distributions. Whereas an open ion channel must be in a different conformation than a closed ion channel distinguishing only two conformational states is a very coarse description of the complicated deformations of channel proteins that can be identified by molecular dynamics models. Nevertheless, if our goal is to base our models on rigorous statistical analysis, for some data we may not be able to identify more than two states.
Non-exponential open and closed times can often be represented satisfactorily by aggregated continuous-time Markov models where more than one state is used for representing the channel being open or closed. These models provide a simple generalization of the two-state Markov model and account for more than just one characteristic open or closed time scale τ O and τ C . By definition, the sojourn times in the open or closed class of an aggregated Markov model are distributed according to a phase-type distribution, a class of distributions representing the time a Markov chain spends in a set of transient states until exiting to an absorbing state [16,17]. As with the two-state model it is tempting to also associate the individual states of an aggregated Markov model with different biophysical states of the channel protein. The multiple open and closed states of an aggregated Markov model could be interpreted to resolve in more detail the series of conformational changes that the channel goes through while it opens. If this interpretation was valid one could hope to discover details of the molecular structure of ion channels beyond the trivial distinction between an open and a closed state once the 'best' aggregated Markov model for a given dataset has been found.
Unfortunately, this 'mechanistic' interpretation of aggregated Markov models has several flaws. Whereas it can be directly inferred from single channel data if the channel is open or closed and in which of its modes M i it is, distinguishing different open or closed Markov states requires additional experiments and is possibly ill-defined. First, the only reason that a particular model consists of multiple open and closed states is that multiple characteristic open and closed times were observed. It is an assumption to be empirically confirmed that for each observed exponentially distributed sojourn time the channel must necessarily be in a distinct conformational states-so more Markov states may appear in the model than can be distinguished biophysically at the level of the channel protein. By contrast, it is likely that some conformational states may not have a strong enough influence on the dynamics that they are represented by a state in a model inferred from the data. But even if we assume that each Markov state should, in principle, reflect a distinct underlying biophysical state, it is challenging both experimentally as well as theoretically to identify, for example, a three-dimensional configuration of the channel protein that corresponds to a model state with a short open time and distinguish it from another conformational state that is characterized by a long open time.
Second, and more importantly, aggregated Markov models are only defined up to equivalence [20,[33][34][35][36] with other models having the same number of open and closed states. In particular, it can be shown that models with completely different adjacency matrices can describe the same process [35] although there is a canonical phase-type description, given, for example, by its Laplace-Stieltjes transform. Thus, interpreting the graphical structure of an aggregated Markov model as a description of possible transitions between different conformational states is not necessarily meaningful without further data. A related problem is the fact that some adjacency matrices lead to non-identifiable models, in particular, certain types of cyclic models are nonidentifiable. Whereas it is unlikely that transitions between conformational states underlie any fundamental restrictions of this kind, only some of these transitions would be identifiable from experimental data. It is important to note that the described challenge of relating aggregated Markov models with biophysical processes does not restrict in any way their capability of statistically capturing the stochastic dynamics of ion channels. This only demonstrates that aggregated Markov models are a more abstract representation than they may appear to be at first glance.
In summary, because it is much less problematic to associate aggregates of states with different underlying biophysical states than individual states within an aggregate, interpreting mode switching as transitions between distinct biophysical states does not suffer from these difficulties. Chakrapani et al. [28][29][30] were able to restrict the KscA channel to one of its normally four modes by mutating a particular site of the amino acid sequence of the channel protein. Combining crystallography imaging and molecular dynamics modelling they could further demonstrate that the four modes were related to different conformational states of the channel. It is therefore likely that switching between distinct characteristic dynamical patterns in single channel data can be directly associated with the transition from one to another conformation of the channel protein. This implies that models which accurately represent mode switching can also be used to infer the time scales of transitions between biophysical states associated with these modes. This opens up the exciting possibility that we can gain insight into biophysical processes involved in ion channel gating by statistical analysis and modelling of single channel data rather than having to rely on more time-consuming experimental techniques such as crystallography or more laborious modelling techniques such as molecular dynamics.
Data accessibility. Data and code used for this study have been published on github. The most current version can be obtained from https://github.com/merlinthemagician/icmcstat.