Joint modelling of repeated measurements and time-to-event outcomes: flexible model specification and exact likelihood inference

Random effects or shared parameter models are commonly advocated for the analysis of combined repeated measurement and event history data, including dropout from longitudinal trials. Their use in practical applications has generally been limited by computational cost and complexity, meaning that only simple special cases can be fitted by using readily available software. We propose a new approach that exploits recent distributional results for the extended skew normal family to allow exact likelihood inference for a flexible class of random-effects models. The method uses a discretization of the timescale for the time-to-event outcome, which is often unavoidable in any case when events correspond to dropout. We place no restriction on the times at which repeated measurements are made. An analysis of repeated lung function measurements in a cystic fibrosis cohort is used to illustrate the method.


Introduction
There is a close relationship between modelling longitudinal data subject to dropout and modelling survival time data in the presence of imprecisely observed time varying covariates. In both cases we have a vector of repeated measurements Y and a time to event S. In the survival context S would normally be considered to be measured in continuous time, though possibly right censored. In the dropout context S usually corresponds to a discrete interval between scheduled measurement times. Typically the occurrence of the event terminates observation of the repeated measurements.
A common approach for both the survival time and the dropout problems is to assume conditional independence between Y and S given underlying random effects U. An important early reference is Wulfsohn and Tsiatis (1997), who assumed that Y was linear in a Gaussian random effect U and took a proportional hazards model for S, conditional on U. Tsiatis and Davidian (2004) have given an excellent review of work in the area to that date, and Albert and Follmann (2009) and Diggle et al. (2009) have given additional references including later developments. More recent contributions include Geskus (2012), Gueorguieva et al. (2012), Proust-Lima et al. (2012) and Rizopoulos (2011Rizopoulos ( , 2012. The random-effects or shared parameter approach to jointly modelling repeated measurement and event time data is conceptually attractive in many settings, but its routine application is hampered by computational difficulties. Fast but approximate methods have been developed for some forms of joint model (see for example Rizopoulos (2012) and references therein) but implementation remains difficult unless the random-effects component is of low dimension. Wulfsohn and Tsiatis (1997), for instance, assumed that U was bivariate Gaussian and adopted a Laird and Ware (1982) random intercept and slope model for the longitudinal trajectory. Henderson et al. (2000) argued that when follow-up is relatively long then it is unreasonable to assume a sustained trend in the trajectory of Y and advocated inclusion of an unobserved stationary Gaussian process W.t/ in a linear predictor for Y to bring more flexibility. In principle this assumes the presence of an infinite dimensional random effect but, under either a discrete dropout or a semiparametric proportional hazards model for S, likelihood inference requires the value of W.t/ at only measurement times or event times. Hence W.t/ can be represented by a finite dimensional vector U. The dimension increases with sample size and there is no generally available software for this type of model.
In this paper we propose an approach which admits exact likelihood inference for a wide range of random-effect specifications. The key is to consider events S to occur only at a discrete set of potential times. In principle, the discretization can be made arbitrarily fine, but at the expense of increasing computational effort. Hence, the practical advantage of our approach relies on its remaining computationally feasible for a discretization that is sufficiently fine to capture the essential features of its continuous time limit. When the events correspond to dropout, their recorded times of occurrence are typically confined to a discrete set of scheduled measurement times. When the event is a survival time then our approach is a form of coarsening at random (Heitjan and Rubin, 1991). For example, in Section 6 we describe an example involving survival in a cystic fibrosis cohort, where we choose to measure survival by calendar year of death. With 10 years of follow-up and no evidence of significant local variation in hazard rates this gives adequate granularity. There is no need for the timescale to be discretized in the submodel for the repeated measurements Y under our approach, which therefore allows for irregular and subject-specific measurement timings.
The general model and some special cases are set out in Section 2. In Section 3 we derive the key likelihood-based methods on the basis of recent work on the extended skew normal family of distributions. Simulations to assess the performance of the method are described in Section 4. An examination of efficiency is presented in Section 5 and in Section 6 we describe application to the cystic fibrosis cohort study. Concluding discussion is presented in Section 7.
The programs that were used to analyse the data can be obtained from http://wileyonlinelibrary.com/journal/rss-datasets

Model
We take a shared parameter approach, with the common assumption that event times S and longitudinal data Y are independent conditionally on random effects U.

General formulation
We assume that subject i provides repeated measurements Y i = {Y ij : i = 1, 2, : : : , N; j = 1, : : : , n i } at follow-up times t ij , together with a time-to-event outcome S i , which terminates the observation of Y . Time varying covariates are allowed. We write x ij for covariates that are operative on Y ij , andx is for covariates that are operative at time s on the event process. For the remainder of this section we drop the subscript i and consider a generic subject. It will be implicit that the number and timing of measurements can vary between subjects. We assume two timescales: a discrete timescale for events, which without loss of generality we can label as S = {1, 2, : : : , m}, and a continuous timescale T = [0, τ ] for measurements. We also assume that there is a surjection s.t/ from T to S. For instance, S might be a partition of T . We consider S to represent a series of time intervals; for example if the timescale T is years then S might contain yearly or 6-monthly intervals. Alternatively, S might contain time intervals of different lengths, or it might represent intervals between scheduled measurement times. We denote by t Å .s/ the midpoint of the set of times in T that map to s ∈ S.

Random effects
We model the random effects as a p-vector U = .U 1 , : : : , U p / T , assuming a Gaussian distribution U ∼ N.0, Σ/ with a general covariance structure.

Event times
With the skew normal results in mind, we adopt a probit model for the discrete hazard function (equivalently the dropout model), which is very similar to the more widely used logistic model. For the most general model we define W.s/ = B.s/U to be ap-vector of linear combinations of the random effects U, where B.s/ is ap × p matrix which may depend on the time interval s. Then we assume that where Φ.·/ is the standard normal distribution function. In equation (1) the survival model is allowed to depend on time through covariates that are contained in a p 1 -vectorx s . In principle a non-linear function of time could be used, and often a separate intercept would be used for each time interval. In our examples we shall focus on models for which the probit probability of surviving a time interval is linearly dependent on time. The association between survival and the random effects can also vary with time, although all examples in this paper will assume that γ sk does not depend on s. Note that the model thus defined is a sequential probit model because the probability of surviving a time interval is conditional on having survived all previous time intervals. Albert and Chib (2001) have provided discussion on the application of sequential models in survival.

Repeated measurements
We consider a linear mixed effects Gaussian model for the sequence of repeated measurements, Y = {Y j : j = 1, : : : , n}, at measurement times t j . At time t j we assume Y j = x T j β + a T j U + Z j , j = 1, 2, : : : , n, .2/ where x j and a j are a p 2 -vector and a p-vector respectively, of possibly time-dependent covari-ates. The {Z j } are mutually independent measurement errors, taken as Z ∼ MVN.0, ν 2 I/ with I an identity matrix. We can write expression (2) in the vector form x 1 , x 2 , : : : , x n / T and A = .a 1 , a 2 , : : : , a n / T .

Random intercept
Our formulation includes the random-intercept model. Assume that U is a scalar random effect with N.0, σ 2 / distribution. Then set a j = 1 for all j and W s = U, with γ s = γ for all s ∈ S.

Random intercept and slope
Let .U 1 , U 2 / T be zero mean bivariate normal with variance matrix and set a j = .1, t j / T . For s ∈ S set W.s/ = U 1 + U 2 t Å .s/ and γ s = γ for all s ∈ S. Thus we can model a sustained trend in the survival hazard, albeit with a piecewise constant form. Given that we work with events in discrete time this is unavoidable. Alternatively we can adjust the model to allow the intercept and slope random effects to enter the event times model directly, setting W.s/ = .U 1 , U 2 / T and γ s = .γ 1 , γ 2 / T :

Stationary Gaussian process
For the stationary Gaussian process model we assume that U is an m-vector with variance matrix .Σ SGP / jk = σ 2 ρ |t Å .j/−t Å .k/| , so that the random effects correspond to a discretely observed stationary Gaussian process.
Here W.s/ = U s and γ s = γ for all s ∈ S. We assume that measurements Y j and Y k with s.t j / = s.t k / share the same U s . We define a j to be an m-vector with value 1 at element s.t j / and 0 elsewhere. If fluctuation at a very short timescale is thought to be materially important then a fine discretization S is needed to capture this behaviour. Note that the discretization does not need to scale with sample size. By extending W.s/ we can allow survival or dropout to depend on the prior history of random effects; for example for one time lag set W.s/ = .U s , U s−1 / T and γ s = .γ, γ lag / T .

Stationary Gaussian process plus random intercept and slope
We can combine a stationary Gaussian process with a random intercept and slope by defining U to be an .m + 2/-vector with variance matrix Σ IS Now W.s/ = .U s , U 1 , U 2 / and γ s = .γ, γ 1 , γ 2 / for all s > 1.

Inference
We now set out the general form of the likelihood for the class of models that was defined in Section 2.1. For simplicity we shall continue with the contribution of a single individual only.
We shall make use of results concerning the properties of the skew normal distribution (Azzalini, 1985(Azzalini, , 2005Arnold and Beaver, 2000;Arnold, 2009) to obtain a closed form for the likelihood. In particular, a multivariate Gaussian hidden truncation distribution that was considered by Arnold (2009) leads to the result where ω 1 , ω 2 and η are k 1 × 1, k 2 × 1 and k 2 × 1 vectors respectively, and Σ 11 , Σ 22 and Σ 12 are concordant variance and covariance matrices. Here φ .k/ .x; μ, Σ/ is the k-dimensional Gaussian density with mean vector μ and covariance matrix Σ and Φ .k/ .x; μ, Σ/ is the corresponding distribution function. By rearranging the likelihood so that it takes the form of the integrand on the right-hand side of equation (4), we can use this result to integrate out the random effects to arrive at a closed form expression.

Preliminaries
We begin by rearranging the longitudinal and random-effects components of the likelihood so that they take the form of the Gaussian density function under the integrand in equation (4). Assume that the dropout or survival time is s and introduce an indicator δ of an event being observed (δ = 1) or right censoring (δ = 0). The random-effects and longitudinal components of the likelihood are Note that f.y|u/f.u/ contains two terms under the exponential which are quadratic in u. We can complete the square in u so that u appears in a single quadratic term (see Appendix A for details). It will be convenient to collect all unknown parameters into vector θ and to write Note that φ .p/ .u; h, H −1 / depends on a subvector of the parameter vector θ through h and H. The expression for L 1 .θ; y/ is Turning to the event times, we have where w = B.s/u and Φ.·/ is the standard Gaussian cumulative distribution function. Now definẽ X v = .x 1 ,x 2 , : : : ,x v / T , and an s ×p matrix Γ such that Γ vk = γ vk . Then equation (6) can be written as where Γ s denotes the first s rows of Γ. Here Φ .s/ .·/ = Φ .s/ .·; 0, I/ is the standard s-dimensional multivariate Gaussian cumulative distribution function.

Likelihood
Combining expressions (5) and (7) and integrating out the random effects leads to the likelihood We can convert the right-hand side of expression (4) to the form of the integrals in expression (8) by defining the parameters ω 1 , ω 2 , η, Σ 11 , Σ 12 and Σ 22 appropriately (see Appendix A for details). Using expression (4) to obtain closed form expressions for the integrals then gives Numerical evaluation of equation (9) is now straightforward by using available software to calculate multivariate normal probabilities; we used the R package mnormt (Azzalini, 2013). Parameter estimation can be conducted by numerical maximization of the likelihood, and candidate models can be compared by exact likelihoods or information criteria. Having specified a fully parametric model, the maximum likelihood estimatorθ is asymptotically Gaussian, centred at the true value θ 0 , and with variance given by the inverse Fisher information. Standard regularity conditions apply, with some supplementation to ensure identifiability of parameters when data can be missing. Details are provided in Appendix B.

Simulation studies
We simulated repeated measurements with dropout data from the models specified by equations (1) and (2). Random effects were generated either as a stationary Gaussian process or as a random intercept and slope. Longitudinal measurements took place at n measurement times randomly distributed over m time intervals, so that individuals could have a varying number of visits in each time interval. A uniform distribution was chosen for the measurement times, motivated by our cystic fibrosis application. We assumed that dropout could occur during any time interval. For the stationary Gaussian process models we generated m-dimensional random effects U = .U 1 , : : : , U m / T and Gaussian repeated measurements where t j are the measurement times, and x T sβ + γU s /, s = 1, 2, : : : , m: For the random intercept and slope simulations we generated bivariate random effects U = .U 1 , U 2 / T and Gaussian repeated measurements For each simulation study 500 data sets were generated; each of 1000 individuals followed over m = 5 time intervals, with 1-10 repeated measurements per person. Survival parameters were chosen such that the death rate per time interval was 1-2%, which is similar to that observed in the cystic fibrosis application. Censoring took place at the end of the fifth time interval. Covariates that were included in both models were age, with initial values generated uniformly from the interval (15,30), and a binary covariate, which was either 0 or 1 with probability 1 2 . These covariates were similar to the observed covariate structure in the cystic fibrosis data. Table 1 summarizes the results of two studies with stationary Gaussian process random effects. For the first simulation (SGP I), the repeated measures and random effects were taken to have large standard deviations, similar to those observed for the cystic fibrosis patients. The simulation was repeated (SGP II) with smaller standard deviations. For both scenarios the means of the parameter estimates are close to the true values. For simulation SGP I the standard deviations of parameter estimates are relatively large for the longitudinal parameters owing to the high level of measurement error. For simulation SGP II the corresponding standard deviations are smaller, but the standard deviation of γ is larger. This is because the variance of the random effects is smaller, so we are effectively regressing on a covariate with a smaller range of values. Coverage probabilities for both simulation SGP I and simulation SGP II are good, with all empirical values within simulation noise of the nominal level and no consistent overestimation or underestimation of the coverage. Table 2 shows the results of two simulation studies with random intercepts and slopes. Again, for one study (IS I) data were generated with large measurement error, very high variance in the random intercept U 1 and high variance for the random slope U 2 , as observed for cystic fibrosis patients. For the second simulation study (IS II) the variance parameters were reduced. Means of parameter estimates are once more close to the true values. The standard deviations of longitudinal parameter estimates are large for simulation IS I and small for IS II, as expected, whereas the standard deviation of estimates of γ 1 is much higher for simulation IS II than for IS I. Interestingly, the standard deviation for γ 2 is approximately the same in both scenarios. Coverage probabilities are once more close to the nominal levels.
In a second simulation study data were generated from a Weibull model and analysed by using four methods. To reduce the computation time a simpler model was chosen with a random intercept only and using a single binary covariate. The Weibull parameters were based on a Weibull fit to cystic fibrosis survival, with intercept −4.2 and shape 1.2. Other aspects of the model that was used to generate data were the same as in the previous simulation study, except times were censored at t = 6 to enable us straightforwardly to carry out different discretizations of the timescale. Each simulated data set was analysed by using (a) a longitudinal model with a random intercept (so ignoring the survival data), (b) the R package joineR to fit a joint model with a shared random intercept and proportional hazards survival model, (c) the discrete time method that is described in this paper with six time intervals and (d) the discrete time method with three time intervals (note that our sequential probit model is misspecified for this scenario).   Table 3 shows the results for the longitudinal model and the joint model fitted by using joineR. Fitting the longitudinal model separately gives slightly biased estimates for the longitudinal slope parameter and binary covariate. The joint model fit using joineR, however, gives parameter estimates with less bias and with good coverage, as would be expected because the data were generated under the same model as used in fitting. Table 4 shows results by using the discrete time model with six and three time intervals. Here the model that was used to generate the survival data and the model used to fit the survival data are not equivalent, and so survival parameters estimated by the discrete time model cannot be compared with true values of the survival parameters. The coverage probabilities and mean-squared error could not therefore be calculated for these parameters. We can, however, see that the directions of the survival parameter estimates agree with those of the true model; positive parameter estimates in the discrete time model are in accordance with negative estimates in the true model because the former are linked to the probability of survival rather than the hazard of an event. Comparing longitudinal parameters, the standard errors are similar to the joineR results. Again, survival parameters are not directly comparable between the two models because the survival parameters of the six-interval model relate to the probability of surviving 1 year, and the parameters of the three-interval model to the probability of surviving 2 years. We would expect covariate effects to be similar, however, as is indeed the case. Comparing standard errors of survival parameter estimates, we find that the standard errors are slightly larger for the coarser time discretization.

Efficiency under coarsening at random
Although we allow the longitudinal measurements to be obtained in continuous or discrete time, we have required the event time data either originally to be measured on a discrete scale or to be placed on a discrete scale through coarsening at random via artificial interval censoring. We assume that our discrete time model is correct and so-because we use exact likelihood-our estimates are consistent and fully efficient given the data that we have chosen to use. However, discretization will affect the efficiency of an analysis, which is considered briefly in this section and expanded on in an on-line supplementary document. Here we concentrate only on survival analysis and omit much of the detail: this is provided together with further examples in the supplementary material. The supplementary material also includes a separate study into loss of information caused by discretization in the presence of random effects.
Let T be the continuous event time. Assume type 1 censoring at a maximum follow-up time τ . The follow-up interval .0, τ ] is partitioned into m disjoint intervals, with boundaries 0 = t 0 < t 1 < t 2 <: : :< t m = τ . Let S denote the interval within which T falls, with S = m + 1 if T is censored at τ . Define W = .T − t s−1 /=.t s − t s−1 /, which is the within-interval information on a (0,1) scale. Note that there is a one-to-one correspondence between T and .S, W/. We shall investigate the loss of efficiency that is caused by ignoring W . The sequential probit model (1) is assumed for event probabilities within each interval j, for j = 1, 2, : : : , m, with time constant covariates and covariate effects, but possibly time varying intercepts: P{S > j|S > .j − 1/,x} = Φ.β 0j +x Tβ /: We assume for simplicity that the conditional within-interval distribution of event times is the same for all intervals. Let the corresponding probability density function be h.w|x/, which will usually depend onβ (and perhaps other parameters). Information onβ from the withininterval distribution of event times provides the extra efficiency for the complete-data analysis. To illustrate, assume that where r = r.x Tβ / = exp.x Tβ / and 0 < ψ < 1. This is the within-interval distribution that arises if a Weibull distribution is discretized.
The information onβ that is brought by W depends on how strongly h.w|x/ depends onβ. Given this set-up, the information that is associated with the marginal likelihood based only on S can be derived and compared with that from the full likelihood given both S and W . Table 5 provides some numerical values for the ratio of asymptotic variances of maximum likelihood estimators. For this example we took a single binary covariate and set S.τ |x = 0/ = 0:7 and β = −1. Sinceβ is fixed, changing m changes S.t|x = 1/: The final row in Table 5 gives S.τ |x = 1/ for each m. Interceptsβ 0j were chosen by assuming equal failure probabilities within each interval atx = 0.
Some calibration based on curvature of discretized Weibull distributions is provided in the on-line supplementary material. The curvature in Table 5 at ψ > 0:7 is higher than anything seen for the Weibull discretizations that we considered. Even so, the efficiency was more than 90% in all simulation scenarios and was more than 97% in more realistic cases. We obtained similar results for otherβ, and when we fixed S.t|x = 1/ and allowedβ to vary with m.

Application: disease progression and survival in cystic fibrosis patients
We now apply our methods to data on repeated measurements of lung function in cystic fibrosis patients, some of whom died during follow-up. The data were from the UK Cystic Fibrosis registry and cover the years 1999-2010. Cystic fibrosis is the commonest serious inherited disease among Caucasian populations, and most patients die as a result of progressive respiratory disease (Davies and Alton, 2009). Previous examples of longitudinal modelling applied to cystic fibrosis data include Schluchter et al. (2002), van Diemen et al. (2011) and Taylor-Robinson et al. (2012. Here, we fit a range of joint models to data from the 1980-1984 birth cohort of the UK registry, conditional on survival to capture in the registry in 1999. The data set included 1231 patients who were alive in 1999, of whom 230 died during the course of follow-up to 2010. Our repeated measurement outcome is per cent predicted forced expiratory volume in 1 s, %FEV1, which is used as a measure of lung function and is recognized as a key outcome measure in cystic fibrosis (Rosenfeld et al., 2005;Davies and Alton, 2009;Orens, 2006). Measurements were taken approximately annually, with some variation between patients. The number of measurements per person varied from 1 to 17. The covariates used were sex and age at initial visit. The timescale was the number of years since the initial visit, with initial age fitted as a separate covariate to allow for left truncation and cohort effects. For the survival model the timescale was discretized into yearly intervals. At the initial visit 54.2% of patients were male, the mean age was 18.9 years (standard deviation SD 2.8) and the mean %FEV1 was 67.1 (SD 25.3). Table 6 shows the results from fitting four random-effects models to the data: a stationary Gaussian process SGP, a stationary Gaussian process with one time lag in the survival model, lagged SGP, a random intercept and slope model IS and a stationary Gaussian process plus random intercept and slope, SGP + IS. The positive effect of the sex covariate in the survival model indicates that males have a significantly better probability of survival than females. The estimated covariate effects are in agreement with expectations for cystic fibrosis patients, with older patients tending to have poorer lung function and survival than younger patients and males tending to have better lung function and to survive longer than females.
For the SGP model, the positive estimate of γ means that better lung function is associated with improved survival in cystic fibrosis patients. For the lagged SGP model there is no evidence 49522.25 †The random-effects models fitted were a stationary Gaussian process SGP, a stationary Gaussian process with one time lag in the survival model, lagged SGP, a random intercept and slope model IS and a stationary Gaussian process plus random intercept and slope, SGP + IS. For each model estimated parameter values are presented with standard errors in parentheses, and the Akaike information criterion AIC. that lung function during the previous time interval affects survival, after adjusting for current lung function. The estimated parameters γ 1 and γ 2 from model IS indicate that patients with higher intercepts and less negative slopes of %FEV1 are more likely to survive. Comparing the fit to the data of all four models by using the Akaike information criterion AIC suggests that the model providing the best fit is the model combining both a stationary Gaussian process and a random intercept and slope.
One way to facilitate interpretation of the parameters of the probit model is suggested in Table  7. Here we explore the effect of changing a parameter value on the probability of death in a time interval, while all other parameters are fixed to their default (i.e. mean or baseline) values. For example, the probability of death in year 1 for a woman with default covariate values is 0.003, compared with a probability of 0.001 for a man with the same characteristics.

Discussion
We have described a method that combines flexibility of model specification with tractability of likelihood. It can be applied to repeated measurement data with dropout occurring between scheduled measurement times, or to the joint analysis of longitudinal and survival time data, provided that the survival timescale is discrete, or can realistically be discretized. It avoids the need for numerical approximation of an integral over the random effects or EM methods.
The method allows the fitting of models with more complex random effects because the number of random-effects terms in the survival model is not constrained by computational time. This may come at the cost of discretizing the timescale, but simulation studies and analysis of special cases suggest that, although some information is inevitably lost, parameter estimation may not be greatly influenced by discretization. In the on-line supplementary material we show that there is no loss of information if the survival functions are linear between discrete time points. Hence a discretization that keeps approximate linearity is recommended. Our evidence shows that there can be little loss of efficiency even in the presence of quite strong non-linearity. In practice there may often be a natural discrete timescale. For example the cystic fibrosis patients had visits around once a year, and an annual discretization seems suitable because shorter-term fluctuations in the underlying continuous time hazard will be poorly identified.
Computational time is driven primarily by the need to calculate multivariate normal probabilities, which can be time consuming for high dimensional data. But the intercept and slope model fitted in around a third of the time required by the R package joineR using 200 bootstraps to calculate standard errors. Alternative approaches to maximizing the likelihood may enable computation time to be further reduced, e.g. by iterating between a Newton-Raphson step for covariate parameters and numerical maximization over variance parameters. Alternatively a prespecified number of steps of an EM algorithm could be used to obtain initial values, as in the R package JM (Rizopoulos, 2010). Even with our current procedure we could fit fairly high dimensional random-effects models in simulations and the cystic fibrosis application (i.e. model SGP, which has a random effect associated with each time interval), whereas current software (R packages JM and joineR) is limited to simple random-effects models or relies on rough approximations of multiple integrals over random effects. Inclusion of stationary Gaussian process random effects led to a marked improvement in AIC for the cystic fibrosis data (Table 6).
Without readily available software it is unlikely that any reasonably sophisticated statistical methodology will find wide use in applications. Our intention is to develop R software to implement the technique and so to supplement the joint models that can be fitted in JM and joineR. We have provided R code to calculate the likelihood as supplementary material available from http://wileyonlinelibrary.com/journal/rss-datasets

A.1. Evaluation of L 1
The longitudinal and random-effects components of the likelihood together are This can be rearranged as In turn .12/ Consider the integral in likelihood (12) at δ = 0, i.e. L 2 .θ; y, s, δ = 0/ = Φ .s/ {X sβ + Γ s B.s/u}φ .p/ .u; h, H −1 /du: .13/ To perform the integration we use the skew normal result (4), which is .14/ We can convert the right-hand side of equation (14) to the form (13) by sequentially setting The contribution of the integral L 2 .θ; y, s, δ = 1/ in expression (12) at δ = 1 can be obtained similarly, to give

Appendix B: Regularity conditions
Collect all unknown parameters into a vector θ, with true value θ 0 in the interior of a compact set Ω. Let the observed data on individual i be z i , consisting of longitudinal responses, event times, censoring indicator and covariates. Write f.z i |θ/ for the likelihood contribution of subject i, for i = 1, 2, : : : , n. From Cramér's theorem (Ferguson, 1996;Kotz and Johnson, 1985) the following conditions are sufficient for the maximum likelihood estimator θ to converge in distribution to a Gaussian variable, centred at θ 0 and with variance given by the inverse expected information.

Condition 2.
There is an open subset ω of Θ such that for all θ ∈ ω and almost all z the second partial derivatives @ 2 f.z|θ/=@θ @θ T exist and may be passed under the integral sign in f.z|θ/ dν.z/, where ν.z/ is the measure associated with Z. Condition 4. There are functions K.z/, independent of θ, such that each component of @ 2 f.z|θ/=@θ @θ T is bounded by K.z/ uniformly for θ ∈ ω.
Condition 6. The support of f.z|θ/ does not change with θ.
Conditions 1-4 are standard given our fully parametric Gaussian random-effects model, provided that we assume (as we do) that the longitudinal measurement schedule {t j }, the number of discrete intervals m and the selection of covariates are all fixed and ancillary. Further, all variance parameters are finite and strictly positive.
Conditions 5 and 6 need more attention in the presence of missing data. We need to ensure that as the sample size increases there are sufficient observations to identify the survival probability for each discrete time interval, and all parameters in the longitudinal model. Hence we make the following additional assumptions.
Assumption 4. There exist c 1 > 0 and c 2 > 0, independent of θ, such that for each s ∈ S and allβ in the appropriate subset of ω.
Assumption 5. For all s ∈ S and all U, the second-partial-derivative matrix is of full rank, where θ S ∈ ω denotes the combined parameters .β, γ/ appearing in the event time model (7).
Recalling that the parameter vector γ links the longitudinal and event processes, assumption 1 ensures that the longitudinal model is well defined. The independent censoring assumption referred to in assumption 2 is essentially a requirement that there is no prognostic information in knowing that an event time is censored. Assumption 3 ensures that all event times in S are observable and assumption 4 makes sure that in large samples there are both events and survivors for each interval. Finally assumption 5 ensures identifiability ofβ and γ.
In assumptions 4 and 5 we have assumed a structural model in which the covariates are considered to be independent and identically distributed random variables. In the functional case we need alternatives of the form Φ.X T isβ / < 1 − c 2 and 1 n n i=1 E U @ 2 Φ.X T isβ + Γ s B s U/ @θ S @θ T S is of full rank, in each case for n > n 0 say.