Longitudinal analysis of the strengths and difficulties questionnaire scores of the Millennium Cohort Study children in England using M‐quantile random‐effects regression

Summary Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, M‐quantile random‐effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.


Introduction
Early exposure to family poverty, stressful life events and neighbourhood disadvantage constitute risks for children's emotional and behavioural adjustment (Bradley and Corwyn, 2002;Flouri et al., 2010;Goodnight et al., 2012). The cumulative risk literature indicates that, as risk factors accumulate, children's emotional (internalizing) and behavioural (externalizing) problems increase (Trentacosta et al., 2008). Internalizing behaviours are typified by inward symptoms such as being withdrawn, fearful or anxious. Externalizing behaviours are outward and may be described as aggressive, non-compliant, impulsive or fidgety. One widely used measure of children's emotional and behavioural problems in psychological research is the strengths and difficulties questionnaire (SDQ) (Goodman, 1997). The SDQ score is the sum of the main caregiver's responses to a series of items that describe children's internalizing and externalizing problems. The 25-item SDQ comprises five domains measured with five items each, namely emotional symptoms (e.g. 'many fears; easily scared'), peer problems (e.g. 'gets on better with adults than with other children'), conduct problems (e.g. 'often lies or cheats'), hyperactivity (e.g. 'restless; overactive; cannot stay still for long') and prosocial behaviour (e.g. 'shares readily with other children'). For each item, 0 is given if the response is not true, 1 if somewhat true and 2 if certainly true. The internalizing SDQ score is the sum of responses to the five emotional symptoms items and the five peer problems items. Therefore, the range of internalizing scores is 0-20. The externalizing score is the sum of responses to the five conduct problems items and the five hyperactivity items (also ranging from 0 to 20). The SDQ is a valid and reliable measure of children's emotional, social and behavioural difficulties. More information can be found at www.sdqinfo.com. Recent literature (Flouri et al., 2014a;Midouhas et al., 2014) has systematically examined the effects of neighbourhood and family risk factors on the Millennium Cohort Study (MCS) children's trajectories of SDQ scores. Owing to the longitudinal structure of the cohort data, these studies make extensive use of multilevel models, which are also referred to as random-effects or mixed models (Steele, 2008).
Conventionally, random-effects models target the expected value of the conditional distribution of the outcome given a set of covariates. When the distribution of the outcome is asymmetric, modelling other location parameters, e.g. percentiles of the conditional distribution, may offer a more complete picture compared with a model that describes only the centre of a distribution. The distribution of SDQ outcomes is typically asymmetric. This is illustrated in Fig. 1. To the best of our knowledge, the above-mentioned studies that analyse the SDQ scores fail to recognize the asymmetric shape of the SDQ distributions. Modelling the conditional mean may therefore not offer the best summary. An alternative measure of the centre of a distribution, such as the median, may be more appropriate in this case. To illustrate, it is possible that the effect of certain risk factors on the SDQ scores is not the same across the distribution of SDQ scores. For example, maternal depression consistently has a strong association with mean child adjustment (Kiernan and Huerta, 2008) but may have a more pronounced effect at the top end where children display a high, perhaps abnormal, level of adjustment problems than at the bottom end of the distribution. Quantifying the effects of risks factors in this way can offer useful insights to child psychologists. The idea of modelling location parameters has a long history in statistics. The seminal paper by Koenker and Bassett (1978) is usually regarded as the first detailed development of quantile regression, which is a generalization of median regression. Extensions of quantile regression for modelling multilevel-type data have been considered by several researchers including Koenker (2004) and Bottai (2007, 2014). Koenker (2004) proposed the use of penalized quantile regression for longitudinal data, where the penalization is aimed at the shrinkage of individual effects. Bottai (2007, 2014) proposed a linear quantile mixed model (LQMM) estimated by using maximum likelihood and the link between quantile regression and the asymmetric Laplace distribution. The distribution of the random effects is assumed to be either Gaussian or Laplace, with the latter offering robustness properties. The use of the asymmetric Laplace distribution by Bottai (2007, 2014) is mainly for convenience as it provides a parametric link between maximum likelihood estimation and minimizing the sum of absolute deviations. Inference for the model parameters is performed by using a bootstrap based on resampling the sample data. Estimation and inference are facilitated by the lqmm function in R (R Development Core Team, 2010).
There are, however, alternatives to quantile regression, such as M-quantile (MQ) regression (Breckling and Chambers, 1988;Chambers and Tzavidis, 2006), which is a quantile-like generalization of regression based on influence functions (M-regression), and expectile regression (Newey and Powell, 1987), which is a quantile-like generalization of mean regression. Currently, the available MQ regression models assume independent observations and hence do not allow for the analysis of multilevel-type data. Although approaches to M-estimation in random-effects models have been proposed by a series of researchers (Huggins, 1993;Richardson and Welsh, 1995), the focus of their work is on modelling a location parameter at the centre of the conditional distribution rather than the entire conditional distribution. This paper proposes an extension of MQ regression to M-quantile random-effects (MQRE) regression.
In particular, from a methodological point of view the present paper proposes a novel approach for modelling location parameters by using MQRE regression. The methodology proposed can be viewed as an alternative to the LQMM that was proposed by Geraci and Bottai (2014), although the two models target different population parameters. Furthermore, the LQMM allows for the specification of both random intercepts and random coefficients. In contrast, the MQRE approach that we propose in this paper can currently accommodate only random intercepts. From an applied point of view, the present paper extends recent studies on the effect of risk factors on the SDQ scores of children in England by using MCS data. The paper further presents a range of existing and newly proposed modelling tools which are available to the prospective data analyst for modelling a general set of location parameters of a distribution via quantile and MQRE regression.
Why consider MQs when one key advantage of quantile regression is the more intuitive interpretations? Although not the same, both quantile and MQ models target essentially the same part of the distribution of interest. As we shall see in this paper, one of the main advantages of M-estimation is that it easily allows for robust estimation of both fixed and random effects. Furthermore, because MQs are based on the use of influence functions, we can trade robustness for efficiency in inference by selecting the tuning constant of the influence function. Finally, the use of a range of continuous influence function, instead of only an absolute value of 1 as in quantile regression, can potentially offer computational stability.
The structure of the paper is as follows. In Section 2 we describe the data. Section 3 reviews random-effects regression and focuses specifically on robust estimation of model parameters.
In Section 4 we review MQ regression, present the proposed MQRE regression and discuss estimation and inference. In Section 5 we present the results from the application of MQRE and quantile random-effects regression (Geraci and Bottai, 2014) to the SDQ scores of the MCS children in England. In Section 6 we empirically evaluate the properties of MQRE regression by using Monte Carlo simulation studies under a range of data-generating mechanisms. Finally, in Section 7 we conclude the paper with some final remarks.

The data
The MCS (www.cls.ioe.ac.uk/mcs) is a longitudinal survey drawing its sample from all births in the UK over a year, from September 1st, 2000. The MCS was designed to overrepresent areas with high proportions of ethnic minorities in England, areas of high child poverty and the three smaller UK countries. Sweep 1 took place when the children were around 9 months old. Sweeps 2, 3 and 4 took place around ages 3, 5 and 7 years. The MCS provides a unique source of longitudinal measurements of SDQ at sweeps 2, 3 and 4. Longitudinal data on the SDQ are now available, providing a unique opportunity for studying the change in SDQ scores over time and how this is affected by risk factors and other family and child characteristics.
The data that we use in this paper are collected from children who participated in the first four sweeps of the MCS in England. Our study sample consists of 5000 MCS children in England, leading to a total sample size for the longitudinal data set equal to n = 11 972 observations. The two outcomes of interest, emotional problems (measured by the SDQ internalizing score) and behavioural problems (measured by the SDQ externalizing score), were collected at ages 3, 5 and 7 years. The data consist of 3837 measurements at the first time point, 4314 at the second and 3821 at the third. Missing measurements are due to unit or item SDQ non-response in a given time point, and previous MCS research demonstrates that less favourable family socioeconomic characteristics (e.g. lower parental qualifications) and ethnic minority backgrounds predict such non-response (Flouri et al., 2014a, b). This is an additional reason for controlling for the effect of these covariates in the models that we present in Section 5 of the paper.
The key time varying explanatory variables are as follows. Adverse life events, ALE 11, were measured as the number (out of 11 events) of potentially stressful life events experienced by the family between two consecutive sweeps. The events, which were derived from available MCS data and based on the adverse life events scale of Tiet et al. (1998), were family member died, negative change in financial situation, new step-parent, sibling left home, child got seriously sick or injured, divorce or separation, family moved, parent lost job, new natural sibling, new stepsibling and mother diagnosed with or treated for depression. Family poverty, measured by socio-economic disadvantage, SED 4, combines information on overcrowding (more than 1.5 people per room excluding the bathroom and kitchen), not owning a home, receipt of meanstested income support and income poverty (below the poverty line defined as 60% of the UK national median household income). Maternal depression, kessm, is measured by the Kessler score. An additional time varying variable is neighbourhood deprivation measured by the index of multiple-deprivation score, imdscore. Furthermore, child's age (in years, centred near the mean age-age year scal) and the quadratic effect of child's age (age2 year scal) were included in the model. The time constant variables that we considered include maternal education (no qualification (baseline), university degree or General Certificate of Secondary Education, GCSE), ethnicity (non-white (baseline) or white) and gender (female (baseline) or male). Finally, a design variable which allows for the stratification of the MCS sampling design was included in the model. The stratification variable of the MCS consists of three categories, namely the advan- taged stratum (baseline category), the ethnic stratum, Eng eth stratum, and the disadvantaged stratum, Eng dis stratum. Table 1 presents summary statistics for the two outcomes, SDQ internalizing and externalizing scores, and for some key continuous covariates. The asymmetry in the SDQ outcomes is noted by examining the mean-median relationship. The average of adverse life events, ALE 11, is 1.447 and the maximum is 7. The mean Kessler score, kessm, is 2.767 but there are cases with much higher scores of maternal depression with the maximum value equal to 24. 38% of children have mothers who hold a degree, 49% have mothers with General Certificate of Secondary Education or other qualification and 13% have mothers with no educational qualification. 51% of the children are males and 83% are of white background.
The descriptive measures offer information about the unconditional distribution of the two SDQ outcomes. It is more appropriate, however, to study the conditional distribution of SDQ scores given a set of covariates. To do so we use a two-level (level 1, measurement occasion; level 2, MCS child) random-intercepts model for the two SDQ outcomes with random effects specified at the level of the MCS child. The model further adjusts for the effect of adverse life events, maternal depression, maternal education, socio-economic disadvantage, neighbourhood deprivation, gender and ethnicity and accounts for the longitudinal structure of the data. Figs 2 and 3 present normal probability plots of level 1 and level 2 residuals. These indicate severe departures from the Gaussian assumptions of the random-intercepts model for both SDQ outcomes. Hence, estimating a robust measure of central tendency and the quantiles of the conditional distribution of SDQ scores, given the covariates, is worth pursuing.

Multilevel models and robust estimation
Suppose that we have data on an outcome variable y and a set of covariates x for n individuals clustered within d groups. This can be either a longitudinal or a multilevel data set. A popular approach for modelling hierarchically structured data is to use a random-effects model. In the simplest case we can define a random-intercepts model : : : , n j , j = 1, : : : , d, .1/ where x ij is a p-vector of X, β is a p × 1 vector of regression coefficients and z ij is a d × 1 vector of group indicators used for defining the random part of the model. In addition, γ denotes a d × 1 vector of group-specific random effects, ij is the individual random effect and we assume that γ ∼ N.0, σ 2 γ I d / and ij ∼ N.0, σ 2 /. Here and throughout the paper I k is an identity matrix of size k. One way of obtaining estimates of the fixed effects and the variance components of model (1) is to employ maximum likelihood estimation. Assuming normality for the error components where y is the n × 1 response vector, V = Σ + ZΣ γ Z T , Σ = σ 2 I n , Σ γ = σ 2 γ I d and Z is an n × d matrix of known positive constants. Estimates of β, γ, σ 2 γ and σ 2 are obtained by differentiating the above log-likelihood with respect to these parameters and then solving the estimating equations that are defined by setting these derivatives equal to 0. Predicted random effects are then obtained by using the maximum likelihood estimates of the fixed effects and the variance components. It is easy to see that in equation (2) we assume a squared loss function. In practice, however, data may contain outliers that invalidate the Gaussian assumptions. In such a case the estimated model parameters under equation (2) will be biased and inefficient (Richardson and Welsh, 1995). One approach to robust estimation of the random-effects model that protects us against departures from normality is to use an alternative loss function in the log-likelihood that grows along with the regression residuals at a slower rate than the squared loss function. This is the approach that was followed by Huggins (1993) and Richardson and Welsh (1995). In particular, robust maximum likelihood estimation for the random-effects model is performed by maximizing the modified log-likelihood function Robust estimates of β, σ 2 γ and σ 2 are obtained by solving the estimating equations defined by setting the derivatives of the modified log-likelihood with respect to the parameters equal to 0. This is the robust maximum likelihood proposal I by Richardson and Welsh (1995). To ensure robustness, ψ.r/ and r T ψ.r/ must be bounded. A bounded ρ-function leading to a redescending ψ fulfils these conditions. An alternative to the use of a redescending ψ-function is to solve the following estimating equations for σ 2 γ and σ 2 : Richardson and Welsh (1995) called this robust maximum likelihood proposal II. It can be viewed as a generalization of Huber's proposal II (Huber, 1981). However, there is no likelihood function that has expression (4) as its derivative with respect to the variance components. For details see Richardson and Welsh (1995). Robust predicted random effects can be obtained by solving for γ in the estimating equation (Fellner, 1986) An alternative robust estimation approach that can potentially lead to more efficient estimates of the variance components when there is a small number of groups or when groups contain a small number of observations is the robust restricted maximum likelihood approach (Richardson and Welsh, 1995;Staudenmayer et al., 2009).

M -quantile regression and extensions to M -quantile random-effects regression
In this paper we are interested in describing the relationship between y and x not only at the centre of the conditional distribution of y given x but also at other parts of this distribution.
In this section we start by reviewing M-quantile regression and subsequently extend this to M-quantile random-effects regression.

M-quantile regression
The classic regression model summarizes the behaviour of the mean of a random variable y at each point in a set of covariates x. This provides a rather incomplete picture, in much the same way as the mean may offer an incomplete picture of a distribution. Instead, quantile regression summarizes the behaviour at different parts (e.g. quantiles) of the distribution of y at each point in the set of the xs.
In the linear case, quantile regression leads to a family of hyperplanes indexed by a real number q ∈ .0, 1/. For a given value of q, the corresponding model shows how the qth quantile of the conditional distribution of y varies with x. For example, if q = 0:5 the quantile regression hyperplane shows how the median of the conditional distribution changes with x. Similarly, for q = 0:1 the quantile regression hyperplane separates the lower 10% of the conditional distribution from the remaining 90%.
As we mentioned in Section 1, quantile regression can be viewed as a generalization of median regression. In the same way, expectile regression (Newey and Powell, 1987) is a 'quantile-like' generalization of mean regression. MQ regression (Breckling and Chambers, 1988) integrates these concepts within a framework defined by a quantile-like generalization of regression based on influence functions (M-regression). The MQ of order q of the conditional density of y given the set of covariates x, f.y|x/, is defined as the solution MQ y .q|x; ψ/ of an estimating equation ψ q {y − MQ y .q|x; ψ/} f.y|x/ dy = 0, where ψ q denotes an asymmetric influence function, which is the derivative of an asymmetric loss function ρ q . In particular, suppose that .x T i , y i /, i = 1, : : : , n, indexes the units of a random sample consisting of n observations from the target population, x T i are row p-vectors of a known design matrix X and y i is a scalar response variable corresponding to a realization of a continuous random variable with unknown continuous cumulative distribution function F ; a linear MQ regression model for y i given x i is one where we assume that i.e. we allow a different set of p regression parameters for each value of q ∈ .0, 1/. Estimates of β ψq are obtained by minimizing .6/ Different regression models can be defined as special cases of expression (6). In particular, by varying the specifications of the asymmetric loss function ρ q we obtain the expectile, MQ and quantile regression models as special cases. When ρ q is the squared loss function we obtain the linear expectile regression model if q = 0:5 (Newey and Powell, 1987) and the standard linear regression model if q = 0:5. When ρ q is the loss function that was described by Koenker and Bassett (1978) we obtain linear quantile regression. Throughout this paper we shall take the linear MQ regression model to be defined by equation (5) when ρ q is the Huber loss function (Breckling and Chambers, 1988): where c is the tuning constant, which is bounded away from zero. Setting the first derivative of expression (6) equal to 0 leads to the estimating equations s −1 r iq /{q I.r iq > 0/ + .1 − q/ I.r iq 0/} and ψ.u/ = uI.−c u c/ + c sgn.u/ I.|u| > c/ and s > 0 is a suitable estimate of scale. For example, in the case of robust regression, s = median|r iq |=0:6745. Provided that the tuning constant c is strictly greater than 0, estimates of β ψq are obtained by using iterative weighted least squares. The estimation algorithm is implemented in R by a simple modification of the iterative weighted least squares algorithm used for fitting standard M-regression by using the function rlm (Venables and Ripley (2002), section 8.3). Note that this guarantees a unique solution (Kokic et al., 1997) when a continuous monotone influence function (e.g. Huber proposal 2 with c > 0) is used. The tuning constant c can be used to trade robustness for efficiency in the MQ regression fit, with increasing robustness and decreasing efficiency as we move towards quantile regression (c chosen to be positive and close to 0) and decreasing robustness and increasing efficiency as we move towards expectile regression (c chosen to be large and positive). Since we cannot set c to 0, it is not possible to use iterative weighted least squares for quantile regression. However, by setting c to be very large and positive we define an expectile regression model. The flexibility of MQ regression is of particular importance for the present paper as this will also allow us to define an expectile random-effects regression model.

M-quantile random-effects regression
In this section we assume that the data have group structure (multilevel or longitudinal) as in Section 3, which is of substantive interest. We extend the linear specification (5) to allow for the inclusion of random effects when modelling the MQs of the target distribution. This extension can be useful when analysing multilevel-type data in which case the random effects aim at capturing unobserved heterogeneity. In the simplest case one can include a group-specific random intercept in the linear specification for the MQs (5) as follows: where γ is a d × 1 vector of group random effects. For fitting equation (8) we propose the use of estimating equations based on asymmetric loss functions. This novel approach for modelling location parameters is what we call MQRE regression. Unlike Bottai (2007, 2014) who utilized the link between maximum likelihood estimation under the asymmetric Laplace distribution and quantile regression, we remain within the M-estimation framework. Since the estimating equations that are obtained from the modified log-likelihood function (3) are susceptible to multiple solutions, we start from the robust maximum likelihood proposal II and following Sinha and Rao (2009) we note that one can extend the idea of asymmetric weighting of residuals by defining the following modified estimating equations for estimating the regression coefficients and the variance parameters: .y − Xβ ψq / is a vector of scaled residuals with components r ijq , U q is a diagonal matrix with diagonal elements u ijq equal to the diagonal elements of the covariance matrix V q and β ψq is the p × 1 vector of MQ regression coefficients. Here V q = Σ q + ZΣ γ q Z T , Σ γ q = σ 2 γ q I d , Σ q = σ 2 q I n and σ 2 q and σ 2 γ q are the MQ-specific variance parameters. Finally, the component For solving equations (9) and (10) we adopt a Newton-Raphson algorithm and the fixed point iterative method (Anderson, 1973). In particular, the fixed effects are estimated by using a Newton-Raphson algorithm whereas the variance parameters are estimated by using a fixed point algorithm. Using the Newton-Raphson optimization method for estimating the variance parameters can cause convergence problems. It is therefore preferable to use a technique that is derivative free such as the fixed point iterative method. The steps of the estimation algorithm are described in Appendix A.
Although prediction of the random-effects vector γ in equation (8) is not the primary focus in this paper, we outline a possible solution. In particular, the simplest solution is to predict the random effects by using a modified Fellner equation (Fellner, 1986) at each value of q. The issue of predicting the random effects is also discussed in Geraci and Bottai (2014) and our proposed solution is similar to the solution that they proposed. However, the approach that we use for fitting equation (8) separately at each q makes γ depend on q since these are functions of MQ-specific parameters (see equation (12) in Appendix A). This raises the question of how one then combines these different q-specific estimates of γ since they will clearly be correlated over q. An alternative approach that avoids this issue would be to modify the q-specific fitting process that was described above to allow for a common γ across q as in equation (8) or by imposing a suitable group-specific ordering over these q-specific predicted values. Both approaches can be potentially achieved by adding suitable constraints to this fitting procedure. Exploring alternative approaches to the prediction of the random-effects vector γ in equation (8) remains an open problem that we are currently investigating.
The estimating equations under expression (2) can be obtained as a special case of equations (9) and (10) for specific choices of ρ q and q. In particular, when q = 0:5 and ρ q is the squared loss function we obtain the estimating equations under expression (2). When q = 0:5 and we use a loss function other than the squared loss, e.g. the Huber loss function, we obtain the estimating equations of the robust maximum likelihood proposal II. For q-values other than 0.5 and for different choices of ρ, solving equations (9) and (10) will provide estimates of fixed effects β ψq and variance parameters, σ 2 q and σ 2 γ q , respectively. More specifically, using a squared loss function in equations (9) and (10) at q = 0:5 results in the expectile random-effects regression whereas using the Huber loss function in equations (9) and (10) results in MQRE regression.
Inference for the parameters of the linear random-effects model when the Gaussian assumptions hold has been studied by Hartley and Rao (1967) and Miller (1977). Huber (1967) showed the consistency and asymptotic normality of 'maximum-likelihood-type' estimators (M-estimators). The work of Huber (1967) is specifically linked to robust estimation problems and his arguments were used by Welsh and Richardson (1997) to propose robust inference for the parameters of the linear random-effects model. Inference for the parameters of the MQ and expectile random-effects regression is based on a Taylor series approximation. Details are given in Appendix B. An alternative approach for inference is by using the bootstrap. A recent example of the use of the parametric bootstrap in the case of robust estimation of the parameters of a random-effects model has been given by Sinha and Rao (2009). A significant drawback to using the bootstrap in the case of MQRE regression is the computation time that is required for performing a large number of bootstrap replications. We have performed some limited empirical assessment of the parametric bootstrap procedure and the results are consistent with those obtained from the analytic approximation based on the Taylor series expansion.

Analysis of the Millennium Cohort Study data
In this section we present longitudinal modelling of the SDQ internalizing and externalizing scores for our sample of MCS children. In light of the substantive literature that was described in Section 1, we are mainly interested in the effect of family and neighbourhood risk factors on young children's emotional and behavioural problems. We modelled the effects of the following time varying and time constant variables on SDQ scores at ages 3, 5 and 7 years: adverse life events, socio-economic disadvantage, maternal depression, maternal education, ethnicity, gender, neighbourhood deprivation, child's age and the quadratic effect of child's age. Details about these variables are provided in Section 2. We further control for the effects of stratification by including the stratification variable in the models.
For longitudinal modelling of location parameters of the distributions of the internalizing and externalizing scores we used the following models: (a) the proposed MQRE model (Section 4) with random intercepts specified at the level of the MCS member, (b) the LQMM (Geraci and Bottai, 2014) with random intercepts specified at the level of the MCS member, (c) the LQMM (Geraci and Bottai, 2014) with random intercepts specified at the level of the MCS member and random slopes (coefficients) specified for age and (d) the linear random effects (LRE) model (1) for the mean (produced by using the lme function in R).
In addition to the MQRE model proposed, the reason for using the LQMM (Geraci and Bottai, 2014) in the application is to inform the prospective data analyst about the range of modelling tools that are available. The lqmm function in R, that estimates the LQMM (Geraci and Bottai, 2014), allows for the specification of both random intercepts and random coefficients. In contrast, MQRE regression allows only for random intercepts. Random intercepts imply a uniform (exchangeable) correlation structure whereas random slopes allow the correlation structure to depend on age, which may offer a more realistic structure for repeated measures data. Although possible, allowing for random slopes in quantile random-effects models is complex and can potentially result in convergence problems when fitting the model. In contrast, quantile models with a randomintercepts specification have a correlation structure that is simpler to estimate while allowing for modelling the entire conditional distribution of the outcome. The MQRE and LQMM results are not directly comparable as these models are targeting different location parameters. However, both models attempt to model location parameters that are associated with the same part of the conditional distribution of SDQ scores. Table 2 presents the results of the MQRE random-intercepts model. The parameters of the MQRE model are estimated by using the algorithm in Appendix A (see also Section 2). The intercepts are estimates of location parameters (MQs in the case of MQRE regression) of the SDQ externalizing score of a child at age 5 years (age is centred) when setting the categorical covariates to the corresponding baseline values and the continuous variables to 0. The estimated regression coefficients are consistent with what theory predicts. Males appear to have higher SDQ scores compared with females. Increasing adverse life events, socio-economic disadvantage, maternal depression, neighbourhood deprivation and lower maternal education are associated with higher SDQ scores. In addition, after controlling for neighbourhood and family characteristics, we do not find an ethnicity effect. The grey area in Fig. 4 displays 95% confidence intervals of the MQRE parameters for various MQs for some selected risk factors (maternal depression, socio-economic disadvantage and higher versus no maternal educational qualifications). The MQRE standard errors are computed by using the methodology in Section 4.2 and in Appendix B. The bold dotted curve presents the corresponding parameter estimates that were obtained under the LRE model. This model is targeting the conditional expectation of the externalizing score given the explanatory variables. The two dotted curves around the bold dotted curve present the upper and lower bounds of 95% confidence intervals. The most interesting aspect of this analysis is that it allows for the estimation of the effect of covariates at different parts of the distribution of SDQ scores. As expected, increasing values for the risk factors such as SED 4, maternal depression and lower maternal education are associated with increased SDQ scores. The effect of these covariates appears to be more pronounced when looking at the upper tail compared with the lower tail of the distribution. For example, the disparity in the externalizing scores of children with mothers who have higher educational qualifications, compared with children with mothers who have no educational qualifications, is smaller at the lower part of the distribution compared with the upper part of the distribution. This may suggest that the protective role of higher maternal education is more pronounced for children with more externalizing problems. Maternal depression also appears to have a stronger effect at the top end, compared with the lower end, of the distribution. We revisit these results in the next section and we draw some comparisons between the externalizing scores results and the results for the internalizing scores by using relevant substantive literature.

Results for externalizing scores
The results for the LQMM are shown in Table 3 for the random-intercepts model and in Table 4 for the random-slopes model. Standard errors in this case are computed by using the bootstrap estimator that was proposed by Geraci and Bottai (2014). To start we note the differences between the parameter estimates of the median and mean models. The estimated level 2 variance component from the median model is smaller than the corresponding estimate from the mean model (see σ 2 γ q in Table 3). The MQRE level 2 variance component lies between the LQMM and the LRE components (see σ 2 γ q in Table 2). These results are as expected and the differences may be explained by the diagnostics that we presented in Section 2. As expected, in most cases the standard errors of the robust estimates are also somewhat higher than the corresponding standard error estimates from the LRE model.
The estimated intercepts from the LQMM random-intercepts model (Table 3) present estimates of the corresponding quantiles of the SDQ externalizing scores distribution for a baseline MCS child. The asymmetry in this distribution is reflected by the mean-median relationship, which is consistent with the findings in Section 2. The results of the LQMM random-intercepts model are comparable with the results obtained with MQRE regression. As we discussed earlier, the LQMM and MQRE regression target different population location parameters. The LQMM results are easier to interpret. The MQRE model, in contrast, targets the same part of the conditional distribution as the LQMM but the interpretation of the results is less intuitive. From the perspective of the data analyst, however, both models can be used to look at the entire distribution of the outcome accounting for the longitudinal structure of the data. Currently, one methodological advantage of the LQMM approach is that it allows for the specification of more complex correlation structures (e.g. random slopes). Table 4 presents the results from the LQMM that, in addition to a random intercept, includes a random slope on age. However, when fitting the random-slopes model we experienced slow convergence for quantiles at the tails of the distribution whereas for the median fit the estimated variance component of the random slope is very close to 0. These results suggest that a random-intercepts specification may present a more feasible approach that allows at the same time for modelling different parts of the distribution of externalizing scores. Table 5 presents the results of the random-intercepts MQRE. As with the externalizing problems, the estimated regression coefficients for internalizing scores are consistent with child development theory. The grey area in Fig. 5 displays 95% confidence intervals of the MQRE parameters for various quantiles and for selected key risk factors. After controlling for family and area characteristics, socio-economic disadvantage is significantly associated with internalizing scores only at q = 0:5 and q = 0:75. This is in contrast with the more pronounced effect of socio-economic disadvantage, across the distribution, that we found for externalizing scores. This is in line with findings from a large number of studies showing that, in general, poverty and material deprivation are less consistently and less strongly associated with children's internalizing problems compared with externalizing problems (Costello et al., 2010;Dearing et al., 2006). This differential effect of poverty is not surprising considering that, in general, genetic influences are modest and environmental influences are large for antisocial behaviour relative to other childhood disorders (Plomin et al., 1997). Externalizing problems may therefore be more malleable than internalizing problems in response to environmental changes.

Results for internalizing scores
In contrast, maternal depression is significantly associated with increased internalizing scores. This effect is also clearly more pronounced (compared with the results for externalizing scores) at the top end of the distribution. Although maternal depression is related to both externalizing and internalizing problems, depression as a construct is more associated with internalizing problems such as emotional symptoms. The top end of the distribution may represent children with abnormal problems (or disorders) whose scores are likely to be affected by several factors contributing to maternal depression effects including genetic transmission or poor parenting (Goodman and Gotlib, 1999). Importantly, there is recent evidence that the mechanisms of risk   from parental depression to externalizing problems are different from those for internalizing problems. For example, Silberg et al. (2010) showed that, whereas internalizing problems were accounted for solely by environmental factors, both environmental and genetic factors were significant in the association between parental depression and childhood externalizing problems.
As with externalizing problems, the protective effect of higher maternal education is present also for internalizing problems. Turning now to the results from the LQMM random-intercepts model in Table 6, again we note that these are in line with the MQRE results. However, with the LQMM we experienced more problems with the convergence of the algorithm, which demonstrates that estimating more extreme quantiles can be sometimes challenging. In contrast, estimation with MQRE regression was smoother but this comes at the cost of modelling location parameters that are more difficult to interpret. Table 7 presents the results from the LQMM that includes a random slope on age. For this model we experienced slower convergence when estimating the lower quantiles. Finally, the comments on the relationship between the mean and median fits in Section 5.1 also apply to the results for the internalizing scores.

Simulation study
In this section we present results from a Monte Carlo simulation study that was used for assessing the performance of MQRE regression at q = 0:5, 0:75, 0:9. The objective of this simulation study is twofold. First, we assess the ability of MQRE regression to account for the dependence structure in hierarchical data and hence provide better efficiency compared with models that ignore this dependence structure. Second, we empirically evaluate the analytic approximations for estimating the standard errors of the model parameters. For both aims, data are generated under the two-level location-shift model y ij = 100 + 2x 1ij + γ j + " ij , i = 1, : : : , n j , j = 1, : : : , 100,  (see Section 4) for which we also use the Huber proposal 2 influence function with c = 1:345. Although both MQRE regression and MQ regression are based on outlier robust estimation methods, we expect that the MQRE regression will perform better than MQ regression when clustering is present. At q = 0:5, MQRE regression is compared with the LRE model (1). We expect that MQRE regression will perform better than LRE when the normality assumptions are violated. For location parameters other than q = 0:5 we compare the MQRE with MQ models. In this case we expect that MQRE regression will be superior when outliers and clustering are present. To compare the various methods we mainly focus on the fixed effects parameters. The results for the variance parameters are available from the authors on request. For each regression parameter, performance is assessed with the following measures: (a) average relative bias ARB, defined as whereθ .r/ is the estimated parameter at quantile q for the rth replication and θ is the corresponding 'true' value of this parameter. (b) relative efficiencies EFF, defined as .r/ : We use MQ regression with c = 1:345 as a reference because we are mainly interested in assessing the ability of the MQRE to account for the dependence structure of hierarchical data. Table 8 reports the simulation results for estimators of the fixed effects under the various approaches for q = 0:5, 0:75,0:9. Under the scenario [N, N] for q = 0:5, we observe that the estimators of the fixed effects from LRE regression are more efficient than the corresponding estimators from MQ regression. This is expected because LRE regression correctly models the two-level structure of the data. The estimators of the fixed effects of the LRE are also more efficient than the corresponding estimators from the MQRE model. Under this scenario there is no reason to employ outlier robust estimation, and doing so results in higher variability for   the MQRE regression estimators. At q = 0:75 and q = 0:9, the estimators of the fixed effects of the MQRE model are more efficient than the corresponding estimators of MQ regression. This demonstrates the ability of MQRE regression to account for the group structure of the data, which is something that is not possible when using MQ regression.
The superior performance of MQRE regression is demonstrated in scenarios [T , T ] where the data are generated under a t-distribution and [", γ] where outliers exist at both hierarchical levels. In particular, in most cases the estimators of the fixed effects from MQRE regression are more efficient than the corresponding estimators from MQ or from LRE regression. These results provide evidence that using MQRE regression protects against outlying values and accounts for the dependence structure. Finally, it appears that having departures from normality by using a Laplace distribution (scenario [N, Lap]) does not have a severe effect on the efficiency of the estimators of the fixed effects in terms of robustness. Nevertheless, we also observe in this scenario a clear advantage of MQRE compared with MQ regression.
Taking a closer look at ARB for the fixed effects, we observe that all estimation methods have almost no bias. The bias is computed by assuming that the target population parameters are the quantiles of the conditional distribution. Hence, the MQRE fits are penalized. Despite this, Table 8 reveals that for q = 0:5 ARB for the slope and intercept is always smaller than 0:1% for all estimators. The same holds also for the slope at quantile q = 0:75 and q = 0:9. In the case of the intercept, ARB is in some cases around 0:3% for q = 0:75 and around 0:6% for q = 0:9.
Having assessed the performance of MQRE regression, the second aim of this empirical study is to evaluate the analytic approximations of the standard errors of the fixed effects. Therefore, we compare the empirical and estimated standard errors under the four scenarios. For each scenario and for each estimatorθ, at q = 0:5, 0:75, 0:9, Table 9 reports averages over simulations of the Monte Carlo standard error S.θ/ = √ {R −1 Σ R r=1 .θ .r/ −θ/ 2 } and the estimated standard errors of the fixed effects β q . It can be observed that for all scenarios the estimated standard error of the estimators at q = 0:5, 0:75, 0:9 offers a good approximation to the empirical standard error. Furthermore, for all scenarios the asymptotic standard error of the MQRE regression is, as expected, larger for location parameters that are closer to the tail of the distribution than for location parameters that are closer to the centre of the distribution.

Discussion
The paper offers to the prospective data analyst tools for modelling a general set of location parameters in the presence of clustering in the data. In particular, we propose an extension of MQ regression to MQRE regression. As illustrated in the real data example, the proposed approach to modelling conditional MQs can offer a considerably deeper insight to child psychologists into the effect of risk factors on children's behavioural problems. This in turn can assist in proposing new substantive theory. The use of the methodology is facilitated by the availability of a computationally efficient algorithm using C++ in R. The current methodology allows only for random-intercepts and two-level structures. Future work will extend the proposed methodology to allow for additional hierarchical levels and more complex correlation structures that include random coefficients. = K 2q tr.V −1 q ZZ T V −1 q ZZ T / K 2q tr.V −1 q ZZ T V −1 q I n / K 2q tr.V −1 q I n V −1 q ZZ T / K 2q tr.V −1 q I n V −1 q I n / and a σ 2 γq σ 2 q = 1 2 ψ q .r q / T U 1=2 q V −1 q ZZ T V −1 q U 1=2 q ψ q .r q / 1 2 ψ q .r q / T U 1=2 q V −1 q I n V −1 q U 1=2 q ψ q .r q / : Iterative equation (11) is more stable than the Newton-Raphson method. Like any other iterative algorithm, the fixed point algorithm requires initial values for the parameters. As a result, using welldefined starting values for the variance parameters is advisable.
Step 5: at convergence, predicted random effects at the qth MQ fit are obtained by solving the following estimating equation with respect to γ q : γq ψ q .Σ −1=2 γq γ q / = 0: . 12/ As can be seen from the steps of the estimation algorithm, predicted random effects are obtained at convergence, i.e. we start by first estimating the fixed effects and the variance parameters and then, given estimates of the fixed effects and of the variance parameters, we predict the random effects. The reason for this, as also pointed out by Sinha and Rao (2009), is that estimates of the variance parameters that depend on the predicted random effects may be unstable.
Following Welsh and Richardson (1997), the covariance matrix of the estimators can be written as G −1 F.G −1 / T . The components of the matrix G and the matrix F are available from the authors on request. The covariance matrix of the estimators can be consistently estimated byĜ −1F .Ĝ −1 / T where the matriceŝ G andF are evaluated atθ q .