Age- and size-related reference ranges: A case study of spirometry through childhood and adulthood

Age-related reference ranges are useful for assessing growth in children. The LMS method is a popular technique for constructing growth charts that model the age-changing distribution of the measurement in terms of the median, coefficient of variation and skewness. Here the methodology is extended to references that depend on body size as well as age, by exploiting the flexibility of the generalised additive models for location, scale and shape (GAMLSS) technique. GAMLSS offers general linear predictors for each moment parameter and a choice of error distributions, which can handle kurtosis as well as skewness. A key question with such references is the nature of the age-size adjustment, additive or multiplicative, which is explored by comparing the identity link and log link for the median predictor. There are several measurements whose reference ranges depend on both body size and age. As an example, models are developed here for the first four moments of the lung function variables forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC in terms of height and age, in a data set of 3598 children and adults aged 4 to 80 years. The results show a strong multiplicative association between spirometry, height and age, with a large and nonlinear age effect across the age range. Variability also depends nonlinearly on age and to a lesser extent on height. FEV1 and FVC are close to normally distributed, while FEV1/FVC is appreciably skew to the left. GAMLSS is a powerful technique for the construction of such references, which should be useful in clinical medicine. Copyright © 2008 John Wiley & Sons, Ltd.

The aim here is to show how GAMLSS can be used to model age-and size-related reference ranges including departures from linearity, homoscedasticity and normality, while at the same time including appropriate adjustments for size. The method is illustrated using a data set of spirometry from childhood to old age [10]. Section 2 describes the data, Section 3 the development of the model, Section 4 the results and Section 5 is the discussion.

DATA
The data for the study are for ethnic white children and adults aged 4 to 80 years from four sources: U.S.A. (NHANES III) [17], Belgium [18], England [19] and Canada [20]. All the data were generated by qualified technologists and met the relevant national quality criteria. Table I summarizes the number of subjects by sex from each study, with the US data split into children and adults. The data were analysed separately for males and females, owing to their different patterns of size, growth and ageing.
The studies included several measures of spirometry, but just two are considered here: forced expiratory volume in 1 s (FEV 1 ) and forced vital capacity (FVC). The two measures are obtained from a rapid expiration following a full inspiration, and FEV 1 is the volume expired in 1 s while FVC is the total volume expired. The ratio FEV 1 /FVC is also clinically useful.

The LMS method and GAMLSS
The principle of the LMS method is that the distribution of the outcome variable Y is defined by three age-varying parameters , , such that the transformed outcome is a z-score with distribution close to N(0, 1). This distribution, called by GAMLSS the Box-Cox-Cole-Green (BCCG) distribution, has properties such that the distribution is symmetric, i.e. any skewness in Y is removed by suitable choice of the Box-Cox power , the location parameter is hence the median rather than the mean and the scale parameter is approximately the dimensionless CV or log standard deviation (SD). Kurtosis is assumed to be absent.

883
GAMLSS is a generalization of the LMS method where Y has a specified frequency distribution D ( , , , ), the parameters representing the first four moments of the distribution. A wide variety of distributional forms are available, of which the normal distribution (called NO by GAMLSS) is the simplest with just two parameters, location (mean ) and scale (SD ). Other distributions have location, scale and one or two shape parameters (skewness and kurtosis ), including the BCCG distribution (1) (where is equivalent to ) and the Box-Cox-power-exponential (BCPE) distribution, which is an extension of the BCCG distribution to include kurtosis.
Each parameter of the distribution D is modelled as follows (this is for , the subscripts 1 corresponding to the first moment): where g 1 (.) is the link function, 1 the linear predictor, X 1 linear terms and J 1 j=1 h j1 (x j1 ) additive terms (e.g. cubic splines) in the regression model. Corresponding equations define the other distribution parameters, i.e. , and for moments 2 . . . 4. GAMLSS is best fitted using the package gamlss [21] in R [22]. The age trends for each moment are fitted using cubic spline curves, as with the LMS method, which are more flexible than polynomials or fractional polynomials for modelling complex nonlinear relationships [23]. The complexity of the spline curve is defined in terms of equivalent (or effective) degrees of freedom (edf), and by R convention edf = 0 corresponds to a linear term, while higher edf indicates a progressively more complex curve. The R model notation for a cubic spline of given edf is cs(x-variate, edf).
Model choice involves minimizing the Schwarz Bayesian Criterion SBC = deviance+df×log n, where the fitted deviance is −2 log Likelihood, n is the sample size and df is the total model degrees of freedom. The SBC is also used to choose the edf for individual spline curves. Thus, the SBC penalises the deviance by log n units for each extra df, leading to optimal spline curves and a parsimonious final model. The SBC is also known as the Bayesian information criterion. For comparison, the well-known Akaike information criterion (AIC) penalises the deviance by 2 units per extra df, which is a special case of the generalised AIC[k] or GAIC[k] where the penalty is k units of deviance per df. For data sets where log n>3 (as here), the AIC and GAIC [3] lead to less parsimonious models. The choice of SBC over AIC or GAIC [3] is justified in the present context by the visual appearance of optimal spline curves selected using the different criteria. GAIC-based curves are consistently and obviously under-fitted, whereas SBC-based curves are smoother and visually more convincing. This is acknowledged to be a subjective judgment, but it has the benefit of parsimony, which is important in such a rich modelling environment.

Modelling
Size is viewed here as one or more anthropometry variables such as height or weight. The same broad modelling issues apply to any reference range involving both size and age: 1. Are both size and age required in the model? 2. If yes, is the size-age relationship additive or multiplicative-should the size and age effects be added together or multiplied together? 3. What is the best way to model size-linear, log or power? 4. Is the age effect linear, quadratic or more complex? 5. Should variability be measured on the outcome scale (i.e. SD) or log outcome scale (CV)? 884 T. J. COLE ET AL. 6. How does variability vary with age and size? 7. What is the distribution of the residuals, and does it vary with age and size? 8. Are there important technical effects, for example differences between the centres providing the data?
Taken together, these issues involve modelling the median (#1-4), the variability (#5-6) and the skewness and/or kurtosis (#7) of the distribution in terms of size and/or age. In addition technical differences may affect all four moments (#8). Note that since skewness is adjusted for, the location parameter is the median rather than the mean, though for the normal distribution of course the two coincide.
We now consider the moments of the model in turn, where size is represented by height.

Median
To predict the median, the questions above simplify to: (i) what are the best forms for modelling height and age (i.e. linear, log, (fractional) polynomial, cubic spline etc.)? (ii) are both the variables required? (iii) and if so, do they combine additively or multiplicatively?
The spirometry-height relationship is usually assumed linear in either height or log height, i.e. cs(Height, 0) or cs(log Height, 0), and slight nonlinearity can be modelled by increasing edf to 1 or 2. The relationship between height and log height is curved, so that cs(Height, 2) and cs(log Height, 2) are likely to fit very similarly.
Fitting age as a cubic spline allows for complex trends in childhood and adulthood, and it also models the transition between them. Polynomials or fractional polynomials do not provide sufficient flexibility. However, childhood is a much shorter period of time than adulthood (15 years as against 60 for the FEV 1 data), so the spline curve struggles to capture the detail of the childhood phase while smoothing the adult phase. To make the two periods more equal, the splines are fitted on the transformed scale Age p (or log age for p = 0), which for p<1 stretches the period of childhood and shrinks adulthood. The fitted curves are then redrawn on the original age scale. The optimal power transform is found by choosing p to minimize the deviance while keeping the age curve edf constant. If the deviance is reduced by less than log n, then p = 1.
Two separate models in height and age, additive and multiplicative, are fitted using the identity link = a +cs(Height, edf h )+cs(Age p , edf a ) and log link respectively, where edf h and edf a are the edf for height and age. Note that the log link uses log height to match conventional log-log regression, though height can be tested as an alternative. When edf h = 0 the models simplify to = a +b ×Height+cs(Age p , edf a ) and log = a +b ×log Height+cs(Age p , edf a ) = e a Height b e cs(Age p ,edf a ) (7) so that b is the height power, and is the product of height b and an age term. This multiplicative model is in the form of the classic allometric growth curve. As a further refinement b can be allowed to vary smoothly with age, where b is modelled as a cubic spline curve in (transformed) age with specified edf. This is known as a varying coefficient (vc in R) model [24]. Here b = vc(Age p , Height, edf ah ) or b = vc(Age p , log Height, edf ah ). For edf ah = 0, the terms simplify to the linear interactions Age p ×Height or Age p ×log Height, respectively.
Nearly all the FEV 1 -age-height models published to date have been special cases of (5) or (6), some of them including the age-height interaction. Age has been modelled as a low-order polynomial, often just a linear term, and the child and adult phases have been analysed separately. Thus, (3) to (6) represent a powerful family of models to apply to FEV 1 throughout the life course from childhood to later life.
Other design covariates can be added to the model, for example a factor centre distinguishing the data sources to estimate the mean size of inter-centre differences after adjusting for height and age.
The nature of the fitted model and its goodness of fit can be assessed for each covariate in turn using term. plot(), which plots the fitted relationship and optionally includes the partial residuals (z k − k )-see ([8], Appendix B). The same can be done for the other moment parameters in the model.

Variability
Depending on the choice of distribution, the variability is measured either as the SD (in absolute units e.g. litres for FEV 1 ) or the CV (in fractional units e.g. log FEV 1 ), which correspond to the residuals from linear and log regression, respectively. GAMLSS can fit either-the normal distribution models the SD while the BCCG and BCPE distributions model the CV. In general, the CV is less variable than the SD across age, as in childhood the SD tends to vary in proportion to the median whereas the CV is relatively constant [4].
Previous spirometric models have assumed homoscedasticity of residuals, i.e. is independent of height and age. GAMLSS tests this by modelling in terms of height and age using the log link, analogous to (4) Note that for simplicity the same age transformation is used as for . Again design effects such as centre can also be added to model differential variability.

Skewness and kurtosis
Another common regression assumption is normality of the residuals, i.e. the residuals are assumed either normally or lognormally distributed, depending on the form of regression. With suitable choice of distribution, GAMLSS allows the degree of skewness and kurtosis to be modelled explicitly.

T. J. COLE ET AL.
Non-normality is modelled using the BCPE distribution [8], where the skewness and kurtosis are estimated using models analogous to (3) and (4) = a +cs(Height, edf h )+cs(Age p , edf a ) Y can be converted to a standardized random variable Z as in (1), where the values of , and (i.e. in (9)) are functions of height and age as described above. Z has a power exponential distribution with kurtosis parameter , and for a distribution with normal tails = 2.
A simple alternative to the BCPE distribution is the two-parameter normal distribution (NO), with mean and SD , where Y is converted to a normally distributed z-score z = (Y − )/ .

Three families
To explore the nature of the height-age relationship (additive versus multiplicative) and the residual error structure (CV versus SD), three different families of models are considered. Family 1 is an additive model based on the BCPE distribution with the identity link for (3). Family 2 is the multiplicative equivalent of family 1 with the log link for (4). Both families adjust for heteroscedasticity and non-normality by modelling CV (8) and and (9). Family 3 fits the normal distribution and models mean (3) and SD (8).
The analysis plan is to build the models incrementally, first estimating p, then assuming constant, = 1 and = 2. Then is modelled to deal with heteroscedasticity, and then and are estimated to adjust for skewness and kurtosis (for families 1 and 2 only). Finally the effect of adjusting for centre is explored.

FEV 1 in males
As a case study we describe the development of the regression model for FEV 1 in males (n = 1621) in detail, and report the models for FVC and for females briefly. (3) and (4) are first optimised by direct search, across the three families, while setting edf h = 0 (i.e. including a linear height term). Table II summarizes the various models discussed here, one model per row, ranked in terms of decreasing SBC, where each extra parameter is penalised by log n ≈ 7.4 SBC units. The minimal SBC is 1325 units with family 2 (BCPE-log link), p = 0 (i.e. log age) and edf a = 7 (Table II row (Table II rows d and e).

Median . The age parameters p and edf a in
Families 1 (BCPE-identity link) and 3 (normal) fit much worse than family 2, with SBC 56 and 347 units greater, respectively (Table II rows b and a). For this reason the fitting focuses on family 2. Figure 1 compares the fitted spline curves for the models in rows d and e plotted on the age scale, with the partial residuals from the log age fit (row e). Figure 1 highlights not only the complexity of the child and adult trends and the child-adult transition, but also how transforming the age scale improves the fit in childhood. The y-axis, multiplied by 100, can be viewed as an   approximate percentage scale [25]. The range of values in childhood exceeds 0.3, so that median FEV 1 in boys aged 4 is 30 per cent less than in young men aged 23 after adjustment for height. Thus in childhood, age is very important over and above height. Fitting height rather than log height in (4) increases the SBC by 26 units (Table II rows c and e), as does increasing edf h . So edf h = 0 and log height is optimal. The log height regression coefficient b in (4) is 2.48 with standard error (SE) 0.034, so that adjusted for age, FEV 1 goes as height 2.5 at all ages. Figure 2 shows the partial regression plot for height with the fitted line.
Adding the log height by age interaction (edf ah = 0) increases the SBC, and increasing edf ah to 1 increases it further, so there is no evidence for an interaction. (8) to adjust for heteroscedasticity, the optimal age curve has edf a = 2 with the log height term omitted, which reduces the SBC by 37 units (Table II row g). Figure 3 plots the partial residuals for log along with the fitted log age effect, and shows that the variability is greater below age 11 and in old age. Including log height in (8) increases the SBC (Table II row f). Figure 4 shows the distribution of residuals for model g in Table II, where = 1 and = 2, and it indicates some left skewness. The two scatterplots show some low outliers, while the frequency distribution has a long left tail and the Q-Q plot shows negative skewness.

Skewness and kurtosis .
Optimizing gives the value 1.59 SE 0.16, significantly greater than 1, which reduces the SBC by 6.1 units (Table II row h). It shows FEV 1. 6 1 to be normally distributed after adjustment for height and age. Optimizing gives 1.85 SE 0.08, slight but insignificant non-normal kurtosis (i.e. = 2). Neither height nor age (9) improves the prediction. Thus, the final model has 16 df (Table II row h) and the predictors are log = a +b ×log Height+cs(log Age, 7) log = a +cs(log Age, 2) = a (10) Figure 5 consists of two contour plots showing the combined effect of age and height on predicted FEV 1 up to age 40 (the age range is truncated to emphasize the childhood phase). To restrict the prediction to plausible regions of the age-height plane, the range of heights by age is defined as ±3 SDs about the mean of the British 1990 height reference (shown as a dotted line in Figure 5) [26].
The plot shows the predicted 5th centile (the conventional lower limit of normal or LLN), and also the 50th centile.
It highlights the complex interplay between FEV 1 , age and height at different stages of childhood. The direction perpendicular to each contour shows the relative importance of height and age, so that Figure 5. Contour plots of FEV 1 versus age and height in males up to age 40. The dotted line is the median of the British 1990 height reference [26], and the plot is restricted to ±3 SDs about the median up to and beyond age 20. The two plots show the 5th and 50th centiles of predicted FEV 1 as functions of age and height. between 8 and 12 years where the contours are essentially flat, height not age determines FEV 1 . At 140 cm for example, median FEV 1 in this age range is 2L and the LLN is 1.6L, irrespective of age. At earlier and later ages the contours have negative slope in childhood, showing that both height and age are predictive. The pattern is simpler in adulthood where age and height are broadly uncorrelated and the parallel contours have positive slope. Figure 6 shows the two contour plots of Figure 5 presented as surfaces in a 3-D wireframe plot. Unlike the contour plot, this provides qualitative rather than quantitative information, but it does give an idea of the surface shapes.

Centre.
The four centres are numbered as in Table I and adjusting for centre adds 3 df to the model for each moment predictor. Adjusting (4) reduces the SBC by nearly 20 units (Table II row j) and adjusting (8) reduces it by a further 2.9 (Table II row k). So there is an evidence of between-centre heterogeneity in both median and variability.
Adjusting for centre affects the predictor, so that the optimal choice of edf a = 2 with height omitted is challenged by a model consisting of age plus height, which reduces the SBC by 7 units to 1252.5 (Table II row m). The height effect is negative, which together with the positive age effect means that young children have increased variability due to their small size whereas older adults have increased variability due to their greater age.  Figure 6. Wireframe plot of FEV 1 in males up to age 40 predicted from age and height, the 5th and 50th centiles as in Figure 5.
Thus, if centre is included, there is an alternative final model with 21 df (Table II row Table III gives the linear regression coefficients for (11). The log links for and mean that their coefficients are proportions and multiplied by 100 can be viewed as percentages. Thus for , the centre differences relative to centre 1 are −4 to +3 per cent, while for they are between −9 and +21 per cent. The log height coefficient for is close to 2.5, so that median FEV 1 goes as height 2.5 . The estimate for of 1.5 is significantly greater than 1, indicating modest left skewness (as seen in Figure 5).

FVC and females
The optimal model for FVC in males is similar to that for FEV 1 , the only substantive difference being that FVC is normally distributed, with the estimate for close to 1. There is no evidence for centre differences. FEV 1 and FVC in females have a simpler age trend for than in males, but similar variability and a normal distribution. FEV 1 also has a linear age-height interaction, with a steeper height slope in children than adults. Both FEV 1 and FVC show centre differences for . Figure 7 shows the height and age effects on and for FEV 1 and FVC in the two sexes, with the curves for FEV 1 (dashed) and FVC (solid) superimposed. They confirm the similarity of the four height coefficients, all with b ≈ 2.5, and the age trends are also similar for FEV 1 and FVC within each sex. Figure 7 shows that: (a) the height coefficient for FEV 1 is slightly shallower than for FVC; (b) median FEV 1 falls with age more steeply than median FVC; and (c) variability is slightly greater for FEV 1 than for FVC. The linear predictor for FEV 1 /FVC is effectively the difference between the linear predictors for FEV 1 and FVC, due to their log links, so that these discrepancies between them define the model for FEV 1 /FVC. In both the sexes, the log height coefficients for FEV 1 /FVC are tiny but highly significant, the age trends are broadly linear and highly significant and the variability is age-dependent. The distributions are very skew to the left with = 2.6 SE 0.24 in males and 2.4 SE 0.23 in females. Figure 8 shows the fitted age trends in FEV 1 /FVC by sex, adjusted for height, along with the partial residuals. The greater variability at the extremes of age is evident, as is the left skewness. Adjusting for centre differences reduces the SBC for both and .

DISCUSSION
The study shows how GAMLSS can be used to construct reference ranges that depend on both age and size, and in the process adjust for departures from the four multiple linear regression assumptions of linearity, additivity, homoscedasticity and normality. Using spirometry as an example, the median of the distribution is found to vary nonlinearly with age, the relationship between outcome, size and age is multiplicative rather than additive, the CV varies nonlinearly with age and the distribution is left skew (though not kurtotic). This is the first practical application of GAMLSS where the predictor is a function of more than one variable (setting aside the authors' own worked examples [8,27]). The LMS method, which is a special case of GAMLSS where the predictor depends only on age, has been used for two decades to define the distribution of anthropometry (weight, height, etc.) by modelling age trends in the first three moments [3]. The present study extends the model to include size as well as age in the linear predictor, and also models the median using the log link, which allows the multiplicative model to be tested without forcing a lognormal error distribution. A multiplicative or power law model of the form outcome ∝ size b × f (age) makes biological sense in terms of allometric scaling, and the results here confirm a far better fit for the log link than the identity link. In detail the height coefficients b in (6) are all close to 2.5, so that the ratio indices FEV 1 /height 2.5 LMS method and assumes no kurtosis. The fact that no kurtosis was detected supports the WHO conclusion that it need not be modelled [2] and that an adjustment for skewness is sufficient.
Evidence of left skewness was strongest for FEV 1 /FVC, where the Box-Cox power was around 2.5 in both sexes. As a ratio FEV 1 /FVC has an upper limit of 100 per cent, which inevitably means that the upper tail of the distribution is shorter than the lower tail and the adjustment for skewness compensates for this in a natural way. The skewness seen in FEV 1 for males was absent for females and for FVC, and it was in any case only weak. We recommend that in practice skewness not to be adjusted for with FEV 1 or FVC.
The existence of significant centre differences is important for generalizability and it emphasizes the need to standardize as far as possible the protocols used for spirometry measurement, in order to minimize centre differences. This is of course a requirement for any data collated from several sources, not just spirometry.
A real concern with GAMLSS is that its power and flexibility can lead to the development of complex models that lack biological plausibility-indeed the GAMLSS manual has a warning to this effect on page 2. Furthermore, 20 years ago the first author, introducing the LMS method, wrote Producing centile charts has always been something of a black art-the centile lines need to be drawn such that they are both smooth and close to the empirical centiles. It is not surprising that this trade-off problem is often solved by drawing the lines by eye.
The march of progress means that although the same sentiment still applies, the uncertainty now lies in the choice of model rather than the absence of one. The use of the SBC rather than the AIC or GAIC [3] for model choice results in more parsimonious final models, which provides some protection against this. Also the distributions considered here (BCPE, BCCG and normal) are just three of the 40-plus available in gamlss [27], but they are tried and tested. Also we justify our form of body size adjustment in biological terms. Taken together these constraints act as prior information to reduce the number of models to be considered. It is still possible that our model will be sub-optimal in certain situations, but we believe that the (probably only slightly) poorer fit will be offset by the robust pre-specified modelling framework.
The value of a parsimonious model adjusting for size and age is that the outcome measure can be expressed as an adjusted z-score or centile (1), which is valuable for interpretation by clinicians. We have published a clinical paper describing our lung function model [10], and an independent review has described the major impact it will have on clinical practice [28]. The downside of a combined size-age adjustment is that the reference ranges cannot, as is routinely done with growth centile charts, be presented graphically on a 2-D chart. The extra dimension requires a 3-D chart that ideally needs to be computerised. Figures 5 and 6 give an idea of what can be displayed in 2-D, but they are inadequate for detailed work. We have developed a Microsoft Excel add-in module called LMSGrowth that does the measurement to z-score conversion, either for individuals or groups [29].
In conclusion the study has demonstrated how age-related reference ranges can be extended to include an adjustment for size, by modelling the distribution in terms of its first three moments, using the log link for the median.