Using Linkage to Electronic Primary Care Records to Evaluate Recruitment and Nonresponse Bias in The Avon Longitudinal Study of Parents and Children

Supplemental Digital Content is available in the text.

around 5 million UK residents. The linked sample comprised 749 individuals eligible to take part in ALSPAC (4% of the ALSPAC-eligible population). Of these, 519 (69%) had enrolled (originally been recruited) in ALSPAC, of whom 348 (67%) had participated at 10 years of age and 223 (43%) at 17 years of age. Further details about the sample are given in the eAppendix (http://links.lww.com/EDE/A898).
We used the linked resource to define the cumulative incidence (by ages 11 and 19) of six outcomes: any mental illness, any respiratory illness (excluding asthma), asthma and/or allergies, ever smoked (up to 19 only), been pregnant (19 only), and been classified as "child at risk" (eAppendix; http://links.lww.com/EDE/ A898). Data on exposures and potential confounders were obtained from ALSPAC questionnaires administered during pregnancy and early infancy.
For each outcome, we calculated the cumulative incidence among those eligible to take part in ALSPAC, those who had enrolled and those who participated at 10 and 17 years of age. We compared these using ratios of cumulative incidences. We calculated odds ratios (ORs) for four exposure-outcome associations (three examined separately at 10 and 17 years of age) among all enrolled subjects, and compared them-using relative odds ratios (RORs)-to ORs from individuals participating at 10 and 17 years of age. Confidence intervals were constructed around the logarithm of the ratios of cumulative incidence and RORs using a nonparametric bootstrap method.
The cumulative incidence of each of the outcomes was similar among enrolled and eligible subjects (eTable 4; http://links. lww.com/EDE/A898). However, participants were less likely than all enrolled subjects to have ever smoked, been pregnant or been "at risk" (eTable 5; http://links.lww. com/EDE/A898). For all the exposureoutcome associations, the ORs among participants at 17 were substantially different from those among all enrolled subjects (Table), although confidence intervals were inevitably wide given the small numbers with linked data available. The RORs for participants at 10 years of age were all

To the Editor:
Participation is often incomplete in observational research-because of initial failure to recruit or loss to follow-upresulting in loss of statistical power and possible bias. Selective participation often leads to biased prevalence estimates, but appears to be less important in relation to estimates of exposure-outcome associations, though exceptions have been reported. [1][2][3][4][5][6][7] Linkage to routine health data offers a means to examine the extent of these biases by providing data on participants and nonparticipants in a study. Here, within appropriate permissions, we have linked those eligible to take part in The Avon Longitudinal Study of Parents and Children (ALSPAC), a birth cohort, to the General Practice Research Database, an anonymized database of primary care records of closer to unity. Adjusting for other factors predictive of nonparticipation made little difference to these RORs. For three of the outcomes, there was evidence for an interaction between the exposure and outcome with respect to participation at 17 years of age, resulting in the outcome being missing not at random in one exposure group but not the other (eTable 6; http://links. lww.com/EDE/A898). These interactions were not seen at 10 years of age.
In conclusion, we have shown that bias due to loss to follow-up in some exposure-outcome associations may be substantial, although this study was small and the uncertainty in our estimates of bias consequently quite large. Our study demonstrates that linkage between observational studies and electronic patient records can provide useful information on nonresponders. This utility is likely to increase as the proportion of individuals with linked data increases. Comparative information available via linkage could be combined with, or used in place of, observational data. In addition, linked data could provide important information on factors predictive of nonresponse; such information can be incorporated into statistical analyses, thus reducing bias 8 and potentially increasing the precision of estimates.

ACKNOWLEDGMENTS
We are extremely grateful to all the families that took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. Adjusted for maternal education, parity as well as sex, breastfeeding, and smoking in first trimester when these were not the exposure variables. b All enrolled subjects with complete information on baseline covariates. c Parity was not adjusted for in this analysis because of sparse data. CI indicates confidence interval.