Use of Fixed Effects Models to Analyze Self-Controlled Case Series Data in Vaccine Safety Studies.

Conditional Poisson models have been used to analyze vaccine safety data from self-controlled case series (SCCS) design. In this paper, we derived the likelihood function of fixed effects models in analyzing SCCS data and showed that the likelihoods from fixed effects models and conditional Poisson models were proportional. Thus, the maximum likelihood estimates (MLEs) of time-varying variables including vaccination effect from fixed effects model and conditional Poisson model were equal. We performed a simulation study to compare empirical type I errors, means and standard errors of vaccination effect coefficient, and empirical powers among conditional Poisson models, fixed effects models, and generalized estimating equations (GEE), which has been commonly used for analyzing longitudinal data. Simulation study showed that both fixed effect models and conditional Poisson models generated the same estimates and standard errors for time-varying variables while GEE approach produced different results for some data sets. We also analyzed SCCS data from a vaccine safety study examining the association between measles mumps-rubella (MMR) vaccination and idiopathic thrombocytopenic purpura (ITP). In analyzing MMR-ITP data, likelihood-based statistical tests were employed to test the impact of time-invariant variable on vaccination effect. In addition a complex semi-parametric model was fitted by simply treating unique event days as indicator variables in the fixed effects model. We conclude that theoretically fixed effects models provide identical MLEs as conditional Poisson models. Because fixed effect models are likelihood based, they have potentials to address methodological issues in vaccine safety studies such as how to identify optimal risk window and how to analyze SCCS data with misclassification of adverse events.


Introduction
The association between particular adverse events following immunization (AEFI) and receipt of a specific vaccine has been studied using large electronically-linked health care utilization databases [1][2][3][4][5][6][7][8]. As an example, the Vaccine Safety Datalink (VSD) project uses Copyright: © 2012 Xu S, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. electronic data, including vaccination and diagnosis data, on 8.8 million managed care enrollees to study not only common AEFIs (eg, fever, soreness), but also rare AEFIs (eg, death, seizures, idiopathic thrombocytopenic purpura (ITP)).
Since individuals in observational settings such as the VSD are not randomly chosen to be vaccinated, vaccinated and unvaccinated individuals may differ greatly and possibly in ways related to the outcome of interest. This confounding bias, if not accounted for properly, may invalidate analytic results of cohort studies. In addition, traditional study designs such as matched cohort and case-control designs may not even be feasible for studying vaccine safety because 1) the coverage of some vaccines is nearly 100%, so there are not enough unvaccinated individuals to use for the control group; and 2) data are not collected in safety surveillance system for those who did not experience/report an adverse event (eg, the Vaccine Adverse Event Reporting System). For these reasons, a method known as the selfcontrolled case series (SCCS) has been developed and widely used for vaccine safety studies [7][8][9][10][11]. The SCCS is a case-only method in which a subject's follow-up period is partitioned into risk and control intervals. SCCS data are typically analyzed using conditional Poisson models by conditioning on the marginal total number of adverse events that occurred in an individual. The resulting likelihood kernel does not contain the individual-specific random coefficients that explain each individual's baseline risk for the event count of interest (Farrington, 1995). By making within-person comparisons of incidence rates between vaccine exposed and unexposed time intervals, conditional Poisson models implicitly adjust for all time-invariant individual-level risk factors and potential confounders (measured and not measured).
Several statistical software packages (i.e. STATA, SAS, GENSTAT) can be used for the Poisson process to analyze SCCS data [10,12,13]. Although not explicitly mentioned, fixed effects models were used to analyze SCCS data in a tutorial paper by Whitaker et al. [10]. SCCS data have a longitudinal data structure in which multiple observations are those intervals defined by vaccination exposure status and time-varying covariates on an individual. The number of observations depends on the number of risk levels and levels of time-varying covariates. Fixed effects models have been used for analyzing data with multiple observations on an individual in cohort studies, randomized or observational [14]. They do not assume distributions for the individual-specific random coefficients. The random coefficients are instead estimated from the data as fixed effects in the fixed effects models.
In this paper, we demonstrate that the likelihoods from the fixed effects model and the conditional Poisson model are proportional when analyzing SCCS data. We show that by simulation, fixed effects models and conditional Poisson models typically used for SCCS data analysis are equivalent in estimating vaccination effects. We also compared these two approaches to a generalized estimating equation approach (GEE) [15,16], which is widely used for longitudinal data analysis. We furthermore demonstrate that fixed effects models have numerous advantages over the conditional Poisson models when analyzing a SCCS example.

Conditional Poisson models for self-controlled case series design
If a Poisson process is assumed for the unrestricted population in a cohort design, the likelihood function, conditioning on each individual's total number of adverse events is a multinomial kernel, containing only the parameters βs for time-varying covariates including the exposure effect β 1 . Since the individual effects are canceled out in this analysis, it is not to be expected that any assumption of their distribution would influence the inference of βs.
The multinomial term is zero for all individuals with no adverse event. Thus, individuals without adverse events do not contribute in parameter estimation.
Suppose we have a sample of n individuals with adverse events. Let R = exp(β 1 ) denote the incidence rate ratio for the vaccination effect where β 1 is the model coefficient for the vaccination effect. Farrington (1995) proposed a conditional Poisson model that uses only cases, vaccinated and unvaccinated. The conditional Poisson likelihood kernel is the product of the likelihood kernel across subjects which is of the following form for the i th subject (1) Here, X ij is the row vector of time-varying covariates including indicator variables for age effects and vaccination effect, β is a column vector of corresponding coefficients including β 1 , t ij is the person-time (in days) for subject i in interval j, and y ij is the corresponding number of adverse events for subject i in interval j which is binary when the adverse event is rare. The conditional Poisson regression model (1) allows for more than one risk levels in the risk window [10].

Fixed effects models for SCCS data: parametric and semi-parametric models
SCCS data have longitudinal data structures where analytic units are intervals of each subject. Each interval represents a unique combination of time-varying covariates. Although each individual has at least one adverse event, some intervals of an individual do not have an adverse event. In this paper it is assumed that the observations in an SCCS data set follow a Poisson distribution in which the individual effects may be represented by a random coefficient. As a consequence, the likelihood function from the fixed effects model is the full likelihood function for SCCS data. Let ξ i represent the individual-specific random coefficient for individual i who experienced at least one adverse event during the follow-up period T i , and ξ i does not assume any defined distribution such as a normal distribution.
We consider fitting the fixed effects models for SCCS data, (2) where u ij =t ij m i λ ij , m i =exp(α+ξ i ) and λ ij =exp(X ij β), α represents the intercept coefficient for the overall case sample, ξ i is the unknown individual-specific random effect, X ij and β have the same meanings as defined previously for model (1), and exp(α) is the baseline incidence rate for the adverse events for each unit of unvaccinated period of time (eg, each day) in the case sample. Similar to the likelihood function for cohort data [14], the likelihood function of the fixed effects model for individual i when using case-only data is, where J i is the number of observations on individual i. Substituting u ij = t ij m i λ ij , Taking the natural logarithm of L FE(i) , differentiating with respect to m i and setting to zero we have, We show that the likelihood function from a fixed effects model as in (3) is proportional to that from a conditional Poisson model as in (1). As a result, the maximum likelihood estimates of βs are same for the two models.
Age is an important confounder in vaccine safety studies because of its possible association with both the probability of vaccination and the occurrence of adverse events. A model with a defined age effect form such as the age groups used in the MMR-ITP example is called a parametric model while a model without defining age effect form is called a semi-parametric model [10,12]. Fixed effects models as in (3) can be easily modified to fit the semiparametric model in which the form of age effects is not specified. Each event age is treated as a dummy variable, so essentially the length of an age group is only one day, and the number of age effect levels is the number of unique days of adverse events.

Simulation study and results
We performed simulations to evaluate the performance of a fixed effects model in its ability to analyze SCCS data. We also compared estimates from a fixed effects model and conditional Poisson model to those obtained using generalized estimating equations (GEE) [15,16], which has been widely used to analyze longitudinal data including normally distributed data, binary and Poisson data. GEE is not a likelihood-based approach and is used to estimate population average parameters and correlation among observations on the same individual is accounted for.
Simulation-Dependent Poisson data were simulated according to model (2) with ξ i independent and identically distributed N (0, σ 2 ) or a uniform distribution (0, 1). σ was chosen to be 1 and 1.5. Each of 100 individuals was assumed to have a follow-up period of 365 days, consisting of both risk and control periods. Typically, there are three periods for each subject: the control period before vaccination, the risk period after vaccination, and the control period after the risk period. Vaccination times were assumed to follow a normal distribution with a mean of 140 days and a standard deviation of 42 days. The values of the overall intercept coefficient, α, were set to be −5, −6, −7, and −8 to achieve baseline incidence rates ranging from 0.00034 to 0.00674 per day in the simulated data. Timevarying covariates included in the model were age effects and an indicator variable for risk and control periods. The chosen coefficients for age effects were 0.3 for days 91-270 and 0.1 for days 271-365 with days 1-90 as the reference group. The coefficient for vaccination effect, β 1 , was chosen to be 0.69, 1.39, and 1.79, which represents incidence rate ratios of 2, 4 and 6, respectively. 1,000 datasets were simulated and analyzed for each combination of parameters.
Evaluation measures-We analyzed each simulated dataset with three methods, a conditional Poisson model, a fixed effects model and GEE. Means of vaccination effects and their standard errors for each of the three methods in each simulated data setting were calculated. We also calculated the means of absolute difference between conditional Poisson models and GEE, and reported the maximum absolute difference. We conducted this last step because it is possible that the means of the vaccination effect estimates are same between the conditional Poisson models and GEE, but vaccination effect estimates may differ in each simulated data set. Empirical powers were calculated as percent of data sets with significant vaccination effect (p-value<0.05) under the alternative hypothesis. Type I error rates were also reported for data simulated under the null hypothesis, β 1 =0.
Simulation results- Table 1 shows that conditional Poisson models, fixed effects models, and GEE have acceptable type I error rates (i.e. about 5%) for the parameters examined when the individual-specific random errors are simulated from a normal distribution. The GEE approach produces slightly different estimates and their standard errors for vaccination effect than conditional Poisson models and fixed effects models ( Table 2). Empirical powers are also similar among the three approaches under the alternative hypothesis β 1 ≠ 0. Although there is little difference on average in parameter estimation between GEE and conditional Poisson models, the means of absolute difference between conditional Poisson models and GEE reveal that they can produce very different results for some simulated data sets. For example, the maximum absolute difference in estimated coefficient for the vaccine effects is 0.339 in a simulated data set while the mean of the absolute difference is only 0.0127 (Table 3). Similar results are observed for β 1 =1.39 and 1.79 (data not shown). When the individual-specific random errors are simulated from a uniform distribution, similar results are observed as well (data not shown).

An example
To demonstrate how a fixed effects model approach can be used for SCCS data, we reanalyzed the data from a SCCS study examining the association between MMR vaccination and ITP among a US pediatric population [7]. They excluded the 42-day healthy period immediately preceding MMR vaccination and used a pre-specified 42-day risk window after MMR vaccination, which has subsequently been confirmed by a data-driven approach to be an optimal choice of risk window length [17]. They found an incidence rate ratio of 7.06 for those vaccinated at an age between 366 and 690 days old. In this paper, for the purpose of demonstrating the use of fixed effects models in analyzing SCCS data, we defined six age groups as in Xu et al. [17]: 366-426 days, 427-487 days, 488-548 days, 549-609 days, 610-670 days, and 671-730 days (the reference group). Days outside of a 42-day postvaccination risk window were considered to be the control window. We also included all subjects with follow-up of 366-730 days regardless of vaccination status and without excluding the healthy period before MMR vaccination.
The first step in applying a fixed effects model to SCCS data involves manipulating the data to a more usable format. Specifically, the analytic data can be expanded as described in Xu et al. [13]. Briefly, the follow-up period for each individual in the SCCS sample was expanded into daily observations with an exposure status indicator (risk versus control), indicators for age effects (defined age effects group for parametric model or undefined age effects form for semi-parametric model), an indicator variable for adverse events, and any other time-invariant and time-varying covariates. Data in this expanded form facilitate simple calculations of the person time (in days) and number of adverse events by exposure status, age groups, and any other time-varying covariates for each individual. The result is a data set with multiple observations for each individual, which can be analyzed using a fixed effects model with most statistical software packages (eg, PROC GENMOD in SAS). Table 4 shows that conditional Poisson models and fixed effects models give the same estimates and standard errors for both the vaccination effect and the age effects when using the parametric method for age effects. Estimates and standard errors obtained using a GEE approach differ only slightly. For the semi-parametric method, the conditional Poisson model and fixed effects model yield the same estimates for the vaccination effect (1.97) and standard error (0.37). Estimates using GEE are not available when using the semi-parametric model because the individual-specific coefficients cannot be modeled as both fixed and random effects in the same model.
To show one of many potential utilities of using a fixed effects model for SCCS data, we also compared the risk of ITP after MMR vaccination between male and female patients. If conditional Poisson models are employed, gender subgroup analyses must be carried out to make this comparison, and a direct statistical comparison (test) is not readily available. However, if a fixed effects model is used, we can statistically compare the risk of ITP between genders by fitting a model with two interactions 1) between exposure and gender and 2) between age groups and gender ( Table 5). The test statistic and p-value for this comparison is readily available in results of standard statistical software (eg, using estimate and contrast statements in PROC GENMOD in SAS). Additionally, the fixed effects model can accommodate the same age effects for both male and female patients by replacing the interaction between age groups and gender with age groups only (Table 5). This model produced smaller gender difference in risks of ITP after MMR vaccination than the model with different age coefficients for male and female patients although the conclusion is same (p-value = 0.76 versus 0.31). In this way, the fixed effects modeling approach offers considerable additional flexibility beyond what a conditional Poisson approach can provide.

Discussion
Fixed effects models were first implicitly recommended for analyzing SCCS data by Whitaker et al. [10]. They included a factor for each individual to ensure the fitted individual totals equal the observed values but did not study the relation between the conditional Poisson models and fixed effects models. In this paper, we demonstrate that theoretically and by simulation the parameters estimates and their standard errors are equivalent between the conditional Poisson models and fixed effects models. On average, GEE approach produces very similar estimates and their standard errors for vaccination effects although for some simulated data sets it can produce very different results than the conditional Poisson models. We do not recommend using GEE approach to analyze SCCS data due to possible bias in estimation.
There are numerous of advantages for statistical analysts in vaccine safety research to use fixed effects models to analyze SCCS data. First, likelihood-based statistical tests may be employed. For example, a likelihood ratio test was used to investigate the effect of gender on the association between oral polio vaccine and intussusception [10]. Second, statistical analysts may find fitting fixed effects models for SCCS data more intuitive than fitting conditional Poisson models in standard statistical software packages because they are already familiar with procedures for fitting Poisson models (eg, PROC GENMOD in SAS). Third, fitting a complex semi-parametric model is achieved by simply treating event days as indicator variables in preparing summary data for person time and number of events and later including them in the fixed effects model.
Although we examined a wide variety of simulated data settings, we only studied normal and uniform distributions for the individual-specific random effects in simulations. Results should remain equivalent for conditional Poisson models and fixed effects models for different distributions since their likelihoods are proportional. Inconsistent estimation may be a potential issue when fitting fixed effects models if the number of random effects approaches infinity. However, in SCCS vaccine safety studies, the number of random effects to be estimated depends on the number of individuals with adverse events, which is usually small. Thus, inconsistency should not often be a problem when analyzing SCCS data. For relatively large number of cases, the fixed effects models may take relatively long time to fit. In this case Whitaker et al. [10] suggested the use of absorbing factors to fit the models efficiently without estimating individual-specific random effects explicitly.
In summary, for vaccine safety studies involving SCCS data, we recommend using a fixed effects modeling framework to estimate incidence rate ratios as this approach offers many advantages compared to the traditionally-used conditional Poisson model. More broadly, existing longitudinal data analysis tools may provide opportunities to better address research questions that arise in SCCS vaccination studies such as how to use likelihood-based statistical method to identify optimal risk windows for a given SCCS data set [17] how to analyze SCCS data with misclassification of adverse events when adverse events are partially chart reviewed due to resource limitation [18][19][20].  Mean of vaccination effect coefficient (standard error) and empirical power from 1000 simulations with true β