Cumulative Effect of Common Genetic Variants Predicts Incident Type 2 Diabetes: A Study of 21,183 Subjects from Three Large Prospective Cohorts

Recent genome-wide association studies (GWAS) and their meta-analyses have identified multiple genetic loci that are associated with type 2 diabetes (T2D). Except for variants in the TCF7L2 gene which had a modest effect on diabetic risk, most genetic variants identified so far have only a weak association with diabetes. It is possible that the combination of multiple variants may have a larger effect on disease risk and improve risk prediction. In this study, we focus on SNPs that had been robustly replicated in previous GWAS and were also genotyped in a large sample of 21,183 participants from three large prospective cohorts, including Atherosclerosis Risk in Communities (ARIC) Study, Framingham Offspring Study (FOS) and Multi-Ethnic Study of Atherosclerosis (MESA). Among these, we were able to successfully confirm the associations of 12 SNPs with baseline prevalent T2D in these two cohorts. A genotype risk score (GRS) using these12 risk variants was constructed to examine whether GRS predicts incident diabetes. In a combined meta-analysis, subjects in the highest tertile of GRS had a 1.62-fold increased risk of incident T2D (95% CI, 1.08–2.44, P=1.5×10−14) compared to those in the lowest tertile of GRS after adjustment for age, sex, race, smoking, body mass index (BMI), lipids (HDL and LDL) and systolic blood pressure. Moreover, GRS significantly improves risk prediction and reclassification in T2D beyond known risk factors.


Introduction
Type 2 diabetes (T2D) has reached epidemic proportions in almost all racial/ethnic groups. Currently, over 200 million individuals worldwide suffer from T2D and this number is projected to reach 438 million by 2030 [1]. Although lifestyle and environmental risk factors are believed to be significant contributors to the etiology of diabetes, genetic predisposition has been suggested to play a critical role [2,3]. Recent genome-wide association studies (GWAS) have identified multiple genetic variants for T2D [4][5][6][7]. Except for variants in the This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. TCF7L2 gene which had a modest effect on diabetic risk (odds ratio 1.40-1.56) [8,9], most genetic variants identified so far have only a weak association with diabetes (odds ratios ranging from 1.1 to 1.3) [10][11][12]. It is possible that the combination of multiple variants may have a larger effect and potentially improves risk prediction over established risk factors. This study aims to confirm the association of these SNPs with prevalent T2D and evaluate the prognostic value of a combination of these validated SNPs in predicting incident T2D in a large sample comprising 21,183 subjects from three large prospective cohorts, including The Atherosclerosis Risk in Communities (ARIC) Study, the Framingham Offspring Study (FOS) and the Multi-Ethnic Study of Atherosclerosis (MESA).

Study populations
A detailed description of the study design and methods for each of the three cohorts has been previously reported and is only briefly outlined here.
ARIC-a prospective cohort study to investigate the etiology of atherosclerosis and cardiovascular risk factors [13]. From 1987 to 1989, a total of 15,792 subjects were recruited from four US communities: Forsyth County, NC; Jackson, MS; Minneapolis, MN; and Washington County, MD. Participants were examined about every three years. A total of 12,771 subjects with complete genotype and phenotype data were included in the current analyses.

FOS-The
Framingham Heart Study (FHS), initiated in 1948, recruited 5,209 men and women residing in Framingham, MA. In 1971, the Framingham Offspring Study (FOS) was undertaken to expand the original FHS cohort by including 5,124 children and spouses of the FHS cohort. The design and selection criteria of the FHS and FOS cohorts have been detailed elsewhere [14]. A total of 2,760 participants with complete genotype and phenotype data were included in the present investigation.
MESA-a prospective cohort of multi-ethnic groups including 6,814 men and women free of overt cardiovascular disease (CVD) at enrollment. In July 2000, participants were recruited from six field centers: Baltimore, MD; Chicago, IL; Forsyth County, NC; Los Angeles County, CA; New York, NY; and St. Paul, MN. Participants have untaken four clinical examinations through 2007. A total of 5,652 subjects with complete genotype and phenotype data were included in the current analyses.
For all these three cohorts, medical history, physical examination, laboratory tests, and risk factor assessments were performed routinely at each visit [13][14][15]. Annual follow-up was conducted to collect information on vital status. All participants provided written informed consents, and the institutional review board of participating institutions or clinical sites approved the studies.

Definition of T2D
Diabetes was defined as a fasting blood glucose level of 126 mg/ dL (7.0mmol/L) or higher or receiving insulin or any hyperglycemic treatment. Incident cases of T2D were defined as participants who were free of overt T2D at enrollment but met the diagnostic criteria in at least one of the clinical exams during follow-up. Incident cases were also verified by review of medical records.

SNPs selection
We obtained authorization to access GWAS datasets of the three cohorts through dbGaP. Using Affymetrix Genome-Wide Human SNP Assay 6.0 (Affymetrix, Inc., Santa Clara, CA), 841,820 SNPs were genotyped in ARIC and 909,622 SNPs were genotyped in MESA. A total number of 500,568 SNPs were genotyped by Affymetrix 500K Arrays (Affymetrix, Inc., Santa Clara, CA) in FOS.
We did an extensive search for T2D-related SNPs from literature published before December 2010. SNPs that met the following criteria were included in the current analysis: (1) robustly replicated in previous GWAS; (2) genotype data available in the three cohorts (ARIC, FOS and MESA); and (3) positively associated with T2D in all three cohorts.

Statistical analysis
We first conducted logistic regression analysis to test the association of each individual SNP with prevalent T2D at baseline, adjusting for baseline age, sex, race, BMI, current smoking, high-and low-density lipoprotein (HDL and LDL), and systolic blood pressure (SBP). We then constructed two weighted genetic risk scores (GRS) by summing the number of risk alleles (0, 1 or 2) of each SNP that was independently associated with prevalent diabetes, with one weight being the corresponding effect sizes defined by this study, and another being the effect sizes reported in literature [16][17][18][19][20]. Since results obtained by the two weights are similar, we choose to present the results obtained using GRS weighted by the effect sizes reported in literature. Analysis was first done in each cohort separately and results were then combined using a random-effects meta-analysis with inverse variance weight. Heterogeneity across the study cohorts was assessed using Q statistics [21]. Multiple testing was adjusted using Bonferroni correction. We did not test the association of these SNPs in FOS because of the limited number of prevalent diabetics (0.5%) in this cohort.
To determine the prognostic value of the weighted GRS (categorized in tertiles) in predicting incident T2D, we performed multivariate Cox proportional hazards regression, adjusting for covariates listed above. The assumption of proportional hazards was tested using scaled Schoenfeld residuals [22]. For each cohort, we estimated the hazard ratio (HR) and corresponding 95% confidence interval (CI) by comparing the highest to the lowest tertile of GRS, and tested the null hypothesis of no linear trend over the tertiles using Wald test. We conducted random-effects meta-analyses to combine results of the three cohorts. Modified inverse normal method [23] was used to combine p-values from the meta-analyses.
To further evaluate the potential value of GRS in risk prediction, we also examined its association with prevalent diabetes at the end of follow-up (i.e., baseline diabetics plus incident diabetes by the end of follow-up) using logistic regression, controlling for covariates described above. Participants were classified into four categories (<5%, 5-10%, 10-20% and ≥20%) based on predicted probabilities with or without GRS. First, we compared the difference in area under the ROC curve (AUC) using the non-parametric approach based on generalized U-statistics [24]. Second, we calculated the integrated discrimination improvement (IDI) and the net reclassification improvement (NRI), as suggested by Pencina et al. [25]. All continuous variables are log-transformed to increase normality. An association was considered to be significant if adjusted p-value<0.05. Analyses were done using Plink [26], SAS version 9.2 (SAS Institute Inc., Cary, NC, USA), R statistical package (version 2.11.1) or Matlab 7.10.0.499 (The Math Works, Inc., Natick, MA, USA).

Baseline characteristics
A total of 21,183 subjects, including 12,771 from ARIC, 2,760 from FOS, and 5,652 from MESA, were included in the current analyses. Baseline characteristics of the study participants are shown in Table 1. On average, participants from FOS are younger, less obese and more likely to be current smokers (all p's <0.0001). Prevalence T2D at baseline was 11.6% and 12.7% in ARIC and MESA, respectively. Baseline prevalence of T2D in FOS was very low (0.5%), possibly due to younger age of the study participants at enrollment (mean age 33.7 years old). Incident diabetes rates were similar among participants from ARIC and FOS but higher than those from MESA (9.8% in ARIC and FOS vs. 7.1% in MESA). The median of total follow-up time was 16.1 years, 32.3 years and 6.5 years for ARIC, FOS, and MESA, respectively. The mean GRS scores were 13.3±4.2, 11.7±2.8 and 14.2±3.9 for ARIC, FOS and MESA, respectively. Diabetic patients on average had a higher GRS than nondiabetics (15.3 vs. 13.3, P<0.0001 in the combined sample).

Single SNP association with prevalent diabetes at baseline
We confirmed the associations of 12 SNPs with prevalent diabetes in ARIC and MESA (Table 2), after correction for multiple testing using Bonferroni adjustment. Out of the 12 SNPs, 9 were genotyped in FOS. The remaining 3 SNPs were imputed by the computer program Impute2 (version 2.1.2) using the 1,000 Genomes Project (2010 interim, Dec., 2010) and the HapMap Phase 3 as reference populations [27].
As anticipated, the genotype risk score constructed using these 12 SNPs was significantly associated with an increased risk for diabetes at baseline. After adjustment for risk factors listed above, the odds ratio (OR, highest vs. lowest tertile) for prevalent T2D was 1.78 (95% CI, 1.45-2.19; P<0.0001) in ARIC, and 2.63 (95% CI, 1.97-3.51; P<0.0001) in MESA. We did not test the association of GRS with prevalent T2D in FOS due to limited number of diabetic patients at baseline. Meta-analyses of the two cohorts also revealed a significant association of GRS with prevalent diabetes at enrollment (OR=2.15, 95% CI, 1.47-3.14; P=1.5 × 10 −10 ) ( Table 3).

Association of GRS with incident diabetes
The median follow-up time to the first T2D event was 8.8 years, 26.3 years and 4.7 years for ARIC, FOS, and MESA, respectively. Table 3 shows the association of GRS with incident T2D. Compared with those in the bottom tertile of GRS, subjects in the top tertile had 1.31 times (95% CI 1.08-1.57; P= 0.0008) and 2.64 times (95% CI 1.84-3.79; P<0.0001) increased risk of T2D in ARIC and FOS, respectively. But GRS did not predict incident diabetes in MESA (P=0.26). Meta-analyses combining samples from all three studies identified a 1.62 times increased risk (95% CI, 1.08-2.44; P= 0.02) in comparing subjects in the highest tertile of GRS to those in the lowest tertile. We detected a significant heterogeneity across the three studies (Q=12.04, P=0.003), but this should not be a concern for our analyses because random effects meta-analysis used in this study takes into account both within-and between-study variability [28].
Additionally, we examined whether the observed association between GRS and incident diabetes was driven by rs7901695 in the gene TCF7L2, the strongest diabetic gene reported so far. After removing rs7901695, the hazard ratio (HR) for incident diabetes was only slightly attenuated in each cohort, with HR=1.28 (95% CI, 1.06-1.54; P=0.025) in ARIC, HR=2.24 (95% CI, 1.58-3.16; P<0.0001) in FOS and HR=1.18 (95% CI, 0.85-1.66; P=0.32) in MESA. Thus, the observed association of GRS with incident diabetes is unlikely dominated by this SNP. Figure 1 shows the distribution of the number of risk alleles carried among the study participants by diabetes status at the end of study follow-up. The distribution is approximately normal, with diabetics carrying more risk alleles than nondiabetics. Figure 2 shows ORs associated with carrying increased number of risk alleles compared with reference group (0-6 risk alleles). Individuals carrying 18 or more risk alleles were more than twice as likely to have T2D as those carrying 6 or less risk alleles (OR=2.42, 95% CI: 1.82-3.42; P<0.0001). Table 3 shows the significant association of GRS with prevalent T2D at end of follow-up in each cohort as well as the combined sample. The adjusted-OR (highest vs. lowest tertile of GRS) was 1.64 (95% CI: 1.39-1.92; P<0.0001) for ARIC, 2.65 (95% CI: 1.78-3.96; P<0.0001) for FOS, and 2.18 (95% CI: 1.71-2.78; P<0.0001) for MESA. Meta-analysis of the three cohorts showed a two-fold increased risk for diabetes (OR=2.08, 95% CI: 1.59-2.74; P<0.0001) after adjusting for covariates. The AUC estimates with and without GRS, adjusting for known risk factors, was 0.74 and 0.75, respectively (P<0.0001, Figure 3), indicating that including GRS slightly, yet significantly, improves risk prediction for T2D beyond established risk factors.
Results for risk reclassification of prevalent T2D were shown in Tables 4. A total of 276 participants (29%) with T2D at 10-20% risk estimated using only known coronary risk factors were reclassified into ≥20% category when GRS was added to the model, and 932 participants (20%) without T2D were reclassified from ≥20% into the 10-20% category when GRS was included. Both IDI (IDI=0.016; P<0.0001) and the NRI (NRI=0.076; P<0.0001) were highly significant, suggesting that including GRS significantly improves the discriminatory ability for risk prediction of T2D over known risk factors.

Discussion
In a sample of 21,183 subjects from three large prospective cohorts, we replicated the individual association of 12 GWAS-identified SNPs with prevalent T2D and demonstrated that cumulative effect of these 12 SNPs significantly predicts incident T2D independent of multiple risk factors. Moreover, the genotype risk score of these 12 SNPs significantly improves risk prediction and reclassification of T2D over known cardiovascular risk factors. To our best knowledge, this is the first investigation of its kind in a large sample of longitudinal cohorts from the U.S communities.
Except for TCF7L2, most SNPs identified so far displayed weak or modest effects on diabetes [4]. In line with previous findings, our study also demonstrated that a SNP in TCF7L2 (rs7901695) showed strong association with T2D. Specially, subjects carrying the "C" risk allele of rs7901695 had a 1.42 times increased risk (95% CI, 1.32-1.51; P=1.27 × 10 −24 ) for T2D compared to those carrying the "T" allele. In addition, our study elucidated a cumulative effect of multiple diabetes-susceptible loci in relation to diabetic risk. Previous studies showed that including genetic information of risk variants provided only limited value in prediction of T2D beyond classical risk factors [29,30].
Our study, however, demonstrated that knowledge of common genetic variants significantly improves risk prediction and reclassification of T2D beyond clinical risk factors. This finding is corroborated by a recent study indicating that inclusion of common genetic variations appropriately reclassifies younger patients with T2D [31]. Our study has a few limitations. First, we only included a subset of diabetes-associated SNPs available in our study sample. Nonetheless, previous study showed that including more associated SNPs might not contribute much to risk prediction [32]. Second, the three cohorts used in our analyses were undertaken in U.S. communities and hence our results might not be generalized to ethnic groups different from those used in this study. Third, despite the large number of subjects involved, our analyses may still be underpowered to detect variants of small effect and rare variants. Fourth, using GRS to evaluate joint effects of multiple risk variants assumes independence among the studied SNPs. However, genetic interactions (e.g., gene-gene) are known to play important roles in the etiology of complex disorders including diabetes [33,34]. Finally, despite adjustment for race in all statistical analyses, we cannot fully exclude the possibility of population stratification or confounding by ancestry in our samples.
In summary, we confirmed the associations of 12 GWAS-identified diabetic SNPs with prevalent T2D in three well-phenotyped prospective cohorts, and further demonstrated that the combined effects of these diabetic risk variants significantly predict incident diabetes independent of known cardiovascular risk factors. Moreover, a GRS comprising these 12 SNPs also significantly improves risk prediction and reclassification for T2D over classical risk factors. However, because of the unknown pathophysiology of these risk variants, the clinical implications of our findings remain to be determined. Given the high heritability of diabetes, the potential roles of more common variants and other sources of variation, such as rare variants, copy number variation and epigenetics, should also be investigated in future research.  Odds ratios for type 2 diabetes according to the number of risk alleles carried.    Results from MESA and ARIC were combined using random-effects meta-analysis Table 3 Association between genotype risk score, T2D prevalence and incident T2D.  Reclassification of individuals on predicted risk of T2D with and without GRS (pooled).

Predicted risk a
Predicted risk based on known risk factors + GRS