An atlas on causal role of circulating vitamin D concentrations in human complex traits and diseases

1. Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, 677 Huntington avenue, Boston, MA 02115, USA 2. Unit of Cardiovascular disease, Institute of Environmental Medicine, Karolinska Institutet, 13 Nobels väg, Stockholm 17177, Sweden 3. Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA 4. Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA 5. Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA


Introduction
Vitamin D and its major circulating form, 25-hydroxyvitamin D [25(OH)D], is a fat soluble, steroid pre-hormone that plays an essential role in human health. Vitamin D deficiency has long been observed to be associated with rickets and osteomalacia. 1 In addition, a variety of common diseases such as cancer, autoimmune inflammatory diseases, cardiovascular conditions and diabetes, have been reported to be linked with vitamin D in observational studies. 2 participants with a median follow-up period of 5.3 years did not find lowered incidence of invasive cancer or cardiovascular events for those who took vitamin D3 at a dose of 2000 IU per day compared the placebo group. 9 In addition, there are ongoing RCTs aiming at other diseases and conditions, such as depression (DepFuD; ClinicalTrials.gov: NCT02521012) and cognitive change (ClinicalTrials.gov: NCT03613116). To better inform the field on the chance of success for such vitamin D trials, a better understanding in the causal role of vitamin D in these diseases and traits, preferably through less costly observational study, is necessary.
The genetic basis of circulating 25(OH)D has been demonstrated by several large-scale genomewide association studies (GWASs), which have identified associations of GC, NADSYN1/DHCR7, CYP2R1, CYP24A1, SEC23A, AMDHD1 with serum levels of vitamin D. 10,11 These genetic discoveries in 25(OH)D may help disentangle the causal relationships between vitamin D and other traits through Mendelian randomization (MR), an approach that uses genetic variants as instrumental variables (IVs) for assessing the causal effect of an exposure on an outcome. 12 Under certain assumptions -that the IVs are strongly associated with the exposure, but not associated with any confounders of the exposure-outcome relationship, nor affect the outcome via other pathways than through the exposure -an unbiased estimate of the causal effect can be estimated using the observed IV-exposure and IV-outcome associations. 13 To date, a large number of MR studies examining the effect of circulating vitamin D on various diseases have been conducted. However, the causal role of vitamin D beyond its importance for bone health remains unclear and is under much debate, as nearly all outcomes assessed by MR studies of vitamin D have been negative. The only replicated results from MR studies are higher vitamin D level decreases the risk of adult/pediatric multiple sclerosis [14][15][16] and Alzheimer's disease; 17,18 whereas findings from MR studies for blood pressure, ovarian cancer, and all-cause mortality are yet to be replicated. [19][20][21][22] Overall, results from MR studies for vitamin D are not consistent with conventional observational studies, which have associated vitamin D with a variety of human diseases and traits. It is also worth noting that among almost all hitherto available MR studies for vitamin D are based on genetic variants (SNPs rs2282679, rs12785878, rs6013897 and rs10741657) identified by the SUNLIGHT meta-GWAS published in 2010. 10 The recently updated vitamin D meta-GWAS identified two new genetic loci associated with circulating 25(OH)D level and improved SNP-vitamin D association estimates with larger sample size, which provided an exceptional opportunity to re-evaluate the casual effect of 25(OH)D with an improved instrument strength and accuracy of estimation. Moreover, the rapidly accumulating publicly available GWAS summary statistics for various diseases and traits allowed us to evaluate the effect of vitamin D on a wide range of outcomes using two-sample MR.
Here, we conducted a two-sample MR analysis (where IV-exposure and IV-outcome

Data for IV-exposure
We retrieved summary data for the associations between six SNPs and circulating 25(OH)D concentration from the SUNLIGHT meta-GWAS involving 79 366 discovery samples and 42 757 replication samples of European ancestry. Genome-wide analyses were performed within each cohort following a uniform plan. Specifically, an additive genetic model using linear regression on natural-log transformed 25(OH)D were fitted to each SNP, and a fixed-effects inverse variance weighted meta-analysis across the contributing cohorts was performed, with control for population structure within each cohort, and consistent quality control thresholds across cohorts (minor allele frequency (MAF)>0.05, imputation info score>0.8, Hardy-Weinberg equilibrium (HWE)>1´10 -6 ).
A minimum of 10 000 individuals was required to contribute to each reported SNP-phenotype association. 11 Among the six SNPs, four were previously identified as being robustly associated with vitamin D (rs3755967 at GC, rs12785878 at NADSYN1/DHCR7, rs10741657 at CYP2R1, rs17216707 at CYP24A1); and the other two were newly identified and replicated (rs10745742 at AMDHD1, rs8018720 at SEC23A). We excluded strand ambiguous SNPs (A/T and G/C SNPs) from the analysis to harmonize between exposure and outcome GWASs and used five SNPs as IVs. In addition to the leading index SNPs, we further performed P-value informed pruning of SNPs across the genome based on their linkage disequilibrium (LD) patterns (r 2 <0.0001 with clumping window size of 1Mb). We selected all LD-clumped SNPs associated with 25(OH)D concentration at P<1´10 -5 . This resulted in an additional 21 independent index SNPs which may increase the power of our analysis by incorporating more instruments. Excluding strand ambiguous SNPs from this extended SNP set yielded a total of 19 SNPs as IVs. Details of the IVs are presented in Supplementary Table 1.

Data for IV-outcome
We retrieved summary data for the associations between index SNPs and 45 complex traits and diseases from publicly available GWASs conducted mainly in European ancestry populations (3 GWASs were conducted in a mixed population including both European and Asian samples;

Statistical analysis
MR uses SNPs as IVs to estimate the effects of risk factors on outcomes while controlling for potential confounding, by leveraging the random allocation of SNP alleles at conception. Three assumptions need to be satisfied to ensure a valid IV. The first is the relevance assumption, that IVs should be strongly associated with the exposure; the second assumption requires no association between IVs and confounders of the exposure-outcome relationship; and the third is the exclusion restriction assumption, indicating that genetic variants should affect the outcome only through exposure. If all MR assumptions are met, a causal effect can be estimated based on the observed IV-exposure and IV-outcome associations.
We conducted two-sample MR to test for the potential causal relationship between circulating 25(OH)D and various traits. We applied a number of MR methods to estimate causal effects including an inverse variance weighted meta-analysis (IVW) approach, 23 MR-PRESSO, 24 MR-Egger, 25 and a weighted median approach. 26 While IVW does not account for potential horizontal pleiotropy, i.e., a genetic variant affects the outcome via a separate biological pathway from the exposure under investigation (violation of exclusion restriction assumption), MR-PRESSO, MR-Egger and weighted median approaches control potential horizontal pleiotropy under different model assumptions. In addition, we performed MR-Egger regression, modified Q test, modified Q' tests, and MR-PRESSO to detect estimation bias due to horizontal pleiotropy. MR-PRESSO implements a global test to evaluate the presence of horizontal pleiotropy, and an outlier test to detect specific outliers. 24 In MR-Egger, significant difference of an intercept from zero suggests the existence of average directional horizontal pleiotropy. 25 Modified Q and Q' tests are traditionally used to identify over-dispersion and have been applied in the context of MR to detect outliers caused by horizontal pleiotropy. 27 Lastly, we estimated the power of detecting the causal effects of 25(OH)D on various traits in our study. We followed the simulation framework outlined in Verbanck et al., 24 while assuming five IVs having the same effect sizes and standard errors for exposure, and allele frequencies as the five SNPs used in our real data analysis. We showed the power to detect significant causal effects at different sample sizes for the outcome GWAS (ranging from 10 000 to 150 000; matched with real outcome GWAS sample sizes), and different true causal effect sizes (ranging from 0 to 1) at a type-I error rate of 0.05. Table 1, Table 2 (Fig 2). The corresponding odds ratios and 95%CI of binary traits are presented in Supplementary Table 3.
We also performed a sensitivity analysis using an expanded set of IVs (independent SNPs associated with 25(OH)D levels at the P-value threshold of 1´10 -5 ), as this approach was used in previous MR studies when fewer genome-wide significant SNPs were available as IVs. 28,29 As shown in Supplementary Fig 1, Supplementary Table 4 and Supplementary Table 5, when incorporating additional IVs, only insomnia complaints remained nominally significant (P<0.05) using both the IVW and the adjusted MR-PRESSO approaches without evidence of horizontal pleiotropy. Although height and HbA1c showed some degrees of horizontal pleiotropy, both were nominally significant in the adjusted MR-PRESSO approach (PMR-PRESSO=0.02 and 0.01 for height and HbA1c, respectively).
For a majority of outcomes that we studied here (26 out of 45), their GWASs included more than 110 000 individuals. Under this sample size, our current study had >80% power to detect a causal effect (!) of 0.2 at the P-value threshold of 0.05. We also presented power estimation for a range of GWAS sample sizes of the outcome (Supplementary Table 6).

Discussion
In this study, we used two-sample MR methods with 25(OH)D-associated SNPs and 45 large-scale meta-GWASs covering a wide spectrum of human complex traits and diseases to determine the causal role of 25(OH)D.
In general, our results, which were based an increased number of IVs and larger GWAS sample While null findings for the outcomes examined here would the main conclusion of our study, we found nominally significant results for 25(OH)D to affect height (P=0.026), glycated hemoglobin (HbA1c) (P=0.001), age at menopause (P=0.03) and insomnia (P=0.02). Note that one previous MR study did not identify any association between 25(OH)D and glycemic traits. 42 These results should be interpreted with caution given the large number of statistical tests we performed, and the fact that none of them survived multiple testing correction. Further investigations are warranted to elucidate these suggestive findings.
To ensure the validity of MR results, several important assumptions need to be satisfied. First of all, the relevance assumption, that IVs should be strongly associated with the exposure, is naturally guaranteed by selecting 25(OH)D-associated independent SNPs with genome-wide significance as IVs. Secondly, none of the IVs used in our analysis were cited by the NHGRI-EBI Catalog of published GWASs as associated with potential confounders, such as BMI, smoking or alcohol consumption at P=1´10 -5 level. This could also be partly reflected by negligible associations between 25(OH)D and various traits from the current and past MR studies, many of which, act as potential confounders for other disease outcomes (e.g., obesity related traits). Finally, the exclusion restriction assumption requires IVs to affect the outcome only through the exposure (no horizontal pleiotropy). We employed a number of methods to control for horizontal pleiotropy. While MR-Egger regression with only a few SNPs can be underpowered to identify pleiotropy, we also used MR-PRESSO global test and modified Q and Q' tests test for horizontal pleiotropy. We also increased the number of IVs by incorporating independent SNPs associated with 25(OH)D at P=1´10 -5 level. No causal relationships appeared when using additional instruments after multiple testing correction. We expect that some of these ambiguous causal links will become evident in the future when larger GWASs become available. However, we also note that our MR analysis is well-powered to detect moderate causal effects of 25(OH)D (>80% power with a causal effect β>0.2), as shown in our power calculation.
Our MR analysis showed limited evidence on any effects of vitamin D on the traits and diseases examined here. This is inconsistent with many observational studies which may be affected by various confounding biases. Our results may inform costly RCTs that would be paid for by publicly funded agencies and help to prioritize the most promising target diseases for vitamin D intervention.
X.J. and C-Y.C. contributed to study conception, data analysis, interpretation of the results and drafting of the manuscript. T.G. contributed to interpretation of the results and critical revision of the manuscript.

Competing interests
The authors declare no competing interests. Supplementary Figure 1 Table 1. Causal effect estimates of 25(OH)D with complex traits and diseases using genome-wide significant SNPs of 25(OH)D. We highlighted all nominally significant results, including 4 vitamin D-outcome pairs from IVW meta-analysis depicted in scatter plots in Figure 2. Note that none of these nominally significant results remained significant after multiple comparison correction.