Multi-omics co-localization with genome-wide association studies reveals a context-specific genetic mechanism at a childhood onset asthma risk locus

Background Genome-wide association studies (GWASs) have identified thousands of variants associated with asthma and other complex diseases. However, the functional effects of most of these variants are unknown. Moreover, GWASs do not provide context-specific information on cell types or environmental factors that affect specific disease risks and outcomes. To address these limitations, we used an upper airway (sinonasal) epithelial cell culture model to assess transcriptional and epigenetic responses to an asthma-promoting pathogen, rhinovirus (RV), and provide context-specific functional annotations to variants discovered in GWASs of asthma. Methods Using genome-wide genetic, gene expression and DNA methylation data in vehicle- and RV-treated airway epithelial cells (AECs) from 104 individuals, we mapped cis expression and methylation quantitative trait loci (cis-eQTLs and cis-meQTLs, respectively) in each condition. A Bayesian test for co-localization between AEC molecular QTLs and adult onset and childhood onset GWAS variants was used to assign function to variants associated with asthma. Mendelian randomization was applied to demonstrate DNA methylation effects on gene expression at asthma colocalized loci. Results Co-localization analyses of airway epithelial cell molecular QTLs with asthma GWAS variants revealed potential molecular disease mechanisms of asthma, including QTLs at the TSLP locus that were common to both exposure conditions and to both childhood and adult onset asthma, as well as QTLs at the 17q12-21 asthma locus that were specific to RV exposure and childhood onset asthma, consistent with clinical and epidemiological studies of these loci. Conclusion This study provides information on functional effects of asthma risk variants in airway epithelial cells and insight into a disease-relevant viral exposure that modulates genetic effects on transcriptional and epigenetic responses in cells and on risk for asthma in GWASs.

Following RV and vehicle treatments, DNA was extracted from cells as described above. DNA 209 methylation profiles for cells from each treatment were measured on the Illumina Infinium 210 MethylationEPIC BeadChip at the University of Chicago Functional Genomics Core. 211 Methylation data were preprocessed using the minfi package [27]. Probes located on sex 212 chromosomes and with detection p-values greater than 0.01 in more than 10% of samples were 213 removed from the analysis; samples with more than 5% missing probes were also removed. A 214 preprocessing control normalization function was applied to correct for raw probe values or 215 background and a Subset-quantile Within Array Normalization (SWAN) [28] was used to correct 216 for technical differences between the Infinium type I and type II probes. Additionally, we 217 removed cross-reactive probes and probes within two nucleotides of a SNP with an MAF greater 218 than 0.05 using the function rmSNPandCH() from the R package DMRcate [29]. 219 PCA identified technical and biological sources of variation in the normalized DNA 220 methylation datasets. We identified contributors to batch and technical effects including array, 221 and cell harvest date. Sex, age, and smoking were significant variables in the PCA. Unknown 222 sources of variation were predicted with the SVA package where we estimated 37 SVs. SWAN 223 and quantile-normalized M-values were then adjusted for batch and technical effects, SVs, sex, 224 age, and smoking using the function removeBatchEffect() in R. Treatment effects were detected 225 in the combined sample with 1,710 differentially methylated CpGs at a FDR<0.10 (Fig. S4). This was accomplished in two-steps. First, a permutation analysis was performed within a cis-240 window sizes of 1 Mb and 10 kb for eQTL and meQTL analyses, respectively, to derive nominal 241 p-value thresholds per molecular phenotype. Second, a forward-backward stepwise regression is 242 applied to ultimately assign significant variants to independent signals. 243 244

Multivariate adaptive shrinkage analysis (mash) 245
An Empirical Bayes method of multivariate adaptive shrinkage was applied separately to the 246 eQTL and meQTL data sets as implemented in the R statistical package, mashr 247 (https://github.com/stephenslab/mashr) [32], to produce improved estimates of QTL effects and 248 corresponding significance values in each treatment condition. Mashr implements this in two 249 general steps: 1) identification of pattern sharing, sparsity, and correlation among QTL effects, 250 and 2) integration of these learned patterns to produce improved effects estimates and measures 251 of significance for eQTLs or meQTLs in each treatment condition. To fit the mash model, we 252 first estimated the correlation structure in the null test from a random dataset in which 235,851 253 and 3,959,482 phenotype-SNP pairs were chosen for eQTLs and meQTLs, respectively, from the 254 FastQTL nominal pass; because mashr is computationally intensive, the number of randomly 255 chosen gene/CpG-SNP pairs were determined based on R's memory capabilities. The data-256 driven covariances were then estimated using the 'top' mQTL in each gene or CpG results from 257 FastQTL. Posterior summaries were then computed for the 'top' eQTL and meQTL results (see 258 [32]). The instructions found in the mashr eQTL analysis outline vignette were followed to run 259 mash. 260 261

Enrichment analysis 262
The R package, GWAS analysis of regulatory or functional information enrichment with LD 263 correction (GARFIELD) [33], was used to quantify enrichment and assess significance of 264 GWAS SNPs among eQTLs and meQTLs. GARFIELD leverages GWAS results with molecular 265 data to identify features relevant to a phenotype of interest, while accounting for LD and 266 matching for genotyped variants, by applying a logistic regression method to derive statistical 267 significance for enrichment. For this study, molecular QTLs were tested for GWAS variant 268 enrichment, estimated as odds ratios and enrichment P-values derived at four GWAS P-value 269 thresholds: 10 -5 , 10 -6 , 10 -7 , and 10 -8 . To assess tissue-specificity of our results, we examined eQTLs from the adrenal gland, 279 frontal cortex, hypothalamus, ovary, and testis from the GTEx database version 7 280 (http://gtexportal.org) [6], and tested for enrichment of adult onset and childhood onset asthma 281 GWAS SNPs among the epithelial eQTLs from our study combined across treatment conditions. 282 GTEx data were matched with respect to sample size and number of eQTLs with those of the 283 epithelium, with the exception of testis, which was included to show the consistency of the 284 enrichment results despite it being an outlier in regards to both sample size, which was smaller, 285 and number of eQTLs, which was larger. An FDR threshold of 5% and 10% was applied to 286 eQTLs from GTEx and from our study, respectively, for a balanced, unbiased assessment of 287 enrichment. An OR > 1 and a Benjamini-Hochberg (BH) corrected p-value threshold of < 0.05 288 was used as the significance threshold for enrichment; BH adjusted p-values were calculated 289 using the p.adjust() function in R where 'n' was determined by the number of tests in each 290 respective enrichment analysis. 291 292

Co-localization analysis 293
To estimate the posterior probability association (PPA) that a SNP contributed to the association 294 signal in the GWAS as well as to the eQTL and/or meQTL, we applied a Bayesian statistical 295 framework implemented in the R package multiple-trait-coloc (moloc) [13]. Summary data from 296 adult onset and childhood onset asthma GWASs from [1], along with eQTL and meQTL 297 summary data from cells within each treatment condition (described above), were included in the 298 moloc analysis. Each co-localization analysis included summary data from a GWAS and 299 epithelial cell eQTLs and meQTLs from corresponding treatment conditions. Because a genome-300 wide co-localization analysis was computationally untenable, genomic regions for co-301 localization were defined using GARFIELD. First, we analyzed the enrichment pattern of 302 e/meSNPs from each treatment condition in adult onset and childhood onset GWASs using the 303 default package settings. Second, we extracted variants driving the enrichment signals at a 304 GWAS p-value threshold of 1x10 -4 . Regions were defined as 2 Mb windows centered around 305 these variants. Only regions with at least 10 SNPs in common between all three datasets or 306 'traits' (GWAS, eQTL, and meQTL) were assessed by moloc and 15 'configurations' of possible 307 variant sharing was computed across these three traits (see [13] for more details). PPAs ³ 0.70 308 were considered as evidence for co-localization. Prior probabilities of 1x10 -4 , 1x10 -6 , and 1x10 -7 309 were chosen for the association of one, two, or three traits, respectively, as recommended by the 310 authors of moloc. 311 312

Genome-wide cis-eQTLs and cis-meQTLs mapping in cultured airway epithelial cells 321
To identify genetic variation influencing gene expression under different conditions, we 322 performed eQTL mapping in cultured AECs exposed to RV, and its corresponding vehicle 323 control from 104 individuals (43 with doctor diagnosed asthma; 61 without a doctor's diagnosis 324 of asthma; Fig. S1). Analyses were performed separately for each treatment condition, testing for 325 associations with 6,665,552 imputed SNPs (MAF>0.05) and 11,231 autosomal genes (see 326 Methods; Additional file 1 and 2). The numbers of SNPs associated with gene expression for at 327 least one gene (eQTLs) and genes with at least one eQTL (eGenes), in any treatment, are 328 summarized in Fig. S5. 329 In parallel, we performed meQTL mapping in the same cells used for gene expression 330 studies. We performed this analysis separately for each treatment condition, testing for 331 associations with the same imputed SNP set that was used for eQTL mapping and interrogated 332 791,765 autosomal CpGs (Additional file 3 and 4). A summary of the number of SNPs 333 associated with methylation levels at one or more CpG sites (meQTLs) and CpG sites with at 334 least one meQTL (meCpGs), in any treatment, are shown in Fig. S5. 335 Each gene/CpG-variant pair was tested for a linear regression slope that significantly 336 deviated from 0. Therefore, the estimated effects for the molecular QTLs reflects both the single-337 SNP effects of each molecular QTL as well as those that are in linkage disequilibrium (LD). 338 Accordingly, these analyses do not differentiate between causal molecular QTLs from those in 339 LD with the QTL. However, these variants are still informative in prioritizing genes and CpG 340 sites that contribute to the etiology of asthma. meQTL for each of 751,914 CpG sites were used to identify condition-specific DNA methylation 362 effects, as described above for eQTLs. A pair-wise analysis of meQTLs revealed that 89.9% of 363 meQTLs were shared between vehicle and RV treatments, representing 48,189 meCpGs, defined 364 here as CpGs with at least one meQTL at a lfsr<0.05 ( Fig. 1C; Additional file 6), revealing a 365 much greater proportion shared meQTLs than those observed for eQTLs. Examples of the 5,416 366 treatment-specific meQTLs are shown in Fig. 1D. 367 In total, we identified 660 and 458 eGenes (lfsr<0.05) that were specific to vehicle or RV 368 treatment, respectively, and 5,162 and 254 meCpGs that were specific to vehicle or RV culture 369 treatment, respectively, with greater confidence than by pairwise comparisons using FDR 370  (Table  383 1), consistent with the strong epithelial cell involvement in asthma in general and with childhood 384 onset asthma in particular. In contrast, there were no significant enrichments for SNPs from four 385 of the other GWASs among the epithelial cell molecular QTLs. These results highlight the 386 specific enrichment of asthma GWAS SNPs among airway epithelial molecular QTLs compared 387 to SNPs from GWASs of diseases without known epithelial cells involvement. 388 389

390
To further assess the specificity of airway epithelial molecular QTLs to asthma, we 391 compared GWAS SNP enrichments among the eQTLs in our study to those from tissues that are 392 not known to be involved in asthma. To this end, we tested for enrichment of asthma GWAS 393 SNPs among eQTLs (FDR<0.05) in five different tissues from the GTEx database (adrenal, 394 frontal cortex, hypothalamus, ovary, testis) [43], and compared them to enrichments among the 395 eQTLs from our study. We observed a significant enrichment (OR>1 and BH-adjusted P<0.05) 396 of childhood onset asthma GWAS SNPs among the epithelial cell eQTLs at all GWAS P-value 397 thresholds ≤1x10 -7 (Table 2), while enrichments for adult onset asthma GWAS SNPs among the 398 epithelial cell eQTLs were not observed at any GWAS threshold (Table S1). Except for the 399 hypothalamus, which showed some enrichment at P < 10 -5 ), no other enrichments of asthma 400 GWAS SNPs were observed among eQTLs in other tissues, further supporting the specificity of 401 Integrating molecular QTLs with GWAS data is a powerful way to identify functional variants 407 that may ultimately influence disease risk [44,45] and to assign function to known disease-408 associated variants. Co-localization approaches directly test whether the same genetic variant is 409 underlying associations between two or more traits (e.g., gene expression and asthma), providing 410 clues to causal disease pathways. We hypothesized that integrating molecular QTLs from RV-411 and vehicle-exposed epithelial cells with results of GWASs for adult onset and childhood onset 412 asthma would reveal genetic and epigenetic mechanisms that modulate risk for childhood and/or 413 adult onset asthma. 414 To test this hypothesis, we extracted summary statistics from large GWASs of adult onset 415 asthma and childhood onset asthma [1], and tested each for co-localization with genetic variants 416 associated with gene expression, DNA methylation, and asthma, using moloc, a Bayesian 417 statistical approach that allows integration and co-localization of more than two molecular traits 418 [13]. We performed four separate co-localization tests for each treatment conditions with each of 419 the GWASs. Each analysis provided three possible configurations in which a variant is co-420 localized between the GWAS and QTLs: eQTL-GWAS pairs, meQTL-GWAS pairs, eQTL-421 meQTL-GWAS triplets. Estimates of a posterior probability of association (PPA) is provided, 422 reflecting the evidence for a colocalized SNP being causal for the associations in the GWAS and 423 for the corresponding eQTL and/or meQTL. 424 Using this approach, we found evidence for a total of 19 unique multiple trait co-425 localizations (Table 3). A single meQTL-GWAS pair was co-localized in both the adult onset 426 and childhood onset asthma GWASs. An additional 18 co-localizations were detected only in the 427 childhood onset asthma GWAS, including a single eQTL-meQTL-GWAS triplet associated with 428 the ERBB2 gene, three eQTL-GWAS pairs associated with three genes (FLG, FLG-AS1, 429 ORMDL3), and 15 meQTL-GWAS pairs associated with 11 CpG sites (Table 3; Table S2). No The significance threshold (p<5x10 -8 ) required to control the false discovery rate in 441 GWASs likely excludes many true associations that do not reach this stringent cutoff. We and 442 others have suggested that these SNPs, i.e., the mid hanging fruit [49], may be environment-or 443 context-specific associations that are missed in GWASs that typically do not control for either 444 [50, 51]. Notably, 10 of the 19 SNPs associated with co-localizations in the childhood onset 445 (pGWAS = 2.33x10 -27 ) asthma GWASs. The meCpG is located in the first (untranslated) exon (5' 459 UTR) of the TSLP gene (Fig. 2), a region characterized as a promoter in normal human 460 epidermal keratinocyte cells (NHEK; ROADMAP). In fact, rs1837253 was the sentinel SNP at 461 this locus in GWASs of asthma (e.g. [1, 53]) and of moderate-to-severe asthma [54]. In our 462 study, the rs1837253-C asthma risk allele was associated with hypermethylation in primary 463 cultured AECs at cg15557878 (Fig. 2), but was not associated with the expression of TSLP in 464 either treatment condition (not shown). 465 Previous studies have shown TSLP to be a methylation-sensitive gene and that 466 hypomethylation at its promoter is associated with atopic dermatitis (AD) and prenatal tobacco 467 smoke exposure [55,56]. Another study showed that the rs1837253-CC genotype was associated 468 with increased excretion of TSLP in cultured AECs after exposure to polyI:C (a dsRNA 469 surrogate of viral stimulation) [57]. Neither finding could be addressed in our study. Moreover, 470 we were unable to identify any SNPs in LD with rs1837253 (± 50 kb) in either European or 471 African American (r 2 < 0.12) 1000Genomes reference panels, implying that this SNP may 472 indeed be the causal SNP at this locus. Our results further suggest that DNA methylation levels 473 in AECs may underlie this effect. 474 475

Multi-trait co-localizations of molecular QTLs and asthma risk at the 17q12-21 asthma 476 locus 477
To further explore the possibility that some mechanisms of asthma risk are exposure-specific, we 478 focused on the co-localizations of eQTLs and meQTLs with asthma-associated SNPs at the  We identified six co-localizations at the extended 17q locus of molecular QTLs that were 489 specific to childhood onset asthma GWAS SNPs. Among these co-localizations, one eQTL-490 GWAS pair with rs12603332 and expression of ORMDL3 was only in vehicle-treated cells 491 (PPA³0.70; Fig. 3A-B). The co-localized SNP (rs12603332) is in LD (r 2 >0.74 in 1000 Genomes 492

European reference panel) with other previously reported asthma-associated GWAS SNPs in this 493
region, including some that were reported as eQTLs for ORMDL3 and GSDMB, primarily in 494 blood immune cells. However, in contrast to studies in ex vivo upper AECs [61], none of the 495 SNPs were eQTLs for GSDMB in our in vitro culture model. That the co-localization with 496 rs12603332 and ORMDL3 expression was only significant in vehicle treated cells reflects the 497 blunting of the eQTL effects (Fig. 3B), and possibly the overall decreased expression of 498 ORMDL3 (Fig. 3C), in RV-treated cells. 499 We also detected three meQTL-GWAS pairs among the six co-localizations at the 17q 500 locus that were associated with two meCpGs (cg21230266, cg17401724) and three SNPs at the 501 distal end of (rs4239225, rs3859191) and beyond (rs66826786) the extended locus near GSDMA, 502 where there is some reduction of LD with SNPs in the core region ( Fig. 3D-F). One of these 503 CpGs was located in an intron (cg21230266) of GSDMA in regions characterized by ROADMAP 504 as enhancers in NHEK cells. SNPs in modest to perfect LD (r 2 range=0.46 -1.00; 1000 Genomes 505 European panel) with these co-localizations (rs4239225, rs3859191) were described in previous 506 studies as an independent GWAS signal for asthma (rs3894194) or an eQTL for GSDMA 507 (rs3859192) [6, 62, 63]. These three meQTL-GWAS co-localizations were detected only in the 508 RV-treated cells, although the meQTL signal for each of the three co-localizations was also 509 detected in the vehicle treatment, likely due to decreased power to co-localize these meQTLs 510 from the vehicle-treated cells. Additionally, there were no statistically significant differences in 511 DNA methylation levels observed between the vehicle and RV treatments (Fig. 3F). 512 The one eQTL-meQTL-GWAS triplet detected in our study at the 17q locus (Fig. 4A,  513 upper panel. The co-localization included an eQTL for ERBB2, at the proximal end of the locus 514 and more than 361 kb from the co-localized asthma risk variant in an intron of MED24 515 (rs66826786) and the co-localized meCpG (cg17401724) at the distal end of the locus (Fig. 3A,  516 middle panel), . MED24 is beyond the extended 17q12-21 locus as previously defined [2] in a 517 region characterized by ROADMAP as both an enhancer and TSS in NHEKs. The eQTL for 518 ERBB2 is observed only after exposure to RV (Fig. 3A middle and lower panels), though the 519 meQTL associated with this triplet was present in both vehicle and RV treatment conditions (Fig.  520 3B upper and lower panels, respectively). The asthma risk allele, rs66826786-T, was associated 521 with decreased DNA methylation of cg17401724 in both conditions but with decreased ERBB2 522 expression only in RV-treated cells. Overall, ERBB2 expression decreased in response to RV 523 exposure in AECs (Fig. 3D). The 361 kb distance between the promoter of ERBB2 and its eSNP 524 other GWASs, and among AEC eQTLs compared to eQTLs from other tissues. Finally, SNPs 569 that were molecular QTLs in our study co-localized with asthma GWAS SNPs, identifying 18 570 unique co-localizations that included both known asthma loci (e.g., 17q12-21 and TSLP) and loci 571 that did not meet stringent criteria for genome-wide significance in the GWASs (Table S2). 572 The results of enrichment analyses further highlighted the important role of airway 573 epithelium in asthma GWAS discoveries. The enrichment of childhood onset asthma GWAS 574 SNPs among epithelial eQTLs is particularly noteworthy, as it not only supports the tissue 575 specificity of our model but also identified genomic loci with molecular mechanisms that have 576 not been described prior to our study. These results are also consistent with previous studies 577 suggesting that functional variants from disease-relevant tissues are more enriched among 578 lung tissue) or cells (e.g., immune cells) might reveal additional novel molecular mechanisms 595 and differences between childhood onset and adult onset asthma. 596 Our study provides mechanistic evidence for associations between GWAS SNPs and 597 asthma at two important loci: the TSLP and 17q12-21 loci. Co-localizations of the asthma 598 associated SNP rs1837253 with DNA methylation levels in the TSLP gene suggest an epigenetic 599 mechanism of disease that contributes to both adult and childhood onset asthma, and is robust to 600 RV versus vehicle treatment. Associations of this SNP with asthma have been highly replicated 601 in GWASs, and TSLP is recognized as having an important role in asthma pathogenesis through 602 its broad effects on innate and adaptive immune cells promoting Th2 inflammation [67]. Our 603 data further show that the effect of rs1837253 genotype on risk for asthma may be mediated 604 through DNA methylation levels at CpG sites in the untranslated first exon of the TSLP gene in 605 AECs. Finally, the lack of LD with other SNPs in a 100 kb window suggests that rs1837253 may 606 be the causal asthma SNP at this important locus. 607 Since its discovery over a decade ago, the 17q12-21 locus has been an important focus of 608 However, our study further shows that genes at both the proximal and distal ends of this locus, 615 ERBB2 and GSDMA, respectively, may contribute to asthma risk in the presence of RV 616 infection. Mendelian randomization revealed a novel epigenetic mechanism through which a 617 SNP at the distal boundary of the locus was associated with expression of ERBB2 at the proximal 618 boundary of the locus, only after exposure to RV. The eQTL effect on ERBB2 expression in RV-619 treated cells was mediated through differential methylation of a CpG site at the distal locus, 620 which was present in both treatment conditions. Previous studies have shown that variation at the 621 17q core locus confers risk to asthma only among children with wheezing illness in early life 622 Many of the associations in GWASs that do not reach stringent criteria for genome-wide 642 significance (p<5x10 -8 ) may be true signals. Distinguishing true from false positive signals for 643 variants among the mid-hanging fruit (e.g., p-values between 10 -5 and > 10 -8 ) can be challenging. 644 In our study, over 57% of the co-localizations were with a GWAS SNP that did not meet 645 genome-wide significance (childhood onset asthma GWAS p-value range 6.1x10 -7 -1.4x10 -5 ; 646 Table S2). One possibility for this is because the variants have exposure-specific, tissue-specific, 647 or endotype-specific effects, which are heterogeneous among subjects included in GWASs. 648 Therefore, annotating SNPs among the mid-hanging fruit for functionality provides more 649 confidence to these findings, a more complete picture of the genetic architecture of asthma, and a 650 model for prioritizing these loci for further studies. 651 Our study has several limitations. First, the sample sizes for the eQTL and meQTL 652 studies were smaller than the most reliable sample size recommended by moloc (nmin=300) [13]. 653 In such cases, moloc can miss true co-localizations in QTL datasets. For example, an eQTL-654 GWAS pair with supporting evidence may, in reality, be an eQTL-meQTL-GWAS triplet. As a 655 result, the eQTL-GWAS and meQTL-GWAS pairs that we identified could be eQTL-meQTL-656 GWAS triplets that we were not powered to detect, or we may have missed other co-localizations 657 entirely. For example, although only a single meQTL co-localized with a GWAS SNP at the 658 TSLP locus, the same SNP, rs1837253, was an meQTL for three additional CpGs (Fig. S6), 659 representing additional potential contributors to asthma disease mechanisms. Nonetheless, the 19 660 unique co-localizations detected in our study are likely to be real, although future studies in 661 larger samples will increase confidence in our findings. Second, we focused our studies on one 662 cell type (upper airway sinonasal epithelium), two exposures (vehicle and RV), and one 663 epigenetic mark (DNA methylation). It is possible that other asthma-relevant co-localizations are 664 specific other tissues or cell types or to other exposures or culture conditions, and that additional 665 epigenetic marks, such as those associated with chromatin accessibility, would be additionally 666 informative. These extended studies will be necessary to validate the specificity and provide a 667 more complete catalog of asthma-relevant co-localizations. Finally, characterizing chromatin 668 conformational changes in AECs before and after exposure to RV will allow a direct assessment 669 of the chromatin looping at the extended 17q12-21 locus that may occur in response to viral 670 infection and potentially identify other context-specific interactions. 671 In summary, we identified cis-eQTLs and cis-meQTLs in an airway epithelial cell model 672 of host cell response to RV and integrated those data with asthma GWASs to assign potential 673 molecular mechanisms for variants associated with asthma in two large GWASs. By combining 674 enrichment studies, co-localization analysis, and Mendelian randomization, we provide robust 675 statistical evidence of epigenetic mechanisms in upper airway cells contributing to childhood 676 onset asthma. We demonstrate that a multi-omics approach using a disease-relevant cell type and 677 disease-relevant exposure allows prioritization of disease-associated variants and provides 678 insight into potential epigenetic mechanisms of asthma pathogenesis. Step-wise experimental design to identify treatment-specific e/meQTLs in NECs from 104 934 individuals: 1. NECs collected from study participants were cultured and treated with RV and a 935 clustering of ancestral PCs in which the within groups sum of squares (y-axis) is plotted against 944 the number of potential group clusters (x-axis); using the 'elbow criterion', it is determined that 945 two clusters are best representative of how many clusters study samples can be grouped into. (C) 946 PCA plot of study participants grouped into two cluster for genotype imputation, European (red), 947 and African American (Blue), according to the k-means clustering criterion.