A catalog of genetic loci associated with kidney function from analyses of a million individuals

Chronic kidney disease (CKD) is responsible for a public health burden with multi-systemic complications. Through trans-ancestry meta-analysis of genome-wide association studies of estimated glomerular filtration rate (eGFR) and independent replication (n = 1,046,070), we identified 264 associated loci (166 new). Of these, 147 were likely to be relevant for kidney function on the basis of associations with the alternative kidney function marker blood urea nitrogen (n = 416,178). Pathway and enrichment analyses, including mouse models with renal phenotypes, support the kidney as the main target organ. A genetic risk score for lower eGFR was associated with clinically diagnosed CKD in 452,264 independent individuals. Colocalization analyses of associations with eGFR among 783,978 European-ancestry individuals and gene expression across 46 human tissues, including tubulo-interstitial and glomerular kidney compartments, identified 17 genes differentially expressed in kidney. Fine-mapping highlighted missense driver variants in 11 genes and kidney-specific regulatory variants. These results provide a comprehensive priority list of molecular targets for translational research. Trans-ancestry meta-analysis of estimated glomerular filtration rate (eGFR) from 1,046,070 individuals identifies 264 associated loci, providing a resource of molecular targets for translational research of chronic kidney disease.

(MVP, n=280,722), 23 for a combined sample size of >1 million participants. The first aim 23 of this study was to identify novel, globally important loci for kidney function through 24 maximizing statistical power (Supplementary Figure 1). Results from GWAS of the 25 complementary kidney function marker blood urea nitrogen (BUN,n=416,178) were used 26 to prioritize eGFR-associated loci most likely to be relevant for kidney function. A genetic 27 risk score (GRS) for low eGFR was used to test relevance for clinically diagnosed CKD 28 among 452,264 independent individuals. The second aim was to characterize replicated 29 eGFR-associated loci through complementary computational approaches, including 30 various enrichment and network analyses, fine-mapping and co-localization with gene 31 expression in 46 tissues and protein levels (Supplementary Figure 1). We focused this 1 aim on European ancestry (EA) individuals, as fine-mapping based on summary statistics 2 requires linkage disequilibrium (LD) reference panels whose sample size should scale 3 with that of the GWAS. 24 The resulting list of prioritized variants and genes provides a rich 4 resource of potential therapeutic targets to improve CKD treatment and prevention. 5 6 Results 7 Discovery trans-ethnic meta-analysis 8 We performed 121 GWAS encompassing 765,348 individuals of European (n=567,460), 9 East Asian (n=165,726), African American (n=13,842), South Asian (n=13,359), and 10 Hispanic (n=4,961) ancestries (median age: 54 years; 50% females; Supplementary 11 Table 1). The median of the study-specific mean eGFR was 89 ml/min/1.73m² 12 We discovered 308 loci containing at least one eGFR-associated SNP at genome-20 wide significance (Methods), of which 200 were novel and 108 contained an index SNP 21 reported by previous eGFR GWAS (Figure 1; Supplementary Table 3 SNPs showed both decreasing and increasing effects on eGFR, with larger effects 24 observed for lower frequency SNPs (Figure 1, inset). The 308 index SNPs explained 25 7.1% of the eGFR variance, nearly doubling recent GWAS-based estimates, 9 and 19.6% 26 of eGFR genetic heritability (h 2 =39%, 95% credible interval: 32%, 47%), estimated in a 27 participating general-population-based pedigree-study ( Supplementary Figure 3; 28 Methods). Index SNPs' effects were largely homogeneous across studies (Figure 2A; 29 We assessed genome-wide genetic correlations (rg) of eGFR associations with each of 23 748 complex traits and diseases (Methods). 28 We observed 37 significant correlations 24 (P<6.7×10 -5 =0.05/748, Supplementary Figure 6; Supplementary Table 7). After serum 25 creatinine, the largest negative correlations were observed between eGFR and serum 26 citrate (rg=-0.27) and urate (rg=-0.23), followed by anthropometric traits including lean 27 mass and physical fitness (e.g., rg=-0.20 with left hand grip strength). While the inverse 28 correlation with muscle mass-related traits likely reflects higher creatinine generation 29 leading to lower creatinine-based eGFR, the correlations with citrate and urate levels 30 likely reflect reduced filtration function, as does the positive correlation with GFR 1 estimated from cystatin C (rg=0.53). 2 A very similar pattern of genetic correlations was observed for BUN 3 (Supplementary Table 7), but the genetic correlations with muscle mass-related traits 4 were generally lower than those for eGFR. The largest genetic correlations of BUN was 5 observed with CKD (rg=0.47), as compared to creatinine-based (rg=-0.29) and cystatin-6 based eGFR (rg=-0.26). 7 In summary, significant genetic correlations with eGFR reflect the two biological 8 components that govern serum creatinine concentrations: its excretion via the kidney and 9 its generation in muscle. The fact that genetic correlations between BUN and muscle-10 mass related traits are generally lower than those observed for eGFR underscores the 11 value of genetic associations with BUN to help prioritize eGFR-associated loci most likely 12 to be relevant for kidney function. 13 14

Functional enrichment and pathway analyses 15
To identify molecular mechanisms and tissues of importance for kidney function, we 16 assessed the enrichment of the eGFR and BUN genetic associations using tissue-specific 17 gene expression, regulatory annotations, and gene sets and pathways (Methods). First, 18 we used eGFR-associated SNPs (P<5×10 -8 ) to explore enriched pathways, tissues and 19 cell types based on gene expression data using DEPICT. 29 We identified 16 significantly 20 enriched physiological systems, cell types and tissues highlighting several aspects of 21 kidney function, physiology and disease. The strongest enrichment was observed for 22 urogenital and renal physiological systems and tissues (kidney, kidney cortex, and urinary 23 tract; false discovery rate (FDR) <0.05; Supplementary Figure 7A  supporting the use of BUN to prioritize loci most likely to be related to kidney function. 2 Second, we used stratified LD Score regression 30 on the genome-wide eGFR and 3 BUN summary statistics to identify cell-type groups with enriched heritability based on 4 data from diverse, cell-type specific functional genomic elements. The strongest 5 enrichment for eGFR was observed for kidney (13.2-fold), followed by liver (7.3-fold) and 6 adrenal/pancreas (5.7-fold enrichment; Supplementary Table 8). Kidney was also the 7 most enriched cell-type group for BUN (11.5-fold enrichment; Supplementary Table 8). 8 Lastly, using a complementary approach, we assessed enrichment of eGFR-9 associated variants in genes resulting in kidney phenotypes in genetically manipulated 10 mice. 31 From the Mouse Genome Informatics database, we selected all genes causing 11 abnormal GFR (n=24), abnormal kidney physiology (n=453), or abnormal kidney 12 morphology (n=764), and interrogated their human orthologs in the eGFR summary 13 statistics (Methods). We identified significant associations in 10 genes causing abnormal 14 GFR in mice (enrichment p-value=8.9×10 -4 ), 55 causing abnormal kidney physiology 15 (enrichment p-value=1.1×10 -4 ) and 96 causing abnormal kidney morphology (enrichment 16 p-value=1.8×10 -5 ; Figure 3; Methods). Of these, 25 genes represent novel eGFR 17 candidate genes in humans, i.e. they were not previously reported to contain genome-18 wide significant eGFR-associated SNPs or map near known loci (Supplementary Table  19 9). The existing mouse models may pave the way for experimental confirmation of these 20 findings. 21 22

(Supplementary
Credible set SNPs were annotated with respect to their functional consequence 16 and regulatory potential. Missense SNPs with >50% posterior probability (PP) of driving 17 the association and/or mapping into a small credible set are of particular interest because 18 they directly implicate the affected gene. Such missense SNPs were identified in 11 19 genes (SLC47A1, RPL3L, SLC25A45,CACNA1S,EDEM3,CPS1,KLHDC7A,PPM1J,20 CERS2, C9, and SLC22A2; Supplementary Table 12), of which CACNA1S, RPL3L, 21 CERS2, and C9 were likely relevant for kidney function ( Figure 4A). The majority of the 22 11 variants had CADD score>15, indicating potential deleteriousness. 33 Several identified 23 genes are plausible biological candidates for driving the association signal (Table 1). For 24 example, the missense p.(Ala465Val) SNP in SLC47A1 (PP>99%) alters the encoded 25 multidrug and toxin extrusion protein (MATE1), a transport protein responsible for the 26 secretion of cationic drugs, toxins and internal metabolites including creatinine across 27 brush border membranes including kidney proximal tubules. The fact that MATE1 28 knockout mice have higher blood levels of both creatinine and BUN 34 argues against a 29 sole effect on creatinine transport. 30 To evaluate the regulatory potential of small credible set SNPs in kidney, we 1 annotated them to open chromatin regions identified from primary human tubular and 2 glomerular cell cultures, 35 as well as from publicly available kidney cells types (Methods). 3 We identified 72 SNPs mapping into one of these annotations, which may thus represent 4 causal regulatory variants (Supplementary Table 12). A particularly interesting finding 5 was the intronic rs77924615 in PDILT, which showed PP>99% of driving the association 6 at the UMOD locus, and mapped into open chromatin in all evaluated resources (native 7 kidney cells, ENCODE and Roadmap kidney cell types; Figure 4B). 8 9 Gene prioritization: co-localization with gene expression 10 We performed co-localization analyses for each eGFR-associated locus with gene 11 expression in cis across 46 tissues including kidney glomerular and tubulo-interstitial 12 compartments (Methods). A PP>80% of co-localization in at least one kidney tissue was 13 observed for 17 transcripts mapping into 16 of the 228 replicated loci (Figure 5), pointing 14 towards a shared underlying SNP associated with both eGFR and gene expression, and 15 implicating the gene encoding the co-localized transcript as the locus' effector gene(s). 16 Novel insights emerged on several levels: first, UMOD is a well-established causal 17 gene for CKD and can therefore be used to evaluate our workflow. In the tubulo-interstitial 18 compartment, we observed a shared underlying variant associated with higher UMOD 19 gene expression and lower eGFR (Figure 5), consistent with previous GWAS of urinary 20 uromodulin concentration, in which alleles associated with lower eGFR at UMOD 15 were 21 associated with higher urinary uromodulin concentrations. 36 The lead SNP at this locus 22 was rs77924615, highlighted above as the candidate causal regulatory variant mapping 23 into the intron of PDILT (upstream of UMOD). The association with differential UMOD but 24 not PDILT gene expression supports UMOD as the causal gene and rs77924615 as a 25 regulatory SNP. 26 Second, novel, biologically plausible candidates emerged. For example, our 27 results suggest KNG1 and FGF5 as effector genes in the respective eGFR-associated 28 loci ( Figure 5, Supplementary Table 13). KNG1 encodes for high-molecular weight 29 kininogen, which is cleaved to bradykinin. Bradykinin influences blood pressure, 30 natriuresis and diuresis, and can be linked to kidney function via the renin-angiotensin-1 aldosterone system. 37 FGF5 encodes for Fibroblast Growth Factor 5, and the index SNPs 2 for eGFR or highly correlated SNPs (r 2 >0.9) have been identified in multiple GWAS of 3 blood pressure, atrial fibrillation, coronary artery disease, hematocrit and multiple kidney-4 function related traits (Supplementary Table 13). The eGFR index SNP rs1458038 5 (PP>50%, CADD score=14.8; Supplementary Table 13), co-localized with the eGFR 6 signal only in tubulo-interstitial kidney portions (Figure 5), supporting its regulatory 7 potential on the expression levels of FGF5 in this compartment. Both KNG1 and FGF5 8 index SNPs were associated with BUN and CKD, and are thus likely related to kidney 9 function. 10 Third, co-localization of eGFR with gene expression across multiple tissues 11 revealed that for kidney-co-localized transcripts, some showed the same direction on 12 transcript levels across all tissues with lower eGFR (e.g. METTL10), while others showed 13 higher transcript levels in some tissues but lower levels in others (e.g. SH3YL1; Figure  14

Co-localization with uromodulin protein levels in urine 22
The UMOD locus is of particular clinical interest for CKD research: 21 rare UMOD 23 mutations cause autosomal-dominant tubulo-interstitial kidney disease 40 , and common 24 variants at UMOD give rise to the strongest eGFR and CKD GWAS signals. 15 We 25 therefore performed conditional analyses based on the EA-specific summary statistics 26 and found two independent variants: rs77924615, mapping into upstream PDILT, and 27 rs34882080 mapping into an intron of UMOD ( Figure 6A). SNP association with the 28 urinary uromodulin-to-creatinine ratio (UUCR) in one participating cohort ( Figure 6B) 29 matched the eGFR association pattern. Co-localization of the conditional eGFR and 30 UUCR associations was evaluated separately for the rs34882080 ( Figure 6C) and 1 rs77924615 ( Figure 6D). Both regions showed high probability of a shared underlying 2 variant driving the respective associations with eGFR and UUCR levels (PP=0.97 and 3 0.96, respectively), further supporting rs77924615 as a causal regulatory variant and 4 UMOD as its effector gene. This trans-ethnic study is 5-fold larger than previous eGFR GWAS meta-analyses and 12 identified 264 replicated loci, 166 of which are reported here for the first time. By also 13 analyzing BUN, an established complementary kidney function marker, we highlight 14 eGFR-associated loci that are likely to be important for kidney function as opposed to 15 creatinine metabolism, and provide a comprehensive annotation resource. Clinical 16 relevance is supported by associations of a GRS for low eGFR with higher odds of 17 clinically diagnosed CKD, CKD-related phenotypes, and hypertension. Enrichment 18 analyses confirm the kidney as the main target organ. Co-localization of associations with 19 eGFR and gene expression in the kidney implicates specific target genes for follow-up. 20 Conditional analyses, fine-mapping and functional annotation at 228 replicated eGFR-21 associated loci among EA participants implicate single potentially causal variants at 20 22

loci. 23
Most previous eGFR GWAS meta-analyses were limited to a single ancestry 24 group 8 and did not prioritize causal variants or effector genes in associated loci. While 25 underpowered to uncover novel loci, one previous trans-ethnic study employed fine-26 mapping, resolving one signal to a single variant, 20 rs77924615 at UMOD-PDILT, also 27 identified in our study. At this locus, we further characterized the relationship between the 28 causal variant, UMOD expression in the target tissue, and uromodulin protein levels. This 29 increase in resolution -from locus to single potentially causal variant with its effector gene, 30 protein and target tissue, represents a critical advance over 10 years of eGFR GWAS, 15 1 and is a prerequisite for translational research. 2 The complementary multi-tissue approaches including enrichment analyses based 3 on gene expression, regulatory annotations, and gene-sets and pathways highlight the 4 kidney as the most important target organ. However, relatively few kidney-specific 5 experimental datasets are publicly available. For example, the kidney is not well 6 represented in the GTEx Project and not included in its tissue-specific eQTL datasets, 38 7 which emphasizes the value of open access resources and in depth characterization of 8 uncommon tissues and cell types. We were able to specifically investigate the kidney by 9 using a recently published eQTL dataset from glomerular and tubulo-interstitial portions 10 of micro-dissected human kidney biopsies, 41 kidney-specific regulatory information from 11 the ENCODE and Roadmap resources, and by obtaining regulatory information from replication, as well as advanced and comprehensive downstream bioinformatics 28 analyses. Further strengths are the use of BUN to prioritize eGFR-associated loci likely 29 relevant for kidney function, and to provide genome-wide BUN summary statistics as a 30 annotation resource for other studies of eGFR. Moreover, we evaluated a GRS for eGFR 1 for association with clinically diagnosed CKD in a large independent study. Among the 2 limitations, non-European populations are still underrepresented in our study, like many 3 other genomic efforts. 45 Statistical fine-mapping using trans-ethnic data with different LD 4 structures can potentially narrow down association signals. However, a sufficiently large 5 reference dataset to compute ancestry-matched LD structure for summary-statistics 6 based fine-mapping was only available for EA, highlighting the potential of future large-7 scale efforts with trans-ethnic fine-mapping and the need to generate data from non-EA 8 populations enabling such endeavors. Lastly, several SNPs had small effective sample 9 sizes in some subpopulations, which might have affected the ability to assess between-10 ancestry heterogeneity and potentially underestimated true heterogeneity. 11 We estimated GFR from serum creatinine, as done in clinical practice and 12 observational studies, because direct measurement of kidney function is invasive, time-13 consuming, and burdensome. Under the assumption that genetic associations supported 14 by multiple markers are less likely to reflect marker metabolism, we used BUN to prioritize 15 eGFR-loci likely to be relevant to kidney function. Blood creatinine, urea and cystatin C 16 concentrations are influenced not only by glomerular filtration, but also by their synthesis, 17 active secretion, or reabsorption, as illustrated by loci detected in our study: for example, 18 the GATM locus was associated with eGFR but not with BUN, consistent with the function 19 of the encoded protein as a rate-limiting enzyme in creatine synthesis. 46 Conversely, the 20 SLC14A2 locus was associated with BUN but not with eGFR, consistent with the function 21 of the encoded protein as a urea transporter. 47 Even so, lack of a SNP's association with 22 one kidney function marker based on a combination of p-value and effect direction may 23 not necessarily mean that the locus is not relevant to kidney function. Our categorization 24 of the eGFR loci into three classes based on effect direction and significance of the BUN 25 associations should be interpreted with caution, with "likely" and "unlikely" reflecting 26 uncertainty of the assignment. Factors complicating the comparison of eGFR and BUN 27 associations at the locus level are differential statistical power, differential ancestry 28 distribution, and potential allelic heterogeneity. Further large-scale studies with multiple 29 kidney function markers measured in the same individuals are therefore warranted. 30 To identify broadly representative and generalizable association signals, we 1 focused on SNPs that were present in the majority of the participating studies. This choice 2 might have limited our ability to uncover novel or to fine-map low-frequency or population-3 specific variants, which represents a complementary avenue of research. Moreover, even 4 with well-powered fine-mapping approaches, potentially causal SNPs need to be 5 confirmed as functional variants in experimental studies. Although co-localization with 6 gene expression can help prioritize effector genes, these associations are based on 7 measures from a single time point and hence cannot answer whether changes in gene 8 expression precede or follow changes in kidney function. 9 In summary, we identified and characterized a large number of loci associated with 10 eGFR and prioritized potential effector genes, driver variants and target tissues. These 11 findings will help direct functional studies and advance the understanding of kidney 12 function biology, a prerequisite to develop novel therapies to reduce the burden of CKD.                                     Lett 386, 156-60 (1996).

Online Methods 1
Overview 2 We set up a collaborative meta-analysis based on a distributive data model and QC 3 procedures. To maximize phenotype standardization across studies, an analysis plan and 4 a command line script (https://github.com/genepi-freiburg/ckdgen-pheno) were created 5 centrally and provided to all participating studies (mostly population-based, 6 Supplementary In studies reporting blood urea measurements, BUN was derived as blood urea×2.8, with 25 units expressed as mg/dl.

1
Genotypes imputed based on the Haplotype Reference Consortium (HRC) v1.1 or the 2 1000 Genomes Project phase 3 v5 (1000Gp3v5) ALL or phase 1 v3 (1000Gp1v3) ALL 3 panels. Imputed variants were coded as allelic dosages accompanied by the 4 corresponding IQ scores (IMPUTE2 info score, MACH/minimac RSQ, or as applicable), 5 and annotated on the NCBI b37 (hg19) reference build (see Supplementary Table 2 for 6 study-specific genotyping arrays, haplotype phasing and genotype imputation methods). 7 8 Genome-wide association studies (GWAS) 9 Each study fitted sex-and age-adjusted linear regression models to log(eGFR) and BUN. 10 Regression residuals were regressed on SNP dosage levels, assuming an additive 11 genetic model. Study site, genetic principal components (PCs), relatedness, or other 12 study-specific features, were accounted for in the study-specific models as appropriate 13 Table 2). Logistic regression models were fitted for CKD. 14 15

Trans-ethnic GWAS meta-analysis 16
Studies contributed 121 GWAS summary statistics files for eGFR (total post-QC 17 n=765,348), 60 GWAS files for CKD (total post-QC n=625,219, including 64,164 CKD 18 cases), and 65 GWAS files for BUN (total post-QC n=416,178). Ancestry-specific details 19 for eGFR, CKD and BUN are given in Supplementary Table 1. 20 Before meta-analysis, study-specific GWAS files were filtered to retain only 21 variants with IQ score>0.6 and minor allele count (MAC)>10, and genomic control (GC) 22 correction applied in case of GC factor λGC>1. Fixed effects inverse-variance weighted 23 meta-analysis was performed using METAL, 53 which was adapted to increase the 24 precision of effect estimates and their standard errors (SE; seven decimal places instead 25 of four). 26 After meta-analysis of 43,994,957 SNPs, only SNPs present in ≥50% of the GWAS 27 files and with total MAC≥400 were retained. Across ancestries, this yielded 8,221,591 28 variants for eGFR (8,834,748 in EA),8,176,554 for BUN (8,358,347 in EA), and 9,585,923 29 for CKD. Post-meta-analysis GC correction was not applied (LD Score regression 1 intercept≈1 in all analyses of eGFR, BUN, and CKD). 54 The genome-wide significance 2 level was set at 5×10 -8 . Between-study heterogeneity was assessed using the I 2 3 statistic. 55 For CKD, variants with I 2 ≥95% were removed to moderate influence of single 4 large studies. Variants were assigned to loci by selecting the SNP with the lowest p-value 5 genome-wide as the index SNP, defining the corresponding locus as the 1 Mb-segment 6 centered on the index SNP, and repeating the procedure until no further genome-wide 7 significant SNPs remained. The extended MHC region was considered as a single locus. 8 A locus was considered novel if not containing any variant identified by previous GWAS 9 of eGFR. 10 11

Meta-regression analysis of trans-ethnic GWAS 12
For eGFR, we evaluated ancestry-related heterogeneity using the software Meta-13 Regression of Multi-Ethnic Genetic Association (MR-MEGA v0.1.2) 56 using study-specific 14 GWAS results. Meta-regression models included three axes of genetic variation. GC 15 correction was applied to the meta-regression results. The 308 genome-wide significant 16 index SNPs from the trans-ethnic GWAS meta-analysis were tested for ancestry-related 17 heterogeneity of the allelic effects at a significance level of 0.05/308=1.6×10 -4 (referring 18 to the corresponding p-value as p-anc-het). 19 20

Variance explained and genetic heritability 21
The proportion of phenotypic variance explained by the index SNPs was estimated as 22 2 ( 2 (1− ) ), with β being the SNP effect, p the effect allele frequency, and var the 23 variance of the sex-and age-adjusted log(eGFR) residuals (assumed as 0.016 based on 24 data from 11,827 EA participants of the population-based ARIC study). 9 Genetic 25 heritability of age-and sex-adjusted log(eGFR) was estimated using the R package 26 'MCMCglmm' 57 on the Cooperative Health Research In South Tyrol (CHRIS) study, 58 a 27 participating pedigree-based study with 186 up-to-5 generation pedigrees (n=4373). 59 We 28 fitted two models, with and without the inclusion of the identified index SNPs (304/308), 1 running 1,000,000 MCMC iterations (burn in=500,000). 59 2 3

Comparison with and replication of results in the MVP 4
The eGFR-associated SNPs identified in the discovery GWAS meta-analyses, were 5 tested for replication in a GWAS from the MVP, 23 an independent trans-ethnic study with 6 participants recruited across 63 U.S. Veteran's Administration (VA) medical facilities. 7 Written informed consent was obtained and all documents and protocols were approved 8 by the VA Central Institutional Review Board. After genotyping and QC, genotypes were 9 phased and imputed on the 1000Gp3v5 reference panel. Serum creatinine was assessed 10 up to one year prior to MVP enrollment using isotope dilution mass spectrometry. GFR 11 was estimated using the CKD-EPI equation 50 after excluding subjects on dialysis, 12 transplant patients, amputees, individuals on HIV medications, and those with creatinine 13 values of <0.4 mg/dl. GWAS of eGFR on SNP dosage levels were performed by fitting 14 linear regression models adjusted for age at creatinine measurement, age 2 , sex, body 15 mass index, and the first 10 genetic PCs, using SNPTEST v2.5.4-beta. 60 All GWAS were 16 stratified by self-reported ethnicity (79.6% White non-Hispanic; 20.4% Black non-17 Hispanic), diabetes, and hypertension status. Results were combined across strata using 18 fixed effects inverse-variance weighted meta-analysis in METAL. 53 This analysis 19 encompassed a total of 280,722 individuals across all strata, of whom 216,518 were non-20 Hispanic Whites (EA). The MVP is described more extensively in the Supplementary 21

Material. 22
Of the 308 eGFR index SNPs identified in the CKDGen trans-ethnic analysis, 305 23 variants or their good proxies were available in the MVP GWAS (proxies had to have 24 r 2 ≥0.8 with the index SNP and were selected by maximum r 2 followed by minimum 25 distance in case of ties). Replication testing of the 256 EA-specific index SNPs was 26 restricted to the MVP EA GWAS. CKDGen and MVP meta-analysis results were pooled 27 via sample size weighted meta-analysis of z-scores using METAL. 53 In both the trans-28 ethnic and EA-specific analyses, replication was defined as a one-sided p-value<0.05 in 29 the MVP and genome-wide significance of the CKDGen and MVP meta-analysis result. 30

Assessment of kidney function relevance using BUN 1
We used genetic associations with BUN to assess replicated eGFR-associated SNPs 2 with respect to their potential kidney function relevance. Support for kidney function 3 relevance was categorized as "likely" (1) for all eGFR index SNPs with an inverse, 4 significant (one-sided P<0.05) association with BUN for a given reference allele, 5 "inconclusive" (2) for eGFR index SNPs whose effect on BUN was not different from 0 6 (P≥0.05), and "unlikely" (3) for all eGFR index SNPs with a concordant, significant (one-7 sided P<0.05) association with BUN for a given reference allele. 8 9

Genetic risk score (GRS) analysis in the UK Biobank dataset 10
To test the combined effect of eGFR-associated SNPs on clinically diagnosed CKD 11 related outcomes, a GRS-based association analysis was conducted based on summary 12 GWAS results, as described before. 61,62 The genetic association results with the diseases 13 were obtained for 452,264 UK Biobank participants available in the GeneAtlas 63 database 14 for glomerular diseases (ICD-10 codes N00-N08; 2289 cases); acute renal failure (N17; 15 4913 cases); chronic renal failure (N18; 4905 cases); urolithiasis (N20-N23; 7053 cases); 16 hypertensive diseases (I10-I15; 84,910 cases); and ischemic heart diseases (I20-I25; 17 33,387 cases). Asthma (J45; 28,628 cases) was included as a negative control. The 18 log(estimated OR) provided by the GeneAtlas phewas interface was used as effect size, 19 and its SE was calculated from the corresponding effect size and p-value. When OR=1, 20 the SE was imputed by the median value of the remaining associations of the trait. Of the 21 147 eGFR index SNPs from the trans-ethnic GWAS meta-analysis that were replicated 22 and showed likely kidney function relevance, 144 were available in the UK Biobank 23 dataset, and 259 out of all 264 replicated trans-ethnic GWAS meta-analysis SNPs. The 24 effect beta of the GRS association corresponds to the OR of the disease depending on 25 the relative change in eGFR, e.g. OR=1.10 beta for a 10% change in eGFR. Alternatively, 26 exp(beta) can be interpreted as the OR of the disease per unit change of log(eGFR). 27

Genetic correlations with other complex traits and diseases 1
Genome-wide genetic correlation analysis was performed to investigate evidence of co-2 regulation or shared genetic bases between eGFR and BUN and other complex traits and 3 diseases, both known and not known to correlate with eGFR and BUN. We estimated 4 pairwise genetic correlation coefficients (rg) between the results of our trans-ethnic meta-5 analyses of eGFR and BUN and each of 748 pre-computed and publicly available GWAS 6 summary statistics of complex traits and diseases available through LD Hub v1.9.0 using 7 LD Score regression. 28 An overview of the sources of these summary statistics and their 8 corresponding sample sizes is available at http://ldsc.broadinstitute.org. Statistical 9 significance was assessed at the Bonferroni corrected level of 0.05/748=6.7×10 -5 . 10 11

Pathway and tissue enrichment analysis 12
We used DEPICT v1 release 194 to perform Data-Driven Expression Prioritized 13 Integration for Complex Traits analysis, 29 including pathway/gene-set enrichment and 14 tissue/cell type analyses as described previously. 9,10 All 14,461 gene sets were 15 reconstituted by identifying genes that were transcriptionally co-regulated with other 16 genes in a panel of 77,840 gene expression microarrays, 64 from mouse knock-out studies, 17 and molecular pathways from protein-protein interaction screening. In the tissues and cell 18 type enrichment analysis, we tested whether genes in associated regions were highly 19 expressed in 209 MeSH annotation categories for 37,427 microarrays (Affymetrix U133 20

Plus 2.0 Array platform). For both eGFR and BUN, we included all variants associated 21
with the trait at P<5×10 -8 in the trans-ethnic meta-analysis. Independent variant clumping 22 was performed using Plink 1.9 65 with 500 kb flanking regions and r²>0.01 in the 23 1000Gp1v3 dataset. After excluding the MHC region, DEPICT was run with 500 24 repetitions to estimate the FDR and 5000 permutations to compute p-values adjusted for 25 gene length by using 500 null GWAS. All significant gene sets were merged into meta 26 gene sets by running an affinity propagation algorithm 66 implemented in the Python 'scikit-27 learn' package (http://scikit-learn.org/). The resulting network was visualized using 28 Cytoscape (http://cytoscape.org/). 29

Enrichment of heritability by cell type group 30
We used stratified LD Score regression to investigate important tissues and cell types 1 based on the trans-ethnic eGFR and BUN meta-analysis results. Heritability enrichment 2 in 10 cell type groups was assessed using the default options of stratified LD Score 3 regression described previously. 30 The 10 cell type groups were collapsed from 220 cell-4 type specific regulatory annotations for the four histone marks H3K4me1, H3K4me3, 5 H3K9ac, and H3K27ac. The enrichment of a cell type category was defined as the 6 proportion of SNP heritability in that group divided by the proportion of SNPs in the same 7 cell type group. 8 9

Analysis of genes causing kidney phenotypes in mice 10
A nested candidate gene analysis was performed using GenToS 67 to identify additional 11 genetic associations that were not genome-wide significant. Candidate genes that when 12 manipulated cause kidney phenotypes in mice were selected using the comprehensive population. The GWAS meta-analysis summary statistics for eGFR were queried for 22 significantly associated SNPs mapping into the selected candidate genes. Enrichment of 23 significant genetic associations in genes within each candidate list was computed from 24 the complementary cumulative binomial distribution. 67 GenToS was used with default 25 parameters on each of the three candidate gene lists, using the 1000 Genomes phase 3 26 release 2 ALL dataset as reference. 27

Independent variant identification in the EA meta-analysis 1
To identify additional, independent eGFR-associated variants within the EA-specific and 2 replicated loci, approximate conditional analyses were performed based on genome-wide 3 discovery summary statistics that incorporated LD information from an ancestry-matched 4 reference population. These analyses were restricted to participants of EA, because an 5 LD reference sample scaled to the size of our meta-analysis could only be constructed 6 from publicly available data for EA individuals, 24 for which we randomly selected 15,000 7 UK Biobank participants (dataset ID 8974). Individuals who withdrew consent and those 8 not meeting data cleaning requirements were excluded, keeping only those who passed 9 sex-consistency check, had ≥95% call rate, and did not represent outliers with respect to 10 SNP heterozygosity. For each pair of individuals, the proportion of variants shared 11 identical-by-descent (IBD) was computed using PLINK. 68 Only one member of each pair 12 with IBD coefficient ≥0.1875 was retained. Individuals were restricted to those of EA by 13 excluding outliers along the first two PCs from a PC analysis seeded with the HapMap 14 phase 3 release 2 populations as reference. The final dataset to estimate LD included 15 13,558 EA individuals and 16,969,363 SNPs. 16 The basis for statistical fine-mapping were the 228 1-Mb genome-wide significant 17 loci identified in the EA meta-analysis, clipping at chromosome borders. Overlapping loci 18 as well as pairs of loci whose respective index SNPs were correlated (r² >0.1 in the UKBB 19 LD dataset described above) were merged. A single SNP was chosen to represent the 20 MHC region, resulting in a final list of 189 regions prior to fine-mapping. Within each 21 region, the GCTA COJO Slct algorithm 69 was used to identify independent variants 22 employing a step-wise forward selection approach. We used the default collinearity cut-23 off of 0.9 (sensitivity analyses showing no major influence of alternative cutoff values; 24 data not shown). We deemed an additional SNP as independently genome-wide 25 significant if the SNPs' p-value conditional on all previously identified SNPs in the same 26 region was <5×10 -8 . 27

Fine-mapping and credible sets in the EA meta-analysis 1
For each region containing multiple independent SNPs and for each independent SNP in 2 such regions, approximate conditional analyses were conducted using the GCTA COJO-3 Cond algorithm to generate approximate conditional association statistics conditioned on 4 the other independent SNPs in the region. Using the Wakefield's formula implemented in 5 the R package 'gtx', 70 we derived approximate Bayes factors (ABF) from conditional 6 estimates in regions with multiple independent SNPs and from the original estimates for 7 regions with a single independent SNP. Given that 95% of the SNP effects on log(eGFR) 8 fell within -0.01 to 0.01, the standard deviation prior was chosen as 0.0051 based on 9 formula 8 in the original publication. 32 Sensitivity analyses showed that results were 10 robust when higher values were used for the standard deviation prior (data not shown). 11 For each variant within an evaluated region, the ABF obtained from the association betas 12 and their SEs of the marginal (single signal region) or conditional estimates (multi-signal 13 regions) was used to calculate the PP for a SNP of driving the association signal ("causal 14 variant"). We derived 99% credible sets, representing the SNP sets containing the 15 variant(s) driving the association signal with 99% probability, by ranking variants by their 16 PP and adding them to the set until reaching a cumulative PP>99% in each region.

Co-localization of eGFR and gene expression in cis 26
As the great majority of gene expression datasets is generated based on EA ancestry 27 samples, co-localization analysis was based on the genetic associations with eGFR in 28 the EA sample and with gene expression (eQTL) quantified from micro-dissected human 29 glomerular and tubulo-interstitial kidney portions from 187 individuals from the NEPTUNE 1 study, 41 as well as from the 44 tissues included in the GTEx Project v6p release. 38 The 2 eQTL and GWAS effect alleles were harmonized. For each locus, we identified tissue 3 gene pairs with reported eQTL data within ±100 kb of each GWAS index SNP. The region 4 for each co-localization test was defined as the eQTL cis window defined in the underlying 5 GTEx and NephQTL studies. We used the 'coloc.fast' function, using default setting, from 6 the R package 'gtx' (https://github.com/tobyjohnson/gtx), which is an adaption of 7 Giambartolomei's co-localization method. 74 'gtx' was also used to estimate the direction 8 of effect over the credible sets as the ratio of the average PP-weighted GWAS effects 9 over the PP-weighted eQTL effects. 10 11

Trans-eQTL analysis 12
We performed trans-eQTL annotation through LD mapping based on the 1000Gp3v5 13 European reference panel (r 2 cut-off >0.8). We limited annotation to replicated index 14 SNPs with a fine-mapping PP≥1%. Due to expected small effect sizes, only genome-wide 15 trans-eQTL studies of either peripheral blood mononuclear cells or whole blood with 16 n≥1000 individuals were considered, resulting in five non-overlapping studies 75-79 17 Table 14). For one study, 79 we had access to an update with larger 18 sample size (n=6645) obtained by combining two non-overlapping studies (LIFE-Heart 80 19

Co-localization with urinary uromodulin concentrations 23
Association between concentrations of the urinary uromodulin-to-creatinine ratio with 24 genetic variants at the UMOD-PDILT locus was evaluated in the German Chronic Kidney 25 Disease (GCKD) study. 82 Uromodulin concentrations were measured from frozen stored 26 urine using an established ELISA assay with excellent performance. 36 Concentrations 27 were indexed to creatinine to account for urine dilution. Genetic associations were 28 assessed using the same software and settings as for the eGFR association 29 Table 2). Co-localization analyses were performed using identical 1 software and settings as described above for the association with gene expression.                        Table 1 -Genes implicated as causal via identification of missense SNPs with high probability of driving the eGFR association signal. Genes are included if they contain a missense SNP with posterior probability of association of >50% or mapping into a small credible set (≤5 SNPs). 34.0 -Encodes a subunit of the slowly inactivating L-type voltage-dependent calcium channel in skeletal muscle. Reports of altered expression in kidney cancer (PMID 28781648) and after indoxyl sulfate treatment (PMID: 27550174). Rare variants can cause autosomal dominant hypokalemic periodic paralysis, type 1 (#170400) or malignant hyperthermia susceptibility (#601887). Common variation at this locus has been reported as associated with eGFR in previous GWAS (PMID: 24029420, PMID: 26831199).

ENCODE kidney
Belongs to the SLC25 family of mitochondrial carrier proteins and is an orphan transporter. This variant has already been identified in a GWAS of symmetric dimethylarginine levels (PMID: 24159190) and in a whole-genome sequence (WGS) analysis of serum creatinine (PMID: 25082825). SLC25A45 may play a role in biosynthesis of arginine, which is involved in the synthesis of creatine.  12.7 -Encodes the polyspecific organic cation transporter (OCT2) that is primarily expressed in the kidney, where it mediates tubular uptake of organic compounds including creatinine from the circulation. Many publications relate SLC22A2 to kidney function. rs316019 is a known pharmacogenomics variant associated with response to metformin and other drugs such as cisplatin. Carriers of the risk allele have a higher risk of cisplatin-induced nephrotoxicity (PMID: 19625999), indicating that this transporter is essential in excreting toxins. The locus has been reported in previous GWAS of eGFR (PMID: 20383146). 1 Boldface indicates SNPs most likely to be relevant for kidney function based on the combined effects on eGFR and BUN; 2 PP: posterior probability. 3 CADD score: Combined Annotation Dependent Depletion (CADD) PHRED-like score (Methods); 4 DHS: DNAse Hypersensitivity Site