Human genetic analyses of organelles highlight the nucleus, but not the mitochondrion, in age-related trait heritability

Most age-related human diseases are accompanied by a decline in cellular organelle integrity, including impaired lysosomal proteostasis and defective mitochondrial oxidative phosphorylation. An open question, however, is the degree to which inherited variation impacting each organelle contributes to age-related disease pathogenesis. Here, we evaluate if organelle-relevant loci confer greater-than-expected age-related disease risk. As mitochondrial dysfunction is a “hallmark” of aging, we begin by assessing nuclear and mitochondrial DNA loci relevant to mitochondria and surprisingly observe a lack of enrichment across 24 age-related traits. Within nine other organelles, we find no enrichment with one exception: the nucleus, where enrichment emanates from nuclear transcription factors. In agreement, we find that genes encoding several organelles tend to be “haplosufficient,” while we observe strong purifying selection against protein-truncating variants impacting the nucleus. Our work identifies common variation near transcription factors as having outsize influence on age-related trait risk, motivating future efforts to determine if and how this variation contributes to age-related organelle deterioration.


48
The global burden of age-related diseases such as type 2 diabetes (T2D), Parkinson's disease (PD), and 49 cardiovascular disease (CVD) has been steadily rising due in part to a progressively aging population. These 50 diseases are often highly heritable 1 . Genome-wide association studies (GWAS) have led to the discovery 51 of thousands of robust associations with common genetic variants 2 , implicating a complex genetic 52 architecture as underlying much of the heritable risk. These loci hold the potential to reveal underlying 53 mechanisms of disease and spotlight targetable pathways. 54 Aging has been associated with dysfunction in many cellular organelles 3 . Dysregulation of autophagic 55 proteostasis, for which the lysosome is central, has been implicated in myriad age-related disorders 56 including neurodegeneration, heart disease, and aging itself 4 , and mouse models deficient for autophagy 57 in the central nervous system show neurodegeneration 5,6 . Endoplasmic reticular (ER) stress has been 58 invoked as central to metabolic syndrome and insulin resistance in type 2 diabetes 7 . Disruption in the 59 nucleus through increased gene regulatory noise from epigenetic alterations 3 and elevated nuclear 60 envelope "leakiness" 8 has been implicated in aging. Dysfunction in the mitochondria has even been 61 invoked as a "hallmark" of aging 3 and has been nominated as a driver of virtually all common age-62 associated diseases. In particular, deficits in mitochondrial oxidative phosphorylation (OXPHOS) have 63 been observed in aging and age-related diseases as evidenced by in vivo 31 P-NMR measures 9,10 , enzymatic 64 activity [11][12][13][14][15][16][17] in biopsy material, accumulation of somatic mitochondrial DNA (mtDNA) mutations 18-20 , and 65 a decline in mtDNA copy number (mtCN) 21 . 66 Given that a decline in organelle function is observed in age-related disease, a natural question is whether 67 inherited variation in loci relevant for organelles are enriched for age-related disease risk. In the present 68 study, we use a human genetics approach to assess common variation in loci relevant to the function of 69 10 cellular organelles. We begin with a deliberate focus on mitochondria given the depth of literature 70 linking it to age-related disease. As mitochondria-localizing protein products from ~1100 nuclear DNA 71 (nucDNA)-encoded genes 22 and 13 mtDNA-encoded genes are critical for proper OXPHOS homeostasis 23 , 72 we test both nucDNA and mtDNA loci relevant for mitochondrial function in 24 different age-related 73 diseases and traits. We hypothesized that heritability for common, age-related traits would be 74 overrepresented among mitochondria-relevant loci, namely variants near genes encoding the organelle's 75 proteome or loci associated with quantitative readouts of mitochondrial function. 76 To our surprise, we find no evidence of enrichment for genome-wide association signal in mitochondria-77 relevant loci across any of our analyses. Further, of ten tested organelles, only the nucleus shows 78 enrichment among many age-associated traits, with the signal emanating from the transcription factors. 79 Further analysis shows that genes encoding the mitochondrial proteome tend to be tolerant to 80 heterozygous predicted loss-of-function (pLoF) variation and thus are surprisingly "haplosufficient," 81 whereas nuclear transcription factors are especially sensitive to gene dosage and are often 82 "haploinsufficient". Thus, we highlight variation influencing gene-regulatory pathways, rather than 83 organelle physiology, in the inherited risk of common age-associated diseases. 84

102
No evidence for enrichment of age-related trait heritability in mitochondria-relevant loci 103 To test if age-related trait heritability was enriched among mitochondria-relevant loci, we began by simply 104 asking if ~1100 nucDNA genes encoding the mitochondrial proteome from the MitoCarta2.0 inventory 22 105 were found near lead SNPs for our selected traits represented in the NHGRI-EBI GWAS Catalog 106 (https://www.ebi.ac.uk/gwas/) 39 more frequently than expectation (Methods, Supplementary note). To 107 our surprise, no traits showed a statistically significant enrichment of mitochondrial genes (Figure S1A); 108 in fact, six traits showed a statistically significant depletion. Even more strikingly, MitoCarta genes tended 109 to be nominally enriched in fewer traits than the average randomly selected sample of protein-coding 110 genes ( Figure S1B, empirical p = 0.014). This lack of enrichment was observed more broadly across 111 virtually all traits represented in the GWAS Catalog ( Figure S1C). We also tested several transcriptional 112 regulators of mitochondrial biogenesis and function -TFAM, GABPA,GABPB1,ESRRA,YY1,NRF1,113 PPARGC1A, PPARGC1B. We found little evidence supporting a role for these genes in modifying risk for 114 the age-related GWAS Catalog phenotypes, observing only a single trait (heel bone mineral density) for 115 which a mitochondrial transcriptional regulator (TFAM) was nearest an associated genome-wide 116 significant variant (Supplementary note). 117 To investigate further, we turned to U.K. Biobank (UKB). We compiled and tested three classes of 118 "mitochondria-relevant loci" (Figure 2A) with which we interrogated the association between common 119 mitochondrial variation and common disease. First, we curated literature-reported nucDNA quantitative 120 trait loci (QTLs) associated with measures of mitochondrial function (Table S3): mtCN 40,41 , mtRNA 121 Figure 1. Selection of genetically diverse age-related diseases and traits using epidemiological data. A. Period prevalence of age-associated diseases systematically selected for this study (Methods). Epidemiological data obtained from Kuan et al. 2019. B: Genetic correlation between the selected age-related traits. All correlations were assessed between UK Biobank phenotypes with the exception of eGFR, Alzheimer's Disease, and Parkinson's Disease, for which the respective metaanalyses were used (Methods). Point estimates and standard errors reported in Table S2 Figure 2. Assessment of the association of nucDNA and mtDNA mitochondria-relevant loci to age-related traits. A. Scheme outlining the aspects of mitochondrial function assessed in this study. nucDNA loci relevant to mitochondrial function are shown in teal, while mtDNA loci are shown in pink. B. Enrichment results for the overlap between loci associated with mtDNA copy number, mtRNA abundance/modification, and OXPHOS biomarkers and loci significantly associated with age-related disease in UKB. Inset number represents the number of tested SNPs, numbers adjacent to bars represent the absolute number of mitochondria-relevant loci overlapping the respective age-related disease. Dotted line represents Bonferroni cutoff for p = 0.1; BH FDR 0.1 threshold cannot be visualized as no tests pass the cutoff (Supplementary note). C. S-LDSC enrichment p-values on top of the baseline model in UKB. Inset labels represent gene-set size; dotted line represents BH FDR 0.1 threshold. D. Visualization of mtDNA variants and associations with age-related diseases. The outer-most track represents the genetic architecture of the circular mtDNA. The heatmap track represents the number of individuals with alternate genotype on log scale. The inner track represents mitochondrial genome-wide association p-values, with radial angle corresponding to position on the mtDNA and magnitude representing -log 10 P-value. Dotted line represents Bonferroni cutoff for all tested trait-variant pairs. E. Replication of S-LDSC enrichment results in meta-analyses. Dotted line represents BH FDR 0.1 threshold. * represent traits for which sufficiently well powered cohorts from both UKB and meta-analyses were available. The trait color legend to the right of panel D applies to panels B, C, and D, representing UKB traits.
First, we tested if published QTLs for mtCN, mtRNA abundance, and OXPHOS biomarkers ( Table S3, S4)  128 were enriched for an overlap with genome-wide significant loci for each of our age-related traits in UKB 129 (Methods, Figure S2). We observed no evidence of enrichment among QTLs available in the literature 130 ( Figure 2B, Supplementary note; all q > 0.1). 131 Second, we used S-LDSC 36,50 and MAGMA (https://ctg.cncr.nl/software/magma) 51 , two robust methods 132 that can be used to assess gene-based heritability enrichment accounting for LD and several confounders, 133 to test if there was any evidence of heritability enrichment among MitoCarta genes (Methods). We found 134 no evidence of enrichment near nucDNA MitoCarta genes for any trait tested in UKB using S-LDSC ( Figure  135 2C, S8A), consistent with our results from the GWAS Catalog. We replicated this lack of enrichment using 136 MAGMA at two different window sizes (Figure S8C, S8E; all q > 0.1). 137 Given the lack of enrichment among the MitoCarta genes, we wanted to (1) verify that our selected 138 methods could detect previously reported enrichments and (2) confirm that common variation in or near 139 MitoCarta genes can lead to expression-level perturbations. We first successfully replicated previously 140 reported enrichment among tissue-specific genes for key traits using both S-LDSC ( Figure S3, S4) and 141 MAGMA ( Figure S5, S6, Supplementary note, Methods). We next confirmed that we had sufficient power 142 using both S-LDSC and MAGMA to detect physiologically relevant enrichment effect sizes among 143 MitoCarta genes ( Figure S7, Methods, Supplementary note). We finally examined the landscape of cis-144 expression QTLs (eQTLs) for these genes and found that almost all MitoCarta genes have cis-eQTLs in at 145 least one tissue and often have cis-eQTLs in more tissues than most protein-coding genes ( Figure S9, 146 Methods, Supplementary note). Hence, our selected methods could detect physiologically relevant 147 heritability enrichments among our selected traits at gene-set sizes comparable to that of MitoCarta, and 148 common variants in or near MitoCarta genes exerted cis-control on gene expression. 149 Third, we considered mtDNA loci genotyped in UKB, obtaining calls for up to 213 common variants passing 150 quality control across 360,662 individuals (Methods, Supplementary note). We found no significant 151 associations on the mtDNA for any of the 21 age-related traits available in UKB using linear or logistic 152 regression (Methods, Figure 2E, S9). 153 As a control and to validate our approach, we also performed mtDNA-GWAS for specific traits with 154 previously reported associations. A recent analysis of ~147,437 individuals in BioBank Japan revealed four 155 distinct traits with significant mtDNA associations 52 . Of these, creatinine and aspartate aminotransferase 156 (AST) had sufficiently large sample sizes in UKB. We observed a large number of associations throughout 157 the mtDNA for both traits (p < 1.15 * 10 !" , Figure S9E). Thus, our mtDNA association method was able to 158 replicate robust mtDNA associations among well-powered traits. 159 Finally, we sought to replicate our negative results in an independent cohort. We turned to published 160 GWAS meta-analyses 26-35 (Table S1) and successfully replicated the lack of enrichment for MitoCarta 161 genes across all 10 traits with an available independent cohort GWAS using S-LDSC ( Figure 2E, S8B) and 162 MAGMA at two different window sizes ( Figure S8D, Supplementary note; all q > 0.1). Importantly, while 163 we were unable to pursue analyses for PD and Alzheimer's disease in UKB due to limited case counts, we 164 tested MitoCarta genes among well-powered meta-analyses for these disorders (Supplementary note) 165 and observed no enrichment ( Figure 2E; all q > 0.1). 166 In summary, we tested (1) QTLs for mitochondrial physiology in UKB, (2) nucDNA loci near genes that 167 encode the mitochondrial proteome in the GWAS Catalog, UKB, and GWAS meta-analyses, (3) mtDNA 168 variants in UKB, and (4) known transcriptional regulators of mitochondrial biogenesis and function in the 169 GWAS Catalog. We found no convincing evidence of heritability enrichment for common age-associated 170 diseases among these mitochondria-relevant loci ( Table S8). 171

Enrichment of age-related trait heritability near genes encoding nuclear transcription factors 172
We next asked whether heritability for age-related diseases and traits clusters among loci associated with 173 any cellular organelle. We used the COMPARTMENTS database (https://compartments.jensenlab.org) to 174 define gene-sets corresponding to the proteomes of nine additional organelles 53 besides mitochondria 175 (Methods). We used S-LDSC to produce heritability estimates for these categories in the UKB age-related 176 disease traits, finding evidence of heritability enrichment in many traits for genes comprising the nuclear 177 proteome ( Figure 3A, Methods). No other tested organelles showed evidence of heritability enrichment. 178 Variation in or near genes comprising the nuclear proteome explained over 50% of disease heritability on 179 average despite representing only ~35% of tested SNPs ( Figure S10, Supplementary note). We 180 successfully replicated this pattern of heritability enrichment among organelles using MAGMA in UKB at 181 two window sizes ( Figure S13A, S13B), again finding only enrichment among genes related to the nucleus. 182 With over 6,000 genes comprising the nuclear proteome, we considered largely disjoint subsets of the 183 organelle's proteome to trace the source of the enrichment signal 54-56 ( Figure 3B, Methods, 184 Supplementary note). We found significant heritability enrichment within the set of 1,804 genes whose 185 protein products are annotated to localize to the chromosome itself (q < 0.1 for 9 traits, Figure 3C, S12).

186
Further partitioning revealed that much of this signal is attributable to the subset classified as 187 transcription factors 56 (1,523 genes, q < 0.1 for 10 traits, Figure 3D, S12). We replicated these results using 188 MAGMA in UKB at two window sizes ( Figure S13), and also replicated enrichments among TFs in several 189 (but not all) corresponding meta-analyses ( Figure S14) despite reduced power ( Figure S7H). We generated 190 functional subdivisions of the TFs (Methods, Supplementary note), finding that the non-zinc finger TFs 191 Figure 3. Heritability enrichment of organellar proteomes across age-related disease in UK Biobank. A. Quantile-quantile plot of heritability enrichment p-values atop the baseline model for gene-sets representing organellar proteomes, with black line representing expected null p-values following the uniform distribution and shaded ribbon representing 95% CI. B. Scheme of spatially distinct disjoint subsets of the nuclear proteome as a strategy to characterize observed enrichment of the nuclear proteome. Numbers represent gene-set size. C. S-LDSC enrichment p-values for spatial subsets of the nuclear proteome computed atop the baseline model. D. S-LDSC enrichment p-values for TFs and all other nucleus-localizing proteins. Inset numbers represent gene-set sizes, black lines represent cutoff at BH FDR < 10%. * represent traits for which sufficiently well powered cohorts from both UKB and meta-analyses were available.
showed enrichment for a highly similar set of traits to those enriched for the whole set of TFs ( Figure  192 S15D, S16B, S17B, S18B). Interestingly, the KRAB domain-containing zinc fingers (KRAB ZFs) 57 , which are 193 recently evolved ( Figure S15H), were largely devoid of enrichment even compared to non-KRAB ZFs 194 ( Figure S15E, S16C, S17C, S18C). Thus, we find that variation within or near non-KRAB domain-containing 195 transcription factor genes has an outsize influence on age-associated disease heritability (Table S8). 196 197 Mitochondrial genes tend to be more "haplosufficient" than genes encoding other organelles 198 In light of observing heritability enrichment only among nuclear transcription factors, we wanted to 199 determine if the fitness cost of pLoF variation in genes across cellular organelles mirrored our results. 200 Mitochondria-localizing genes and TFs play a central role in numerous Mendelian diseases 49,58-60 , so we 201 initially hypothesized that genes belonging to either category would be under significant purifying 202 selection (i.e., constraint). We obtained constraint metrics from gnomAD 203 Figure 4. Differences in constraint distribution across organelles. A. Constraint as measured by LOEUF from gnomAD v2.1.1 for genes comprising organellar proteomes, book-ended by distributions for known haploinsufficient genes as well as olfactory receptors. Lower values indicate genes exacting a greater organismal fitness cost from a heterozygous LoF variant (greater constraint). B. Proportion of each gene-set found in the lowest LOEUF decile. Higher values indicate gene-sets containing more highly constrained genes. C. Constraint distributions for subsets of the nuclear-encoded mitochondrial proteome (red) and subsets of the nucleus (teal). Black points represent the mean with 95% CI. Inset numbers represent gene-set size.
(https://gnomad.broadinstitute.org) 61 as the LoF observed/expected fraction (LOEUF). In agreement with 204 our GWAS enrichment results, we observed that the mitochondrion on average is one of the least 205 constrained organelles we tested, in stark contrast to the nucleus ( Figure 4A). In fact, the nucleus was 206 second only to the set of "haploinsufficient" genes (defined based on curated human clinical genetic 207 data 61 , Methods) in the proportion of its genes in the most constrained decile, while the mitochondrion 208 lay on the opposite end of the spectrum ( Figure 4B). Interestingly, even the Mendelian mitochondrial 209 disease genes had a high tolerance to pLoF variation on average in comparison to TFs (Figure 4C, S19A). 210 Even across different categories of TFs, we observed that highly constrained TF subsets tend to show 211 GWAS enrichment (Figure S19B, S15E) relative to unconstrained subsets for our tested traits. Indeed, 212 explicit inclusion of LOEUF as a covariate in the enrichment analysis model (Methods) reduced the 213 significance of (but did not eliminate) the enrichment seen for the TFs (Figure S20B, S21B, S20E, S20F). 214 Thus, while disruption in both mitochondrial genes and TFs can produce rare disease, the fitness cost of 215 heterozygous variation in mitochondrial genes appears to be far lower than that among the TFs. This 216 dichotomy reflects the contrasting enrichment results between the mitochondrial genes and the TFs and 217 supports the importance of gene regulation as it relates to evolutionary conservation. 218 219 Discussion 220 Pathology in cellular organelles has been widely documented in age-related diseases 3,7,62-65 . Using a 221 human genetics approach, here we report the unexpected discovery that except for the nucleus, cellular 222 organelles tend not to be enriched in genetic associations for common, age-related diseases. We started 223 with a focus on the mitochondria as a decline in mitochondrial abundance and activity has long been 224 reported as one of the most consistent correlates of aging 9,14,19,20 and age-associated diseases [10][11][12][13][15][16][17][18]21 . 225 We tested mitochondria-relevant common variants on the nucDNA and mtDNA and found no convincing 226 evidence of heritability enrichment in any tested trait, cohort, or method. We systematically expanded 227 our analysis to survey 10 organelles and found that only the nucleus showed enrichment, with much of 228 this signal originating from nuclear transcription factors. Constraint analysis showed a substantial fitness 229 cost to heterozygous loss-of-function mutation in genes encoding the nuclear proteome, whereas genes 230 encoding the mitochondrial proteome were "haplosufficient." 231 For highly polygenic and well-powered traits, any large fraction of the genome may explain a statistically 232 significant amount of disease heritability 66,67 . Indeed, individual associations between mitochondria-233 relevant loci and certain common diseases have been identified previously 68,69 . As associations have also 234 been identified among loci relevant for other organelles, enrichment analyses can place these complex 235 genetic architectures in a broader biological context and prioritize pathways for follow-up. Importantly, 236 both MAGMA and S-LDSC are capable of detecting an enrichment even in a highly polygenic background. 237 Both methods have been used in the past to identify biologically plausible disease-relevant tissues 36,50 and 238 pathway enrichments 70,71 in traits across the spectrum of polygenicity, and we identify enrichments 239 among disease-relevant tissues using both methods in several highly polygenic traits. 240 While previous work has shown that common disease GWAS can be enriched for expression in specific 241 disease-relevant organs 50,72 , our data suggest that this framework does not generally extend to organelles. 242 This finding contrasts with our classical nosology of inborn errors of metabolism that tend to be mapped 243 to "causal" organelles, e.g., lysosomal storage diseases, disorders of peroxisomal biogenesis, and 244 mitochondrial OXPHOS disorders. The observed enrichment for transcription factors within the nucleus 245 indicates that common variation influencing genome regulation impacts common disease risk more than 246 variation influencing individual organelles. 247 Our analysis of common inherited mitochondrial variation represents, to our knowledge, the most 248 comprehensive assessment of mitochondria-relevant nucDNA and mtDNA variation in age-related 249 diseases. We replicated mtDNA associations with creatinine and AST observed previously in BioBank 250 Japan 52 , further supporting our approach. While individual mtDNA variants have been previously 251 associated with certain traits 73-75 , these associations appear to be conflicting in the literature, perhaps 252 because of limited power and/or uncontrolled confounding biases such as population stratification 76,77 . 253 Our negative results are surprising, but they are not inconsistent with a small number of isolated reports 254 interrogating either mitochondria-relevant nucDNA 78 or mtDNA 52,79-81 loci in select diseases. 255 To our knowledge, we are the first to systematically document heterogeneity in average pLoF across 256 cellular organelles. That MitoCarta genes are "haplosufficient" and pLoF tolerant ( Figure 4A) is consistent 257 with the observation that most of the ~300 inborn mitochondrial disease genes produce disease with 258 recessive inheritance 49 and healthy parents. The few mitochondrial disorders that show dominant 259 inheritance are nearly always due to dominant negativity rather than haploinsufficiency. The intolerance 260 of TFs to pLoF variation ( Figure 4A) provide a stark contrast to the results from the mitochondria that is 261 borne out in their associated Mendelian disease syndromes: TFs are known to be haploinsufficient 82 and 262 even regulatory variants modulating their expression can produce severe Mendelian disease 83 . We 263 observe heritability enrichment among TFs for 10 different diseases, consistent with observed elevated 264 purifying selection against pLoF variants in these genes. Our enrichment results combined with pLoF 265 intolerance suggest that variation among TFs may produce disease-associated variants with larger effect 266 sizes than expectation, underscoring their importance as genetic "levers" for common disease heritability. 267 Why are mitochondria so robust to variation in gene dosage and hence "haplosufficient?" We propose 268 three possibilities. First, one possibility is pathway redundancy. For example, in cell culture, defective 269 OXPHOS can be supported thanks to the action of non-mitochondrial pathways such as cytosolic glycolysis 270 and nucleotide salvage as long as key environmental nutrients are provided 84 . Second, mitochondrial 271 pathways tend to be highly interconnected, and it was already proposed by Wright 85 and later by Kacser 272 and Burns 86 that haplosufficiency arises as a consequence of physiology, i.e., network organization of 273 metabolic reactions. Kacser and Burns in fact explicitly mention that noncatalytic gene products fall 274 outside their framework, and we believe that our finding that nucleus-localizing and cytoskeletal genes 275 are the two most pLoF-intolerant compartments is consistent with their assessment. Finally, it is crucial not to confuse our results with previously reported associations between somatic 294 mtDNA mutations and age-associated disease 18-20 -the present work is focused on germline variation. 295 We emphasize that our study does not formally address the causality of mitochondrial dysfunction in 296 common age-related disease. Rather, we have tested if common variants in mitochondrial pathways tend 297 to explain a disproportionate amount of age-related disease heritability. The observed lack of heritability 298 enrichment in mitochondrial pathways does not preclude the possibility of a therapeutic benefit in 299 targeting the mitochondrion for age-related disease. For example, mitochondrial dysfunction is 300 documented in brain or heart infarcts following blood vessel occlusion in laboratory-based models 99,100 . 301 Though mitochondrial variants do not influence infarct risk in this laboratory model, pharmacological 302 blockade of the mitochondrial permeability transition pore can mitigate reperfusion injury and infarct 303 size 101 . Future studies will be required to determine if and how the mitochondrial dysfunction associated 304 with common age-associated diseases can be targeted for therapeutic benefit. 305 Our finding that the nucleus is the only organelle that shows enrichment for common age-associated trait 306 heritability builds on prior work implicating nuclear processes in aging. Most human progeroid syndromes 307 result from monogenic defects in nuclear components 102 (e.g., LMNA in Hutchinson-Gilford progeria 308 syndrome, TERC in dyskeratosis congenita), and telomere length has long been observed as a marker of 309 aging 103 . Heritability enrichment of age-related traits among gene regulators is consistent with the 310 epigenetic dysregulation 104 and elevated transcriptional noise 3,105 observed in aging (e.g., SIRT6 311 modulation influences mouse longevity 106 and metabolic syndrome 63 ). An important role for gene 312 regulation in common age-related disease is in agreement with both the observation that a very large 313 fraction of common disease-associated loci corresponds to the non-coding genome and the enrichment 314 of disease heritability in histone marks and transcription factor binding sites 36,107 . Given that a 315 deterioration in several other cellular organelles has been linked to age-related traits, a future challenge 316 lies in elucidating the connection between variation influencing transcription factors and organelle 317 dysfunction in age-related disease. 318

319
Acknowledgements 320 We thank D. Genetic correlation point estimates and standard errors plotted in Figure 1B is available in Table S2. 329 Summary statistics from mtDNA-GWAS available in Table S6. All gene-based enrichment analysis p-values 330 and point estimates are available in Table S8. Literature-reported loci associated with biomarkers of 331 mitochondrial function after clumping and QC are available in

Materials and Methods 366
Trait selection: 367 Sex-standardized period prevalence of over 300 diseases was obtained from an extensive survey of the 368 National Health Service in the UK as reported previously 24 . To select high prevalence late-onset diseases, 369 we ranked diseases with a median onset over 50 years of age by the sum of the period prevalence of all 370 age categories above 50. We selected the top 30 diseases using this metric and manually mapped these 371 traits to similar or equivalent phenotypes with publicly available summary statistics from UKB and/or well-372 powered meta-analyses (e.g., Parkinson's Disease and Alzheimer's Disease for dementia) resulting in 24 373 traits with data available in UKB, meta-analyses, or both (Table S1).

375
Criteria for inclusion of summary statistics: 376 We manually mapped selected age-related diseases and traits to corresponding phenotypes in UKB. In 377 parallel, we searched the literature to identify well-powered EUR-predominant GWAS (referred to as 378 meta-analyses) that (1)  showed heritability Z-score > 4 within meta-analyses but not in UKB (Table S1). P-values for genetic 395 correlation represented deviation from the null hypothesis & # = 0. Traits were ordered by their 396 contribution to the first eigenvector of the absolute value of the correlation matrix, with point estimates 397 and standard errors available in Table S2. Bonferroni correction was applied producing a p-value cutoff of 398 0.05 ( $% $ ) = ⁄ 1.81 * 10 !% . 399 400 Assessment of mitochondria-localizing genes in the GWAS Catalog: 401 We mapped variants in the GWAS Catalog (obtained on September 5 th , 2019, 402 https://www.ebi.ac.uk/gwas/) meeting genome-wide significance (p < 5e-8) to genes using provided 403 annotations, producing a set of trait-associated genes for each trait. We manually selected phenotypes 404 represented in the GWAS Catalog matching our set of age-associated traits with over annotated 30 trait-405 associated genes. For each trait, we computed the proportion of trait associated genes that were 406 mitochondria-localizing (defined via MitoCarta2.0 22 ) and tested for enrichment or depletion relative to 407 overall genome background using two-sided Fisher's exact tests correcting for multiple hypothesis tests 408 with the Benjamini-Hochberg (BH) procedure at FDR q-value < 0.1. 409 We also computed the test statistic , # &'()*+ , defined as the number of age-associated traits showing a 410 nominal (not necessarily statistically significant) enrichment for a given gene-set -, for the MitoCarta 411 genes. We then generated an empirical null distribution for , # &'()*+ . We drew 1,000 random samples of 412 protein-coding genes, where each sample contained the same number of genes as the set of 413 mitochondria-localizing genes and computed , # &'()*+ for each of these gene-sets ( Figure S1B). The one-414 sided p-value, defined as Pr(, # &'()*+ ≤ 1) under the null, was subsequently obtained. 415 We expanded our enrichment/depletion analysis to all 332 traits in the GWAS Catalog with over 30 trait-416 associated genes; for enrichment or depletion testing, we used two-sided Fisher's exact tests and 417 corrected for multiple hypothesis testing with the BH procedure at FDR q-value < 0.1.

419
Enrichment analysis of literature-curated mitochondria-associated phenotypes: 420 We reviewed the literature for quantitative trait loci (QTLs) for mtDNA copy number ( drew variants at random 2500 times matching on LD score, in-sample MAF, and distance to transcription 432 start site (where the distance metric was set to 0 if the variant was located within a gene boundary). LD 433 scores per variant were generated per-chromosome with a 1 cm window using the 1000G EUR reference 434 panel. The , * ,-&(./0 was then computed for each category for each set of randomly selected variants, 435 generating a category specific empirical null distribution for the statistic (Figure S2). The one-sided p-436 value, defined as Pr(, * ,-&(./0 ≥ 1) under the null, was subsequently obtained. To correct for multiple 437 hypothesis testing, we applied the BH procedure with FDR < 0.1 and also applied a Bonferroni threshold 438 of 3 = Tissue-expressed gene-set enrichment analysis: 488 To obtain the set of genes most expressed in a given tissue versus others, we obtained t-statistics 489 computed from GTEx v6 gene-level transcript-per-million (TPM) data corrected for age and sex as 490 published previously 50 . For each tissue, we selected the top 2485 genes (10%) with the highest t-statistics 491 for tissue-specific expression, producing tissue-expressed gene-sets. We selected nine tissues based on 492 expectation of enrichment for our tested traits in UKB (e.g., liver for LDL levels, esophageal mucosa for 493 GERD). We used both S-LDSC and MAGMA to test for enrichment in the usual way (Methods) controlling 494 for the set of tissue-expressed genes to ensure a competitive analysis (Supplementary note). Tissue-495 expressed gene-set analyses were performed on meta-analyses with S-LDSC and MAGMA on the same 496 tissues using the same parameters as used in UKB. 497 498 Power analysis: 499 To test for the effects of gene-set size on power, we selected ten positive control tissue-trait pairs based 500 on (1) the presence of tissue enrichment in UKB with S-LDSC and MAGMA and (2) if the observed 501 enrichment was biologically plausible. The pairs tested were liver-HDL, liver-LDL, liver-TG, liver-502 cholesterol, pancreas-glucose, pancreas-type 2 diabetes, atrial appendage-atrial fibrillation, sigmoid 503 colon-diverticular disease, coronary artery-myocardial infarction, and visceral adipose-HDL. We then, in 504 brief, used an empirical sampling-based approach, generating random subsamples of a selected set of 505 tissue-expressed gene-sets at four different gene-set sizes (1523, 1105, 800, and 350 genes), defining 506 power as the proportion of trials showing a significant enrichment (Supplementary note). We used the 507 same sub-sampled gene-sets for enrichment analysis using both S-LDSC and MAGMA in the usual way 508 (Methods) controlling for the set of tissue-expressed genes to ensure a competitive analysis 509 (Supplementary note). We used the same gene-sets among the subset of the positive control traits that 510 showed enrichment in the corresponding meta-analysis to verify power for the meta-analyses 511 (Supplementary note).

513
Cross-tissue eQTL analysis 514 We obtained the set of eGenes from GTEx v8 across 49 tissues (https://www.gtexportal.org), filtering to 515 only include cis-eQTLs with q-value < 0.05. To determine how the landscape of cis-eQTLs for MitoCarta 516 genes compared to other protein-coding genes, we regressed the number of tissues with a detected cis-517 eQTL for a given gene x, , 6 &789 , onto an indicator for membership in a given organellar proteome 518 ( 7  removing lowly-expressed genes with maximal cross-tissue TPM < 1, defined as: 522 523 where 1 ) is the expression of gene 1 in tissue F with ? tissues. 8 ranges from 0 to 1, with lower 8 indicating 526 broadly expressed gene and higher 8 indicating more tissue specific expression patterns. Because GTEx 527 sampled multiple tissue subtypes (e.g., brain sub-regions) that show correlated expression profiles 110 528 which bias 8 6 , , 6 &789 , and , 6 &60(&:: upward, for each broader tissue class (brain, heart, artery, esophagus, 529 skin, cervix, colon, adipose) we selected a single representative tissue when computing these quantities 530 ( Figure S14B, Supplementary note) (https://github.com/Nealelab/UK_Biobank_GWAS). We also used Hail to run Firth logistic regression with 555 the same covariates for case/control traits ( Table S1). As we observed that some mitochondrial DNA 556 variants were specific to array type, we also ran linear regression including array type as a covariate; we 557 did not perform logistic regression with array type as a covariate due to convergence issues secondary to 558 complete separation of variants assessed only on only array type. We defined mtDNA-wide significance 559 using a Bonferroni correction by N = and experimental data from the Human Protein Atlas. We used this resource to obtain the degree of 565 evidence (a number ranging from 0 to 5) linking each gene to localization to one of 12 organelles: nucleus, 566 cytosol, cytoskeleton, peroxisome, lysosome, endoplasmic reticulum, Golgi apparatus, plasma 567 membrane, endosome, extracellular space, mitochondrion, and proteasome. To avoid noisy localization 568 assignments due to weak text mining and prediction evidence, we only considered localization 569 assignments with a score > 2 as described previously 53 . We subsequently assigned compartment(s) to each 570 gene by selecting the compartment(s) with the maximal score within each gene. We only included 571 compartments containing over 240 genes due to limited power at these smaller gene-set sizes and used 572 MitoCarta2.0 22 to obtain a higher confidence set of genes localizing to the mitochondrion, resulting in 573 gene-sets representing the proteomes of 10 organelles. S-LDSC and MAGMA were used to test for 574 enrichment across the UKB age-related traits for these gene-sets in the usual way, controlling for the set 575 of protein-coding genes. S-LDSC was also used to obtain estimates of the percentage of heritability 576 explained by each organelle gene-set. 577 578 Enrichment analysis of spatial components of the nucleus: 579 To produce interpretable sub-divisions of the nucleus, we used Gene Ontology (GO) 54,55 to identify terms 580 listed as children of the nucleus cellular component (GO:0005634). We used Ensembl version 99 112 to 581 obtain a first pass set of genes annotated to each sub-compartment of the nucleus (or its children). After 582 manual review of sub-compartments with > 90 genes, we selected nucleoplasm (GO:0005654), nuclear 583 chromosome (GO:0000228), nucleolus (GO:0005730), nuclear envelope (GO:0005635), splicosomal 584 complex (GO:0005681), nuclear DNA-directed RNA polymerase complex (GO:0055029), and nuclear pore 585 (GO:0005643). We excluded terms listed as 'part' due to poor interpretability and manually excluded 586 similar terms (e.g., nuclear lumen vs nucleoplasm). To generate a high confidence set of genes localizing 587 to each of these selected sub-compartments, we then turned to the COMPARTMENTS resource which 588 assigns localization confidence scores for each protein to GO cellular component terms. We assigned 589 members of the nuclear proteome to these selected nuclear sub-compartments using same the approach 590 outlined for the organelle analysis (Methods). After filtering our selected sub-compartments to those 591 containing > 240 genes, we obtained four categories: nucleoplasm, nuclear chromosome, nucleolus, and 592 nuclear envelope. The nuclear chromosome annotation was largely overlapping with a manually curated 593 high-quality list of transcription factors 56 however was not exhaustive; as such, we merged these lists to 594 generate the chromosome and TF category. To improve interpretability, we removed genes from 595 nucleoplasm that were also assigned to another nuclear sub-compartment, constructed a list of other 596 nucleus-localizing proteins not captured in these four sub-compartments, and included only genes 597 annotated as localizing to the nucleus (Methods). S-LDSC and MAGMA were used to test for enrichment 598 across the UKB age-related traits for these gene-sets in the usual way while controlling for the set of 599 protein-coding genes (Methods). 600 601 Enrichment analysis of functionally distinct TF subsets: 602 We used a published curated high-quality list of TFs 56 to partition the Chromosome and TF category into 603 transcription factors and other chromosomal proteins. To determine which TFs are broadly expressed 604 versus tissue specific, we computed 8 per TF across all selected tissues after removing lowly-expressed 605 genes with maximal cross-tissue TPM < 1 (Methods, Supplementary note). The threshold for tissue-606 specific genes was set at 8 ≥ 0.76 based on the location of the central nadir of the resultant bimodal 607 distribution ( Figure S14A). To identify terciles of TFs by age, we obtained relative gene age assignments 608 for each gene previously generated by obtaining the modal earliest ortholog level across several databases 609 mapped to 19 ordered phylostrata 113 . DNA binding domain (DBD) annotations for the TFs were obtained 610 from previous manual curation efforts 56 . S-LDSC and MAGMA were used to test for enrichment across the 611 UKB age-related traits for these gene-sets in the usual way while controlling for the set of protein-coding 612 genes (Methods). We also tested TFs for enrichment in meta-analyses using S-LDSC and MAGMA with the 613 same parameters as for UKB traits (Supplementary note).

615
Analysis of constraint across organelles and sub-organellar gene-sets: 616 We obtained gene-level gnomAD v2.1.1 constraint tables (https://gnomad.broadinstitute.org), 617 haploinsufficient genes, and olfactory receptors 61 (https://github.com/macarthur-lab/gene_lists). 618 Constraint values as loss-of-function observed/expected fraction (LOEUF) were mapped to genes within 619 organelle, sub-mitochondrial, sub-nuclear, and TF binding domain gene-sets. 620 621 Enrichment analysis across age-related disease holding constraint as a covariate: 622 To test for enrichment with constraint as a covariate, we used MAGMA with UKB age-related traits. We 623 mapped variants to genes and performed the gene-level analysis as done previously for the mitochondria-624 localizing gene and organelle analysis. We included LOEUF and log LOEUF as covariates for the gene-set 625 analysis in addition to the default covariates (gene length, SNP density, inverse MAC, as well as the 626 respective log-transformed versions) via the -condition-residualize flag. 627