Genetic aetiology of glycaemic traits: approaches and insights

Abstract Glycaemic traits such as fasting and post-challenge glucose and insulin measures, as well as glycated haemoglobin (HbA1c), are used to diagnose and monitor diabetes. These traits are risk factors for cardiovascular disease even below the diabetic threshold, and their study can additionally yield insights into the pathophysiology of type 2 diabetes. To date, a diverse set of genetic approaches have led to the discovery of over 97 loci influencing glycaemic traits. In this review, we will focus on recent advances in the genetic aetiology of glycaemic traits, and the resulting biological insights. We will provide a brief overview of results ranging from common, to low- and rare-frequency variant-trait association studies, studies leveraging the diversity across populations, and studies harnessing the power of genetic and genomic approaches to gain insights into the biological underpinnings of these traits.


Introduction
Since their advent in 2005 (1), genome-wide association studies (GWAS) have been very successful at identifying common variant (minor allele frequency (MAF) > 5%) trait associations, with over 30,000 unique associations described to date (2). The type 2 diabetes (T2D) field has been no exception, with the number of loci robustly associated with T2D risk rising from three [PPARG, KCNJ11 and TCFL2 (3)(4)(5)] prior to the GWAS-Era, to 128 (6,7). Fasting and post-challenge glycaemic measures, and glycated haemoglobin (HbA1c), have also been the subject of intense genetic research as they are used to diagnose and monitor T2D, and are important risk factors for cardiovascular disease even within the non-diabetic range. For example, studies have found that patients diagnosed using either fasting (FG) or 2-h glucose (2hG) have distinct cardiometabolic risk (8), with 2hG being a better predictor of cardiovascular mortality than FG (9). Similarly, glycated haemoglobin (HbA1c) which reflects average glycaemia over the 2-3 month lifespan of a red blood cell, is an accepted diagnostic test for diabetes (10), but also predicts future vascular complications (11). Furthermore, insulin resistance, commonly measured using proxy phenotypes fasting insulin (FI) and insulin resistance by homeostasis model assessment [HOMA-IR (12)], is often associated with obesity or with limited peripheral adipose tissue capacity (13), and is an important risk factor for T2D. However, more sophisticated glycaemic measures such as the insulin suppression test or euglycemic clamp (considered the 'gold standard' estimate of peripheral insulin sensitivity) or proinsulin [adjusted for FI, equivalent to the proinsulin:insulin ratio, an indicator of betacell stress (14)], may, in combination with other glycaemic traits (FG, 2hG, HOMA-B and HbA1c), provide insights into diabetes pathophysiology, and possible disease stratification.
The application of a series of genetic approaches to these traits have to date yielded over 97 trait-associated loci ( Table 1, Fig. 1). In this review, we will focus on the progress made in       recent years and will briefly describe: a) insights from common variant (MAF ! 5%) associations; b) results from approaches that expand the allelic frequency range to low-and rare-variant associations; c) results from diverse populations; d) early biological and functional insights and e) application of results to T2D.

Common Variant Trait Associations
Genome-wide association studies (GWAS) have transformed the landscape of glycaemic trait genetics. Prior to GWAS FG was associated with genetic variants in GCK (Glucokinase) (15). Subsequently, early GWAS replicated the GCK association (16,17) and identified novel associations with FG at G6PC2 (16,17) and GCKR (18)(19)(20). Aggregation of data through meta-analyses, primarily in populations of European ancestry in the setting of large consortia (such as the Meta-Analyses of Glucose and Insulin-related traits Consortium, MAGIC), and the development of targeted arrays such as the Metabochip (21), have increased the number of associations between common variants and the most commonly used glycaemic measures (FG, FI, 2hG and HbA1c) to over 70 (Table 1), accounting for <6% of phenotypic variance in Europeans (22,23). Association with more sophisticated glycaemic measures, identified additional genome-wide significant loci, such as LARP6 and SGSM2 associated with fasting proinsulin (24), NAT2 associated with euglycemic clamp and insulin suppression test techniques (25), BCL2 and FAM19A2 associated with the modified Stumvoll Insulin Sensitivity Index (ISI) (a dynamic measure of whole-body insulin sensitivity) (26). These measures enabled detailed physiological characterization of existing loci (27)(28)(29), including establishment of the role of MTNR1B in decreased early phase insulin response (30). An alternative measure of impaired glucose tolerance, 1-h glucose (1hG), may warrant further research following studies investigating its potential utility (31,32), and the identification of novel loci MYL2, C12orf51 and OAS1 associated 1hG in Koreans (33) ( Table 1).

The Contribution of Low Frequency and Rare Variants
The majority of genome-wide association signals are both common and non-coding, and recent efforts have focused on the contribution of rare (MAF < 1%) and low frequency (1% MAF < 5%) variants, and their role as possible causal variants. Current strategies include: 1) genotyping arrays targeting the exons (also known as 'Exome Chips') or with combined common variant backbone and exonic content; 2) genome-and exome -wide sequencing and 3) combined genotyping arrays and dense imputation using sequence based reference panels such as 1000 genomes (34), UK10K (35,36) and HRC (37).
The UK10K Consortium (35) performed low depth (7x) whole-genome sequencing in 3,781 participants from two British cohorts (ALSPAC and TwinsUK) and conducted association analyses with 31 phenotypes available in both cohorts, replicating common variant associations at G6PC2-ABCB11 with FG. Subsequent fine-mapping efforts identified missense variant associations as the causal variant or within the credible set of causal variants at GCKR (L446P) and SLC30A8 (R325W) (41).

Transferability to Other Ancestries and Fine Mapping
Driven by the availability of large sample sizes, the majority of early GWAS studies were performed in populations of European ancestry. Since then, efforts have expanded to diverse populations, leveraging differences in allele frequency and linkage disequilibrium (LD) structure, to harness power for novel locus discovery and fine-mapping (42). While genetic effect sizes for common variants are largely consistent across ancestry groups, allele frequencies can vary (43,44), improving power for association in certain populations. Studies in African Americans have identified SC4MOL and TCERG1L associated with FI and insulin resistance (HOMA-IR) (45), and FAM133A and PELO associated with FI, where PELO was identified in a trans-ethnic meta-analysis combining African American data with publicly available European summary statistics from MAGIC (46). In East Asians, studies have identified SIX2-SIX3, C12orf51, PDK1-RAPGEF4, KANK1 and IGF1R associated with FG (33,47,48), MYL2, C12orf51 and OAS1 associated with 1-2hG (33) and HBS1L-MYB, CYBA, MYO9B and G6PC3 for HbA1c (49,50) (Table 1).
More focused replication and fine-mapping efforts have also been carried out in African Americans (51)(52)(53), Asian populations (54,55) and an admixed Mexican population (56). Exact (the same index variant) and local replication has replicated variants in or near MNTR1B, G6PC2-ABCB11, GCK, IRS1, TCF7L2, DGKB, FADS1, GCKR, SLC30A8 and ZMAT4 associated with FG and GCKR with FI. These results suggest partial locus transferability but are limited in power by the relatively modest sample sizes (largest discovery sample sizes, N$20-25 K) compared to the much larger European ancestry efforts (N$ 108-133 K for FI and FG) that have led to the discovery of the loci being assessed. Nonetheless they highlight the utility of diverse populations to refine association signals, to fewer probable casual variants. For example, inclusion of African American samples in a trans-ethnic fine-mapping approach reduced the credible set (smallest set of SNPs that accounts for 99% of the posterior probability of containing the causal variant at the locus) at GCK and ADCY5 for FG, PPP1R3B for FI, and GCKR for FG and FI, to a single SNP (46).
In contrast, population isolates derive from a small number of founder individuals, have reduced genetic diversity and higher levels of LD, and enrichment of some rare alleles following the initial bottleneck, thus increasing power and facilitating genetic discovery (57,58). Successful outcomes are the TBC1D4 locus identified in Greenland strongly associated with 2hG and 2hI (59), and most recently, a variant (P50T) in AKT2 associated with a large effect (12% increase) on FI, with MAF 1.1% in Finns, but virtually absent (MAF 0.2%) in the individuals from other ancestries (60).

Biological and Functional Insights
As mentioned earlier, most glycaemic trait genetic variant associations map within non-coding regions, with the underlying causal or effector transcript hard to establish, requiring finemapping which often necessitates other genomic evidence to establish a functional link between associated variants and underlying biology. Recent studies have shown that pancreatic islet enhancers are enriched with FG associated loci (61,62), and that pancreatic islet eQTLs provide important clues for candidate effector transcripts at FG associated loci (63,64). For some of these loci, the eQTL provides compelling confirmatory evidence for the biological candidate loci at these association signals [e.g. ADCY5, DGKB at the DGKB/TMEM195 locus, FADS1 and MTNR1B (63), replicating previous findings at this locus (64,65)]. At the ARAP1 locus a recent study (63) suggests STARD10 is the likely effector transcript, which is in contrast with earlier data (66), but consistent with another more recent report (67). At the MADD locus two potential effector transcripts were identified, MADD and ACP2 (63), supporting evidence for MADD is provided by a beta-cell specific mouse model which showed that Madd plays a role in glucose-stimulated insulin secretion (68), however the mouse phenotype did not provide any clues regarding the insulin processing effects also strongly associated with MADD (24). ACP2, on the other hand, encodes a lysosomal protein; the role of lysomes in the degradation of ageing insulin granules (69) was hypothesised by the authors (63) as a possible link for the fasting glucose and prosinsulin association signals. WARS, NKX6-3 (at the ANK1 locus) and RBMA6 (at the AMT locus) were also implicated as plausible effector transcripts but the mechanism through which they impact islet function, is as yet, unknown (63).
Loci associated with insulin resistance have been more recalcitrant to the GWAS approach and thus the number of established loci and effector transcripts is much smaller (Table 1). Recently, a blood transcriptomic genome-wide analysis (TWAS) combined with eQTL analysis, identified a trans-eQTL (rs592423) where the A-allele was associated with higher IGF2BP2 transcript levels and higher fasting insulin, suggesting this is the effector transcript at this locus (70). The TWAS also identified several genes with established roles in metabolic traits, namely IRS2 and FOXO4 involved in insulin signalling, and three genes involved in adipocyte or adipokine biology (ITLN1, PID1, ADIPOR1) (70). Another recent approach focused on identifying loci simultaneously associated with higher levels of FI adjusted for BMI, higher levels of triglycerides and lower levels of HDL, a hallmark of insulin resistance and of the condition lipodystrophy. In total, 53 associated loci were identified which when combined in a genetic risk score, were associated with increased T2D and coronary heart disease risk, but lower peripheral adipose tissue. The same loci also provided the first evidence of polygenic influence in familial lipodystrophy type 1, a severe form of insulin resistance previously thought to be monogenic in origin. Overall, these data suggested that impaired peripheral adipose tissue capacity may be an important mechanism influencing insulin resistance and is likely to be an important aetiological contributor to insulin-resistant cardiometabolic disease (13). The importance of adipose tissue differentiation in insulin resistant states was known from monogenic lipodystrophy due to mutations in PPARG (71,72) and has also more recently been demonstrated to be an important aetiological factor in T2D predisposition (73).
Complementing functional regulatory associations, the identification of multiple rare missense variants shown to affect protein function, and that contribute to a gene-based association signal, is a strong indicator that the effector transcript has been identified [e.g. G6PC2 (39,40), SLC30A8 (74) and PPARG (73)]. Similarly, single-point associations shown, or predicted, to have an effect on protein function [e.g. the P50T variant at AKT2 associated with FI (60) and the S690T and Q665E at PCSK1 associated with proinsulin and FG (24,40)], or mapping proximal to classical candidate loci are also strong indicators that the effector transcript is likely to map to those specific genes. This approach suggested that SLC2A2 (encoding GLUT2), GCK, GCKR, FOXA2 and PDX1 are the likely effector transcripts at these loci (Table  1). SLC2A2 encodes GLUT2, the main glucose transporter in the islets of rodents but not of humans, where GLUT1 and GLUT3 predominate both in islets and beta-cells, suggesting that the role of variants at this gene are likely to be mediated through effects on other metabolic tissues (75). Recently, another study has supported this hypothesis, where the C allele of rs8192675 in SLC2A2 was associated with a greater metformin-induced decrease in HbA1c levels, and was also shown to be an eQTL for GLUT2 in human liver samples. This suggested a role of hepatic GLUT2 in metformin action and glucose metabolism with significant clinical impact, and proposed as a biomarker for precision medicine (76). The importance of the liver in glucose homeostasis and FG levels, was also confirmed by studies of the P446L variant in GCKR, which demonstrated that this variant affected GCKR inhibition of GCK which was predicted to promote hepatic glucose metabolism with consequent decrease in FG (77). A number of glycaemic trait-associated loci map within, or proximal to, genes associated with a range of Mendelian metabolic disorders namely SLC2A2 (OMIM # 227810), GCK (OMIM # 125851), PPARG (OMIM # 604367), PCSK1 (OMIM # 600955), PDX1 (OMIM # 606392), GLIS3 (OMIM # 610199), IGF1 (OMIM # 608747) and HNF1A (OMIM # 600496) providing additional biological support for their candidacy as effector transcripts at these loci, and suggesting a role for rare penetrant and common variants influencing familial or polygenic traits, respectively. These data combined, highlight genes involved in glucose regulation, insulin processing, secretion and response, and transcription factors with an established role in pancreas development as important mechanisms influencing glycaemic traits. Early GWAS results highlighted for the first time in humans, the role of loci involved in circadian rhythm [MTNR1B (65,78,79) and CRY2 (80)] in glucose metabolism. These results have been replicated in many additional studies, and subsequent analyses have shown that the associations at these loci are seasondependent (81) and that clock genes are regulated in pancreatic islet cells confirming that perturbations in circadian clock components are likely important in glucose homeostasis (82). The role of circadian clock in metabolism and possible therapeutic opportunities has recently been extensively reviewed (83), though the exact mechanism of how MTNR1B is likely to affect glucose homeostasis and diabetes risk remains the subject of some controversy (84,85).

Glycaemic Traits and T2D
Fasting glucose is used to diagnosis type 2 diabetes (T2D) however, GWAS studies have demonstrated that the genetic architecture of these two traits does not fully overlap (22,80,86), suggesting that raising fasting glucose per se is insufficient to confer T2D risk and that pathophysiology is likely conditional on the affected pathway. The availability of detailed measures of glycaemia has thus helped demonstrate that a diverse set of mechanisms are involved in conferring risk of T2D. To date, T2D risk loci have been grouped into five distinct groups: a) those loci whose primary effect appears to be on insulin sensitivity (PPARG, KLF14, IRS1, GCKR); b) loci associated with decreased insulin secretion and with fasting hyperglycaemia (MTNR1B, GCK); c) a single locus, ARAP1, associated with impaired proinsulin processing; d) a large cluster of loci influencing insulin processing and secretion with modest or no detected effects on fasting glucose levels (TCF7L2, SLC30A8, HHEX/IDE, CDKAL1, CDKN2A/2B, PROX1, THADA, ADCY5, DGKB/TMEM195); and e) a large set of 20 loci that despite influencing T2D risk did not have clear associations with any of the available measures of glycaemia and which may correspond to novel mechanisms influencing diabetes by as yet not understood biology (87). Similar earlier analyses of loci influencing fasting and postchallenge glucose measures also suggested similar diverse mechanisms influencing these traits (27).
A recent large-scale trans-ethnic meta-analyses of GWAS for HbA1c has expanded the number of HbA1c-associated loci to 60, and importantly highlighted that the genetic architecture of the trait differed in African Americans compared to the other ancestries studied (European, East and South Asians). In African Americans, a single variant in the G6PD gene (G202A) responsible for glucose-6-phosphate deficiency, accounted for a significant fraction of the variance in the trait (14.4%) and led to a substantial decrease in HbA1c values in hemizygous men (0.81%-units) and homozygous women (0.68%-units). This variant, if unaccounted for, could lead to up to 2% of African Americans with T2D to remain undiagnosed, highlighting the importance of studying glycaemic traits in diverse populations in order to avoid racial health disparities in the application of precision medicine (23).

Summary and Future Directions
In conclusion, large-scale genetic association analyses, combined with information on genomic features (enhancers, expression QTLs, TWAS) and high-throughput functional assays (88) have provided an increasingly growing list of loci associated with continuous glycaemic measures. The genetic architecture of these traits is comprised of many common variants of modest effect, mostly mapping to non-coding regions, with evidence of enrichment in active islet enhancers, and some overlap with monogenic loci involved in various disorders of metabolism. Genetic locus overlap between several glycaemic traits can be observed, most notably between FG and many of the other glycaemic traits, including T2D, though this number is likely to change as larger more powered studies become available (Fig. 1). Interestingly, FG and FI, have limited overlap in associated loci which may be a reflection of underlying differences in physiology affecting these traits (Fig. 1). These approaches have revealed some expected, and some novel pathways involved in glucose homeostasis, with recent efforts highlighting a number of low-frequency or rare missense variants affecting protein function, which provide compelling evidence for the effector transcript at a given locus. Studies of diverse populations have demonstrated, for the most part, the transferability of glycaemic trait-associated loci across ancestries and highlighted the power of isolated populations to identify variants of larger effect sizes. More recently, large-scale trans-ethnic genetic analysis of HbA1c highlighted the need for more powered studies on diverse ancestries to avoid health disparities in the application of genomics to the clinic. Future efforts combining sequencing approaches, increased sample sizes (particularly in non-European ancestries), understanding of the noncoding regions of the genome and the integration of other 'omics' data will continue to improve understanding of the biology underlying glycaemic traits and how they impact on disease.