The Triangulation WIthin a STudy (TWIST) framework for causal inference within pharmacogenetic research

In this paper we review the methodological underpinnings of the general pharmacogenetic approach for uncovering genetically-driven treatment effect heterogeneity. This typically utilises only individuals who are treated and relies on fairly strong baseline assumptions to estimate what we term the ‘genetically moderated treatment effect’ (GMTE). When these assumptions are seriously violated, we show that a robust but less efficient estimate of the GMTE that incorporates information on the population of untreated individuals can instead be used. In cases of partial violation, we clarify when Mendelian randomization and a modified confounder adjustment method can also yield consistent estimates for the GMTE. A decision framework is then described to decide when a particular estimation strategy is most appropriate and how specific estimators can be combined to further improve efficiency. Triangulation of evidence from different data sources, each with their inherent biases and limitations, is becoming a well established principle for strengthening causal analysis. We call our framework ‘Triangulation WIthin a STudy’ (TWIST)’ in order to emphasise that an analysis in this spirit is also possible within a single data set, using causal estimates that are approximately uncorrelated, but reliant on different sets of assumptions. We illustrate these approaches by re-analysing primary-care-linked UK Biobank data relating to CYP2C19 genetic variants, Clopidogrel use and stroke risk, and data relating to APOE genetic variants, statin use and Coronary Artery Disease.

ing to explain treatment effect heterogeneity, by the science of Pharmacogenetics. A canonical example is Clopidogrel: the primary drug for ischemic stroke prevention in the UK and many other countries (see https://cks.nice.org.uk/antiplatelet-treatment). It requires CYP2C19 enzyme activation in order to be properly metabolised into the active form of the drug and thus work to its fullest extent. However, it has long been known that both common loss-of-function and gain-of-function variants within the CYP2C19 gene region can massively impact each patient's ability to metabolise the drug [2]. Consequently, when prescribed in a primary care setting its effectiveness is heterogeneous, working for some and not for others. Estimating the true effectiveness of a treatment from observational data is challenging due to 'confounding by indication' [3] (see Figure 1 top-right). For example, Clopidogrel use will, quite rightly, strongly depend on an individual's underlying risk of stroke, but socio-economic factors may also influence an individual's ability to access appropriate healthcare [4]. Use for a sustained period may additionally depend on whether they tolerate the drug without side-effects. Whilst these confounding factors could in principle be directly accounted for in a statistical analysis if complete information on a patient's clinical state were available, this is seldom the case. A well known example, albeit in a non-pharmacogenetic context, is hormone replacement therapy, which was linked to increased cancer risk in observational data but not in subsequent randomized trials [5]. Thankfully, the need for adjustment can be circumvented if the purpose of the analysis is is instead to compare the relative effectiveness of a treatment across across genetic groups (see Figure 1 bottom-right). In the case of Clopidogrel and CYP2C19, typically one might assume that CYPC219 variants G: do not predict whether an individual receives Clopidogrel (T ); are not associated with any confounders (U ) predicting Clopidogrel use and stroke (Y ); and only affect stroke risk through their interaction with Clopidogrel (denoted by the variable T * ). We will refer to these as the 'Pharmacogenetic' (PG) assumptions as a counterpart to the IV assumptions utilised by MR. A key difference exists between the role of the gene in MR and the role of the gene in pharmacogenetics: In MR, genes are assumed to directly influence the modifiable exposure. In Pharmacogenetics we can think of treatment as the exposure, and the genes are hypothesised to alter treatment once it is taken.
In recent work, Pilling et al [6] use data from GP-linked UK Biobank participants on Clopidogrel treatment to estimate the its effect in different CYP2C19 genetic subgroups and from this the number of strokes that could potentially be avoided if all individuals could experience the same benefit as the group with the most favourable genotype (through either dose modification or through switching to an alternative drug). In this paper we review the methodological underpinnings of the general PG approach, which utilises only individuals who are treated and relies on fairly strong baseline PG assumptions to estimate what we refer to as the 'genetically mediated treatment effect' (GMTE). When the PG assumptions are seriously violated, we show that a robust but less efficient estimate of the GMTE that incorporates information on the population of untreated individuals can instead be used. In cases of partial violation, we clarify when Mendelian randomization and traditional confounder adjustment can also yield consistent estimates for the GMTE. A decision framework is then described to decide when a particular estimation strategy is most appropriate and how specific estimators can be combined to further improve efficiency.
Triangulation of evidence from different data sources, each with their inherent biases and limitations, is becoming a well established principle for strengthening causal analysis [7]. We call this framework 'Triangulation WIthin a STudy' (TWIST)' in order to emphasise that an analysis in this spirit is also possible within a single data set, using causal estimators that are approximately statistically uncorrelated, but reliant on different sets of assumptions. We illustrate these approaches by re-analysing primary-care-linked UK Biobank data relating to CYP2C19 genetic variants, Clopidogrel use and stroke risk, and data relating to APOE genetic variants, statin use and Coronary Artery Disease (CAD).

Methods
Suppose that we are interested in evaluating the maximal effectiveness of a treatment T on an outcome Y using observational data. For simplicity we will assume initially that T is a binary treatment indicator so that, if prescribed, it is taken in full, that Y is a continuous or binary outcome variable and we are interested in estimating the treatment effect as a mean or risk difference contrast. A naive way of estimating this effect would be to compare outcomes across those who are treated and those who are untreated. Borrowing terminology from the clinical trials literature, we refer to this as the 'As treated' The AT estimate may not directly address our research needs for two reasons. Firstly, although we may understand many of the factors which influence whether an eligible individual is prescribed treatment by their doctor, there may be unmeasured variables, U , which influence both the decision to prescribe treatment and the outcome. Indeed, even if the treatment is truly effective in reducing the severity or risk of Y , it is highly likely that the population of treated individuals may still experience worse outcomes than those who are untreated. This would mean that the sign of the AT estimate could be positive, and thus qualitatively different than the true causal effect. This is classic confounding by indication. Secondly, a pharmacological investigation may suggest that the treatment does not in fact work for a certain proportion of the population at all, has a markedly reduced effectiveness, or increases the risk of side-effects. We would therefore like to estimate the difference in patient outcomes if all patients who take treatment experienced the 'full' effect, as experienced by those with a genotype G=1, versus the reduced (or zero) effect experienced by those with genotype G=0. This effect could be realised in practice by switching to an alternative medication whose effect is not affected by patient genotype.
To make the target of our analysis more explicit we use potential outcomes notation. Let Y i (g, t) be the potential outcome for individual i if assigned to genotype level G = g and treatment level T = t and define the potential treatment variable T i (j) as the treatment experienced by patient i under allocation to genotype level j. Our quantity of interest is the genetically mediated treatment effect (GMTE) on the outcome: This estimand reflects the extent to which a gene affects the outcome through the pathway of treatment only, and not the effect of the genotype on the outcome through any other mechanism. In order to make ideas more concrete, consider the following model for the expectation of Y given treatment, T , genetic variant G and unmeasured confounder U : Under model (3), β 1 and β 0 reflect the treatment effect experienced by those with genotype G=1 and G=0 respectively. The parameter γ GY represents the direct effect of G on Y and γ Y U represents the direct effect of U on Y . The GMTE is equal to the difference in treatment effects experienced by the two genetic groups, β 1 − β 0 . If the two genetic groups experience distinct and non-zero treatment effects because both β 1 and β 0 are non-zero, model (3) violates the Homogeneity assumption (IV4) typically invoked in Mendelian randomization (see Figure 1). When model (3) is re-written as in (4), we see that Homogeneity violation can be understood as T affecting Y in two ways: directly through a main effect (β 0 ) as well as indirectly through its interaction with G (β 1 − β 0 ). If those who take treatment with genotype level G=0 experience no treatment effect all (β 0 = 0), then the Homogeneity assumption would be satisfied and model (3) reduces to: In the general case of treatment effect heterogeneity (model (3)), when assumptions PG1-PG3 hold, so that G ⊥ ⊥ T |U , G ⊥ ⊥ U and G ⊥ ⊥ Y |T, U (this last condition would imply that γ GY in equation (5) is zero), then we can consistently estimate the GMTE using only treated individuals via the 'GMTE(1)' estimate:β whereÊ [.] refers to the sample mean. We use the additional subscript '(1)' to denote that the estimate conditions on T =1 and the bracket '(Y)' to denote the outcome, Y . Although the latter notation may seem redundant at first sight, it is needed for subsequent development. Further intuition for this estimate is provided using DAGs in Figure 2 (top-left) and Section 2.1. Note that in the DAG we replace the product T G with the variable T * to denote the mechanistic interaction. In Appendix 1 we discuss the precise conditions under which the GMTE(1) estimate is consistent for the GMTE under our assumed model. For example, it can still be consistent even when assumption PG1 is violated if T ⊥ ⊥ U . We call this the No Unmeasured Confounder (NUC) assumption from now on.

A robust GMTE estimator
The GMTE(1) estimate is simple, intuitive and can be calculated entirely in the sub-population of those who are treated. However, in this context it generally requires assumptions PG1-PG3. Violation of PG1 implies that an individual's genotype directly influences the likelihood that they receive treatment. For example, it could be that those with a G=0 genotype have an increased risk of side effects on treatment and choose to immediately come off the drug. An alternative explanation could be genetic population stratification [8]: e.g the allele frequency of the genetic variant G and the rate of treatment could simultaneously vary across individuals from different ethnic groups. An example of PG2 violation would be if the genetic variant increases the likelihood of an unmeasured risk factor for the outcome, and this risk factor also increases their likelihood of being treated. An example of PG3 violation would be if an individual's genotype directly affects the outcome through a pathway completely independent of either treatment or any confounding factor, which could be viewed as horizontal pleiotropy [9]. When any of these assumptions are violatedβ GM T E(1) (Y ) will reflect the genetically mediated effect of treatment plus the bias due to PG1-3 violation. The first and second examples above are illustrated in Figure 2 (top-left). In this case the bias is the sum of Whilst the bias contributions b P G2 and b P G3 are clear, bias contribution b P G1 is perhaps less so; it occurs because the GMTE estimate explicitly conditions on treatment T , the presence of an association between G and T makes T a 'collider' [10]. Thankfully, when assumption PG1 is satisfied but PG2-3 are violated, the additional bias terms b P G2 and b P G3 can be removed by incorporating information on the untreated population. This is achieved by calculating the equivalent GMTE(1) estimate for the untreated group, and subtracting it from that in the treated group. The 'Robust' genetically mediated treatment effect, RMTE(Y), is therefore defined aŝ This bares a strong resemblance to the 'difference-in-differences' approach used in econometrics [11]. For further clarification see Figure 2 (top-left). In Appendix 2 we discuss the precise conditions under which the RGMTE estimate is consistent for the GMTE under our assumed model. Although assumption PG1 is key for the RGMTE, we show that it can still consistently estimate the GMTE if PG1 is violated but the NUC assumption is satisfied.

5
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021.

The Mendelian randomization estimate
Given data on both treated and untreated individuals, it is possible to obtain an estimate for the GMTE by using the genetic variant G as an instrumental variable for the mechanistic interaction variable T * = T G, as in Mendelian randomization. The MR estimate with a single instrument is known to have the form where T * is the de-facto exposure. As discussed previously, the MR estimate is usually justified by assuming that G satisfies assumptions IV1-IV3 with respect to the exposure (in our case, T * ) and that the Homogeneity assumption (IV4) is satisfied. However, unless those who have genotype level G=0 experience no treatment effect, the Homogeneity assumption will be violated in this setting. In Appendix 3, we show that the MR estimate in (8) is consistent for the GMTE if PG2-PG3 hold and either PG1 holds, or the Homogeneity assumption holds (β 0 =0). Further intuition for this approach is given in Figure 2 (top-right).

A corrected 'As Treated' estimate
As previously discussed, the As Treated estimate suffers in general from confounding by indication, and a lack of specificity to the genetic variant driving the mechanistic interaction with the treatment. However, if the Homogeneity and NUC assumptions are satisfied and either PG1 or PG3 are satisfied, then the 'Corrected' As Treated estimate (CAT): can consistently estimate the GMTE. Further details are provided in Appendix 4 and for further intuition see Figure 2 (bottom-left).

Implementation of the methods
In Table 1 we provide summary information for the GMTE(1), RGMTE, MR and CAT estimates, including the assumptions that they rely on and how they can be obtained using generic regression models, where the variable Z is included as a measured covariate. We also include a column that shows, how to test for important potential confounders of treatment and outcome using an estimate-specific approach (see Appendix 1-4 for more details). A nice property is that, for each estimate of the GMTE, one calculates exactly the same estimate, but using a potential confounder U as the outcome. If this estimate is non-zero then this indicates a meaningful bias, unless said confounder is adjusted for.
When the outcome is continuous, the approaches can be implemented using linear regression to estimate the GMTE as a mean difference. With a binary outcome, we recommend estimating risk differences directly using either a linear probability model, or using a logistic regression model to furnish estimates on the risk difference scale. In the latter case, a risk difference can be constructed by calculating the mean difference between predicted probabilities under manipulation of the appropriate variable between levels 1 and 0, as suggested by Gelman and Hill [12]. For example, in the case of the GMTE(1) estimatorβ is the estimated fitted value from a logistic regression of Y on T ,T * and Z at treatment level j. For time-to-event data, we recommend analysing the data using an Aalen additive hazard model, as suggested by Tchetgen et. al. [13]. Specifically, one would assume that model (3) holds on the additive hazard scale, so that at time t y the hazard: The proposed modelling framework for estimating mean, risk differences or additive hazard differences is intended to guarantee that we can obtain estimates for the GMTE from different estimators on the same scale, because these measures are collapsible. That is, they should remain constant when marginalised over unobserved confounders [14]. This is especially important for being able to effectively combine methods, as described in the next section.
6 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021.

Which estimates can be combined?
When two estimates are highly correlated, we gain little knowledge when they are observed to be similar. However, when two uncorrelated or weakly correlated estimates are similar, it gives credence to the hypothesis that they are estimating the same underlying quantity, and there is the potential to combine them into a single, more precise estimate. In Appendix 5 we show that the RGMTE and MR estimates are asymptotically uncorrelated. We also show that the CAT estimate is mutually uncorrelated with the GMTE(1) and RGMTE estimates, and uncorrelated with the MR estimate when G ⊥ ⊥ T . In cases where G and T are not perfectly independent, but G is a modest predictor of T (a highly plausible scenario in most pharmcogenetic contexts), the correlation between the MR and CAT estimate will be non-zero but practically negligible. The fact that most estimate-pairs are uncorrelated makes them easy to combine via a simple inverse variance weighted average or meta-analysis. In order to decide whether two uncorrelated estimates can be combined, we propose the use of a simple heterogeneity statistic. Taking the GMTE(1) and CAT estimates as an example, we would first calculate where e = {GM T E(1), CAT } and where w e is the inverse variance of estimateβ e . If Q GM T E(1),CAT < χ 2 1,1−α (where α is pre-specified) then we judge the GMTE(1) and CAT estimates to be sufficiently similar to combine into a single estimateβ GM T E(1),CAT (Y ). If Q GM T E(1),CAT is greater than the prespecified threshold then the two estimates should be left separate.
The RMGTE and MR estimates are in general highly correlated with the GMTE(1) estimate, and should therefore not be combined. In Appendix 6 we show that the MR and RMGTE estimates can be viewed as complementary functions of the GMTE(1) and GMTE(0) estimates, and that the combined RGMTE/MR estimate is exactly equivalent to the GMTE(1) estimate when G ⊥ ⊥ T and the proportion of treated and untreated participants in the data is the same. Figure 2 (Bottom-right) shows a pictorial diagram of all single and combined estimates that can be derived using the above heterogeneity statistic criteria. This comprises four original estimates (CAT,GMTE1,RMGTE,MR), four 'paired estimates (CAT/GMTE1, CAT/RGMTE, CAT/MR, RGMTE/MR) and one 'triplet' estimate (CAT/RGMTE/MR), making nine in total. One possible analysis option would be to report all single and valid combined estimates which are sufficiently homogeneous according to a particular significance threshold. Another option would be 7 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021. to allow the GMTE(0) estimate to initially guide the analysis towards either the GMTE(1) estimate (and its possible combination with the CAT estimate) or the RMGTE estimate (and its possible combination with either MR estimate, the CAT estimates or both).

Simulation illustration
Trial data comprising a binary genotype G, treatment indicator T , observed covariate Z and a continuous outcome Y are simulated for n=10,000 patients using the following data generating model which is consistent with the DAG in Figure 3:  Under this model, assumptions PG1-PG3 are violated if γ GT , γ GU and γ GY are non-zero respectively. Finally, the Homogeneity (Hom) and NUC assumption (U ⊥ ⊥ T ) are violated when β 0 and γ U T are nonzero respectively. Note that if the NUC assumption holds, then PG2 is in a sense automatically satisfied because U is no longer a confounder. However, in this case there may still be a path from G to Y via U . This would then form all or part of any PG3 violation. Figure 4 shows the distribution of estimates for the GMTE obtained across 500 independent data sets and six simulation scenarios, using the CAT, GMTE(1), RGMTE and MR estimators. We also show the distribution of the GMTE(0) estimate in each case, as a helpful guide to understand the extent of bias that can be estimated from the data. In all scenarios the true GMTE is fixed at -0.5. Table 2 shows the mean point estimates, standard errors and 95% confidence interval coverage corresponding to the same six scenarios. Table 3 shows mean point estimates, standard errors and 95% confidence interval coverage for the 5 combined estimators shown within the decision framework in Figure 2 (bottom-right), using a significance threshold of α=0.05.
In Scenario 1 assumption PG3 is violated but all others (PG1, PG2, Hom, NUC) are satisfied. In this case both the CAT and RMGTE estimators are unbiased, with the RGMTE having the smallest standard error. In Table 3 we see that the combined RGMTE/CAT estimate is consequently unbiased with a standard error of 0.085, which is smaller than either the RMGTE or CAT estimates.
In Scenario 2 the NUC assumption is violated but all others (PG1, PG2, Hom) are satisfied. In this case the GMTE(1), RGMTE and MR estimators are unbiased, with the GMTE(1) estimate being the most precise. In Table 3 we see that the combined RGMTE/MR estimate is consequently unbiased with 8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

9
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

10
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021. ; a standard error near-identical to the GMTE(1) estimate, in line with the theoretical prediction outlined in Appendix 6. In Scenario 3 assumption PG1 is violated and (PG2, PG3, Hom, NUC) are satisfied. In this case all estimators are unbiased. In Table 3 we show in this case that the most efficient unbiased estimate of all comes from combining the RGMTE, CAT and MR estimates. In Scenario 4 PG1 and NUC are violated but the remaining assumptions (PG2, PG3, Hom) are satisfied. In this case only the MR estimate is unbiased. Consequently, no combined estimator is unbiased. In Scenario 5, PG2 and Hom are satisfied but (PG1, PG3, NUC) are satisfied. In this case the GMTE(1) and RGMTE estimators are unbiased, with the GMTE(1) estimator being the most efficient. No combined estimate is unbiased. In Scenario 6 all assumptions except PG1 are violated. In this case only the RGMTE estimate is unbiased and, again, no combined estimate is unbiased.
In conclusion, our simulation study provides an empirical verification of the strengths and limitations of each approach, and when any two uncorrelated estimates can be effectively combined via a simple inverse variance weighted meta-analysis. Although the standard error of any estimate that combines the CAT and MR estimates requires G to be independent of T to be strictly valid (since this implies a zero correlation between their estimates) when this assumption is violated in Scenario 3 it only induces a modest loss of coverage (e.g. 91% for the CAT/MR estimate and 92% for the CAT/RGMTE/MR estimate). Across the simulations the RGMTE is emerges as the most robust estimate.

Applied analyses 4.1 Clopidogrel, CYPC219 & Stroke risk
Clopidogrel is a widely used anti-platelet therapy that impairs platelet aggregation with consequent reductions in risk of atherothrombotic events such as myocardial infarctions and ischemic strokes [15]. Clopidogrel is a pro-drug that requires activation by liver enzymes, primarily CYP2C19. Genetic variants in CYP2C19 impair function with subsequently reduced clopidogrel active plasma levels [16], and we have previously shown using primary care linked data on UK Biobank participants that carriers of these variants have increased risks of ischemic stroke and myocardial infarction (MI) whilst prescribed clopidogrel in the primary care setting [6]. In this work we calculated the population attributable fraction using established methods by analysing data on only those who were treated with Clopidogrel, but we revisit the analysis and apply the full decision framework proposed in this paper.
The UK Biobank study recruited 503,325 volunteers from the community who attended one of 22 assessment centres in England, Wales or Scotland between 2006 and 2010 [17]. Participants were aged 40 to 70 years at the time of assessment, and baseline assessment included extensive questionnaires on demographic, health, and lifestyle information. Blood samples were taken, allowing analysis of participant genetics. Ethical approval for the UK Biobank study was obtained from the North West Multi-Centre Research Ethics Committee. This research was conducted under UK Biobank application 14631 (PI: DM).
Linked electronic medical records from primary care are available for 230,096 (45.7%) of participants, which includes >57 million prescribing events between 1998 and 2017. Detailed description of the data extracted and limitations are available from UK Biobank. For this analysis we excluded 5,353 participants missing any genetics data, then 14,856 of non-European genetic ancestry, then 555 missing any CYP2C19 loss of function genotype data, leaving 209,333 participants with sufficient primary care and genetic data. N=198,868 never received a Clopidogrel prescription. N=938 only ever received one prescription, so did not have sufficient exposure time for study. Of the 9,527 participants remaining, in 2,044 the prescribing frequency was less than once every 2 months, and these were also excluded. This left 7,483 participants with at least two Clopidogrel prescriptions for analysis. Baseline information on the included participants is shown in Table 4.

11
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021.

Estimating the GMTE
To estimate the GMTE in this case we modelled the time to stroke using an Aalen additive hazards model, as described in Section 2.4. All models were adjusted for age at recruitment or first Clopidogrel prescription, sex, and the first 10 genetic principal components of ancestry. Figure 5 and Table 5 show the results for this analysis, which reflect the genetically mediated effect of Clopidogrel treatment on the hazard of stroke per year, expressed as a percentage. The GMTE(1) estimate suggests that being a CYP2C19 LoF carrier (G=1) increases the risk of stroke by 0.28% (p=0.048) compared to those without the LoF variant (G=0). To put this figure in context, if we could reduce the LoF carrier's risk by this amount then, when multiplied by the 5264 LoF carrier patient years in the data, it would lead to an expected 13.2% reduction in the total number of strokes (or a reduction of 15 strokes from 110 to 95). To test for potential bias in the GMTE(1) estimate, we calculate the GMTE(0) estimate in the untreated population. Thankfully, it is close to zero (Hazard diff = -0.0039%, p=0.61), although slightly negative. Taken at face value, this suggest LoF carriers have a slightly reduced risk of stroke through pathways other than Clopidogrel use. Next we calculate the Corrected As Treated (CAT) estimate. As discussed, the validity of this method rests strongly on being able to identify all confounders of Clopidogrel use and stroke. With the data available, it was only possible to adjust for age, sex and genetic principal components and perhaps unsurprisingly, the CAT estimate is an order of magnitude larger (Hazard diff = 2.2%, p≤ 2 × 10 −16 ). Consequently, the Q CAT,GM T E(1) statistic detects large heterogeneity and suggests that the CAT and GMTE(1) estimates should not be combined.

12
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021.

Statins, AP OE & CAD
We now apply our framework to estimate the extent to which genetic variation at the AP OE locus modulates the risk of coronary artery disease (CAD) due to statin treatment using UK Biobank data. Our full data comprises 155,409 unrelated participants of European descent, with primary care data available (updated to March 2017) and up-to-date hospital admission data as of December 2020. Of this sample, we excluded: n=11 participants with missing APOE genotypes; n=6,456 non-regular statin users with less than four prescriptions per year or residuals from the linear regression for total statin prescriptions on years of statin treatment greater than 3 or less than -3; n=1,273 non-statin users diagnosed with CAD at baseline (or prior to baseline); n=4,566 participants starting statin after a doctor's diagnosis of coronary artery disease (CAD) based on the hospital admission records.

Results
Using the e3e3 group as a reference, we fitted Aalen additive hazard models within each mutually exclusive genetic group additionally adjusting for sex, age on statin or age at recruitment, and the top 10 genetic principal components. For brevity, we focus on the results of the e2e2 versus e3e3 and e4e4 versus e3e3 analyses, which are shown in Table 7 and account for approximately 72% of the patient data. Estimates reflect the hazard or risk difference of a CAD event per year, expressed as a percentage.
Only results of combined estimates that pass a heterogeneity test at the 5% level are shown. Equivalent 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021.

e4e4 versus e3e3
Inspecting the e4e4 genetic subgroup first, the GMTE(1) estimate suggests that the risk of CAD could be reduced by 0.031% per year if e3e3 patients experienced the same treatment effect as e4e4 patients (p=0.043). This estimate is valid if the e4e4 genotype only affects the risk of CAD through modulating the effectiveness of statins (i.e. assumptions PG1-PG3 hold). In order to probe this we calculate the equivalent GMTE(0) estimate in non-statin users. The e 4 e 4 group is now seen to have a 0.011% larger risk of CAD than e3e3 (p=0.07), which suggests that PG1-PG3 violation is possible. Since the RMGTE(0) and RGMTE(1) estimates are of opposite signs, the RGMTE estimate, which is robust to PG2-PG3 violation, infers the risk difference between e4e4 and e3e3's is larger at -0.037% per year (p=0.046). The MR estimate of the GMTE is also positive (0.0069%), but very close to zero (p=0.69). This is, however, sufficiently similar to the RGMTE estimate for it to be combined with the MR estimate, and the combined value suggests a hazard difference of -0.014% per year (p=0.28). The CAT estimate for the hazard difference in these data is a 3.9% increase per year. Its magnitude is so large compared to the other estimates that we can infer adjustment for age, sex and genetic PCs is again not sufficient to remove confounding by indication. Consequently, no other estimate is sufficiently similar in order to combine with the CAT estimate, as shown in Figure 6 (left). In the final column of

e2e3 versus e3e3
Turning our attention to the e2e3 subgroup in Table 7 and Figure 6 (right), we again see a large, noncredible CAT hazard difference estimate for the GMTE of a 1.2% per year between e2e3 and e3e3 groups. The GMTE(1), GMTE(0) and RGMTE estimates for the GMTE are all close to zero and non-significant at the 5% level. In contrast, the MR estimate for the GMTE suggests that e2e3's have a 0.046% reduced risk of CAD per year (p=3×10 −5 ). Using this estimate, the expected number of CAD events that could be avoided if all 26,938 e3e3 patients could receive the same benefit as the e2e3 patients is 128. This is valid if we believe that assumptions PG2-PG3 hold, but either PG1 or the Homogeneity assumption are violated. The GMTE1 and RGMTE estimates imply more modest reductions of 28 and 15 CAD events respectively. In this example, no two single uncorrelated estimates are sufficiently similar in order to combine.

Discussion
In this paper we propose a general framework for estimating the genetically mediated treatment effect that combines several distinct but complementary causal inference techniques. We propose a rudimentary decision framework for choosing when to combine approaches based on heterogeneity statistics. In practice, expert knowledge and prior evidence should also be leveraged to decide whether the particular assumptions of the causal estimation strategy are likely to be met, in order to put more or less weight on their findings. Our inverse variance meta-analysis procedure for combining estimates is very simple, and simulations showed that exhibited good statistical properties even when small correlations between constituent estimates were present. As future work, we plan to develop a more sophisticated procedure to explicitly account for this correlation. We also plan to develop a hierarchical testing procedure that can be applied across the TWIST framework to control family-wise or false-discovery error rates.
In our analysis of the statin data, we estimated the GMTE in several mutually exclusive genetic groups, which resulted in an inevitable loss of precision. Efficiency could potentially be regained by collapsing genetic subsets together if they give similar estimates, or by making a linearity assumption about the magnitude of effect across genotypic groups (e.g. between e3e3, e3e4 and e4e4). This would not be 15 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 7, 2021. defensible if the genetic groups were ordered with respect to the magnitude of their causal estimate, but would be defensible if genetic groups could be ordered by their effect on increasing drug metabolism. In the case of 3 genetic groups, G i and T * i could take a value in {0,1,2}. This would enable the data to be pooled in order to target a combined estimand for all m in {1, 2}. If such a model were correct, it opens up the possibility of making the analysis even more robust to violations of the PG assumptions, because an additional causal parameter could be jointly estimated alongside the GMTE to reflect, for example the direct effect of G on Y . This is an important avenue for further research.
Although the CAT estimate can, in principle consistently estimate the GMTE estimand, it relies heavily on the NUC assumption. In both applied analyses we were not able to sufficiently control for confounding by indication to deliver an estimate close to any other GMTE estimate, due to a lack of relevant covariate data. In future work we plan to revisit both analyses after collating a much larger set of relevant information. More-sophisticated approaches such as Propensity Scores, matching methods and inverse probability weighting may then offer some utility [19]. So too may methods for multi-variable Mendelian randomization, where instead of directly adjusting for confounders of treatment and outcome, we instead adjust for their genetically predicted value. This latter approach could be more robust to collider bias [10].
In supplementary material we provide R code for fitting the TWIST framework for continuous, binary and time-to-event data as well as code used in the simulation study. Work is ongoing to produce a single R package to apply TWIST and visualise its results.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2021. ; https://doi.org/10.1101/2021.05.04.21256612 doi: medRxiv preprint Using similar arguments it is easy to show that τ 11 Cov(β CAT (Y ),β RGM T E (Y )) = 0 τ 11 π 11 Cov(β CAT (Y ),β M R (Y )) = 0 if G ⊥ ⊥ T , Cov(β CAT (Y ),β GM T E(1) (Y )) = 0 In cases where G ⊥ ⊥ T does not hold, but G predicts a relatively small amount of variation in T , the correlation between the CAT and MR estimate will be non-zero but negligible. This is a reasonable assumption in almost all pharmacogenetic contexts. This means that there are effectively only two pairs of correlated estimates: (GMTE(1), RGMTE), and (GMTE(1), MR).

20
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2021.

21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 7, 2021. ; https://doi.org/10.1101/2021.05.04.21256612 doi: medRxiv preprint