Estrogen exposure, metabolism, and enzyme variants in a model for breast cancer risk prediction.

Estrogen is a well-known risk factor for breast cancer. Current models of breast cancer risk prediction are based on cumulative estrogen exposure but do not directly reflect mammary estrogen metabolism or address genetic variability between women in exposure to carcinogenic estrogen metabolites. We are proposing a mathematical model that forecasts breast cancer risk for a woman based on three factors: (1) estimated estrogen exposure, (2) kinetic analysis of the oxidative estrogen metabolism pathway in the breast, and (3) enzyme genotypes responsible for inherited differences in the production of carcinogenic metabolites. The model incorporates the main components of mammary estrogen metabolism, i.e. the conversion of 17β-estradiol (E2) by the phase I and II enzymes cytochrome P450 (CYP) 1A1 and 1B1, catechol-O-methyltransferase (COMT), and glutathione S-transferase P1 (GSTP1) into reactive metabolites, including catechol estrogens and estrogen quinones, such as E2-3,4-Q which can damage DNA. Each of the four genes is genotyped and the SNP data used to derive the haplotype configuration for each subject. The model then utilizes the kinetic and genotypic data to calculate the amount of E2-3,4-Q carcinogen as ultimate risk factor for each woman. The proposed model extends existing models by combining the traditional “phenotypic” measures of estrogen exposure with genotypic data associated with the metabolic fate of E2 as determined by critical phase I and II enzymes. Instead of providing a general risk estimate our model would predict the risk for each individual woman based on her age, reproductive experiences as well as her genotypic profile.


Introduction
Estrogens have long been recognized as the primary risk factor for the development of breast cancer. 1,2 Epidemiologic studies have indicated that breast cancer risk is higher in women with early menarche and late menopause, who have longer exposure to estrogens. 3 A pooled analysis of nine prospective studies found that circulating estrogen levels were directly related to risk of breast cancer in postmenopausal women. 4 Based on these data, current models of breast cancer risk prediction are mainly based on cumulative estrogen exposure and include such factors as age, age at menarche, and age at fi rst live birth; 5,6 (www.cancer.gov/bcrisktool). While all these studies implicate estrogens as risk factor for the development of breast cancer, they leave open two important questions that need to be answered to advance from an empirical, global risk assessment to a truly etiological, individualized assessment. The questions are: (1) How do estrogens cause breast cancer? and (2) Since all women are exposed to estrogens, how do we better delineate risk? To close these gaps in our knowledge we need to explain mechanisms of estrogen carcinogenesis and inter-individual risk variation and our approach is to examine the dynamics of a pathway for estrogen metabolism and use its prediction of the level of DNA corrupting compounds as a predictor of breast cancer risk.
Carcinogenesis is usually viewed as a stepwise process beginning with genotoxic effects (initiation) followed by enhanced cell proliferation (promotion). The main estrogen, 17β-estradiol (E 2 ), is a substrate for the phase I enzymes, cytochrome P450 (CYP) 1A1 and 1B1 and a ligand for the estrogen receptor. In its dual role of substrate and ligand, E 2 has been implicated in the development of breast cancer by simultaneously causing DNA damage via its oxidation products, the 2-OH and 4-OH catechol estrogens, and by stimulating cell proliferation and gene expression via the estrogen receptor. Thus, E 2 and its oxidative metabolites are unique carcinogens that affect both tumor initiation and promotion. [7][8][9] As shown in Figure 1, E 2 is oxidized to catechol estrogens by CYP1A1 and CYP1B1. These enzymes further oxidize the catechol estrogens to semiquinones and quinones. The highly reactive estrogen quinones form Michael addition products with deoxyribonucleosides. [10][11][12] Thus, estrogen quinones share a common feature of many chemical carcinogens, i.e. the ability to covalently modify DNA. [13][14][15][16] Furthermore, estrogen semiquinones and quinones undergo redox-cycling, which results in the production of reactive oxygen species that can cause oxidative DNA damage. [17][18][19] Support for the carcinogenic activity of estrogens and their oxidative products, the catechol estrogens, comes from experiments in animal models. Treatment with either E 2 or the 2-OH or 4-OH catechol estrogens caused kidney cancer in male Syrian hamsters and endometrial cancer in female CD1 mice, the latter compounds being the most carcinogenic agents. [20][21][22] However, there is no animal model for estrogen-induced breast cancer and even in the hamster and mouse models the precise mechanism of DNA damage is uncertain. Thus, there is a need to understand estrogen metabolism in the human breast in order to elucidate the role of endogenous and exogenous estrogens in mammary carcinogenesis. To advance this understanding requires not only characterization of the various estrogen metabolites but equally important, a precise defi nition of the responsible enzymes. Several investigators have proposed a qualitative model of mammary estrogen metabolism regulated by oxidizing phase I and conjugating phase II enzymes. 23,24 The oxidative estrogen metabolism pathway starts with E 2 and E 1 , which are oxidized to the 2-OH and 4-OH catechol estrogens by the phase I enzymes CYP1A1 and CYP1B1. 25,26 As described above, the P450-mediated estrogen metabolism is expected to lead to the formation of both estrogen and oxidative DNA adducts, all of which have been shown to possess mutagenic potential. 27,28 It is postulated that the genotoxicity of the oxidative estrogen metabolism pathway is mitigated by alternate reactions of the metabolites with phase II enzymes. Specifi cally, catechol-O-methyl transferase (COMT) catalyzes the methylation of catechol estrogens to methoxy estrogens, which lowers the catechol estrogens available for conversion to estrogen quinones. 29,30 In turn, the estrogen quinones undergo conjugation with glutathione (GSH) via the catalytic action of glutathione S-transferase GSTP1. 31,32 The formation of GSH-estrogen conjugates would reduce the level of estrogen quinones and thereby lower the potential for DNA damage.
The current models of mammary estrogen metabolism have limitations. Firstly, only single enzymes, e.g. CYP1B1 and COMT, have been analyzed to date with simple substrate-product kinetics, which clearly generates an incomplete picture of the metabolic pathway. Secondly, while the model incorporates the functional roles of the phase I and II enzymes, it does so only qualitatively and it remains uncertain how the enzymes interact quantitatively. Third, each of the phase I and II enzymes contains genetic polymorphisms. 26,29,33,34 Studies from several laboratories have examined the functional implications of the polymorphisms on estrogen metabolism, again focusing on single enzymes. 26,29,30,35,36 Thus, the multitude of potential kinetic reactions resulting from the complex genetic variations of the phase I and II enzymes is completely outside the scope of the current model of estrogen metabolism. In contrast to the relatively small number of functional studies of estrogen metabolism, multiple epidemiological studies have investigated breast cancer risk in relation to genetic variation in the critical enzymes involved in estrogen metabolism with inconsistent fi ndings. 37,38 A drawback of any purely genetic assessment is the lack of information about functional interactions inherent in complex metabolic pathways such as the estrogen metabolism pathway. Thus, such studies cannot assess the underlying metabolic interactions in the pathway. 39,40 A pathway-based functional and quantitative approach is necessary to overcome the current limitation in genotype assessment.
We have developed an experimental in vitro model of mammary estrogen metabolism, in which we combined purified, recombinant phase I enzymes CYP1A1 and CYP1B1 with the phase II enzymes COMT and GSTP1 to determine how E 2 is metabolized. 41 We employed both gas and liquid chromatography with mass spectrometry (GC/MS and LC/MS) to measure the parent hormone E 2 as well as eight metabolites, i.e. the catechol estrogens, methoxyestrogens, and estrogen-GSH conjugates. With this important experimental data, an in silico model of the metabolic pathway has been developed. 42

Methods
A mathematical model for the estrogen metabolism pathway that is shown in Figure 1 can be constructed using some basic assumptions  about the kinetics of the reactions in this figure. We assume that each reaction in the pathway (A B → , a generic step in the pathway) is an enzyme-catalyzed reaction of the form: where E denotes the enzyme, C is the enzyme-substrate complex, and k i , i = 1,2,3, are the rate constants of the reaction. For these types of reaction we approximate the kinetics using the quasi-steady state assumption: 1 where E* is the initial enzyme concentration. We can "prove" this approximation by looking at the differential This system of differential equations yields two conservation laws: With the conservation laws, we can reduce the four differential equations to two differential equations: We now assume that the reaction has progressed to the state that

Hence, the formation of B(t) is approximately given by dB dt k E A K
More information about the quasi-steady state approximation can be found in Parl et al. 43 Using this approach for the individual reactions in Figure 1, we can write down to a system of nonlinear, ordinary differential equations for the concentrations of the compounds in the pathway: Here k cat j and K m j are constants and E enzyme are the enzyme levels in the reactions. There are parts of the pathway for which kinetic data is not available. In particular, rate constants are not known for the reactions: 2 23 23 , reactions. Our fi rst simplifi cation is to collapse these reaction to single reactions, 2 23   Figure 2 shows comparisons between the model (solid curves) and the data 41  Having all of the parameters of the system, one can view the model as giving functional relations between E 2 (t) and the estrogen quinone concentrations: EQ t 2 23 ( ) and EQ t 2 34 ( ). Figure 3  The mathematical model for the estrogen metabolism pathway provides a relationship between an input E 2 and two outputs AUC 23 and AUC 24 . It can also be view as connecting the area under the curve outcomes to the kinetic parameters, k cat and K m , embedded in the model. The model permits one to analyze the behavior of the area under curve variables as functions of the kinetic parameters, either for a single step in the pathway or a combination of steps. This analysis allows one to view how variations in the kinetic parameters, which are the result of polymorphism of the enzymes, affect the area under the curve outcomes.

Results
Each of the phase I and II enzymes involved in estrogen metabolism possesses genetic variants that (a) are associated with altered enzyme function and (b) occur in a sizable portion of the population. 38,44 We and others have determined the enzymatic rate constants (k cat and K m ) of the common CYP1A1, . . Figure 2. Comparison of mathematical model with experimental data. The red curves are plots of the solutions to the nonlinear system of differential equations and the blue dots are experimental data. 41 As shown, the model allowed simulations of all reactions in the pathway, which agreed well with the experimentally determined results. 42 CYP1B1, and COMT variants and compared their activity to the respective wild-type enzymes. 26,29,32,45,46 These studies were limited to individual enzyme reactions and did not take the entire estrogen metabolism pathway into account. To obtain a more realistic and inclusive view of estrogen metabolism in the female population, we utilized the model to simulate how variations in the kinetic parameters resulting from polymorphisms of the enzymes impact the metabolite concentrations. We examined 4 CYP1A1, 16 CYP1B1, and 2 COMT alleles. Thus, our simulations are based on the examination of 4 . 16 . 2 = 128 genetic combinations to demonstrate the utility of the model. Although each of the metabolites can be modeled, we concentrated our analysis on the catechols and quinones because of their documented carcinogenic activity. 15,22 Since women may differ in their combination of enzyme variants, they will have different rate constants, resulting in differences of 4-OHE 2 and E 2 -3,4-Q production. As shown in Figure 3 modeling of the 128 haplotype combina tions produced a spectrum of catechol and quinone concentrations over time, as expressed by a range of AUC values. The simulations identifi ed the haplotype combinations producing the highest and lowest AUCs. For example, the maximum AUCs for 4-OHE 2 and E 2 -3,4-Q were produced by the haplotype CYP1A1 461Asn-462Ile CYP1B1 48Arg-119Ser-432Val-453Asn-COMT 108Met , which were 2.6-and 4.6-fold higher, respectively, than the minimum AUCs produced by haplotype CYP1A1 461Thr-462Val CYP1B1 48Gly-119Ala-432Val-453Ser COMT 108Val . While 2.6 to 4.6-fold differences may not appear large, it is important to consider that they impact on lifetime exposure, which is consistent with the hormonal risk model presented by Pike. 2 If a subject's haplotypes can be resolved for all genes (i.e. she has at most one heterozygous SNP for each gene), then the in silico model can be used directly to derive the E 2 -3,4-Q production, as depicted in Figure 4. When a subject's haplotype confi gurations are uncertain for some genes because of the presence of two or more heterozygous SNPs (e.g. CYP1B1), we fi rst calculate the distribution of all haplotype confi gurations using PHASE 47 (stephenslab.uchicago.edu/software.html). Then we derive the E 2 -3,4-Q production value for each haplotype confi guration, and calculate the weighted average of all E 2 -3,4-Q production values, using the probabilities of haplotype confi gurations as weights. It can be shown that this weighted average is the expected E 2 -3,4-Q production given the genotypes. This way, we incorporate information from all genotyped SNPs and each haplotype confi guration is apportioned appropriately. Application of the model to a breast cancer case-control population (438 pre-and postmenopausal women with 221 invasive breast cancer cases and 217 controls) defi ned the estrogen quinone E 2 -3,4-Q as a potential breast cancer risk factor. This exploratory analysis identifi ed a subset of women at increased breast cancer risk based on their enzyme haplotype and consequent E 2 -3,4-Q production. 42 Based on the E 2 -3,4-Q AUC values, cases predominated in the top tier of the population. For example, among the 10 women with the highest E 2 -3,4-Q values in the entire study population, there were nine cases and one control (p-value = 0.01). These results suggest for the fi rst time the possibility that breast cancer risk prediction may be enhanced by incorporation of inherited differences in estrogen metabolism. Obviously, the model requires testing and as a fi rst step we have examined the contribution of the estrogen concentration on E 2 -3,4-Q production and the associated breast cancer risk. Numerous epidemiological studies have implicated estrogens in the development of breast cancer. 3 For example, a pooled analysis of nine prospective studies of serum estrogen levels and breast cancer in 2428 postmenopausal women revealed a strong association of serum E 2 concentrations with breast cancer risk. 4 The relative risk of breast cancer for women whose free E 2 levels were in the top quintile was 2.58 compared with 1.00 for those women whose levels were in the bottom quintile. Since the nine studies employed different methods to measure, E 2 , there were considerable differences in the median E 2 values reported. In spite of this variability, the median serum E 2 concentrations in seven of the nine studies were higher in the case patients than in the control subjects. To incorporate the different levels into our simulations, we introduced a ratio that is defi ned as E E E case control 2 2 2 0 0 Ratio = ( ) ( ) . In Table 1, we summarized the median E 2 values for the nine studies as well as the corresponding cases/controls E 2 ratios, which ranged from 0.91 to 1.34. These ratios appear rather narrow and are of unknown biological signifi cance. We used the model and our study population to determine whether such seemingly small differences in serum E 2 concentrations between cases and controls could infl uence mammary estrogen metabolism suffi ciently to cause significant differences in the production of the carcinogenic E 2 -3,4-Q. Since serum E 2 was not measured in our study population, we used the initial level E 2 0 for the cases and controls from the nine prospective studies to calculate the E 2 -3,4-Q AUC for the 294 postmenopausal women in our group. There were 144 women with breast cancer and 150 control subjects with average ages of 65.6 and 64.9 years and average body mass indices of 25.7 and 26.0 kg/m 2 , respectively. In our simulations we varied the E 2 ratio between cases and controls from 0.91 to 1.34 and calculated the corresponding The results of these simulations demonstrate that relatively small changes in the concentration of the parent hormone E 2 result in markedly increased production of the carcinogenic estrogen quinone metabolite, E 2 -3,4-Q, which, in turn, is refl ected in a higher fraction of women with breast cancer in the top tier of our study population. Thus, testing of our model with estrogen concentrations reported in the literature confi rms the striking infl uence of serum E 2 concentrations on breast cancer risk. Importantly, the model offers a risk assessment of individual women by combining the hormone level with the genotype.

Discussion
A strength of the in silico model is that it can incorporate each woman's actual lifetime endogenous and exogenous estrogen exposures, in  Figure 4. Utilization of in silico model to derive E 2 -3,4-Q production. Each of the four genes is genotyped for all subjects and the SNP genotype data used to derive the haploype confi guration for each subject. The model then calculates the E 2 -3,4-Q production for each haplotype confi guration as well as the weighted average of all E 2 -3,4-Q production values, using the probabilities of haplotype confi gurations as weights.
addition to her genotype, when predicting cumulative E 2 -3,4-Q exposure. This is schematically shown in Figure 5, which displays the interaction of estrogens, enzyme genotypes, and resulting E 2 -3,4-Q production as a three-dimensional graph. The graph is built on the two-dimensional Figure 3, in which we used a fi xed E 2 level to model the E 2 -3,4-Q AUC for wild-type and variant enzyme genotypes and displayed only the lowest, highest, and wild-type E 2 -3,4-Q AUCs. In the three-dimensional graph, we plot the available genotypes, from lowest to highest, separated into quintiles based on their respective E 2 -3,4-Q production. A new component in the three-dimensional graph is the variation in E 2 concentration. As illustrated in the overall pathway in Figure 1 and in experimental studies, the input concentration of the parent hormone E 2 determines the output concen tration of the oxidative metabolites, such as 4-OHE 2 and E 2 -3,4-Q. 26,29,32,45,48 Thus, in the graph we display estrogen exposure in quintiles. Estrogen exposure can be represented by actual E 2 values, measured in pmol/L, in combination with semiquantitative estimates each woman's overall exposure to estrogen. The latter is derived by taking into account her total years of ovulation as a function of current age, age at menarche, age at menopause, numbers of full-term pregnancies and lactation experience for each, and the dosage and duration of the use of exogenous estrogens. With regard to exogenous estrogens, all estrogens including equine estrogens used in hormone replacement therapy are metabolized via the same CYP-mediated oxidative pathway to generate catechols and quinones, which, in turn, cause DNA damage. For example, cell culture experiments showed that 4-OH-equilenin via its quinone induced DNA damage in breast cancer cell lines and cellular transformation in vitro. 49,50 Thus, as far as the model is concerned, exogenous and endogenous estrogens can be combined although their precise contribution to estrogen exposure and the production of carcinogenic metabolites is presently unknown.
In designing Figure 5, we assumed that the difference in estrogen exposure between individual women is no more than twofold, with the quintiles 1.0, 1.25, 1.5, 1.75, 2.0. We chose this two-fold difference based on the range of median serum E 2 values seen in post-menopausal women 4 and the variation in mammary tissue E 2 concentrations. 5 This range is conservative since up to fi vefold differences have been reported. 52 Regardless of the scale used for the estrogen exposure axis, the production of carcinogenic E 2 -3,4-Q would be expected to be greater in women with more Table 1. Correlation of postmenopausal serum estradiol concentration by study and case-control status with breast cancer risk. Controls (m in number) and cases (n in number) are individuals with the top AUC values (m + n) in the simulation model. The case-control data is from 42 and the E 2 Ratio is taken from a reanalysis of nine pooled prospective studies. 4 The ratio of all centers is the average ratio of the nine centers.

Study, country
Estradiol pmol/L m n n/(m + n) endogenous (more ovulatory cycles) or exogenous (hormone replacement therapy, oral contraceptives) estrogen exposure. It is evident from Figure 5 that our in silico model extends existing models by combining the traditional "phenotypic" measures of estrogen exposure with genotypic data. It is also evident from the 3D graph that the combined phenotypic and genotypic data appear to have not just an additive, but also a multiplicative effect on E 2 -3,4-Q production. Current models of breast cancer risk prediction are mainly based on cumulative estrogen exposure but do not refl ect mammary estrogen metabolism; 16 (www.cancer.gov/bcrisktool). Moreover, they do not address genetic variability between women in exposure to estrogen metabolites. Our model addresses the unique genetic trait of each woman and combines the genetic information with the metabolomic information in order to predict individual-level mammary estrogen metabolism. Some genetic traits are currently available in the patient care setting, such as BRCA and CYP2D6 testing. The availability of rapid genetic testing for BRCA1 and BRCA2 mutations has made it possible to follow unaffected carriers in greater numbers and to search for inherited mutations in women with a severe family history of breast cancer. The potential effect of CYP2D6 genetic variants on clinical response to tamoxifen treatment in breast cancer patients has gained much interest. 53 The Food and Drug Administration recommended an update in the tamoxifen package insert in 2006 to refl ect the increased risk of breast cancer recurrence in postmenopausal estrogen receptorpositive patients, who are CYP2D6 poor metabolizers. Thus, the CYP2D6 genotype has the potential to become a useful predictive marker for tamoxifen response. Certain characteristics are benefi cial for a marker to become successful clinically. 54 Testing of this marker should be cost-effective as well as easy to apply in daily practice, both of which are increasingly realized for DNA analysis. Thus, analysis of multiple genes encoding the enzymes in the estrogen metabolism pathway can readily be achieved. Estrogen is a universal breast cancer risk factor; by helping to defi ne high-risk subgroups, the proposed model should advance the overall goal of reducing breast cancer mortality through improved screening and the early detection and treatment of disease. Rates of obesity, an important source of estrogen after menopause are on the rise in most of the world, Figure 5. Three-dimensional graph displaying estrogen metabolomic-genomic model of breast cancer risk. The risk is represented by the amount of carcinogenic estrogen quinone, E 2 -3,4-Q AUC, which is produced by the metabolism of estrogen catalyzed by the enzymes CYP1A1, CYP1B1, and COMT. In theory, all enzyme genotype combinations could be plotted. However for clarity, we have plotted representative combinations from lowest to highest, separated into quintiles based on their respective E 2 -3,4-Q production: (1) CYP1A1 461Thr-462Val CYP1B1 48Gly-119Ala-432Val-453Ser COMT 108Val (2) CYP1A1 461Thr-462Val CYP1B1 48Arg-119Ala-432Val-453Asn COMT 108Met , (3) CYP1A1 461Thr-462Ile CYP1B1 48Arg-119Ala-432Val-453Ser COMT 108Val , (4) CYP1A1 461Thr-462Ile CYP1B1 48Arg-119Ala-432Val-453Ser COMT 108Met , (5) CYP1A1 461Asn-462Ile CYP1B1 48Arg-119Ser-432Val-453Asn COMT 108Met . Following the characterization of GSTP1 variants, we will include GSTP1 genotype data in the model. Cumulative estrogen exposure is displayed in quintiles. Actual E 2 values, measured in pmol/L, could be plotted, in combination with semi-quantitative estimates of each woman's overall exposure to estrogen. The latter is derived by taking into account her total years of ovulation as a function of current age, age at menarche, age at menopause, numbers of full-term pregnancies and lactation experience for each, and the dosage and duration of the use of exogenous estrogens. (The authors acknowledge work of Eric Parl in the design and preparation of this fi gure).  underscoring the importance of establishing the impact of estrogen metabolites on breast cancer risk before and after menopause. Women who carry a germline mutation of BRCA1 frequently develop breast cancer at an early age. However, in any given kindred the age of onset can vary substantially and an important unresolved question is the extent to which other risk factors modify the cancer risk in carriers. Estrogen exposure appears to play an important role since prophylactic oophorectomy is associated with a signifi cant reduction in the risk of breast cancer. 55 A practical clinical application of the model in the premenopausal age group would be the differentiation of BRCA1 carriers into low-and high-risk based on their genetic profi le of estrogen metabolism. Another clinical application of the model would be in the postmenopausal age group with the distinction of low-and high-risk women. The former could benefi t from hormone replacement therapy whereas the latter should avoid such treatment.

Q U IN T IL E S O F E S T R O G E N E X P O S U R E Q U IN T IL E S O F G E N O T Y P E S
In summary, our in silico model integrates pathway-specifi c genetic testing with diverse types of data and for the fi rst time offers the opportunity to combine exposure, metabolic, and genetic data in assessing estrogens in relation to breast cancer risk. In order to achieve such comprehensive risk assessment, the model will require extensive validation.