Prediction of outcome of early ER+ breast cancer is improved using a biomarker panel, which includes Ki-67 and p53

Background: The aim of this study is to determine whether immunohistochemical (IHC) assessment of Ki67 and p53 improves prognostication of oestrogen receptor-positive (ER+) breast cancer after breast-conserving therapy (BCT). In all, 498 patients with invasive breast cancer from a randomised trial of BCT with or without tumour bed radiation boost were assessed using IHC. Methods: The ER+ tumours were classified as ‘luminal A’ (LA): ER+ and/or PR+, Ki-67 low, p53−, HER2− or ‘luminal B’ (LB): ER+ and/or PR+and/or Ki-67 high and/or p53+ and/or HER2+. Kaplan–Meier and Cox proportional hazards methodology were used to ascertain relationships to ispilateral breast tumour recurrence (IBTR), locoregional recurrence (LRR), distant metastasis-free survival (DMFS) and breast cancer-specific survival (BCSS). Results: In all, 73 patients previously LA were re-classified as LB: a greater than four-fold increase (4.6–19.3%) compared with ER, PR, HER2 alone. In multivariate analysis, the LB signature independently predicted LRR (hazard ratio (HR) 3.612, 95% CI 1.555–8.340, P=0.003), DMFS (HR 3.023, 95% CI 1.501–6.087, P=0.002) and BCSS (HR 3.617, 95% CI 1.629–8.031, P=0.002) but not IBTR. Conclusion: The prognostic evaluation of ER+ breast cancer is improved using a marker panel, which includes Ki-67 and p53. This may help better define a group of poor prognosis ER+ patients with a greater probability of failure with endocrine therapy.

Oestrogen receptor-positive (ER þ ) breast cancer comprises approximately 75% of all breast cancers and treatments targeting oestrogen synthesis (aromatase inhibitors) or the ER (tamoxifen) are the most effective adjuvant therapies. Gene expression profiling (GEP) studies over the past decade have established molecular subtypes of ER þ luminal disease, which are characterised by differences in outcome and underlying biology, largely now referred to as luminal A (LA) or luminal B (LB), the latter being characterised by increased proliferation and higher grade as well as lower levels of ER related genes (Perou et al, 2000;Sørlie et al, 2001). Despite the successes of endocrine therapy in reducing annual recurrences and death by 41% and 34%, respectively, resistance occurs in about 30% of patients treated with tamoxifen (Early Breast Cancer Trialists' Collaborative Group (EBCTCG), 2005). Therefore, predicting the likely prognosis in an individual patient before treatment would allow early selection of optimal therapies, the importance of which is highlighted in the most recent St Gallen guidelines for the treatment of early breast cancer (Goldhirsch et al, 2009).
The abundant data derived from GEP studies have clearly identified the significance of genomic grade and proliferation signatures in prognosis and response to endocrine therapy (reviewed in detail in Sutherland, 2009 andSotiriou andPusztai, 2009). However, given the current costs of such molecular testing, translating these findings into an economical, reproducible and readily applicable panel for immunohistochemistry (IHC) in a routine pathology setting is a priority. Most previous IHC definitions of LA and LB tumours include ER þ and/or PR þ , with HER2 positivity defining LB, creating a population size of approximately 5 -10% (Cheang et al, 2008;Nguyen et al, 2008;Millar et al, 2009b;Blows et al, 2010). However, GEP studies have documented the LB population to be larger than this, averaging approximately 16% (ranging from 10 to 21%, reviewed in detail in Sorlie et al, 2003 andHu et al, 2006), suggesting that this poorer prognosis subtype may be underrepresented using this definition. This discrepancy is most likely explained by the fact that only approximately 30% of LB cancers are in fact HER2 positive . Although proliferation is the key discriminator of luminal tumours, the optimal subclassification of luminal tumours by GEP has yet to be defined (Weigelt et al, 2010b). Several studies have, however, shown that intrinsic subtype as defined by IHC 'mirrors' the subtypes identified by GEP and that the IHC subtypes so defined have distinct clinical outcomes (Neilsen et al, 2004;Abd El-Rehim et al, 2005;Cheang et al, 2008Cheang et al, , 2009Blows et al, 2010). Such IHC definitions are now in common clinical usage. Some recent studies have addressed the issue of a more refined definition of good and poor prognosis ER þ cancer, and used a modified IHC definition to include assessment of the proliferation marker Ki-67 Cuzick et al, 2009;Hugh et al, 2009), which results in a larger proportion of LB tumours with independent prognostic power ). This latter study defined a Ki67 cutpoint (14%) derived from GEP analyses. This set of biomarkers more closely resembles the Oncotype Dx assay of known predictive and prognostic power in ER þ , lymph node-negative cancer, which is largely driven by proliferation, HER2-and ER-related genes (Paik et al, 2004). However, a recent head to head comparison of a four IHC biomarker panel of ER, PR, HER2 and Ki-67 (IHC 4) has been shown to provide prognostic information, which is at least equivalent to Oncotype Dx using material from the ATAC trial (Cuzick et al, 2009). This important study identifies the robustness of prognostic data, which can be provided by routine IHC. Some observers support the view that GEP currently offers no more that routine IHC when combined with important morphological features (not assessable by GEP), such as lymphatic vascular invasion and lymph node status (Weigelt and Reis-Filho, 2010). In addition, these routine analyses can be performed at a fraction of the cost of commercially available GEP tests. In addition, it also supports the concept that measurement of a few well chosen protein products can identify clinically significant patient groups (Ring et al, 2006). Histological grade is a key component of routine pathology reporting and of prognostic importance, but may, in some circumstances, be affected by subjectivity, along with problems with inadequate or delayed fixation, which can result in undergrading . Incorporation of biomarkers as surrogates for molecular grade into routine reporting may help more reliably define good and poor prognosis patients, most significantly for grade 2 invasive carcinomas, which comprise 37 -49% of all breast cancers .
To further validate an IHC panel of markers for routine application in a clinical setting, we assessed a new biomarker panel to differentiate good prognosis (LA) and poor prognosis (LB) tumours in a cohort of predominantly ER þ early breast cancer patients enrolled in a randomised clinical trial of conservative surgery, post-operative whole breast radiotherapy and then randomised to an additional cavity boost or not. We previously described the clinical usefulness of a five biomarker panel (Millar et al, 2009b; ER, PR, HER2, CK 5/6 and EGFR) and have further defined luminal tumours by including Ki-67 and p53 status, the latter described in higher grade tumours, overexpressed more frequently within LB (Sorlie, 2004;Jacquemier et al, 2008;Hugh et al, 2009;Carey, 2010;Weigelt et al, 2010b) and as a predictor of endocrine resistance in some studies (Yamashita et al, 2006). These markers have easily available and well-characterised antibodies in current use, which can be immediately applied to clinical practise.
This study aimed to define the predictive value of a more refined luminal IHC biomarker signature in those patients who were ER þ , with disease relapse and death from breast cancer as end-points.  (Millar et al, 2009a;López-Knowles et al, 2010). In summary, 40% of tumours were 420 mm, 45% were grade 3, 43% were lymph node positive, 68% were ER positive, 57% were PR positive and 18% were HER2 fluorescent in situ hybridisation (FISH) positive (HER2:CEP17 ratio 42.2). Median age was 54 years, and patients were treated with endocrine therapy (49%), chemotherapy (38%) or both (24%). Cases were prospectively followed up for a median of 64 months, and the outcome events measured were as follows: recurrence (local or distant; 25%), metastasis (23%) and breast cancer-specific death (18%). This cohort was used to identify differences in expression of several cell cycle and apoptotic markers, including Ki67 and p53 (CM McNeil et al, manuscript in preparation), between LA and B cancers using the following definitions: LA: ER þ and/or PR þ and HER2À and LB: ER þ and/or PR þ and HER2 þ . Using the median expression levels for Ki67 and p53 as the cutpoints (5% and 10%, respectively), we were able to demonstrate a significant difference in level of expression between LA and LB for these antigens (P ¼ 0.0158 and P ¼ 0.0061, respectively). Subsequently, we modified our definition of LA and LB to include Ki67 and p53 status as follows: 'LA': ER þ and/or PR þ and HER2À, Ki67 low and p53 negative and 'LB': ER þ and/or PR þ and/or HER2 þ and/or Ki67 high and/or p53 þ . Kaplan -Meier analysis for breast cancer specific death showed a significant difference in outcome between these two groups of ER þ patients (P ¼ 0.0002) using this updated classifier (CM McNeil et al, manuscript in preparation).

Study validation cohort
In this biomarker study, tissue was available from 498 patients (from a total of 688) with invasive breast cancer who were enrolled into a randomised clinical trial, which compared the benefit of the addition of a local cavity boost of radiotherapy to breast-conserving therapy (BCT; Clinical Trials Registry NCT00138814). The study was conducted at St George, Wollongong and Liverpool Hospitals, Sydney, New South Wales, Australia between 1996 and 2003 when the trial was closed to accrual. Follow-up for this analysis continued until September 2008. Clinicopathological details are summarised in Supplementary Table 1, and have been previously published in detail Millar et al (2009b). This study was approved by the Human Research Ethics Committee of the St George Hospital, Sydney, Australia (ref. no.: 96/84). The flow of patients through the trial is summarised in a CONSORT flow diagram ( Figure 1). Patients were randomised using random blocking sequences set up before commencing of the study. Following patient consent, a person independent of the study both generated the sequence and assigned participants to interventions as below. This was an unblinded study.
All patients with invasive carcinoma received local excision and axillary sentinel node biopsy or axillary clearance. Adjuvant chemotherapy (AC or CMF) was given to 23.7% of patients and 44.9% received adjuvant endocrine therapy with tamoxifen. No patients received adjuvant trastuzumab. For patients subsequently classified as modified 'LA', 49.5% received endocrine therapy and 13.4% received chemotherapy, and those classified as modified 'LB' 55.7% received endocrine therapy and 25% received chemotherapy. Patients were randomised to whole breast radiotherapy of 50 Gy in 25 fractions or whole breast radiotherapy of 45 Gy in 25 fractions plus a tumour bed boost of 16 Gy in eight fractions. Supraclavicular fields were not added unless there were four or more nodes positive. In all, 17 patients had positive margins, 65 had clearance of o1 mm and a further 86 had o2 mm clearance, the remainder being well clear. HER2 status was unknown at the time of treatment.

Study definitions
Patients were assessed at 6 weeks after radiation therapy, 6 monthly for 2 years, then annually thereafter with annual breast imaging. Follow-up time for this biomarker cohort was calculated from the date of the first surgical procedure to the date of the first event, as outlined below, or to the last known confirmed date of breast cancer disease-free status. Median follow-up time was 84 months (range 1 -134 months). The primary end point was time to ipsilateral breast tumour recurrence (IBTR). This included any ipsilateral in-breast recurrence (invasive or non-invasive). The secondary end points were locoregional recurrence (LRR: IBTR, axilla, chest wall, internal mammary or supraclavicular fossa lymph nodes) and time to distant metastases and death.
All staining was performed using a Dako autostainer following antigen retrieval for all antibodies except for Ki-67, which was performed on a Leica (Wetzlar, Germany)/Bond Max system using ER2 (high pH antigen retrieval). All staining was centrally assessed by one breast Pathologist (EKAM). ER and PR were assessed as positive if a modified 'H score' (i.e., percentage Â intensity) was 410. CK5/6 and EGFR were considered positive if staining of any intensity was present (i.e., 40). Tumours were considered HER2 positive only if they were HER2 amplified on FISH using a HER2: chromosome 17 ratio 42.2 as positive. p53 and Ki-67 were considered positive if there was 410% positive average nuclear staining of any intensity. . Although the total number of patients assessed for eligibility and excluded for all centres is not known, this data are available for the main recruiting centres at St Geroge Hospital, which contributed the majority of patients in the trial, n ¼ 546 (number assessed, n ¼ 2046; excluded, n ¼ 1500: not meeting criteria, n ¼ 943; declined to partcipate, n ¼ 235; other reasons, n ¼ 322; patients randomised in the trials, n ¼ 536).

Statistical analyses
Kaplan -Meier analyses for IBTR, LRR, distant disease-free survival and breast cancer-specific death were estimated for each subtype and compared using the log-rank test. We used Cox proportional hazards univariate analysis to analyse the association between prognostic variables and molecular subtype with IBTR, LRR, metastases and breast cancer-specific death. Multivariate analysis (MVA) was used to construct models identifying those variables which were independently prognostic. Subsequently, step-wise removal of variables was used until resolution. Analyses were performed using Statview 5.0 (Abacus systems, Berkeley, CA, USA) and STATA 10.0 (StataCorp LP, College Station, TX, USA). The ANOVA was used to assess differences in expression of target antigens as continuous variables between intrinsic subtypes.

Assessment of Ki67 and p53 expression between LA and B tumours
Having identified differences in Ki67 and p53 in ER þ tumours in our training cohort, we then assessed the difference between LA and B tumours in expression level of these two antigens in our validation cohort (n ¼ 498). Within LB tumours in this cohort, we observed significantly higher levels of Ki-67 and p53 expression (P ¼ 0.0008 and 0.0048, respectively). The median average value for both Ki67 and p53 within the validation cohort was 10%. Subsequently, we modified our working definition further for good prognosis modified 'LA' as ER þ and/or PR þ and HER2À, Ki67 low and p53À; and poor prognosis modified 'LB' as ER þ and/or PR þ and/or HER2 þ and/or Ki67 high and/or p53 þ .
As previously described, no benefit of a tumour bed boost was observed in this group of patients (Millar et al, 2009b). At a median follow-up period of 84 months, the 5-year survival rates for modified LA and modified LB, respectively, using the updated classifier were IBTR 99.3, 96.6%; LRR 99.7, 93.4%; DMFS 97, 87%; and BCSS 99.7, 92.5%. Comparative analyses of the clinicopathological features, crude event rates and univariate analyses of LA and LB groups between the differing definitions are presented in Tables 1 and 2. Univariate Cox proportional hazards were calculated for each measure of outcome for Ki67 and p53 and the modified LA and LB subtypes, which are presented with crude event rates in Table 3. Further crude event rates for modified LA and LB for lymph node negative, lymph node positive and lymphatic vascular invasion are presented in Supplementary Table 2. As expected, the updated classification resulted in increased numbers of events for all outcomes for LB and a reduction for LA. This is mirrored in LB by increases in LVI and LN þ status, with recurrence rates and death rates two to three times that of LA. Univariate analyses showed that modified LA is a significant predictor for all measures of outcome including IBTR (hazard ratio (HR) 0.314, 95% CI 0.136 -0.726, P ¼ 0.007) where it previously was close to but not statistically significant (P ¼ 0.051). Modified LB predicted DMFS and BCSS (P ¼ 0.005 and 0.003, respectively) and approached significance for IBTR and LRR (P ¼ 0.07 and 0.052, respectively) where previously it was not significant for any outcome measure. Ki67 predicted outcome for all measures

Kaplan -Meier analysis of intrinsic subtype
Kaplan -Meier analysis (log-rank test) comparing modified LA and LB alone was significant for all measures of outcome IBTR P ¼ 0.02, LRR P ¼ 0.002, DMFS and BCSS both Po0.0001 (Figure 2 inserts). This classifier also showed improvement in the degree of statistical significance between all molecular subtypes compared with the previously reported five biomarker panel, which was observed for LRR P ¼ 0.0004 (previously 0.012), DMFS Po0.0001 (previously 0.0035) and BCSS P ¼ 0.0001 (previously 0.048) but not for IBTR (P ¼ 0.074, previously 0.346, Figure 2). Although LA had an excellent prognosis, LB had adverse survival, similar to basal, HER2-enriched and unclassified subtypes.

MVA for IBTR, LRR, DMFS and BCSS
We then constructed multivariable models of clinicopathological features and intrinsic subtype to assess predictive value and compare HRs between intrinsic subtypes, using modified LA as a reference group.

DISCUSSION
Oestrogen receptor-positive early breast cancer is the commonest form of the disease and tailoring treatment to individual patients is a priority. It is important to identify ER þ patients with a good prognosis who will receive most benefit from endocrine therapy and receive little or no benefit from chemotherapy, and, therefore, avoid any toxicity. In addition, it is also beneficial to identify patients who will have little or no benefit from endocrine therapy. GEP studies have consistently identified at least two groups of ER þ tumours; the less favourable LB group being characterised by higher histological grade and higher expression of proliferation  and GRB7, and lower levels of ER-related genes. Although there is some consistency in the recognition of these differing subgroups between GEP studies, there is some doubt as to the stability of the classifiers used by different single sample predictors (Weigelt et al, 2010b) and most assays are not yet ready for routine clinical use (De Ronde et al, 2010). As a result, a simple and relatively cheap test using IHC surrogates would be easier to transfer into clinical practise. Various combinations of markers have been assessed to develop a robust IHC panel for routine pathology reporting, most recently adding Ki67 to ER, PR, HER2 to better assess proliferative luminal tumours Hugh et al, 2009). Assessing ER þ tumours with surrogates for molecular grade may strengthen patient selection as histological grade can be compromised in some specimens because of suboptimal fixation. Using an independent discovery cohort of 292 patients, we identified a significant difference in expression in Ki-67 and p53 within ER þ cancers, which was associated with differences in clinical outcomes (breast-cancer specific death; CM McNeil et al, manuscript in preparation). These findings were subsequently validated in a detailed analysis of 498 early breast cancer patients, in which we compared good and poor prognosis 'LA' and 'LB' IHC signatures, which included Ki-67 and p53 in addition to ER, PR and HER2. This updated definition provided superior predictive power and better discrimination between the two groups of luminal tumours for all measures of outcome. In all, 73 previously LA tumours were reclassified as LB, increasing the size of the 'LB' group by 4four-fold from 4.6 to 19.7% of the cohort, better reflecting GEP estimates of the size of the LB population. Using this definition, 'LB' was an independent predictor of poor prognosis in MVA for LRR, DMFS and BCSS but not for IBTR for the whole cohort. As well as demonstrating its superior predictive power over the most frequently used classifier or ER, PR, HER2 alone, we also performed additional analyses to make a comparison with ER þ breast cancer classified by hormone receptor (HR) status alone (data not shown). Some studies have shown a significant difference in outcome between double-positive (i.e., ER þ PR þ ) and single-receptor positive HR status (i.e., ER þ PRÀ or ERÀ PR þ , Rakha et al, 2007). This latter group may correspond to the LB subtype (Rakha et al, 2009). Our further analyses of these subgroups demonstrated that HR status alone was inferior to our updated five biomarker classifier: in univariate analysis good prognosis double-positive status (ER þ PR þ ) was only statistically predictive for distant metastases and death (not IBTR or LRR) and single-positive status (i.e., poor prognosis 'LB') was not predictive for any measure of outcome in univariate analysis.
Our updated classification of ER þ disease also improves the statistical significance in survival between all intrinsic subtypes, where the adverse survival and HR of our poor prognosis 'LB' group is three times that of 'LA' and closer to that of HER2enriched and basal subtypes. One limitation of this study is that recurrence rates may be over estimated for LB, as the prognosis of HER2-positive LB tumours (24% of all LB tumours) would currently be modified by the benefits of Herceptin treatment (which was not used in this study) and an underestimate for LA, as only 44.9% of patients received adjuvant tamoxifen. An additional limitation of this study is the difference in cut points used for Ki67 positivity where the training cohort median was 5% and the validation cohort median was 10%. Although we have identified good and poor prognostic groups with our signature, the relatively wide confidence intervals, which reflect the small numbers of events, strongly suggests the importance of further independent validation. Further analyses in a larger data set with a greater number of events may provide narrower confidence intervals, which along with assessment of the hazard ratio will determine the likely clinical significance derived from this panel of markers. These findings suggest a potential role for this biomarker panel in better defining groups of ER þ cancer of low and high molecular grade, allowing better selection of patients for endocrine therapy alone or with AC. Although Ki67 alone identifies approximately 60% of LB tumours, p53 adds a further 20% of cases, 12% are positive for both markers, 8% are negative for both but HER2 positive. This study builds upon previous work ) using a cut point for optimal determination of 'high' Ki-67 proliferation rate at 14% through correlation with the PAM50 classifier using RT -PCR. They identified a LB population, which was 42% of the cohort (includes their LB and luminal HER2 cases). Although the cut point of 14% correlates with GEP estimates it may, in practical terms, be difficult to discern by IHC. Ki67 has long been analysed in breast cancer cohorts with varied results in terms of its predictive value. A recent review has recommended its inclusion as a routine biomarker in breast cancer , but its application as a stand alone biomarker has been debated (Stuart-Harris et al, 2008). Therefore, its inclusion in a panel to help define molecular grade and better subtype 'LA' and 'LB' cancers is independently prognostic and valuable. However, its role as a predictive marker appears less certain. A pre-and post-biopsy analysis of endocrine treated breast cancer has demonstrated that only the post-treatment tumour Ki67 (at 2 weeks) was predictive of response to endocrine therapy, whereas baseline Ki67 was not (Dowsett et al, 2007). High Ki67 status in BIG 1 -98 suggested a potential benefit in selecting letrozole over tamoxifen in post-menopausal patients (Viale et al, 2008). Most recently a significant study identified that the prognostic information provided by 'IHC4' (ER, PR, HER2 and Ki-67) was at least equivalent to Oncotype Dx (Cuzick et al, 2009) and highlights the relevance of these readily available routine pathology markers in the clinical management of breast cancer.
p53 overexpression in breast cancer assessed by IHC is, rather over simplistically, assumed to act as a surrogate for TP53 mutations and is associated with higher tumour grade and responsiveness to radiotherapy, chemotherapy and endocrine therapy (Thompson and Lane, 2010). Although the p53 pathway is undoubtedly highly complex, its assessment by IHC does appear to provide meaningful information. p53 mutations are more frequent in the LB group compared with LA (Weigelt et al, 2010a), being described in 71% of LB tumours but only 16% of LA (Sorlie, 2004). p53 currently features as one of five antibodies in the Mammostrat (Clarient, Inc., Aliso Viejo, CA, USA) IHC test shown to be of predictive value in ER þ , tamoxifen-treated early breast cancer (Ring et al, 2006;Bartlett et al, 2010). Mammostrat uses a five IHC panel (p53, HTF9C, CEACAM5, NDRG1, SLC7A5) with an algorithm that is independent of ER and PR status to identify low-, medium-and high-risk groups. The initial published study (Ring et al, 2006) demonstrated HRs of 1.8 and 2.3 (training and validation cohorts, respectively) for high risk compared with the low and medium risks for disease recurrence. Elevated expression of p53 was observed by IHC in our cohorts and appeared to be a useful classifier and was included in the updated definition of poor prognosis 'LB' cancer.
Although the number of events was small, additional exploratory multivariate analyses for patients treated with tamoxifen alone (n ¼ 169, 10 events) showed that the poor prognosis 'LB' definition retained independent prognostic significance in the final resolved model for breast cancer specific death (HR 5.361, 95% CI 1.418 -20.25, P ¼ 0.013). This finding suggests that 'LB' has five times the risk of death compared with 'LA' in patients treated with endocrine therapy. The predictive value of this classification would however require further testing within the setting of a randomised trial of endocrine therapy.
Our updated definition of ER þ cancer translates into an IBTRfree survival at 5 years of 99.3% for LA and 96.6% LB, LRR-free survival 99.7 and 93.4%. A similar recent study using ER, PR and Ki67 in the definition for LA and LB found local recurrence-free rates at 10 years of 92% for LA and 90% for LB . Importantly, our findings further support the observations of this group, who found that LB was associated with increased risk of LRR. These results highlight the role of proliferation and grade, mirrored by the Oncotype Dx assay (Mamounas et al, 2005), as a predictor of locoregional recurrence, and may help further refine patient selection regarding therapy for optimal locoregional control. A subsequent study analysed patterns of metastases and found both LA and LB had a predilection for bone as a metastatic site and found that LB had a distant relapse rate similar to basal tumours at 15 years . In summary, this study suggests that good and poor prognosis ER þ breast cancers can be reliably and easily discriminated using Ki67 and p53 in addition to ER, PR and HER2 in routine pathology IHC. This definition greatly enhances the detection of poor prognosis ER þ 'LB' breast cancers, with an outcome closer to that of basal and HER2enriched tumours. This approach may help more reliably define groups of ER þ patients with an excellent prognosis and identify those at risk of early relapse who may benefit from more frequent follow-up and early intervention with alternative therapies and/or chemotherapy. Further, larger studies in randomised clinical trials of endocrine therapy are required to assess the clinical utility of this classification and its value as a predictor of therapeutic responsiveness.