Stage at diagnosis and early mortality from cancer in England

Background: Stage at diagnosis is a key predictor of overall cancer outcome. For the first time, stage completeness is high enough for robust analysis for the whole of England. Methods: We analysed data from the National Cancer Registration Service's (NCRS) Cancer Analysis System on persons diagnosed with breast, colorectal, lung, prostate or ovarian cancers in England in 2012. One-year relative survival (followed-up to the end of 2013) was calculated along with adjusted excess rate ratios, for mortality within 1 year. Results: One-year relative survival decreased with increasing stage at diagnosis. For breast, prostate and colorectal cancers survival showed a major reduction for stage 4 cancers, whereas for lung and ovarian cancers there were substantial decreases in relative survival for each level of increase in stage. Excess rate ratios for mortality within 1 year of diagnosis showed that stage and age were the most important cofactors, but they also identified the statistically significant effects of sex, income deprivation and geographic area of residence. Conclusions: Further reductions in mortality may be most effectively achieved by diagnosing all cancers before they progress to stage 4, but for lung and ovarian cancers there is also a need for a stage shift to earlier stages together with efforts to improve stage-specific survival at all stages.

Improving cancer survival is a key challenge identified in 'Improving outcomes: a strategy for cancer (Department of Health, 2011)'. Cancer survival estimates in England currently fall below those in many European countries across most cancer types (Richards, 2007;Verdecchia et al, 2007;Coleman et al, 2011;De Angelis et al 2014). It has been estimated that if cancer survival in England was made comparable with the European average, then 5000 or more deaths within 5 years of diagnosis could be avoided annually (Abdel-Rahman et al, 2009;Richards 2009a). However, if analyses are restricted to include only those who survive at least a year from diagnosis, then the difference in conditional 5-year survival between England and European countries is, in general, smaller (Thomson and Forman, 2009;Holmberg et al, 2010). This would suggest that differences in 1year survival are an important driver of differences in longer-term survival.
Stage at diagnosis is highly predictive of cancer mortality, and a possible explanation for the difference in cancer survival between England and Europe is that a higher proportion of patients are diagnosed at a later stage in England (Sant et al, 2003;Foot and Harrison, 2011;Walters et al, 2013a, b). The completeness of stage at diagnosis for cancers registered in England by the NCRS has improved greatly in recent years. Staging completeness now exceeds 80% for several major cancers (including breast, colorectal, lung, ovarian and prostate) diagnosed in 2012, allowing more robust analyses than have previously been possible. The remainder of staging data may be missing for various reasons: certain morphological tumour types have no formal agreed staging system; it was clinically inappropriate to stage the patient; diagnosis and/or treatment was outside the National Health Service; the patient died before staging was complete; or staging information was not transferred to the NCRS.
In England, a National Awareness and Early Diagnosis Initiative was established in 2008 (Richards, 2009b) as a joint initiative between the Government and Cancer Research UK. Much of its work has focussed on ways of promoting awareness of the early symptoms of cancer to patients and primary-care physicians. The ability to measure stage at diagnosis at a population level is vital to study the impact of such initiatives, study the scale and nature of variation in stage at diagnosis within England and to enable international comparisons.
The purpose of this study is to characterise the stage at presentation for major cancers, which have the highest recorded stage completeness, and to examine the relationship between stage at diagnosis, early mortality and major demographic variables.

MATERIALS AND METHODS
Details of 156 131 malignant breast, colorectal, lung, prostate and ovarian (ICD-10 C50, C18-20, C34, C61 and C56) tumours diagnosed in 2012 in residents of England were extracted from the NCRS registration data set. Of these, 2663 cases were excluded on the basis that they were a death certificate only registration. Other exclusions comprised the following: 281 male breast cancers; 168 aged under 15 years or over 99 years at diagnosis; 187 recorded as stage 0 -for breast cancer -this is Paget's disease of the nipple, included under ICD-10 C50; 9 had a misordered date of diagnosis and date of death; and 2 had a missing deprivation quintile. Information on deaths was provided by the Office for National Statistics as part of a routine data feed to NCRS, and follow-up is complete to the end of 2013. Cancers were staged according to the TNM version 7 classification, based on clinical, imaging and pathological information (Sobin et al, 2009). The income deprivation quintile was derived by linking each tumour to the Index of Multiple Deprivation 2010 (Communities and Local Government, 2011) using postcode at the time of diagnosis to derive the Lower Statistical Output Area of residence at diagnosis. Equal population quintiles (of the general population) were derived from the income domain score. Geographic area of residence at the time of diagnosis was defined by the strategic clinical networks (SCNs) established in England in 2013. These SCNs have populations ranging from 2.1 to 9.0 million.
Relative survival is the ratio of the observed survival in the patient cohort and the expected survival of a cohort from the general population matched by age, sex, socioeconomic deprivation and geographic region (Government Office Region). It was calculated using the strs programme (Dickman et al, 2004) with break points set at 1, 3, 6 and 12 months and using the Ederer II method. The life tables used (Cancer Research UK Cancer Survival Group, 2006) were available with background mortality up to 2009. Age-standardised relative survival was calculated using a method of Corazziari et al, 2004. Observed mortality, expected mortality and person-years of exposure time were calculated using the strs command in Stata 12.1 (StataCorp., 2011) for the same periods as for survival. These were summed into an overall excess mortality for the year following diagnosis. This outcome measure was chosen both for simplicity of expression and because mortality in the first year of diagnosis is of wide interest; exploration of the variation in excess mortality within the first year of diagnosis is left for future work. Excess mortality rate ratios were modelled using the glm command as per the 'grouped' methodology of Dickman et al (2004) with sex, age band, income deprivation quintile, SCN and stage as independent variables. The baseline SCN for the calculation of rate ratios was selected from one of the two median SCNs in the distribution of 1year relative survival. Stage 4 was used as the baseline for stage at diagnosis. Interactions between variables were explored by considering further models including an interaction between each pair of variables with a likelihood ratio test performed by comparing the model with and without interactions to determine the significance of each interaction term.

RESULTS
Description of cohort. A total of 152 821 newly diagnosed malignant cancers of interest, after exclusions, were diagnosed in England in 2012. Table 1 shows the number of tumours included and the proportion broken down by age, sex, income deprivation, SCN and stage at diagnosis for breast, colorectal, lung, ovarian and prostate cancers. The median age varies between 63.0 years (breast and ovarian cancer) and 70.8 (colorectal) and 71.9 (lung) years, whereas the difference in the proportions of cases occurring in the most and least deprived varies between À 11% (lung cancer) and þ 12% (prostate cancer). There is substantial variation in the stage breakdown with cancer type: more than two-thirds of breast cancers present at stage 1 or 2 and more than two-thirds of lung cancers present at stage 3 or 4. The other three cancers are intermediate between these two distributions. Stage completeness varies between 89% (colorectal cancer) and 82% (prostate cancer). Table 2 shows the variation in stage at diagnosis by sex, age and income deprivation (variation by SCN is included in the Supplementary Online Material and Supplementary Online Table 5). More men with colorectal cancer present at stage 1 compared with women (16 vs 14%, Po0.001), whereas for lung cancer slightly fewer men present at stage 1 compared with women (12 vs 15%, Po0.001). More men present with stage 4 lung cancer compared with women (50 vs 48%, Po0.001).
For all cancer types, the proportion of missing data increases with age, particularly in those aged 80 þ years. The proportion of ovarian cancers diagnosed at stage 1 drops from 54.6% in those aged 15-49 years to 19.9% in those aged 70-79 years, whereas there is no statistically significant change in lung cancer. Prostate cancer is intermediate with a change from 45.7 to 33.0%. For colorectal and breast cancer a linear change is not observed, and the highest proportion of stage 1 diagnoses occur in 60-69-year-olds.
The effect of income deprivation on stage distribution is generally o2.0% between most and least deprived for colorectal and lung cancer, and not statistically significant for ovarian cancer. For breast cancer presentation at stage 1, and at unknown stage, is more common for the least deprived (Po0.001), whereas presentation at stages 2, 3 and 4 is more common in the most deprived (Po0.001). For prostate cancer, presentation at stage 2 (Po0.001) and unknown stage (Po0.05) is more common for the least deprived, whereas presentation at stages 3 (Po0.05) and 4 (Po0.001) is more common in the most deprived.
For the non-sex-specific cancers, survival is 4.0% higher in men (colorectal cancer) and 5.6% higher in women (lung cancer) compared with the opposite sex. Age-standardised figures are 2.2% for colorectal cancer and 6.1% for lung cancer. Relative survival varies strongly with age but, depending on cancer type, either showed only a small decline up to a certain age and then a steeper decline (breast, colorectal and prostate cancers) or declined with every increment in age category (lung and ovarian cancer). Relative survival decreases with increasing income deprivation, between the least and most deprived by 1.7% (breast), 6.5% (colorectal), 2.6% (lung), 3.0% (ovarian) and 0.8% (prostate). The age-standardised figures are 2.1%, 6.7%, 4.5%, 8.6% and 0.6%, respectively.
The variation of the relative survival between SCNs has a standard deviation of 0.5% (breast), 1.3% (colorectal), 1.6% (lung), 2.6% (ovarian) and 0.8% (prostate), assuming a normal distribution across SCNs. Relative survival is reduced with increasing stage; again some cancer types show a small reduction for lower-stage categories (breast, colorectal and prostate) followed by larger reductions in higher-stage categories, whereas other cancer types (lung and ovarian) show substantial reductions for each increase in stage at diagnosis.
Variation in excess mortality rate ratio. Table 4 shows the excess mortality rate ratio for each independent variable. For early-stage breast and stage 1-3 prostate cancers, the mortality rate ratio is close to zero (relative to the baseline case of stage 4). Of the independent variables, stage and age have the greatest influence. Women have 14% higher excess mortality for colorectal cancer and 13% lower for lung cancer than men (Po0.001 for both). Rate ratios increase with increasing age and increasing deprivation, and they are statistically significant for each cancer type for older ages compared with the youngest group. Except for prostate cancer, higher income deprivation is associated with higher excess mortality. There is some statistically significant variation in the rate ratios geographically, with 5 out of 50 combinations of cancer types and SCN being statistically significant at a 95% level.
Interactions and robustness. Of the 38 possible pairwise interaction terms across the five cancer types, 19 were significant at a 95% level in a likelihood ratio test comparing the model with and without interaction terms. Of these five were between stage and SCN and due to geographic variation in excess mortality by unknown stage. Four were associated with small subcohorts and showed no clear pattern in the excess mortality. Four were significant overall but had no individual combination of joint variables that was significant. One was owing to high excess lung cancer mortality in the N58 network being concentrated in the most-deprived quintile. One interaction between sex and stage was because of colorectal cancer excess mortality being higher in women specifically for stage 3, and one interaction between sex and age was because of worse colorectal outcomes in older women compared with men. Finally, there were four significant interactions between age and stage (data shown in Supplementary Online Table 6) because of higher colorectal and ovarian cancer mortality in older persons with stage 3 cancer and worse lung cancer outcomes in stage 1 and stage 2 lung cancer in persons aged 80-99 and 90-99 years. The interaction was also significant for prostate      cancer, but no interaction terms were individually significant (and thus this interaction is also counted above). The

DISCUSSION
The results presented here demonstrate the value of the substantial improvement in the completeness of staging data collected by the NCRS in England. Early-stage presentation is more likely in younger persons for ovarian and prostate cancers, and for screening age for colorectal and breast cancers. Earlystage presentation is (marginally) less likely in the more incomedeprived. The analysis clarifies the expected patterns of survival, and it shows that age and stage have the greatest association on the absolute value of the 1-year survival and the adjusted excess mortality rate ratio for early mortality, whereas for sex, income deprivation and geographic area of residence the impact is smaller.
For sex, the fact that the rate ratios are close to unity implies that some of the difference in relative survival by sex is driven by age and stage case-mix, concordant with earlier work (Riaz et al, 2013). Excess mortality rate ratios between the least and most deprived of up to 1.4 are seen, and in colorectal cancer the associated difference in relative survival is 6.5%. This rate ratio is broadly in agreement with previously calculated mortality rate ratios of B1.1 per increment in income deprivation quintile (McPhail et al, 2013) but could also be influenced by variables outside the model, including comorbidity, differential uptake of potentially curative treatment (Peake, 2014) and the frequency of emergency presentation, all of which are higher in the more incomedeprived.
Examination of the interaction between independent variables considered shows worse outcomes in stage 3 colorectal cancers in women. The relationship between age and stage in colorectal and ovarian cancers (with outcomes worse for stage 3 in older persons) and lung cancer (with outcomes worse for early stages in older persons) may indicate opportunities for re-evaluation of clinical pathways. Geographic variation in the mortality rate for unknown stage cancers is also observed, although this is likely owing to varying stage completeness.
Several SCNs show excess mortality rate ratios that are above unity and statistically significant at a 95% level. This may reflect variation not captured by the model, for example, owing to varying comorbidity, route of presentation or treatment, although, owing to the multiple testing performed, some may be simple random variations. There is significant scope for more work to describe this variation down to much smaller geographical and even health-care provider level, which is likely to be more able to help understand the reasons for such variation. Major strengths of this study are the high stage completeness, between 80 and 90 percent, and that the data cover the whole population of England. There are four principal limitations of the study. First, the unavailability for this study of route to diagnosis, previously shown to affect short-term survival (McPhail et al, 2013), means that some of the excess mortality attributed to older age and higher stage might be a result of differences in route of presentation. However, McPhail et al (2013) also found age and stage to be the most predictive independent variables. Second, the study is limited to a single year of data, 2012, complicating the interpretation of the data in comparison with earlier studies. Additionally, during the processing of data from 2012 the registration function of the previous eight regional cancer registries merged to form the National Cancer Registration Service. Standardisation of practice can be expected in the future to improve stage completeness to a consistent level nationally, but some bias in completeness with geography still exists. Third, the model does not capture data on comorbidities. However, the influence of comorbidity on short-term mortality is lower than that of age and stage (McPhail et al, 2013), and thus this is unlikely to change the main conclusions of the study. Last, the outcome measured is the excess mortality in the year after diagnosis; although this is calculated by summing the excess mortality across four periods in the year, it does not attempt to characterise any non-proportionality of hazards in this period.
The Office for National Statistics publishes yearly overall cancer survival figures for a number of cancers, currently complete to tumours diagnosed up to 2011  and predicted for tumours diagnosed up to 2013 (Solomon et al, 2014). Direct comparison with these is complicated by differences in methodology, but for colon and breast (and also for oesophagus and stomach cancers -shown in Supplementary Online Material) cancers the agreement is good -B2% or less between years and generally 1% or less for a direct comparison of 2011. There is a notable difference in lung cancer, with overall relative survival reported here being larger (33.4 in men and 39.4 in women, 2012) than those reported by Office for National Statistics (ONS) (31.6 in men and 34.7 in women, 2011). However, ONS predictions for 2013 are larger again than figures reported here (36.1 in men, 42.2 in women, 2013, predicted). It is possible that the difference may be an artefact explicable by a change in practice in the recording of diagnosis date by the NCRS owing to better access to data from radiological systems and from the National Lung Cancer Audit (NLCA). However, there have been major improvements in the treatment rates for lung cancer, particularly in surgical resection rates, between 2005 and 2012, as demonstrated by the NLCA (Health and Social Care Information Centre, 2013). Khakwani et al (2013), using NLCA data, showed a significant fall in the hazard ratio of death in early-stage lung cancer between 2005 and 2010, and more recent preliminary data from the NLCA also demonstrates an improvement in overall median and one-year survival for lung cancer patients between 2010 and 2013 ( MD Peake, 2014, personal communication), supporting the increase observed here.
Survival by stage has been previously published for the UK for cancers diagnosed in 2004-07 (Maringe et al, 2012(Maringe et al, , 2013Walters et al, 2013a, b). Again, direct comparison is complicated by differences in methodology and the differing definition of the tumour cohorts. However, it appears that breast and colon cancers exhibit the largest improvement in stage-specific survival for later stage cancers, whereas lung cancer has greater improvements for earlier-stage cancers. Ovarian cancer shows little change in stagespecific survival with the exception of unknown stage, which shows an improvement in all cancers compared. This increase in the survival of unknown cases is consistent with a reduction in the proportion of unknown cases that are of advanced stage.

Implications.
The results presented here support the work underpinning campaigns promoting early diagnosis, with survival estimates shown to be better for the cancers diagnosed at earlier stage. For all cancer types examined, diagnosis before stage 4 substantially increases the 1-year survival. For lung and ovarian cancer, any shift to all lower stages at diagnosis brings substantial benefit. For these two latter cancer types, there is also scope for increasing the early-stage-specific survival both by the development of more effective treatments and by ensuring the universal application of best current practice to all suitable patients, in other words reducing the large variations in the standards of care that are known to exist (Health and Social Care Information Centre, 2013).
In conclusion, the completeness of stage at diagnosis will allow more accurate comparisons between England and other countries. It will also allow the frequency of early diagnosis to be investigated more comprehensively, to examine regional and local variations and to enable better assessment of the campaigns aimed at promoting the earlier diagnosis of cancer.