Clinical Neuropathology Practice Guide 3-2013: levels of evidence and clinical utility of prognostic and predictive candidate brain tumor biomarkers

A large number of potential tissue biomarkers has been proposed for brain tumors. However, hardly any have been adopted for routine clinical use, so far. For most candidate biomarkers substantial controversy exists with regard to their usefulness in clinical practice. The multidisciplinary neurooncology taskforce of the Vienna Comprehensive Cancer Center Central Nervous System Unit (CCC-CNS) addressed this issue and elaborated a four-tiered levels-of-evidence system for assessing analytical performance (reliability of test result) and clinical performance (prognostic or predictive) based on consensually defined criteria. The taskforce also consensually agreed that only biomarker candidates should be considered as ready for clinical use, which meet defined quality standards for both, analytical and clinical performance. Applying this levels-of-evidence system to MGMT, IDH1, 1p19q, Ki67, MYCC, MYCN and β-catenin, only immunohistochemical IDH1 mutation testing in patients with diffuse gliomas is supported by sufficient evidence in order to be unequivocally qualified for clinical use. For the other candidate biomarkers lack of published evidence of sufficiently high analytical test performance and, in some cases, also of clinical performance limits evidence-based confirmation of their clinical utility. For most of the markers, no common standard of laboratory testing exists. We conclude that, at present, there is a strong need for studies that specifically address the analytical performance of candidate brain tumor biomarkers. In addition, standardization of laboratory testing is needed. We aim to regularly challenge and update the present classification in order to systematically clarify the current translational status of candidate brain tumor biomarkers and to identify specific research needs for accelerating the translational pace.


Introduction
In clinical medicine biomarkers are defined as objectively measurable/determinable patient-related factors that provide clinically meaningful disease-related information with regard to diagnosis, prognosis, therapy decisions and patient follow-up [1,2,3]. In neuropathological oncology, diagnostic, prognostic and predictive biomarkers assessed in patient biopsy specimens and/or body fluids are of relevance [4,5]. A large number of prognostic and predictive candidate tissue biomarkers have been proposed for brain tumors, but almost none have translated into routine clinical use so far [4,6]. For most biomarkers, there is substantial controversy regarding their clinical usefulness [5]. In this article, we present a levels-of-evidence system for assessing the current translational status of candidate biomarkers. This levels-of-evidence system is based on criteria, which have been elaborated and consensually defined by the multidisciplinary neurooncology task force of the Vienna Comprehensive Cancer Center -Central Nervous System Unit (CCC-CNS). We apply this system to currently debated prognostic and predictive neuro-oncological candidate biomarkers in order to assess their clinical utility.

Methods and definitions
Our multidisciplinary neurooncology task force within the CCC-CNS has defined the Clinical Neuropathology, Vol. 32 -No. 3/2013 (148-158) criteria for a four-tiered levels-of-evidence system and an adjunct scoring system for assessing the clinical utility of prognostic and predictive candidate brain tumor biomarkers in a continued process of discussion and consensual agreement ( Table 1). The levels-of-evidence system is related to the two crucial dimensions analytical and clinical performance, which are considered the essential elements for clinical biomarker translation [4]. Having established the levels-of-evidence system, we used it for assessing the evidence levels and clinical utility for the following candidate brain tumor biomarkers: O6-methylguanine methyl-transferase gene (MGMT) promoter methylation status, isocitrate dehydrogenase 1 gene (IDH1) mutation status, chromosome arms 1p19q co-deletion status, Ki67 tumor cell proliferation index, MYCN status, MYCC status and β-catenin expression. We selected these biomarker candidates, because they are considered to be close to routine clinical use, but their translational status is still subject to controversy and discussion.
For each candidate biomarker, we separately assessed the analytical performance and the prognostic and predictive clinical performance. Our ratings are based on review of published data and rely on consensual agreement within our multidisciplinary task force.
We defined analytical performance as the reliability of the results yielded by a particular assessment or test. To this end, we evaluated published data with regard to repeatability of test results (intra-laboratory agreement, intra-observer agreement) and reproducibility of testing (inter-laboratory agreement, inter-observer agreement).
As clinical performance, we defined the prognostic and predictive value of a given candidate biomarker. Prognostic markers were defined by their association with patient outcome, and predictive markers by their association with response to a given therapy. As the definition of prognostic and predictive markers in (neuro)oncology has been subject to continued debate and controversy, we provide -for illustration purpose -a generic example of a biomarker in medicine with prognostic, predictive and diagnostic properties, depending on the issue of interest (see Textbox). We perceive this basic biomarker concept also as valid for the field of neurooncology.
Only those factors reaching an A or B level for both analytical and cliniical performance were considered to have adequate justification for recommendation in routine clinical use as prognostic or predictive biomarkers (Table 1).

IDH testing -analytical performance
A recently published ring trial (round robin test) and several studies including large patient cohorts support the high analytical performance of immunohistochemical detection of the IDH1-R132H mutation [7,8,9,10,11,12]. In contrast, DNA-based IDH sequencing procedures showed inconsistent results among different laboratories [12].

IDH testing -prognostic and predictive clinical performance
The high prognostic clinical performance of IDH mutations in diffuse gliomas was confirmed in several large cohorts [7,13,14,15]. However, all of these studies had a retrospective design. Data from adequately designed prospective trials have not yet been published. With regard to a potential predictive value of IDH1 mutations, there is currently only little evidence based on few and inconsistent results from small studies [7,16,17,18].

Interpretation of the clinical utility of IDH testing
Assessment of IDH1-R132H mutation status by immunohistochemistry has sufficient evidence for clinical use as a prognostic marker in diffuse gliomas. IDH testing by DNA-based methods, which is in principle suitable to detect also other and rarer forms of IDH mutations, has promising clinical utility but is not yet ready for routine clinical use. Standard test protocols with the potential of sufficiently high analytical performance need to be elaborated and validated.
Textbox. Illustrative generic example of a biomarker in medicine which can be used as prognostic, predictive or diagnostic biomarker, depending on the issue of interest:

Prognostic biomarker:
Core criterion of a prognostic biomarker: provides information with regard to some outcome over time (e.g., phenotype, survival, etc.). Proband: child, gender is unknown Question: as this child grows up, will it adopt a male or female phenotype? Biomarker: sex chromosomal status from karyogram Result of biomarker analysis: XY sex chromosomal status Outcome: when this child becomes an adult, it will adopt a male phenotype, because it is biologically male This example illustrates that a prognostic marker allows to foresee the result of a natural development over time.

Predictive biomarker:
Core criterion of a predictive biomarker: provides information whether a particular intervention or therapy is likely to be effective in the tested person or not. Proband: young adult person of unknown gender Question: will the administration of oral contraceptives be effective? Biomarker: sex chromosomal status from karyogram Result of biomarker analysis: XX sex chromosomal status Outcome: administration of oral contraceptives will be effective, because the person is biologically female This example illustrates that a predictive biomarker allows foreseeing the effect of a particular intervention depending on the status of the biomarker.

Diagnostic biomarker:
Core criterion of a diagnostic biomarker: identifies / confirms a particular entity. Proband: person, gender is unknown Question: is this person biologically female or male? Biomarker: sex chromosomal status from karyogram Result of biomarker analysis: XY sex chromosomal status Interpretation: This person is biologically male This example illustrates that a diagnostic biomarker identifies/confirms a particular entity.
We consider this biomarker concept as generic and the core criteria as valid also for the field of neurooncology.  1p19q co-deletion 1p19q testing -analytical performance Published data indicate promising analytical performance of the following methods for 1p19q co-deletions testing in oligodendroglial tumors: fluorescence in situ hybridization (FISH), PCR-based loss of heterozygosity (LOH) analysis and multiplex ligation-dependent probe amplification (MLPA). FISH seems to be the most robust method allowing also visual control of test results. However, a published ring trial formally documenting the high inter-laboratory reproducibility is missing [19].

1p19q testing -clinical performance
There is a high level of evidence for the prognostic value of 1p19q co-deletion in oligodendroglial tumors coming from several studies, including two large prospective trials [14,20]. These studies also indicate a promising predictive value of the 1p19q codeletion with regard to a response to PCV (procarbazine, lomustine, vincristine)-based chemotherapy in anaplastic oligodendroglial tumors.

1p19q testing -clinical utility
There is strong evidence that 1p19q codeletion has a prognostic and predictive value in anaplastic oligodendroglial tumors. Ring trials remain to be performed in order to objectively validate the high reproducibility of 1p19q testing and thus making it recommendable for routine clinical use.

Ki67 proliferation index analysisanalytical performance
Several methods for Ki67 proliferation index analysis exist, such as direct counting, semi quantitative estimation, computerbased image analysis and direct microscopeassisted counting with a graticule. These assessments are associated with relatively high reproducibility among trained observers [21,22,23,24,25]. However, Ki67 immunostaining procedures lack standardization of antigen retrieval and staining. Modalities of tissue fixation may also impact on Ki67 staining results. This lack of standardization may limit reproducibility among different laboratories [26].

Ki67 tumor cell proliferation index analysis -clinical performance
A robust association of high Ki67 tumor cell proliferation index with unfavorable survival times in patients with ependymoma has been shown in several independent retrospective series [25,27,28,29]. For nonfunctioning pituitary adenomas promising data exist to support an inverse correlation of Ki67 tumor cell proliferation index with time to progression [30,31,32]. With regard to the prognostic value of Ki67 tumor cell proliferation index, only small studies or conflicting results exist for oligodendroglial tumors, diffuse astrocytomas, meningiomas and medulloblastomas [24,33,34,35,36,37,38,39,40,41,42,43,44]. Several studies indicate that the Ki67 tumor cell proliferation index has no prognostic value in glioblastomas [36,38,45]. For none of the mentioned entities sufficient data exist to support a predictive value of the Ki67 tumor cell proliferation index.

Ki67 tumor cell proliferation index analysis -clinical utility
The Ki67 tumor cell proliferation index is associated with a high prognostic clinical performance in ependymoma. However, inter-laboratory variability of tissue processing and immunohistochemical staining protocols limit its clinical utility. There is a need for better standardization of tissue processing and Ki67 immunostaining procedures.

MGMT methylation testinganalytical performance
Various DNA-based methods for testing of MGMT promoter methylation status have been developed, but for none of them intraand interlaboratory reproducibility has been sufficiently analyzed [46,47,48,49,50].

MGMT methylation testing -clinical performance
The high prognostic value of MGMT methylation in glioblastoma has been confirmed by many studies including prospective trial data [51,52]. A predictive value of MGMT promoter methylation status for response to temozolomide-based chemotherapy in elderly glioblastoma patients is supported by two independent prospective randomized clinical trials [53,54].

MGMT methylation testing -clinical utility
In glioblastoma, a high prognostic clinical performance of MGMT promoter methylation status has been robustly confirmed. In the subgroup of elderly patients there is also evidence for a high predictive value. However, there is insufficient evidence for a high analytical test performance. In particular, intra-and interlaboratory reproducibility remains to be confirmed by adequate scientific data. This lack of evidence impedes recommendation of MGMT testing for routine clinical use.

MYCC amplification testinganalytical performance
Promising data in terms of analytical performance exist for FISH-based analysis in medulloblastoma [55,56,57,58]. Trials analyzing the inter-observer and inter-laboratory reproducibility have not been performed so far, but currently studies are accomplished to test the analytical performance.

MYCC amplification testing -clinical performance
The investigation of the clinical performance of MYCC amplification is limited by sample size issues, as medulloblastomas are relatively rare and only ~ 5% of the cases harbour MYCC amplification. However, several retrospective studies consistently show an association of poor patient outcome and MYCC amplification status [55,58,59,60].
In forthcoming SIOP PNET clinical trials, patients with tumors harboring a MYCC amplification will be excluded from the average risk medulloblastoma group and included into the high-risk patient group.

MYCC amplification testing -clinical utility
Data indicate a significant prognostic value of MYCC gene amplification status in medulloblastoma. Currently, there is still a lack of data confirming a high analytical performance of MYCC amplification testing.

MYCN amplification testinganalytical performance
Like in MYCC gene amplification testing, currently available data suggest that FISH has the highest analytical performance for identification of MYCN amplifications [55,56,57,58]. No ring trial systematically investigating the inter-observer and inter-laboratory reproducibility has been reported so far. Currently, studies are performed to test the analytical performance.

MYCN amplification testing -clinical performance
Similar as MYCC only ~ 5% of medulloblastomas harbor MYCN amplification. A prognostic value of MYCN amplification has been reported in several studies [55,58,59,60]. Recent data suggest that MYCN amplified medulloblastomas comprise two different molecular subgroups with different clinical characteristics and prognosis [61]. Thus, to date available data do not support a predictive value of MYCN gene amplification status in the whole medulloblastoma cohort, and refinement of the definition of MYCN amplified tumors by using additional clinical and molecular parameters is needed. In the forthcoming SIOP PNET5/6 clinical trial for average and low-risk medulloblastomas, patients with tumors harboring a MYCC amplification will be excluded and treated with a high risk protocol which is currently developed.

MYCN amplification testing -clinical utility
Currently, there is insufficient evidence, both for analytical and clinical performance of MYCN gene amplification status testing in medulloblastoma. Therefore, it does not fulfill the criteria for being implemented in the routine clinical setting. There is a need to systematically address both, analytical and clinical performance of MYCN amplification testing in adequately designed studies.

β-catenin status β-catenin testing -analytical performance
The β-catenin status in medulloblastomas can be tested by immunohistochemistry alone or in combination with gene sequencing [62,63,64,65]. No ring trials and consensus guidelines for β-catenin immunohistochemistry evaluating the analytical performance of β-catenin testing have been conducted so far. Currently, studies are performed to test the analytical performance of β-catenin immunohistochemistry.

β-catenin testing -clinical performance
β-catenin protein expression and mutations within the β-catenin encoding gene (CTNNB1) have been associated with improved patient outcome in several retrospective medulloblastoma studies including large patient cohorts [59,64,66,67], but this correlations have so far not been confirmed in prospective studies. The predictive value of the β-catenin status in medulloblastoma will be tested in the forthcoming SIOP PNET5 clinical trial.
In supratentorial primitive neuroectodermal tumors (CNS PNET), several independent studies have shown that β-catenin mutation is neither a prognostic nor predictive marker [68].

β-catenin testing -clinical utility
There is evidence for a prognostic value of β-catenin testing in medulloblastomas.
However, there is a need for ring trials and elaboration of consensus guidelines for standardization of laboratory protocols for β-catenin testing. The prognostic value of the β-catenin status in medulloblastoma needs to be confirmed in adequately designed studies.

Discussion
Among the biomarker candidates that were evaluated according to our levels-ofevidence system, only immunohistochemical testing for IDH1-R132H mutations unequivocally meets the criteria that indicate sufficient evidence for routine clinical use.
The most common and important reason why the other candidate biomarkers failed to meet the criteria for routine clinical use was lacking evidence of sufficiently high analytical test performance. This result indicates that there is a strong need of studies that specifically address the issue of test reproducibility, e.g., by means of repeatability testing, ring trials and interlaboratory comparison, as has been done recently in the case of IDH mutation testing [12,69]. Indeed, this ring trial can be considered as the final building element definitely translating IDH testing of diffuse gliomas into routine clinical use [69].
Another necessity specific to analytical performance is the standardization of testing procedures which constitutes an important prerequisite for interlaboratory comparability of test results. In the case of the widely used Ki67 immunostaining for example, a consensus-based standardized immunohistochemical staining procedure does not exist so far, which limits interlaboratory comparability of Ki67 tumor cell proliferation index threshold values. In the future, Euro-CNS may serve as a common platform for the definition of test standards as has recently been taken place in the case of the 1p/19q FISH protocol [19].
The prognostic clinical performance for most of the candidate biomarkers included in this study has been clarified by means of adequately designed and sufficiently powered clinical trials. However, with regard to certain biomarkers relevant for pediatric brain tumors (e.g., MYCC and MYCN amplification, β-catenin testing), the prognostic value has not yet been sufficiently clarified because of the rarity of these tumors. This limitation underscores the need for sharing of tissue specimens within multicenter research collaborations, in order to increase both the total case numbers and -in further consequence -the statistical power of patient outcome studies.
For virtually none of the investigated candidate biomarkers, a sufficiently high evidence level for predictive clinical performance exists that would allow rating them as ready for clinical use. The only candidate biomarkers, for which published literature indicates a high predictive clinical performance is MGMT testing in glioblastomas of the elderly and 1p19q status in anaplastic oligodendroglial tumors [14,20,51,52]. However, translation of MGMT testing into common clinical use has been impeded by controversial evidence for a sufficiently high analytical test performance. In order to translate MGMT testing into common clinical use, a robust and reproducible method needs to be established and validated. Validation of currently used MGMT test methods seems to be very challenging, mainly because of methodological reasons [48,50].

Conclusion
Our four-tiered levels-of-evidence system allows us to clarify the current translational status and clinical utility of candidate brain tumor biomarkers. Among the currently most debated candidates, only IDH mutation testing in diffuse gliomas is supported by sufficiently high evidence unequivocally qualifying it as prognostic clinical biomarker that is ready for routine clinical use. In the other biomarker candidates, insufficient evidence for high analytical test performance and -in some instances -also low clinical performance (mainly due to low case numbers) seem to be the major obstacles that impede the fast translation of candidate biomarkers into common clinical use. Future studies should be aware of and specifically address these limitations.
We intend to regularly update the current translational status of candidate brain tumor biomarkers. Such periodic reassessments assessment may also be helpful to identify and exclude inappropriate biomarker candidates, and should be beneficial to identify specific research needs that may help to accelerate the translational pace.