Gene Expression Profiling of Adult Acute Myeloid Leukemia Identifies Novel Biologic Clusters for Risk Classification and Outcome Prediction

To determine if gene expression profiling could improve risk classification and outcome prediction in older AML patients, expression profiles were obtained in pre-treatment leukemic samples from 170 patients whose median age was 65 years. Unsupervised clustering methods were used to classify patients into six cluster groups (designated A-F) that varied significantly in rates of resistant disease (RD, P <.0001), complete remission (CR, P =.023), and disease free survival (DFS, P =.023). Cluster A (n=24), dominated by NPM1 mutations (78%), normal karyotypes (75%), and genes associated with signaling and apoptosis, had the best DFS (27%) and overall survival (OS, 25% at 5 years). Patients in clusters B (n=22) and C (n=31) had the worst OS (5% and 6%, respectively), cluster B was distinguished by the highest rate of RD (77%) and multidrug resistant gene expression ( ABCG2, MDR1) . Cluster D was characterized by a “proliferative” gene signature with the highest proportion of detectable cytogenetic abnormalities (76%; including 83% of all favorable and 34% of unfavorable karyotypes). Cluster F (n=33) was dominated by monocytic leukemias (97% of cases), also showing increased NPM1 mutations (61%). These gene expression signatures provide insights into novel groups of AML not predicted by traditional studies that impact prognosis and potential therapy.


Introduction
In the majority of patients, particularly those over 55 years of age, acute myeloid leukemia (AML) is a highly resistant disease and overall outcomes remain extremely poor. [1][2][3][4][5] While improved survival has been achieved in younger AML patients or in selected cytogenetic subsets, older patients are either unable to receive intensive chemotherapy or such therapy results in remission rates of only 25-55% and overall survival rates of 10% or less. 1,[6][7][8][9][10] In addition to age and white blood cell count, the presence of recurring cytogenetic abnormalities provides the most important prognostic information in AML. Unfortunately, cytogenetic abnormalities associated with favorable outcomes account for only 5-12% [t(8;21)], 5-8% [inv (16)] and 10-12% [t (15;17)] of all AML cases, and are disproportionately seen in younger patients. 11,12 In contrast, approximately 50-70% of all AMLs have normal or risk-indeterminate karyotypes. 11,13,14 Gene mutations confer additional prognostic information that may be useful in refining cytogenetic risk classification. [15][16][17][18][19] The most frequently acquired mutation in AML is a mutation at exon-12 of the nucleophosmin (NPM1) gene. This multifunctional, nucleocytoplasmic shuttling protein primarily resides in the nucleolus, playing a role in maintenance of genomic integrity, ARF-p53 pathway regulation, and centrosome duplication. 20,21 Mutated NPM1 relocates to the cytoplasm and disrupts normal NPM1 function. Approximately 25-35% of AML patients have NPM1 mutations, [22][23][24] with a higher percentage (47-60%) seen among those with a normal karyotype. 22,[25][26] The impact on survival is variable, but likely favorable, with secondary influences, such as concurrent FLT3 mutations having potentially significant roles. 23,24,26,27 The FLT3 mutations occur as internal tandem duplications (ITD), observed in 15-35% of AML, or point mutations of the intracellular tyrosine-kinase domain (TKD), seen in an additional 5-10% of patients. 19 The prognostic impact of FLT3 mutations trends towards decreased survivals or increased relapse rates primarily for patients with FLT3 ITDs. [28][29][30] In contrast to traditional cytogenetic analysis or the detection of mutations in individual genes, global gene expression profiling provides a powerful method to probe the marked biologic heterogeneity For personal use only. on December 2, 2017. by guest www.bloodjournal.org From of AML. Comprehensive expression profiles have the power to provide new insights into mechanisms of leukemogenesis and to enhance risk classification and therapeutic targeting in AML. A number of laboratories using supervised learning algorithms have identified unique gene expression signatures associated with karyotypic abnormalities, normal karyotypes, and NPM1 mutation status. [31][32][33][34][35][36][37][38][39] In contrast, we wished to determine whether gene expression profiling using an entirely unsupervised approach could reveal intrinsic biologic groups of AML among a set of well characterized older AML patients, with a high frequency of normal and unfavorable cytogenetic abnormalities. We further wished to determine whether the gene expression signatures we derived were useful in risk classification and therapeutic targeting in this poor risk disease.

Patients
This study utilized pre-treatment samples from patients with previously untreated de novo or secondary AML by FAB criteria, who were registered to SWOG clinical trials for patients over the age of 55 years (studies S9031, S9333), patients aged 15-55 years (S9034, S9500) and patients with secondary AML (S9126). Trial details have been previously reported. 2,9,[40][41][42] All trials except S9031 excluded patients with acute promyelocytic leukemia (FAB-M3); S9031 evaluation was limited to non-M3 AML patients who received induction chemotherapy with Ara-C and an anthracycline. Case selection was restricted to patients with cryopreserved blood or bone marrow containing >80% leukemic blasts, stored in the SWOG Myeloid Leukemia Repository (University of New Mexico) after appropriate informed consent.   11 For studies S9031, S9126, S9333 and S9500, response to induction chemotherapy was assessed according to SWOG criteria. 43 Study S9034, an intergroup trial coordinated by the Eastern Cooperative Oncology Group (E3489), used slightly different response criteria.

Gene expression profiling
RNA was prepared from thawed cryopreserved samples with the Qiagen RNeasy mini kit (Qiagen, Valencia, CA). All specimens had >80% blasts as confirmed by microscopic review of Wright stained cytospin preparations of the thawed cell suspensions. Total RNA concentration was quantified with the RiboGreen assay (Molecular Probes, Eugene, OR); RNA integrity and DNA contamination were evaluated as described at http//hsc.unm.edu/crtc/willmanresearch. 44 The isolated RNA was reverse transcribed into cDNA and re-transcribed into cRNA after double amplification using a modification reported by Ivanova et al to enhance detection of low abundance genes. 44,45 Biotinylated cRNA was fragmented and hybridized to HG_U95Av2 oligonucleotide microarrays (Affymetrix). 44 After analysis with Affymetrix Microarray Suite (MAS 5.0); the data was scaled to minimize experimental variation. 44 Technical criteria for case inclusion of the 185 initial specimens evaluated included: adequate total RNA >2.5 ug, good quality cRNA, good quality scanned images, and good experimental quality. Experimental quality was assessed by GAPDH > 1800, > 10% expressed genes, and GAPDH 3'/5' amplification ratios of < 4. High quality expression data were obtained on 170 of the 198 specimens, 133 from marrow and 37 from peripheral blood. Of the original 12,625 probe sets in the Affymetrix HG_U95Av2 probe sets, 9463 genes were "present" in at least 1 case; these genes were further analyzed after transformation to Savage rank scores (VxInsight). 44

NPM1 and FLT3 mutational status
Samples were evaluated for NPM1 mutations utilizing cDNA amplified to generate a 249 bp fragment spanning portions of exons 11 and 12 (see Supplement for details). 44 The PCR products were subjected to dissociation analysis (65ºC to 80ºC) with appropriate controls. Samples with characteristic melting profiles underwent agarose gel electrophoresis and hybridization with NPM1 variant A probe or a pool of

Statistical analysis
VxInsight TM , developed at Sandia National Laboratories for extremely large datasets, was the primary unsupervised data mining tool utilized in this study (http://www.cs.sandia.gov/projects/VxInsight.html). [47][48][49][50] Using a force-directed placement algorithm, clusters were formed one hundred times using different starting conditions for the random number generator. The most representative single ordination (the most central member of the whole set) was then determined by measurement of the total overlap of local neighborhoods around the individual genes. Analysis of variance (ANOVA) was used to identify rank ordered gene lists characterizing each cluster; bootstrap resampling was applied to estimate the stability of these lists. 50 Receiver operator characteristic (ROC) curves and genetic algorithm K-nearest neighbor method (GA/KNN) were additionally employed to identify top characterizing genes for the VxInsight derived clusters, as further explained in the Supplement. 44 The full rank ordered gene lists derived from ANOVA with bootstrapping, ROC, and GA/KNN are provided in the Supplement. 44 Principal component analysis (PCA) and hierarchical clustering were performed using MATLAB (MathWorks, Inc, Natick, MA). 51,52 Concordance between VxInsight and hierarchical clusters was measured by the adjusted Rand index, with Monte Carlo estimation of statistical significance (N=10000 replications). 53 Comparisons between clusters were based on the Kruskal-Wallis test for continuous variables (age, lab values), and on the χ 2 approximation of the Fisher exact test and Pearson's χ 2 test for independence for dichotomous and categorical variables [CR, resistant disease (RD), FAB classification, cytogenetic characteristics, FLT3 mutations]. Overall survival (OS) was measured from registration on treatment study until death from any cause, with observation censored for patients last known alive.
Disease free survival (DFS) was measured from the date the complete response was established until the relapse of leukemia or death from any cause, with observation censored for patients last known to be alive For personal use only. on December 2, 2017. by guest www.bloodjournal.org From without report of relapse. Distributions of OS and DFS were estimated by the method of Kaplan and Meier 54 and compared between clusters using the log rank test. 55 Multivariate analyses of cluster differences and prognostic factors were based on logistic regression models for CR and RD, and on proportional hazards regression models for OS and DFS. 56 In logistic regression models, differences in proportions between clusters are represented as odds ratios relative to a defined cluster. This permits the cluster differences to be compared on a consistent scale regardless of other terms in the model. The hazard ratio plays an analogous role for proportional hazards regression models. All P-values were twotailed and, in view of the exploratory nature of these analyses, were calculated without adjustment for multiple testing.

AML cohort
Gene expression profiles were obtained from a retrospective cohort of 170 patients with previously untreated AML. Clinical, morphologic, cytogenetic and mutation status of the cohort, outlined in Table   1, showed no gender predominance and a majority of patients (80%) over the age of 55 years with a median age of 65 years (range 20-84). Thirty-two cases (19%) were judged by clinical history to have secondary AML, while 104 (61%) had clinically de novo AML (clinical onset was not recorded in the two trials for patients of age 15-55, and in none of the other trials was secondary AML further classified as MDS-versus treatment-related). All FAB subtypes were included except AML-M3, with a preponderance of acute myeloblastic leukemia with maturation (FAB-M2, 35%). Adequate cytogenetic analyses were obtained on 141 (83%) of the patients, and 139 of these could be assigned to cytogenetic risk categories. The majority of cytogenetically evaluable cases fell into the intermediate cytogenetic risk group (59%) due to the high percentage of patients with normal karyotypes (46%).
For personal use only. on December 2, 2017. by guest www.bloodjournal.org From

Unsupervised clustering algorithms
VxInsight analysis partitioned the AML patients into six distinct and stable groups based on strong similarities in gene expression among the 9,463 genes, visualized in Figure 1.      were last known to be alive at 13 months to 10.9 years after starting treatment (median 6.1 years).
Overall survival did not vary significantly among clusters (P=.40), but generally paralleled the DFS results, with cluster A having the best OS and clusters B and C generally the worst (Table 2, Figure 3).

Figure 3
Response to induction chemotherapy varied significantly among the six clusters (Table 2). Sixty (35%) of the 170 patients were resistant to their protocol induction chemotherapy, with a significantly different RD rate seen between clusters (P < .0001). This was largely due to an exceptionally high RD rate in cluster B (77%) compared to all other clusters combined (43/148, 29%), although heterogeneity among the remaining five clusters was also significant (P=.021). Roughly complementary results were observed for CR. Seventy-three patients (43%) achieved CR, and the CR rate varied significantly among clusters (P=.023), being lowest in cluster B (14%). Forty-seven of the remitting patients have relapsed, and 11 others have died without report of relapse.
VxInsight cluster membership was not significantly correlated with patient age or de novo versus secondary onset of disease ( Figure 4, Table 3). Despite the absence of a significant association with age, it was noteworthy that only 4% of patients in the two clusters with worse outcomes (B, C) were under age 56, compared to 27% (32/117) of patients in the remaining clusters. Clinical and laboratory parameters that showed significant correlation with VxInsight cluster membership were: pretreatment white blood cell counts, blast percentages, platelet counts; FAB classification; normal or t(8;21) karyotypes, and NPM1 mutation status (Table 1,  NPM1 mutations were present in 30% (50/165) of cases with significant differences observed between VxInsight clusters (Table 1, 3, Figure 5). The highest prevalences were seen in cluster A (78%), which also had the highest percentage of females and normal karyotypes, and in cluster F (61%) with the predominance of monocytic leukemias. FLT3-ITD mutations were identified in 27% of cases (Table 1) with no significant differences among VxInsight groups (Table 3). A significant number of patients with FLT3-ITDs also had NPM1 mutations (Table 3). FLT3-TKDs were found in 12% of the AMLs investigated; cluster A had the highest percentage of point mutations (FLT3-TKDs).

Figure 5
Further analyses were performed to investigate whether comparisons of outcomes between the clusters might be biased by confounding effects of the other factors considered. In multivariate logistic regression analysis, increasing age (P=.024), secondary AML onset (P=.010) and unfavorable cytogenetic risk category (P=.030) had independent detrimental prognostic effects on RD. AML onset and/or cytogenetic risk group were unknown for 57 of the 170 patients. Therefore to allow for the possibility that results might be biased by the exclusion of these patients, the association between RD rate and cluster was estimated with and without adjustment for age, AML onset and cytogenetic category, for the 113 patients with complete data. The results, shown in Figure 6, confirm that heterogeneity of RD rates among the six clusters remained statistically significant (P<.0001) after adjusting for possible confounding. The variation of CR rates among the six clusters remained marginally significant after adjusting for age, AML

Genes distinguishing VxInsight clusters
Using ANOVA with bootstrapping, gene lists were derived that define the VxInsight clusters. The 50 most significant discriminating genes for each cluster are provided in the Supplement, 44 with a summary of these lists, including the most significantly up-regulated and down-regulated genes, given in Table 4.  For personal use only. on December 2, 2017. by guest www.bloodjournal.org From are overexpressed by cluster A. The top ranking gene, latent transforming growth factor (TGF) beta binding protein (LTBP1) activates latent TGF-beta, a modulator of apoptosis that is independent of caspase 3 mediated mechanisms. 57-59 FOXC1 is a TGF-beta1 responsive gene that possibly functions as a tumor suppressor gene. 60 Notably absent in cluster A is expression of the MHC II alleles.
Cluster B with the poorest clinical outcomes, shows increased expression of the multi-drug resistance gene ABCG2 (ranked 18). The multi-drug resistance membrane transporter (MDR1) is concurrently overexpressed (Figure 7). Additional genes of interest are PBX1 and serine/threonine protein kinase 17A (STK17A-ranked 23). STK17A plays a role in the regulation of apoptosis; PBX1 is a cofactor in genetic mechanisms that prevent myeloid differentiation but appears to lack inherent transformation ability in isolation. 61 Alternative gene lists utilizing different statistical and normalization methods are provided in Supplement. 44 These show extensive overlap with the ANOVA derived gene lists.

Discussion
We used a novel unsupervised clustering algorithm (VxInsight) to analyze gene expression profiles from older AML patients with a high proportion of intermediate and poor risk outcome factors. This type of analysis, without knowledge of prior class definitions, allows for identification of fundamental subsets of patients sharing similar expression signatures. Unanticipated similarities between cytogenetically diverse patient groups, as discovered in this study and reported by others, 35 would have been harder to detect with a more restrictive supervised approach. The result is an interesting separation of the AML cases into six distinct clusters with outcome differences.
In contrast to previous studies using unsupervised computational methods alone, 32,34,35 we found significant outcome differences between the clusters defined by gene expression for RD after induction therapy (P<.0001), CR rate after induction therapy (P=.023), and DFS (P=.023). The heterogeneity of RD and CR rates among clusters was not explained by confounding effects of age, AML onset and unfavorable cytogenetics, indicating that the clustering conveyed prognostic information independent of the other factors. For some patients, data was absent regarding prognostic factors, in particular de novo versus secondary onset of AML and cytogenetics. However in the multivariate regression analyses of treatment outcomes, it was evident that excluding the patients with incomplete data did not markedly influence the magnitude or statistical significance of differences between clusters. This was most clearly evident for RD, for which both the statistical significance of cluster differences and the ORs representing the magnitudes of those differences were essentially unchanged by the adjustment for covariates.
Evaluation of DFS was limited by the small number of remitting patients with complete data. For CR, adjusting for the covariates decreased the statistical significance from P=.023 to .051, which is not a profound change, especially given the necessity of excluding patients with incomplete data from the multivariate analysis.
Members of cluster A had the best DFS and OS: 27% and 25%, respectively, at 5 years. The striking finding for this group was the high percentage of NPM1 mutations (78%). This group has many of the characteristics emerging for cases of AML with NPM1 mutations including the disproportionate number of women (67%), increase in normal karyotypes (75%), older age (but not significantly different than other cluster groups), and higher WBC counts. 22 Gene expression profiles associated with NPM1 mutations are dominated by a stem-cell molecular signature 39 . Activation of HOX genes and TALE partner genes (i.e. MEIS) are found in NPM1 gene signatures 39 . The reportedly favorable impact of NPM1 mutations on survival has included higher CR rates 23,26 , and a trend to longer OS and EFS 26 . However, other studies have observed either no significant effect, 22,25 or an impact only when NPM1 mutated cases are also FLT3-ITD negative. 24,26,27 While AML with NPM1 mutations are associated with increased FLT3 mutations, 22-24 this relationship was not observed for cluster A members. Cluster A had a disproportionate number of FLT3 mutations involving TKDs rather than ITDs, but the overall FLT3 mutation incidence was similar to the other VxInsight groups. FLT3 TKDs have been linked to increased release of IL-12 by leukemic blasts; IL-12A was overexpressed by members of cluster A. 63 IL-12 has anti-angiogenic and anti-tumor effects and unless offset by an increased level of pro-angiogenic regulators, may have a role in improving outcomes. 63,64 Cluster A members had overexpression of Wilms tumor (WT1) gene; this gene is overexpressed at variable levels in 75-100% of AMLs at diagnosis. 18,65 A lower level of expression of WT1 has been seen among the more differentiated AMLs in most but not all series. 18,66 Because of the increased WT1 expression, patients in cluster A may be more likely to benefit from WT1-specific immunotherapy than other AML patients, either in the form of a T-cell approach or a vaccine. 14,66 One potential problem is that a number of the MHC class II genes were down-regulated in cluster A. Because tumor cells are poorly immunogenic when deficient in MHC class II molecule expression, the leukemic cells may escape host immunity. Cluster A also had over-expression of genes that promote apoptosis (LTBP1, CASP3), with LTBP1 being a particular important gene for predicting cluster A membership (ranked 1) and for predicting NPM1 mutations. 24,39 For personal use only. on December 2, 2017. by guest www.bloodjournal.org From Patients in clusters B and C had the worst DFS, with estimated probabilities of 0% and 6%, respectively at five years. They also had the poorest OS, although OS did not vary significantly among clusters. Cluster B, in particular, is an interesting group of 22 patients: 77% were unresponsive to induction chemotherapy, and its three remitting patients all relapsed within 16 months. This group of patients might be considered prone to disease resistance, since they had the highest median age (68 years) and highest incidence of secondary disease (32%); yet these factors did not vary significantly in the six clusters. Despite 42% of cases having a normal karyotype, only one individual (5%) had an NPM1 mutation. One gene overexpressed by these patients was ABCG2. ABCG2, also termed breast cancer resistance protein (BCRP) and mitoxantrone resistance protein (MRX) is a member of the ATP-binding cassette (ABC) superfamily of membrane transporters, that function as drug efflux pumps to remove chemotherapeutic agents from cells. 67 ABCG2 is expressed by approximately one-third of adult AMLs when measured using semi-quantitative RT-PCR or flow cytometric analysis. [68][69][70][71] Of relevance to our study is the report by Steinbach and colleagues, who found significantly higher median ABCG2 gene expression levels in 24 pediatric AML patients who failed to achieve remission after initial induction chemotherapy compared to the 21 patients who achieved remission. 72 Similar and discrepant results have been reported by others using varying and sometimes discordant analytical methods and study designs. 71,73,74 The significant ABCG2 over-expression among members of our high induction failure cluster, B, supports a role for ABCG2 in chemoresistance, possibly in combination with MDR1. 75,76 Permeability glycoprotein (MDR1, P-gp or ABCB1) was concurrently overexpressed among patients in cluster B but MDR1 alone did not significantly differentiate this group from the other AML patients. Drug-sensitive cells transfected with ABCG2 become resistant to mitoxantrone, doxorubicin, daunorubicin and topotecan, 69 while ABCG2 expressing cells from AML patients are resistant to daunorubicin in vitro. 77 Therefore, treatment methods circumventing ABCG2 mediated multidrug resistance should be considered for evaluation in future patients with gene profiles similar to cluster B members. These include the use of For personal use only. on December 2, 2017. by guest www.bloodjournal.org From ABCG2 inhibitors and antineoplastic agents that show poor ABCG2-mediated efflux (i.e. idarubicin or newer agents in development). 78 The other poor outcome cluster, C, had the highest rate of complete response to induction therapy (58%) with only 16% of the 31 patients showing initially resistant disease. However these CRs were comparatively short lived: in the analysis of DFS, cluster C had the largest hazard ratio of the six clusters.
Many high-ranking genes defining this cluster were down-regulated (54% of the top 50 genes). Among the significantly over-expressed genes were genes involved in immunoregulation including; interferon regulatory factor-4 (IRF4), a gene regulated by NF-kappaβ member c-rel; 79  Cluster E represented a small group of patients (n=18), many of whom were registered to SWOG protocol S9500 for the treatment of younger adults (<56 years). Cluster F (n=33) was defined by AML with monocytic differentiation (97% of members), the highest pre-treatment white blood cell counts, and a high percentage of NPM1 mutations (61%). NPM1 mutations have been shown to be increased among monocytic leukemias and the gene expression profiles contained many genes pertinent to monocyte function. 25 signature" masks other genes of potential biologic significance in these groups. 35 We are currently evaluating whether the outcome of monocytic leukemias with NPM1 mutations and a "monocytic gene signature" differs significantly from AMLs with a "stem-cell molecular signature" seen in cluster A and reported by Alcalay et al. 39 This gene expression profiling study highlights the divergent mechanisms and pathways of leukemic transformation that are not appreciated by current methods of AML diagnosis, classification and risk assignment. No bias was induced during cluster selection in this analysis and therefore these subsets represent true reflections of the intrinsic biology in this cohort of patients. For example, the significance of NPM1 mutations in AML was unknown at the initiation of this study, yet the gene expression profiles clustered groups of patients together with this unique genetic abnormality. Additional studies will be For personal use only. on December 2, 2017. by guest www.bloodjournal.org From important to determine if the improved survivals in cluster A with increased NPM1 mutations relate to the gene expression signatures displayed by these cases, regardless of the FLT3 mutational status. We are now evaluating the relative significance of these genes in predictive models of outcome using supervised learning methods in this same cohort (manuscript in preparation). The gene signatures identified in this study will hopefully provide clues to new therapeutic interventions for older AML patients who have historically done poorly with current treatment regimens. Confirmatory studies and prospective validation of our results are required to continue to understand the significance of our clusters of patients, such as cluster B with increased RD. These analyses are important to enhance risk classification and the identification of individual genes and pathways that can be exploited for improved therapeutic interventions.
For personal use only. on December 2, 2017. by guest www.bloodjournal.org From 1 Analysis of variance was used to identify rank ordered gene lists with bootstrap resampling to estimate the stability of these lists. * P-value represents the estimated fraction of time that a gene was ranked at or above its observed position after tabulation of rankings from 1,000 bootstrap resamplings (see Supplement). 44 For personal use only. on December 2, 2017. by guest www.bloodjournal.org From  (B) OS showed a trend that paralleled the DFS findings among the six clusters (log rank P = .40).
Tickmarks indicate censored observations for patients last known to be alive without report of relapse.  Each horizontal row represents an individual AML patient and each column is the clinical variable for that individual. Age is presented as a continuum with the lightest color (white) representing the youngest patients and the darkest color (dark red) representing the oldest patients. Discode relates to AML onset; it is color coded and categorized similar to the remaining clinical variables, as described in the legends below the associated columns. Distribution of FAB classification varied significantly among clusters (P < .0001).