Acquired mutations and transcriptional remodeling in long-term estrogen-deprived locoregional breast cancer recurrences

Background Endocrine therapy resistance is a hallmark of advanced estrogen receptor (ER)-positive breast cancer. In this study, we aimed to determine acquired genomic changes in endocrine-resistant disease. Methods We performed DNA/RNA hybrid-capture sequencing on 12 locoregional recurrences after long-term estrogen deprivation and identified acquired genomic changes versus each tumor’s matched primary. Results Despite being up to 7 years removed from the primary lesion, most recurrences harbored similar intrinsic transcriptional and copy number profiles. Only two genes, AKAP9 and KMT2C, were found to have single nucleotide variant (SNV) enrichments in more than one recurrence. Enriched mutations in single cases included SNVs within transcriptional regulators such as ARID1A, TP53, FOXO1, BRD1, NCOA1, and NCOR2 with one local recurrence gaining three PIK3CA mutations. In contrast to DNA-level changes, we discovered recurrent outlier mRNA expression alterations were common—including outlier gains in TP63 (n = 5 cases [42%]), NTRK3 (n = 5 [42%]), NTRK2 (n = 4 [33%]), PAX3 (n = 4 [33%]), FGFR4 (n = 3 [25%]), and TERT (n = 3 [25%]). Recurrent losses involved ESR1 (n = 5 [42%]), RELN (n = 5 [42%]), SFRP4 (n = 4 [33%]), and FOSB (n = 4 [33%]). ESR1-depleted recurrences harbored shared transcriptional remodeling events including upregulation of PROM1 and other basal cancer markers. Conclusions Taken together, this study defines acquired genomic changes in long-term, estrogen-deprived disease; highlights the importance of longitudinal RNA profiling; and identifies a common ESR1-depleted endocrine-resistant breast cancer subtype with basal-like transcriptional reprogramming.

Hormone receptor positive breast cancer has served as a prototype for targeted therapy due to the well-established efficacy of estrogen deprivation. Largely because of these approaches, breast cancers are somewhat unique in that recurrences can occur years, sometimes decades following the primary diagnosis [1][2][3][4] . Given that the majority of patients receive long-term maintenance regimens of either a selective estrogen receptor modulator (SERM) or aromatase inhibitor (AI), recurrent breast cancers are often classified as estrogen-independent given their ability to thrive in an estrogen-deprived environment. Identifying the biological mediators that allow breast cancer cells to bypass their dependence on estrogen is a crucial step in understanding advanced breast cancer biology and defining novel therapeutic targets.
Defining these molecular processes in patient samples, however, has been challenging because of the logistics in obtaining well-characterized, longitudinally collected biospecimens.
Nevertheless, shared features of more advanced breast cancers have emerged, such as relapsed tumors losing expression of ER and over 20% of metastatic ER-positive breast cancers acquiring mutations in ESR1 that confer ligand-independent signaling [5][6][7] . Other largely accepted mechanisms of estrogen-independence are bypass activations of mitogenic pathways such as MAPK and PI3K through initiating FGFR, EGFR and IGF signaling and exploitation of the Rb-CDK-E2F axis [8][9][10][11][12] . Less well validated, more recently discovered mechanisms include ESR1 fusions and amplifications 13,14 .
Recent studies analyzing multiple, longitudinally collected, pre and post-treatment samples have shown clonal evolution and selection in the context of targeted therapies [15][16][17][18] .
Similar work analyzing hormone-receptor positive breast cancers have mainly been restricted to short-term pre/post neoadjuvant therapy analyses [19][20][21][22] . One of the most comprehensive genomic studies of this type was a multi-platform effort that characterized the clonal architecture of tumors after four months of AI therapy 23 . Although drastic clonal remodeling was observed at the DNA-level, few recurrent resistance mechanisms were appreciated. A more recent, large-scale study showed activating ERBB2 mutations, MAPK activation and NF1 loss as mechanisms possibly driving endocrine resistance-with some of these alterations being confirmed in subsequent studies [24][25][26][27] . The majority of this work has notably been performed on metastatic tissues-whether or not some of these changes occur locally as a result of estrogenindependence before distant spread is unknown.
Thus, to better define both DNA and transcriptional changes that occur in long-term estrogen-independent tumors, we undertook a targeted analysis of DNA/RNA alterations in ~1,400 cancer genes in 12 paired primary and locoregional recurrences from patients with ERpositive breast cancers that were documented as being treated with estrogen-depleting therapy.
The median time to recurrence was 3.7 years, with the longest time to recurrence being over 7 years.

Expression and copy number changes in local recurrences.
Dual hybrid-capture DNA/RNA sequencing was performed for 1,400 cancer genes on 12 paired primary and local recurrences from patients with ER-positive breast cancers that underwent continuous endocrine therapy (Table 1). RNA-seq data (Supplementary Table 1

Outlier expression gains and losses.
To further explore major expression changes that may be driving recurrence and therapy resistance, an outlier expression analysis was performed using gene-level fold-change values of each patient-matched case (Data Supplement S8).
These included gains and losses in shared pathway members, notably NTRKs and SFRPs respectively, targetable upregulation of growth factor pathway mediators such as FGFR4 and EGF and outlier gains in the CDK regulator CCNE1. 3 of 12 cases also shared outlier expression gains in TERT, with case ERLR_14 harboring a particularly extreme enrichment from near undetectable levels in the primary tumor ( Figure 3B). Case ERLR_03's recurrence, which was most dissimilar to its patient-matched pair transcriptionally, showed extreme loss and gain of ESR1 and ERBB2 respectively. CNA analysis confirmed recurrence-specific ERBB2 amplification and is consistent with previous studies of endocrine therapy-treated breast cancers selecting for HER2-signaling in more advanced tumors. The most recurrent outlier loss involved ESR1.

ESR1 depleted recurrences.
Five cases showed outlier expression losses of ESR1 ( Figure   4A). Despite estrogen receptor being the driver of ER-positive breast cancer and a major regulator of transcription; counterintuitively, 4 of 5 of the recurrences which lost ESR1 expression generally retained the expression profile of their patient-matched primary ( Figure   1A). Importantly, many of these cases also harbored very similar CNA profiles ( Figure 1B (Table 2). NDRG1, a particularly compelling candidate since it also showed upregulation in LTED breast cancer models, was further interrogated using METABRIC data. Like PROM1 and KLK7, NDRG1 is most highly expressed in basal breast cancers; yet, when expressed in ERpositive primary tumors, NDGR1 confers significantly worse disease-specific survival outcomes (Supplementary Figure 8).

DISCUSSION
In this study, a targeted RNA/DNA analysis of approximately 1,400 cancer genes in ERpositive primary breast cancers and matched long-term, endocrine therapy treated local recurrences was performed. We found general conservation of transcriptional and copy number Nearly all recurrences are more similar transcriptionally to their matched primaries than to other, long-term estrogen deprived tumors-reinforcing the notion that advanced cancers generally retain their core transcriptional programming, even after nearly a decade of dormancy [26][27][28][29] . Furthermore, amplifications and deletions of recurrences are markedly similar to primaries, supporting recent evidence from breast cancer single-cell sequencing that structural variation is likely an early event and many CNAs, even in metachronous therapy-resistant tumors, may be shared by the majority of subclones 32 . An important exception to this conservation was ERLR_03_R1, a recurrence with a completely unique transcriptional and copy number profile than its matched primary. Evidence has emerged of so-called 'collision tumors', whereby two synchronous, distinct cancers can merge anatomically and only under the selective pressures of therapy or through deep sequencing , their individuality can be unmasked 23,33 . Indeed, this "recurrence" switched to ER-negative/HER2-positive from ERpositive/HER2-negative clinically, and thus could represent a different cancer than the primaryalthough the level of shared SNVs suggests some degree of clonal relatedness. Collectively, these results begin to unravel the complex adaptations that breast cancer populations undergo when under the selection of long-term estrogen depleting therapies longterm. We identify acquired DNA-level mechanisms of resistance, such as mutations in ARID1A, other transcriptional regulators and multiple mutation selection within PIK3CA-but more importantly, uncover the most recurrent genomic adaptations taking place appear to be at the transcriptional level. These include targetable outlier gains and modifications in NTRKs as well as a distinct population of ESR1 depleted recurrences that enrich themselves with genes generally expressed in basal breast cancers-such as PROM1 and NDRG1. Preclinical, mechanistic investigations into these temporally altered genes are warranted given they may uncover novel and targetable mechanisms of endocrine therapy resistance in advanced breast cancers.

Patient Samples, tissue processing and nucleic acid extraction. Institutional Review Board approval from both participating institutions (University of Pittsburgh IRB# PRO15050502, The
Charité IRB Office) was obtained prior to initiating the study. Inclusion criteria for this study were supporting reads containing the altered allele, and with AF greater than 0.05 in either primary or recurrence were considered.

Copy number alterations.
To estimate copy number ratios, CNVkit was implemented on processed bam files using default settings and the -drop-low-coverage option 68 . A pool of bam files from adjacent normal tissue, sequenced in the same manner, was used as a reference.
Probe and segment level copy number estimates were finalized with CNVkit's call function, which utilizes circular binary segmentation 69 . To adjust for tumor purity and normal contamination, the -m clonal option was used with tumor purities from pathologic evaluations.

8
Copy number ratios were then plotted with the heatmap function and copy number values were assessed and plotted with ggplot2. Gene-level copy number estimates represent the mean copy number call across all probe targets. CNVkit copy number ratios showed a near normal distribution and ERBB2 copy number values demonstrated a strong correlation (pearson R = 0.924, p-value < 0.001) with expression (Appendix A.4: Figure 42). Pearson correlations as distance measurements and the "average" agglomeration method.

Differential gene expression, clustering and outlier gains and losses.
Differential expression between primary and recurrent tumors was analyzed with limma. Raw counts were input into the voom function and quantile normalized prior to fitting the linear model and performing the empirical Bayes method for differential expression 70,71 . The linear model was fitted with a design that accounts for the paired nature of the cohort (model = ~Patient+Tissue [primary or recurrence]). Outlier expression gains and losses were determined for each patient by discretely categorizing genes into one of 5 categories. If log2FC values (i.e. recurrence log2normCPM -primary log2normCPM) for a given gene were less than Q1 -(1.5 X IQR) or Q1 -(3 X IQR), using case-specific log2FC values for all genes as the distribution, that gene was deemed an "Outlier Loss" or "Extreme Loss" respectively. If log2FC values calculated were greater than Q3 + (1.5 X IQR) or Q3 + (3 X IQR), it was deemed an "Outlier Gain" or "Extreme Gain" respectively. All other genes with intermediate fold changes were classified as "Stable." To determine subtype expression of KLK7, PROM1 and NDRG1, normalized microarray expression data along with PAM50 calls was obtained from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) through Synapse (https://www.synapse.org/,

9
Synapse ID: syn1688369), following IRB approval for data access from the University of Pittsburgh 72 . Overlap with genes in long-term estrogen deprived, ER-positive breast cancer lines (HCC1428, MCF7, T47D, ZR75.1) was performed by running a separate differential expression analysis (LTED vs. parental lines) on microarray data with limma 71,73 . Dysregulated gene overlap was designated if the nominal p-value and FDR-adjusted p-value were both < 0.05 in the local recurrence and LTED differential expression analysis, respectively. Binary dichotomization of METABRIC samples using NDRG1 expression (>50 th percentile, <50 th percentile) and log-rank testing were used to assess significant differences in disease-specific survival (DSS) and then Kaplan-Meier curves were plotted with survminer 74,75 .