Single cell analysis of colorectal cancer identifies mitogen-activated protein kinase as a key driver of tumor cell plasticity

In colorectal cancer, oncogenic mutations transform a hierarchically organized and homeostatic epithelium into invasive cancer tissue lacking visible organization. We sought to identify differences in cellular composition between normal colon and colorectal cancer, and to define signals controlling cancer cell development. We used single cell RNA and protein profiling to analyze tumors and matched normal tissues of twelve colorectal cancer patients. RNA metabolic labelling followed by single cell RNA sequencing in patient-derived normal colon and colorectal cancer organoids was employed to define colorectal cancer cell developmental trajectories. We find that colorectal cancer tissues exhibited consistent changes in cellular composition in the epithelial, immune and stromal compartments across patients compared to normal colon. Tumor epithelial cells displayed patient-specific gene expression often correlating with somatic copy number alterations, but mainly organized into patient-overarching clusters. These clusters were defined by cell type-specific transcriptional programs related to stem, transient-amplifying and immature goblet cells, and showed differential expression for signatures of oncogenic traits such as replication stress. Patient-derived colorectal cancer organoids exhibited developmental trajectories forming along a gradient of mitogen-activated protein kinase activity that were Wnt-independent. Likewise, colorectal cancer cell types of patient samples were organized by mitogen-activated protein kinase activity. Our single-cell analyses provide a subtyping system for colorectal cancer cells based on transcriptional readout of morphogenetic signals and oncogenic traits. We provide evidence that mitogen-activated protein kinase signaling, a key pathway for targeted therapy, is a main driver of colorectal cancer cell plasticity.


Introduction
Cells in the human body develop along trajectories controlled by intrinsic and extrinsic signals to ensure tissue homeostasis. Cancer cells are compromised in their ability to maintain homeostasis, as oncogenic mutations activate signaling pathways cell-intrinsically and render cancer cells less reliant on paracrine signals 1 . Furthermore, cancer cells induce remodeling of neighboring tissues, for instance by secreting growth factors not found in the normal cellular environment 2 . Cancer cells are also immunogenic or associated with inflammation, and therefore attract immune cells 3 . These processes intersect and result in the emergence of a qualitatively and quantitatively unbalanced cellular ecosystem in cancer tissue. Cell interactions and cancer cell-intrinsic signal processing are critical for tumor progression and therapy response 4,5 .
Colorectal cancer (CRC) commonly initiates via mutations activating Wnt/β-catenin signaling that maintains stem cells in the normal colon epithelium, while subsequent mutations deregulate further signaling pathways such as the RAS-RAF-MEK-ERK (also known as mitogen-activated protein kinase; MAPK) signaling cascade 6 . Less frequently, CRC initiates via BRAF mutations, or from chronic inflammation increasing the mutation rate in the tissue 7,8 . Genetic CRC drivers have direct and indirect effects on cancer cell development and the cellular composition of CRC and its microenvironment.
There is substantial evidence for the existence of tumor cell subpopulations in CRC, as so-called cancer stem cells can be distinguished by surface proteins like CD133, EPHB2 or LGR5 [9][10][11][12] and cells at the invasive front expressing genes such as the matrix metalloproteinase MMP7 contribute disproportionally to metastasis 13,14 . Single cell technologies allow to determine cellular constituents of CRC, and previous studies have mainly focused on the immune and stroma compartments 15-17 . However, it has not been investigated systematically how CRC cells organize into distinct states along developmental trajectories, and whether CRC cell types are distinguished by oncogenic traits relevant for therapy.
Here, we use droplet-based single-cell RNA sequencing to profile cell types and their differentiation states in normal colon and tumor tissues of twelve CRC patients, and in patient-matched CRC organoids. We identify tumor cell clusters across patients that differ in gene expression signatures related to oncogenic signaling and DNA replication. We show that CRC cell differentiation states are informed by MAPK activity. 4

Collection and processing of clinical specimens
Fresh normal colon and colorectal cancer tissues were acquired during the intraoperative pathologist´s examination at Charité University Hospital Berlin. Tissues (approx. 0.1-0.4g) were minced using scalpels, processed using the Miltenyi Human Tumor Dissociation Kit (Miltenyi, #130-095-929) and a Miltenyi gentleMACS Tissue Dissociator (Miltenyi, #130-096-427), using program 37C_h_TDK_1 for 30-45min. For three tumors, we also used digestion with the cold active protease from Bacillus licheniformis (Sigma P5380) at approx. 6°C for 45min. with frequent agitation, following a published protocol 18 (see Supplementary Fig 1). Cell suspensions were filtered using 100µm filters, pelleted by centrifugation, and treated with 1ml ACK erythrocyte lysis buffer, washed and resuspended in ice-cold PBS, and filtered using 20µm filters. Debris was removed using the Debris Removal Solution (Miltenyi #130-109-398). Cell suspensions were analyzed for cell viability >75% using LIVE/DEAD Fixable Dead Cell Stain Kit (488nm; Thermo Fisher) and a BD Accuri cytometer.
Immunostainings of FFPE tissue sections were performed on the BenchMark XT immunostainer (Ventana Medical Systems),using CC1 mild buffer or Ultra CC1 buffer (Ventana Medical Systems) for 30 min at 100°C for antigen retrieval. The following primary antibodies were used: rabbit anti-TFF3 (1:100, ThermoFisher Scientific, MA5-14215). Images were taken using an Axio Vert.A1 fluorescence microscope (Zeiss) equipped with an Axiocam 506 color camera (Zeiss) or a CQ1 Yokogawa Benchtop Analysis System. Hematoxylin-and-eosin and immunohistochemical images were taken using a Pannoramic SCAN 150 slide scanner (3DHISTECH).

Organoid culture and metabolic labelling
Cells from patient-derived tumor tissues were washed in Advanced DMEM/F12 medium (Gibco), embedded in Matrigel, and cultured in 24-well plates, as published 20 . Wnt3 and R-Spondin3 were prepared as conditioned media 20 . For single-cell sequencing, organoids were dissociated completely using TrypLE and DNAseI, and filtered via a 20µm filter. For single cell SLAM-seq, NCO, P009T and P013T replicate cultures were cultured in media with and without Wnt/R-Spondin. Organoids were metabolically labelled in culture using 200µM 4-thio-uridine for 3 hours 21 , harvested, disaggregated to single cells by TrypLE, and fixed in fixation buffer (80% methanol/20% DPBS) at ≥ -20 °. Samples were warmed to room temperature and incubated with 10mM iodoacetamide. Alkylation was carried out overnight, in the dark, with gentle rotation, followed by two washes with cold fixation buffer. Single 6 cell suspensions were rehydrated and incubated 10 minutes at room temperature in 100mM DTT.
Samples were resuspended in fixation buffer and conserved at -80 °C.

Single-cell RNA-seq data analysis
UMIs were quantified using cellranger 3.0.2 with reference transcriptome GRCh38. Spliced, unspliced and ambiguous UMIs were quantified with velocyto 22 (mode: run10x, default parameters). Quality control filters were set to include cells with 500 to 5000 genes detected, 1000 to 50000 UMIs counted, fraction of mitochondrial reads ranging between 0 and 0.8, fraction of spliced reads ranging between 0.3 and 0.9, fraction of unspliced reads ranging between 0.1 and 0.7 and fraction of ambiguous reads ranging between 0 and 0.2. After filtering, UMI counts were variance-stabilized using scTransform 23 with 3000 variable features, while regressing out fraction of mitochondrial reads and differences in S-Phase and G2M-Phase scored with Seurat v3 24 . The top ten principal components were used to construct shared nearest neighbor (SNN) graph and UMAP embedding as implemented in Seurat v3.
Next, main cell types (epithelium, stromal, and immune cells) were identified by scoring cell type markers across Louvain clusters for each sample (resolution = 1). Cell type markers used to score epithelium, stromal, and immune cells were adapted from Similie et al. 25 and are listed in Supplementary table 1. Sample-wise quality control assessments and subsettings into main cell types are documented at sys-bio.net/sccrc/. Normalized subsets were merged for each main cell type of normal and tumor samples without further batch correction. SNN graph, Louvain clusters and UMAP embeddings were recomputed for each subset based on top ten principal components. Louvain cluster-specific marker genes of merged normal and tumor samples were used to identify sub cell types among epithelial, stromal and immune subsets. Here, marker genes were determined with Seurat (wilcox text) at a minimum log fold change threshold of 0.25. Gene expression sets were taken from the hallmark signature collection of the Broad institute 26 , unless otherwise referenced in the main text, and were scored as implemented in the progeny R package and Seurat v3, respectively.
To compute ligand-receptor connectivity between cell type clusters, UMI counts were summed for all ligands of the same pathway in each stromal or immune cell type of normal or tumor samples. Summed ligand counts were scaled to range between zero and one for each pathway. The fraction of normal and tumor proliferative epithelial cells expressing a given receptor was calculated and fractions were averaged across receptors for each pathway and cell type. Averaged fractions of cells expressing receptors were likewise scaled to range between zero and one for each pathway. Connectivity between stroma or immune ligand expression and epithelial receptor expression was calculated as the product of scaled ligand counts and scaled receptor expression fractions and, accordingly, also ranged between zero and one. Ligands and receptors used for each pathway are listed in Supplementary table   2. 7 Single cell SLAM sequencing data was pre-processed using Seurat, and counted using the resulting alignments (as BAM files), using a custom pipeline utilizing Snakemake 27 , SeqAn 28 , R. For each read, the numbers of T nucleotides and T-to-C conversions was counted, leaving out positions with common SNPs (using the dbSNP build 151 as available as track from UCSC genome browser). For each molecule as identified by cell barcode and UMI, positions with discordant nucleotides were excluded.
Subsequently, molecules were counted as nascent RNA if they contained a T-to-C conversion, and old RNA otherwise.
Total-seq data and the associated 5' sc-RNA-seq data was integrated with the 3' sc-RNA-seq data by first integrating the two data sets from the same tumor using IntegrateData of Seurat, and a common shared nearest neighbor (SNN) graph was constructed using FindNeighbors. Subsequently, only immune cells of the 3' data each cell of the Total-seq were considered, and the cells of the Total-Seq run were assigned to the cluster with which it had most neighbors in the SNN graph. The Total-Seq signals were normalized and scaled using Seurat's default parameters, and average expression of Total-Seq markers in each cluster was used to display cluster-specific expression.
scRNAseq and scSLAM-seq data for organoids was analyzed using scanpy 29 and scvelo 30 . For diffusion map analysis and RNA velocity, cells were first filtered by the number of genes (between 2000 and 5000) and the percent mitochondrial reads (between 0.075 and 0.2) and normalized, using scvelo standard settings. Cell cycle was scored according to the scanpy standards, and S_score, G2M_score, percent mitochondrial reads and UMI counts per cell were regressed out. The diffusion map was calculated on the top 10 principal components and using a neighborhood graph with 50 neighbors and calculated on all genes. Moments were calculated on 30 principal components and 30 neighbors, and velocity was calculated using the stochastic model.

Calling of SCNAs from single cells and whole exome data
InferCNV v1.3.3 was used with default parameters to estimate copy number changes per single cell and genomic location from scRNA seq data. To detect copy number aberrant clones in tumor samples, inferCNV dendrograms were cut at k = 2 for each patient. For the resulting two clones per patient and for each normal sample, a clone-wise SCNA score was computed by calculating the average standard deviation in inferCNV expression of all cells in a given clone or normal sample and divided by the average standard deviation in inferCNV expression of all normal samples taken together. Clones with a SCNA score greater than the highest observed score for normal samples were considered copy number aberrant clones.
To validate SCNA calls from single cells we performed allele-specific SCNA calling for each patient from bulk whole exome data. Germline variants were discovered de-novo and read counts were 8 accumulated for each allele at heterozygous germline variants using the bcftools (v1.9) 31 multi-allelic caller. Discovered variants and read counts were passed to Sequenza (v3.0.0) 32 for segmentation and calling of allele-specific SCNAs.

Ethics permission
All patients were aware of the planned research and agreed to the use of tissue. Research was approved by vote EA4/164/19 of the ethic´s commission of Charité Universitätsmedizin Berlin.

Single Cell RNA sequencing identifies key features of CRC
To capture the cellular diversity in CRC and track changes from normalcy to disease, we performed single-cell transcriptome analysis of twelve previously untreated CRC patients (Fig. 1A). We utilized tumor tissue samples that included the invasive tumor front and matched normal tissues ( Supplementary Fig. 2). Tumors encompass stages pTis (Tumor in situ) to pT4, that is, from cancer confined within the lamina propria to invasive through the visceral peritoneum, with or without metastasis, and with various locations along the cephalocaudal axis of the colon. Genetic analysis revealed mutational patterns characteristic for canonical CRC progression in most tumors; however, tumors from patients P007, P014 and P026 contained the BRAF V600E mutation often associated with the serrated progression pathway and tumor P008 was colitis-associated and harbored a TP53 mutation. Eleven patients were diagnosed with microsatellite-stable (MSS) CRC, while patient P026 was microsatellite-instable (MSI).
We enzymatically dissociated the fresh normal and cancer tissues to single cells, produced transcriptome libraries using a commercial droplet-based system, and sequenced the libraries to obtain transcriptomes covering 500 to 5000 genes per cell. For each library, single cell profiles were split into epithelial, immune or stromal subsets, using known marker genes 25 . Subsets were merged into three separate supersets with >30000 epithelial or immune cells each or 3003 stromal cells (see Supplementary Fig. 3 for quality controls and initial clustering). We observed that quality parameters, such as fraction of mitochondrial reads, were more variable in epithelial cells, in line with previous studies 25,33 .
When visualized in a standard UMAP embedding employing ten UMAP components 34 , transcriptomes clustered by cell lineage (Fig. 1B). Cells from normal versus tumor samples distributed similarly across many clusters, but we also identified clusters which were strongly enriched for either normal or tumor tissues (Fig. 1C). These features of the UMAP embedding indicate that our single cell data are largely free from sample-specific bias, but instead reflect intrinsic differences between normal and tumor cell transcriptomes.

Cell type census in CRC versus normal colon
We used cell type-specific signatures and marker genes to annotate cell clusters 25 (Fig. 2A, B). In the normal epithelium, we identified clusters of undifferentiated cells by transcripts of stem cell markers such as OLFM4. Neighboring clusters were annotated as enterocyte progenitors or mature enterocytes by expression of absorptive lineage markers such as KRT20 and FABP1 (Supplementary Fig. 4A,   Supplementary table 3). BEST4and OTOP2-expressing enterocytes formed a discrete cluster ( Fig. 2A), as observed previously 35 . Further separate epithelial clusters were identified as immature and mature secretory goblet cells expressing MUC2 and TFF3, and as tuft cells expressing TRPM5. In tumor tissue, we additionally identified four tumor-specific clusters, TC1-TC4, expressing high levels of stem cell markers such as OLFM4, CD44 and EPHB2 ( Fig. 2A Myeloid cell clusters predominately contained CD68-expressing macrophages ( Figure 2C). Myeloid cell cluster 1 showed a more inflammatory profile as the cells were characterized by high expression of the antibacterial proteins S100A8 and S100A9 as well as IL1B. In contrast, cells belonging to the myeloid cell cluster 2 displayed high expression of the M2 marker CD163 as well as FCGR3A transcription and CD16 protein reactivity (Fig. 2C, D).
Overall, we observed an increase in the numbers of T cells and macrophages in the tumors as compared to normal tissue. In contrast, we observed a decrease in total B and plasma cell numbers in the tumor microenvironment compared to the adjacent normal tissue counterparts (Fig. 2B), which is in line with a recent study 17 . Three immune cell clusters were larger in tumors compared to normal tissue across all patients analyzed: these were the plasma cell clusters L2 and K2, which were expanded in the tumor in contrast to the overall trend for less plasma cells in tumor tissue, as well as the CD8+ T4 T cell cluster. Plasma cell clusters L2 and K2 contain more IgG producing cells, which confirms the previously observed loss of IgA + but gain of IgG + plasma cells 17 Fig. 6, 7).

Identification of tumor cells by somatic copy number aberrations
MSS CRC is defined by somatic copy-number aberrations (SCNAs). Thus, we next distinguished epithelial tumor cells from normal cells by inferring SCNAs from the single cell transcriptome data. We identified clusters of SCN-aberrant epithelial cells in ten out of the twelve tumors (Fig. 3A). P014 and P026 contained no cells with overt SCNAs. This was expected for tumor P026, which is MSI and thus defined by single nucleotide polymorphisms rather than SCNAs, but unexpected for P014, which was diagnosed as BRAF-mutant however MSS. Epithelial cells with SCNAs predominantly populated the TC1-4 clusters, along with substantial fractions of cells defined as stem cell/TA-like or immature goblet cells (Fig. 3B). In contrast, a majority of absorptive enterocytes and mature goblet cells were identified as copy-number normal, and therefore likely stem from non-cancerous tissue at the tumor margins ( Supplementary Fig. 2). We used exome sequencing of tumors P007, P008 and P009 to validate SCNA calling from transcriptomes, showing that the procedure is robust for our single cell data ( Supplementary Fig. 8A).
While most SCN-aberrant cells were rather undifferentiated, a fraction also showed high expression levels of the absorptive and secretory differentiation marker genes FABP1 and TFF3 (Fig. 3E). We used immunofluorescence to assess spatial distributions of proteins marking undifferentiated and differentiated epithelial cells (Fig. 3F). In normal colon sections, we detected the stem cell marker Patient-specific gene expression patterns often correlate with SCNAs As individual CRCs are driven by specific combinations of oncogenic mutations and SCNAs, we were surprised by the lack of patient-specific epithelial cell clusters. We thus represented smaller differences between epithelial transcriptomes by using a higher number of top 50 components for UMAP embedding, instead of the initial top 10 components. At this resolution, the TC clusters and smaller parts of the immature goblet cell clusters clearly separated by patient (Fig. 4A). We re-clustered the 13 TC1-4 cells of the tumor samples into 13 subclusters (Fig. 4B). Subcluster 8 was removed from further analysis, as expression of immune-specific genes suggested misannotation of a few immune cell or epithelial-immune cell-doublet transcriptomes. The remaining subclusters were often patient-specific, such as subcluster 0 that was specific for patient P009 and subcluster 10 that was specific for patient P008 (Fig. 4C,D). Indeed, tumor P008 was selectively immunoreactive for cytokeratin 17 (KRT17), as predicted by transcriptome data (Fig. 4E). In contrast, subclusters 1, 2, 9 were represented in several patients. Accordingly, proteins encoded by genes in these subclusters such as MUC2, EREG, LYZ and MMP7 were expressed in tumor cells of patient subgroups (Fig. 4D, E). We found that patient-specific gene expression clusters are often associated with SCNAs ( Supplementary Fig. 8B). For example, subclusters 0 and 5 were enriched in genes located on chromosome 17q (p=1,41E-14 and p=3.9E-15, respectively, using Bonferroni-corrected hypergeometric distribution tests), and indeed SCNA profiles for patient P009 suggested amplification of this chromosome arm. Similar association exist for subcluster 4, chromosome 12q, patient P015 (p=1,77E-5), and subcluster 6, chromosome 20q, patient P013 (p=1,65E-33), respectively. However, not all putative amplifications manifested as transcriptome subclusters, and vice versa. Furthermore, proteins encoded by some subcluster genes showed regionalized expression, such as MMP7 that was immunoreactive predominantly for cells at the invasive front, suggesting that genetic/genomic aberrations and inductive signals can co-operate in directing CRC gene expression. In summary, our analysis indicates that patient-specific expression patterns in CRC are controlled partly by SCNA and are heterogeneous on single cell level. However, our cohort of 12 patients is too small to define patient-overarching subgroups based on single cell transcriptome patterns.

Epithelial tumor cell clusters differ by oncogenic traits and signals
We sought to define whether the tumor cells clustered as TC1-4, stem/TA-cell-like or goblet cell-like and defined by SCNAs were associated with oncogenic signaling or cancer-associated functional signatures (Fig. 5A, B, and Supplementary Fig. 9). We found a strong association of TC1 cells with the expression of hallmark signatures related to DNA repair and the G2/M replication checkpoint 26 . This indicates that TC1 cluster cancer cells experience high levels of replication stress, a therapy-relevant trait of many cancers, including CRC. Indeed, TC1 cluster cells were exclusively assigned to the S or G2/M cell cycle phases by gene expression (Fig. 5C), in line with cells under replication stress, as also seen by XRCC2 expression (Supplementary Fig. 4A). The DNA damage-associated protein PARP stained many nuclei of the TC1-high CRC tissue P009 but not of TC1-low P008 (Fig.5D). TC2 transcriptomes were associated with high PI3K pathway activity, related to control of metabolism and apoptosis.
Wnt/β-catenin target gene activity was high across all TC clusters. Wnt/β-catenin target gene expression was not associated with specific tumor cell clusters, but stem/TA-cells had strongest expression of the LGR5-ISC signature that is Wnt activity-associated 12 ; patient P012 tumor tissue showed the strongest overall Wnt/β-catenin activity. TC4 showed the strongest expression of YAP targets defined in intestinal tissue 40 . YAP transcriptional activity is linked to regenerative responses and tumor progression 41 . TC1 and TC4 were enriched for the expression of direct ERK targets that are activated by MAPK 42 , across all tumors and in particular in P007. In summary, assessment of cell signaling and cell type signatures provides additional information on signaling pathway activities of tumor epithelial cell clusters, and specific features of individual tumors. The analyses suggest that assignment to the TC1-4 clusters reflects, at least partially, differential signaling activities and oncogene-induced functional traits of cancer cells.

RNA metabolic labeling defines MAPK-driven tumor cell hierarchies
To establish how cell type and signaling signatures are organized in tumor cell hierarchies, we established organoid lines of two tumor samples, P009, and P013 (Fig. 6A). We confirmed the identity of the organoids with the matched tumor tissue on a mutational level (Supplementary Table 6). To test how normal and tumor epithelial cell hierarchies differentially depend on paracrine signals, we cultured the P009T and P013T cancer organoids, as well as normal colon epithelial organoids (termed NCO), in medium containing Wnt, R-Spondin, and EGF (WRE medium). This medium maintains normal intestinal stem cells depending on paracrine Wnt signals. Alternatively, we cultured the lines for three days in medium lacking Wnt and R-Spondin (E medium). We first determined activities of cell typespecific signatures in the organoid transcriptomes (Fig. 6B). A transcriptional signature for intestinal stem cells expressing LGR5, termed LGR5-ISC 12 , marked a fraction of cells of NCO organoids cultured in WRE medium, but expression was lost in E medium. Stem cell signature gene expression was much higher, and independent of Wnt/R-Spondin, in the CRC organoids. Signature genes for differentiated absorptive enterocytes or secretory goblet cells showed opposite patterns to stem cell-related gene expression in NCO but were expressed only at low levels in few P009 and P013 cells. Taken together, these expression patterns are in line with Wnt-dependent stem cell maintenance in normal tissue organoids, and Wnt-independent stem cell maintenance and block of terminal differentiation in cancer organoids with APC mutations. The data however do not show whether graded developmental trajectories exist in CRC.
We, therefore, metabolically labelled RNAs of the organoids by 4-thio-uridine, before dissociation and single cell sequencing (SLAM-scSeq; see Fig. 6C for experimental workflow) 21 . This allowed us to distinguish nascent labelled from older non-labelled mRNA and to order cells along inferred latent time based on dynamic RNA expression (also known as RNA velocity). When cultured in WRE medium, developmental trajectories defined by RNA velocity initiated in areas of maximal LGR5-ISC signature scores and terminated in a region of apoptotic cells characterized by high proportions of mitochondrial reads (Fig. 6D). When cultured without Wnt/R-Spondin, NCO normal colon organoids lost uniform direction of RNA velocity, in line with the loss of stem cell-related gene expression observed above. In contrast, the P009T and P013T cancer organoids maintained strong transcriptional trajectories in E medium.
We ordered the organoid transcriptomes along latent time and assessed strengths of oncogenic signals ( Fig 6E). In line with the key role of Wnt in stem cell maintenance, normal colon organoids showed a gradient of Wnt/β-catenin target gene activity along latent time when cultured in WRE medium. In contrast, P009T or P013T organoid transcriptomes showed no Wnt/β-catenin-related expression gradient, but a clear gradient of MAPK target gene activity along latent time. We found that TFF3, marking secretory differentiation, was graded in both CRC organoid lines along latent time, but FABP1, marking absorptive differentiation in the normal colon, was not (Fig. 6F). This suggests that TFF3, but not FABP1, is a valid marker for endpoints of CRC differentiation trajectories, in line with our prior finding of immature goblet cell-like CRC cells. The proliferation marker MKI67 was confined to the beginning of the latent time trajectory of normal organoids in Wnt medium but showed extended gradients in CRC organoids. Unexpectedly, both CRC organoid lines displayed gradual loss of MKI67 expression along latent time only in the absence of Wnt, and likewise, MMP7 was expressed in a Wntdependent manner in both P009T and P013T organoids. In summary, our metabolic RNA labelling approach identifies MAPK signaling as a driving force of CRC developmental trajectories and suggests a role for Wnt as a potential paracrine signal influencing gene expression in APC-deficient CRC cells.

MAPK target gene expression defines CRC differentiation states
As MAPK signaling marked developmental trajectories in CRC organoids, we wanted to assess whether cell types and states are organized along a MAPK gradient in primary CRC. As a benchmark indicating developmental dynamics in the colon epithelium, we employed the LGR5-ISC stem cell signature 12 . We assigned SCN-aberrant tumor epithelial cell transcriptomes to 40 bins along a gradient of diminishing LGR5-ISC gene signature activity or along decreasing MAPK activity, and assessed expression of the stem cell markers LGR5, and EPHB2 12 along the gradients. As expected, both LGR5 and EPHB2 were graded with LGR5-ISC activity; but MAPK activity also ordered tumor cells in agreement with both cell hierarchy markers and performed even better in sorting cells along a gradient of EPHB2 expression. 16 We conclude that cell hierarchies in CRC can be defined by the LGR5-ISC signature, but also by MAPK target gene expression.

We next investigated cell type distribution among SCN-aberrant tumor cells in the bins defined by an
LGR5-ISC or the MAPK gradient. LGR5-ISC activity placed a higher proportion of stem/TA-like tumor cells at the beginning of the gradient, whereas tumor cells assigned as goblet cells aggregated in the lower end of the gradient, and TC1-4 cells were broadly distributed across the complete LGR5-ISC activity range (Fig. 7B)

Discussion
Intratumor heterogeneity and cell type composition are a key determinants for treatment response of cancer. In this study, we used single cell omics to investigate the cellular composition and developmental trajectories of tumor cells in colorectal cancer ecosystems. We find that tumor cells, distinguished by somatic copy number aberrations, organize into clusters that either show tumorspecific expression patterns (termed here TC1-4) or resemble stem, TA or immature goblet cells. TC1-4 tumor cell clusters were distinguished by gene expression patterns related to replicative stress and oncogenic signaling pathway activities. We provide evidence for Wnt-independent intrinsic MAPK gradients as organizers of tumor cell plasticity, and Wnt-dependent expression of markers of the invasive front. We identify stromal and immune cell types enriched in the tumor microenvironment, including CAFs that were found specifically in the tumor samples. Our data suggest that MAPK, Wnt and other paracrine signals are defining factors shaping the CRC ecosystem.
Recent single cell studies defined features of the CRC immune microenvironment 16,17 ; however, no consensus exists on how to subtype CRC on the single cell level. Here, we subgroup the epithelial tumor component of CRC into tumor-specific TC1-4, stem/TA-like, and goblet-like cells (Fig. 7D). Our analyses suggest that CRC cells with lower levels of MAPK activity resemble, and cluster with, normal stem, TA or immature goblet cells. Stem cell-like CRC cells are characterized by high expression of LGR5-ISC signature genes. Intestinal stem cells and the LGR5-ISC signature are known to be Wnt/β-catenindriven 43 ; however, β-catenin target genes were similarly active throughout the CRC cell clusters, suggesting additional levels of regulation. TC1 and TC4 cells have high levels of MAPK/ERK-dependent transcription, and TC1 cells are additionally associated with DNA damage. Indeed, high MAPK activity is known to cause DNA damage and replicative stress 44 . TC4 cells display high YAP transcriptional activity in addition to high MAPK levels. YAP, an effector of the Hippo pathway, is a key driver of CRC and other cancers 41 . It is of note that the CMS subtyping system developed for bulk tissue CRC transcriptomes 45 could not distinguish the CRC cell types that we identified here on the single cell level, as most epithelial cancer cells were assigned to CMS1 or CMS2 with the exception of goblet-like cells that can adopt CMS3 (Supplementary Fig. 10).
Our study identified multiple cancer traits in tumor cell clusters with relevance to therapy. For instance, TC1 cells were defined by high levels of replication stress, and tumors with high TC1 cell content were strongly positive for PARP, an important therapeutic target in many cancers 46 . MAPK activity is a key pathway for targeted therapy, as many CRC patients profit from anti-EGFR or anti-EGFR/anti-BRAF therapy blocking MAPK [47][48][49] . We show here that MAPK activity and CRC tumor cell differentiation states are linked. This suggests that CRC cell types identified here may also have intrinsically different thresholds to targeted therapies blocking MAPK. CRC cell types defined by oncogenic signal activities and differentiation states may therefore provide opportunities to study mechanisms of resistance defined by CRC cell plasticity. Recent studies already point to roles of paracrine signals and the microenvironment in anti-EGFR therapy resistance of CRC 50,51 . Our CRC cell classification system could therefore aid in building a comprehensive subtyping system for CRC on the single cell level with therapeutic relevance.
Stem cells, often defined as a pool of cells sustained at the apex of cell hierarchies or developmental trajectories, are traditionally seen as unique drivers of tissue homeostasis and regeneration of normal tissue, but also of metastasis and therapy-resistance in cancer 52 . However, recent studies have shown metastasis and stemness can be uncoupled in colorectal cancer metastasis 53 , and that oncogenic mutations and paracrine signals can reverse developmental trajectories so that more differentiated cells can regain stem cell characteristics [54][55][56] . Here, we show that a gradient of ERK target gene expression is associated with developmental latent time in CRC organoids, extending our previous finding of graded ERK activity in CRC organoids 57 . The results imply a capacity for intrinsic regulation of CRC cell lifetimes via MAPK. However, further functional studies are required to mechanistically dissect how MAPK and other signaling pathways interact to control CRC developmental trajectories. We suggest that patient-specific sets of genomic aberrations, oncogenic mutations and paracrine signals will all play roles in modulating cell development. Indeed, we see that target genes of further pathways, such as Wnt/β-catenin, Hippo/YAP or TGF-β are expressed in heterogeneous or graded fashion in CRC epithelium. These pathways could represent novel vulnerabilities of cancer cells that could be exploited by directing developmental trajectories towards endpoints of cellular lifespans.
The extension of single-cell analyses of CRC and other cancers to multi-omics, taking also in account genetic and epigenetic heterogeneity 58,59 , as well as assessment of cell types and signaling network states while preserving spatial information, promises to identify cell plasticity and genetic diversity of cancer at a cellular resolution. Such approaches have the potential to improve the molecular understanding of cancer and therapy prediction for patients 60          high expression of the LGR5-ISC signature. High MAPK is associated with TC1 and TC4 cells that are further characterized by high replication stress and YAP activity, respectively.