A census of cell types and paracrine interactions in colorectal cancer

1 Institute of Pathology, Charité Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany 2 IRI Life Sciences, Humboldt University of Berlin, Philippstrasse 13, 10115 Berlin, Germany 3 German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany 4 Department of Surgery, Charité Universitätsmedizin Berlin, Hindenburgdamm 30, 12203 Berlin 5 Core Unit Bioinformatics (CUBI), Charité Universitätsmedizin Berlin, MDC Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin 6 Institute of Medical Immunology, Charité Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin 7 Berlin Institute of Health, Anna-Louisa-Karsch-Straße 2, 10178 Berlin, Germany


Introduction
All cells in the human body exist in contact with other cells in finely tuned microenvironments.
Paracrine communication between cells ensures tissue homeostasis. Cancer cells are compromised in their ability to maintain homeostasis, as the oncogenic mutations activate signaling pathways cellintrinsically and render cancer cells less reliant on paracrine signals 1 . Furthermore, cancer cells induce remodeling of neighboring tissues, for instance by secreting growth factors not found in the normal environment 2 . Thirdly, cancer cells are often immunogenic or associated with inflammation, and therefore attract immune cells 3 . These processes intersect and result in the emergence of a qualitatively and quantitatively unbalanced cellular ecosystem in cancer tissue. Interactions between immune, stromal and cancer cells are critical for tumor progression and therapy response 4,5 .
Colorectal cancer (CRC) most often initiates via mutations activating Wnt/β-catenin signaling that maintains stem cells in the normal colon epithelium, while subsequent mutations deregulate further signaling pathways such as the RAS-RAF-MEK-ERK signaling cascade 6,7 . Less frequently, CRC can arise by initiating mutations in BRAF, or from chronic inflammation increasing the mutation rate in the tissue. Regardless of the order of oncogenic mutations, genetic CRC drivers have direct and indirect effects on the cellular composition of CRC and its microenvironment. There is substantial evidence for the existence of tumor cell subpopulations in CRC. For instance, cancer stem cells with high clonogenic potential can be sorted from CRC based on surface markers like PROM1 (also known as CD133) or LGR5 [8][9][10] . Furthermore, CRC cells at the invasive front express matrix metalloproteinases such as MMP7 and display epithelial-to-mesenchymal transition, in contrast to cells residing in more central locations in the tumor [11][12][13] . However, it has not been investigated systematically how cell types and cell plasticity differ between the normal colon and CRC.
Here, we use droplet-based single-cell RNA sequencing to profile cell types and their differentiation states in normal colon and tumor tissues of eight CRC patients, and in matching CRC organoids. We identify consistent changes occurring in epithelial differentiation programs between normal and tumor epithelium in the colon, resulting in the emergence of tumor-specific epithelial cell clusters expressing genes relevant for cancer traits such as stemness, invasion, and epithelial-to-mesenchymal transition.
We also catalog cell types expanded in the tumor microenvironment, for instance, stromal cancerassociated fibroblasts and several types of immune cells. We provide evidence for cancer-cell-specific re-wiring of morphogenetic signaling informed by oncogenic mutations and differences in paracrine signaling networks.
. CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01. 10.901579 doi: bioRxiv preprint

Single-cell RNA sequencing of CRC
To capture the cellular diversity in CRC and track changes from normalcy to disease, we performed single-cell transcriptome analysis of eight previously untreated CRC patients (Fig. 1A). We utilized tissue samples that included the invasive tumor front and matched normal tissues ( Supplementary Fig.   1). Tumors under investigation encompass stages pTis (Tumor in situ) to pT4, that is, from cancer confined within the lamina propria to invasive through the visceral peritoneum, with or without metastasis, and with various locations along the cephalocaudal axis of the colon. Panel sequencing of genomic tumor DNA uncovered mutations in APC, KRAS and/or TP53 in tumors P008, P009, P013, P016, and P017; these mutations are characteristic for the canonical CRC progression pathway initiated by loss of APC. P007 harbored BRAF V600E and TP53 mutations; this mutational pattern is in line with a tumor initiated by BRAF activation. P008 carried a TP53 mutation and was colitis-associated. Tumor P014 contained putative driver mutations in APC, BRAF, HRAS, and PIK3CA, albeit at a lower frequency, suggesting the possibility of distinct subclones contributing to this tumor.
We enzymatically dissociated the fresh normal and cancer tissues to single cells, produced single-cell transcriptome libraries from each tissue using a commercial droplet-based system, and sequenced the libraries to obtain transcriptomes covering 500 to 5,000 genes per cell. Singe-cell profiles were partitioned into epithelial, immune or stromal transcriptome subsets for each library, using known marker genes 14 , and then merged. We observed varying fractions of cell types per library, but stromal cells were generally less abundant (in total across all libraries: >25,000 epithelial cells; >25,000 immune cells and 2,691 stromal cells). Fluctuations in epithelial, immune or stromal cell abundance between libraries could reflect differences in tissue cell content or technical variances related to ischemic time during operation or the dissociation process.
We observed that single-cell transcriptomes derived from all patients intermingled within the epithelial, immune, and stromal compartments (Fig. 1B). When distinguishing normal versus tumor samples, distributions of single-cell profiles largely overlapped, although several regions within each plot were preferentially inhabited by transcriptomes derived from either normal or tumor tissues (Fig.   1C). This indicates that our data are generally free from patient-or sample-specific batch effects confounding the cell type distributions, but that general differences occur between normal and tumor transcriptome distributions.
Cell type census in CRC versus normal colon We next clustered the single-cell profiles of the epithelial, immune and stromal compartments, and used cell-type-specific signatures and marker genes to annotate cell types 14 ( Fig. 2A; Supplementary   . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint Fig. 2 for genes over-or underrepresented between tumor and normal samples). In the normal epithelium, we identified a zone populated by profiles of undifferentiated cells with high activity of stem cell markers such as OLFM4 and transiently-amplifying proliferative markers. This region bordered on transcriptome clusters annotated as enterocyte progenitors, and, ultimately, mature absorptive enterocytes with high expression of markers such as KRT20 and FABP1. BEST4and OTOP2expressing enterocytes formed a discrete cluster ( Fig. 2A, identified by the lightest shade of green) 15 .
Further separate epithelial clusters were populated by profiles annotated as immature and mature secretory goblet cells defined by expression of MUC2, TFF1, and TFF3, and TRMP5-expressing tuft cells.
In the tumor samples, the zone of undifferentiated epithelial cells was expanded by five largely tumorspecific clusters (TC1-5, Fig. 2A We used immunofluorescence to verify the spatial distributions of epithelial cell types, using marker genes identified in the single-cell data (Fig. 2C, D). We detected the stem cell marker OLFM4 exclusively at the base of normal crypts. However, in tumor sections, OLFM4, as well as the proliferation marker MKI67, stained cells scattered throughout the epithelium, as validated by co-staining with the epithelial marker EPCAM. The goblet cell and enterocyte differentiation markers TFF3 and FABP1 stained preferentially cells in the lower and upper crypt of the normal colon, respectively. In contrast, TFF3-and FABP1-positive tumor cell populations were not clearly organized in domains. Clusters of TFF3-and FABP1-positive cells that were largely negative for MKI67 suggest the presence of differentiated cells in CRC, in agreement with the single-cell sequencing data.
The five tumor-specific epithelial cell clusters TC1 to TC5 were represented in different proportions in all eight CRCs under investigation (Fig. 2E). TC1 and TC2 were assigned as highly stem cell-like using prior classifiers 14 , while TC3-4 showed the strongest similarity to transiently-amplifying cell types and TC5 shared similarity with both, transient-amplifying and stem cells. Transcriptome-based cell cycle analysis revealed that TC1 is highly proliferative (Fig. 2F). Furthermore, TC clusters shared a couple of defining genes that have previously been linked to oncogenic processes in CRC (Supplementary Fig. 4; Supplementary Table 1). The CRC stem cell marker CD44 16 was expressed prominently in TC1 and TC4.
MMP7, encoding a matrix metalloproteinase responsible for CRC invasion 11 , was expressed highest in TC4, but also in TC1 and TC3. VIM 17 and S100A4 18 , key markers of the epithelial-to-mesenchymal transition, were among the genes defining TC2. PLA2G2A, encoding a phospholipase that controls inflammation and homeostasis in the intestinal stem cell niche 19 , and the intestinal stem cell marker OLFM4 was overrepresented in TC5.  Table 2). This interleukin has been implicated in CRC progression 20 , and a similar type of T cell was recently found expanded in single-cell analyses of colitis patients 14 .
Among stromal transcriptomes, we annotated an interconnected supercluster of fibroblasts.
Strikingly, one fibroblast cluster was confined to the tumor samples and was therefore designated as The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint 7 of ligands, the prevalence of the ligand-expressing cell (that is, cluster size for that cell type), and fractions of receptor-expressing cells. We focused on ligand-encoding genes active in immune or stromal cells and genes encoding matching receptors in proliferative epithelial cells, that is, in stem/TA cells and the five tumor-specific clusters TC1-5 ( Fig. 3A, Supplementary Fig. 7A, and Supplementary Table 4). Possible ligand-receptor connections appeared relatively sparse in the normal tissue.
However, we found many more potential paracrine signaling connections in the tumor. This was mainly due to three features of the tumor ecosystem: Firstly, CRC contained novel epithelial cell types TC1-5 expressing multiple receptors, including high levels of the receptor tyrosine kinase MET 24 Table 5). In particular, CAFs dominated signaling in the microenvironment of tumors P008 and P016, and, to a lesser extent, P007 and P017, and express genes encoding ligands of cancerrelevant signaling pathways. Among these are GREM1, WNT2B, WNT5A, HGF, multiple EGFR ligands including AREG and many others that have been linked to CRC initiation and progression [25][26][27][28] . Finally, some cell types in the cancer microenvironment expressed additional ligands compared to their normal tissue counterparts, for instance, crypt-base-like fibroblasts in the tumor express FGF7 and IGF ( Supplementary Fig. 7B).
We took a detailed look at Wnt signaling, as this pathway drives stem cell maintenance and tumor initiation in the gut. A signature of Wnt/β-catenin target genes was most active in the TC1-5 and stem cell compartment of tumor epithelium (Fig. 3B), while activity was lower among the differentiated CRC cells, similar to the normal colon epithelium. Indeed, it has been shown that Wnt/β-catenin is dynamic in CRC and can be activated by Wnt ligands 29,30 , although the pathway is frequently activated by loss of APC. We detect the highest connectivity for Wnts and the R-Spondin family of Wnt amplifiers between CAFs, expressing WNT2 and WNT5A, and stem cells and the tumor-specific cell clusters ( Fig.   3A).

We next investigated EGFR-RAS-RAF-ERK signaling that plays a central role in CRC development and is
a key target of therapy. ERK-regulated genes 31 were most active in TC4 (Fig. 3B). We identified CAFs, endothelial cells, but also immune cells as potential sources of EGFR family ligands in the CRC microenvironment ( Fig 3A). Monocytes and macrophages that we found consistently enriched in tumor versus normal tissues, express the ligand-encoding genes AREG, EREG, and HBEGF (Fig 3C and   Supplementary Table 3). These ligands could play roles in the activation of the EGFR-RAS-RAF-ERK cascade in CRC, particularly in tumors lacking mutations in RAS, RAF or other activating components of the pathway. Macrophages and CAFs also express HGF, encoding the MET receptor tyrosine kinase ligand driving cancer progression at the invasive tumor front 13 . Indeed, we found macrophages to be . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint 8 enriched in tumors specifically at the invasive front (Fig. 3C, D), suggesting that these could play roles in the induction of epithelial-mesenchymal transition via HGF-MET.
We additionally detected TGF-β target gene activity, along with expression of a gene signature of epithelial-to-mesenchymal transition in the tumor cell cluster TC4, and to a lesser degree in TC2 (Fig.   3B, E). Genes encoding TGF-β ligands were expressed in multiple stromal cell types, including CAFs, and cognate receptors were present in the TC clusters (Fig. 3E, Supplementary Fig. 7A, B). While the connectivity of the TC2 cluster was generally low, maybe also due to lower sequencing depth per cell, the cluster was defined by expression of VIM and S100A4 (see above, Supplementary Fig. 4 and Supplementary Table 1), supporting the association of TC2 cells with the epithelial-to-mesenchymal transition. Interestingly, the size of cluster TC2 was highly correlated with the size of the CAF population in the cancer microenvironment in the eight tumor profiles (Fig. 3F), suggesting a potential role for CAFs in the support of the tumor cells clustering in TC2.

Cell type composition informs Consensus Molecular Subtypes
Consensus molecular subtypes (CMS) represent a transcriptome-based classification system for CRC with clinical utility 32 . We applied the CMS classifier to our single-cell data and found strong inter-and intratumor heterogeneity. Epithelial cells were exclusively assigned to CMS1-3 (Fig. 4A). The continuous cluster comprising intestinal stem cells, TA cells, and enterocytes of the normal tissue, as well as the TC1-5 tumor cell subtypes consists mainly of intermingled CMS1 and CMS2 cells. In our limited set of CRCs, we find that epithelial cells of the two cancers probably arising via serrated precursors (P007) or inflammation-induced progression (P008) are scoring predominantly CMS1, while epithelial cells of the other cancers score mainly as CMS2 ( Supplementary Fig. 7). CMS3 appears to be confined to cells differentiating into the secretory lineage.
CMS subtypes were also unevenly distributed in the tumor microenvironment. Almost all immune cells were assigned CMS2, and only a minority scored as the "immune" subtype, CMS1 (Fig. 4B). Stromal cells were mostly CMS2, but a minority population of normal fibroblasts that we assigned previously as potential crypt base fibroblasts were CMS4 (Fig. 4C). Most interestingly, CAFs in the tumor tissue provided a strong CMS4 component that in our tumors was most prominent in P008 and P016 ( Supplementary Fig. 7). As CMS4 was assigned to fibroblasts exclusively, we suggest that this cell type also drives the "mesenchymal" CMS4 assignment from bulk CRC tissue. It is of note that CMS4 cancers have a worse clinical prognosis which may, therefore, also be linked to the presence of stromal tumorspecific fibroblasts. In summary, we found that different cell types of the tumor ecosystem are preferentially assigned to different CMS, and thus, that bulk tissue CMS assignments report combinations of cell state and cell type prevalence in the tumor. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint 9 Cell type composition of patient-derived CRC organoids and matched tumor samples We established organoid lines of the tumor samples P009 and P013, using standard culture conditions with media containing EGF, FGF, and p38-and TGF-β inhibitors ( Fig. 5A) 33,34 . P009 tumor tissue initially grew out unevenly, however, formed a uniformly spheroidal organoid culture within three passages.
P013 tumor tissue grew out swiftly and uniformly, forming complexly folded organoids. We used panel sequencing to confirm the identity of the organoids with the matched tumor tissue on a mutational level (Supplementary Table 6).
We used cell suspensions of the organoid lines for single-cell RNA sequencing to generate profiles before the first passage (designated as p0), essentially sequencing the primary tissue after one to two weeks of expansion in culture. We also sequenced transcriptomes of the established lines P009 and P013 after two and three passages, respectively. Single-cell profiles of organoids were of a higher quality than those from epithelial cells of the tumors, as judged by the uniformly low fraction of mitochondrial reads, despite having used similar conditions for disaggregation at 37°C. To exclude confounding effects of technical differences in the different single-cell profiles, we anchored the organoid profiles in the space of the matched epithelial tumor transcriptomes 35 .
In the resulting integrated data set, organoid cell transcriptomes of the two patients and the different passage numbers intermingled (Fig. 5B). Next, we re-clustered the transcriptomes and assigned cell identities per cluster by matching cell types with the previous annotation (Fig. 5C, D; for the previous annotation, see Fig. 2A, B). We found that profiles corresponding to TA cells, differentiated cells, and The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint positive for the absorptive and secretory differentiation markers FABP1 and TFF3 (Supplementary Fig.   9). Markers specific for the TC cell clusters, that is, MMP7, S100A4, and VIM were found overlapping and adjacent to the stem cell zone defined by LGR5, PROM1 and CD44.
The organoid line P009 showed an enlarged zone inhabited by transcriptomes positive for the TC cell markers and was, therefore, suitable to infer a more granular picture of cell plasticity by taking into account RNA velocity, that is, direction of cell differentiation defined by ratios of immature unspliced versus mature spliced mRNAs (Fig. 5E). We could define a single trajectory origin in the vicinity of cells expressing the normal intestinal stem cell marker LGR5 and CD44, which is a CRC stem cell marker  The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint

Discussion
Here, we use droplet-based single-cell sequencing of transcriptomes to characterize patient-derived matched CRC and normal colon tissue, as well as CRC organoids. We find that CRC, regardless of specific genetic mutations or clinical parameters, contains a large proportion of tumor-specific undifferentiated epithelial cells in addition to cells that resemble differentiated cell types of the normal colon epithelium. We assign the tumor-specific epithelial cells to five clusters, TC1-5, that form a continuum of cell plasticity. The TC cell types are distinguished by proliferative activity, oncogenic signaling, and gene expression patterns related to stemness, tissue invasion, and epithelial-tomesenchymal transition. We conclude that these cancer traits are unevenly distributed between tumor cell subpopulations. Furthermore, we identify stromal and immune cell types, including CAFs, macrophages, monocytes, and subsets of CD8+ T cells as cell types enriched in the tumor microenvironment that are sources of multiple growth factors initiating or amplifying Wnt-, TGF-β-, EGFR-and HGF-MET-signaling. This data suggest that paracrine signaling is a defining factor shaping the CRC ecosystem.
Our single-cell analysis illuminates a couple of clinically relevant features of CRC. Classification of CRC by bulk cancer transcriptome analysis can be achieved by the consensus molecular subtype system 32 .
CMS4, also termed as the "mesenchymal" subtype, is notable for its worse relapse-free and overall survival. We show here on a single-cell level that CMS4 transcriptomes stem specifically from fibroblasts, in particular, CAFs, while epithelial tumor cells can only assume CMS1 -CMS3. We conclude, therefore, that CMS4 is assigned to CRCs with a high content of CAFs that could possibly also contain a large fraction of the correlated TC2 tumor cell subtype. As we find CAFs to produce multiple pro-oncogenic growth factors including HGF and TGF-β ligands, it appears as a plausible strategy to target paracrine signaling as a future therapeutic option for the CMS4 CRC subtype. Furthermore, we could assign epithelial differentiation states to CMS subtypes. Our findings could be used to incorporate further informative and cell-type-specific genes into the CMS classifier, in order to assign CRCs with greater specificity, including those CRCs that currently cannot be assigned to a CMS subtype.
Anti-EGFR antibodies serve as first-line targeted therapy for patients with metastatic disease and no mutations in the EGFR-RAS-RAF-ERK signaling axis. Treatment success in this cohort has been linked to the production of the EGFR ligands AREG and EREG, and to immune infiltration in separate publications 28,36 . Connecting these phenomena, we have identified here AREG and EREG to be expressed by immune cells in the CRC ecosystems that we investigated here, in addition to epithelial cells. AREG and EREG were most strongly active in monocytes, and AREG was additionally expressed in other immune cell types. We hypothesize that AREG-/EREG-expressing immune cells contribute to . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint the paracrine signaling loop activating ERK in KRAS-, NRAS-and BRAF-wildtype CRC cells, and possibly influence anti-EGFR antibody therapy outcome.
Our results imply that CRC cells display considerable cell plasticity and have multilineage differentiation capacity, in agreement with pioneering single-cell sequencing studies 37,38 . Stem cells have traditionally been seen as unique cells driving tissue homeostasis and regeneration of normal tissue, but also therapy-resistance in cancer. However, recent studies have shown that combinations of oncogenic mutations and paracrine signals can steer the cell-intrinsic signaling network so that differentiation trajectories are reversed and more differentiated cells can regain stem cell characteristics [39][40][41][42][43] . In CRC, stem cells can be maintained and induced by factors such as HGF 13 and IL17A 20 . We found Macrophages and CAFs to express HGF and EGFR ligands, extending previous observations 44,45 . We also observed an expansion of IL17A-expressing CD8+ T cells. As these cells also showed transcription of KLRB1, these cells might be mucosal-associated invariant T cells (also known as MAITs) 46 The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint 13 growth conditions with high concentrations of EGF, but lacking a complex cellular microenvironment (see Fig. 5D). It is of note that improved experimental procedures are now available incorporating fibroblasts into organoid cultures to better mimic paracrine interactions present in vivo 48 . Single-cell analysis of such models could inform the dissection of cellular interdependences in CRC, and would also provide a more realistic scenario for preclinical drug tests.
Our workflow for dissociating clinical samples resulted in the acquisition of many types of major epithelial, immune and stromal cell transcriptomes. In addition, we capture subtle transcriptome differences within T cells, plasma cells, fibroblasts, and the tumor-specific TC cells that were assigned to multiple similar clusters. Analysis of single-cell transcriptomes is subject to multiple confounding factors, including numbers of genes detected, fractions of spliced versus unspliced mRNAs, and the fraction of mitochondrial reads. Indeed, particularly the epithelial cell transcriptomes that we captured for the present study varied on these quality parameters within, but also between the clusters.
Reassuringly, the distinction of phenotypic features between clusters was largely independent of the confounders. However, we note that the discrimination of five tumor-specific epithelial clusters is purely heuristic and, presently, we consider tumor-specific epithelial cells to have continuous phenotypic plasticity rather than inhabiting fixed states that can be clustered with confidence. Improvements to methods for tissue dissociation have recently been published 49 and should be incorporated into future clinical workflows to diminish tissue processing artifacts. However, with clinical samples, cell-type-specific degradation of transcriptome quality during the ischemic time window between the restriction of blood flow and completion of the operation probably cannot be avoided completely.
The extension of single-cell analyses of CRC to multi-omics, taking also in account genetic and epigenetic heterogeneity 50,51 , promises to identify cell plasticity and genetic diversity in cancer at a cellular resolution. We believe that such approaches will aid the future identification and eradication of CRC cell populations responsible for therapy resistance.

Organoid Culture
Cell filtrates from patient-derived tumor tissues that were retained in the 20µm filters after dissociation were washed in Advanced DMEM/F12 medium (Gibco), embedded in Matrigel, and cultured in 24-well plates, according to published procedures. Rho-kinase inhibitor Y27632 (10µM, Sigma) was used for the first passage to avoid anoikis. Cells originally embedded in Matrigel (Corning) were termed passage 0 (p0), and outgrowing organoids were passaged by removal from Matrigel, washing in PBS, and partial digestion using TrypLE cell dissociation solution (Gibco) at 37°C, washing in medium and re-embedding in Matrigel. For single-cell sequencing, organoids were dissociated completely using TrypLE and DNAseI, and filtering via a 20µm filter.

Single-cell RNA-seq data analysis
For each sample, UMIs were quantified using cellranger 3.0.2 with reference transcriptome GRCh38.
Spliced, unspliced and ambiguous UMIs were quantified with velocyto 52 (mode: run10x, default parameters). Quality control filters were set to only include cells with 500 to 5000 genes detected, 1000 to 50000 UMIs counted, fraction of mitochondrial reads ranging between 0 and 0.8, fraction of spliced reads ranging between 0.3 and 0.9, fraction of unspliced reads ranging between 0.1 and 0.7 and fraction of ambiguous reads ranging between 0 and 0.2. After filtering, UMI counts were variancestabilized using scTransform 53  The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint collection of the Broad institute 54 , unless otherwise referenced in the main text, and were scored with Seurat v3. Epithelial subsets of tumor and matched organoid samples were integrated using Seurat v3 with tumor samples as reference 35 . Consensus molecular subtypes were scored using random forrest approach included in R package CMSclassifier.
Diffusion map analysis and RNA velocity were performed using scanpy 55 and scvelo 56 . Cells were first filtered by the number of genes (between 2000 and 5000) and the percent mitochondrial reads (between 0.075 and 0.2) and normalized, using scvelo standard settings. Cell cycle was scored according to the scanpy tutorial, and S_score, G2M_score, percent mitochondrial reads and UMI counts per cell were regressed out. The diffusion map was calculated on the top 10 principal components, and using a neighborhood graph with 50 neighbors and calculated on all genes. Moments were calculated on 30 principal components and 30 neighbors, and velocity was calculated using the stochastic model.
To compute ligand-receptor connectivity between cell type clusters, UMI counts were summed for all ligands of the same pathway in each stromal or immune cell type of normal or tumor samples. Summed ligand counts were scaled to range between zero and one for each pathway. The fraction of normal and tumor proliferative epithelial cells expressing a given receptor was calculated and fractions were averaged across receptors for each pathway and cell type. Averaged fractions of cells expressing receptors were likewise scaled to range between zero and one for each pathway. Connectivity between stroma or immune ligand expression and epithelial receptor expression was calculated as the product of scaled ligand counts and scaled receptor expression fractions and, accordingly, also ranged between zero and one.

Tumor cell calling
For tumor-specific single-nucleotide variant (SNV) calling on single-cell data, we employed exome sequence data of patients P007, P008 and P009. We used Mutect2 57 to detect SNVs, retaining only events classified as somatic. We additionally filtered these results by removing variants on noncanonical chromosomes or within repeat regions (UCSC genome browser RepeatMasker track). We then used cellSNP 58 to quantify the total number dij of (UMI-collapsed) reads covering variant i in cell j (reference and variant allele), and the number aij of reads supporting the alternative allele, using all variants detected with at least one read in any cell, and all cells containing at least 3 variant-covering reads. Adapting the EM-type approach proposed in McCarthy et al. 59 to our situation, we evaluate the likelihood pT,j that cell j is a tumor cell using a binomial model: . CC-BY-NC-ND 4.0 International license author/funder. It is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.01.10.901579 doi: bioRxiv preprint Here, θT is the "success probability" for the somatic variants, measuring how likely it is to get a read supporting the variant allele. Similarly, we compute pN,j as the likelihood that cell j is normal, with a fixed parameter θN=0.01 allowing for sequencing errors and uncertainties in the variant calls. We calculate pT,j and pN,j in the E-step and estimate the parameter θT in the M-step as weighted sum over the counts dij and aij: E-and M-steps are iterated until convergence of the likelihood Finally, the criterion pT,j > pN,j is used to define likely tumor cells.

Ethics permission
All patients were aware of the planned research and agreed to the use of tissue. Research was approved by vote EA4/164/19 of the ethic´s commission of Charité Universitätsmedizin Berlin.

Competing Interests
The authors declare no competing interest.