Title : Chromatin regulatory dynamics of early development and regional specification in a directed differentiation model of the human small intestine

The appropriate development of the small intestine (SI) is critical for efficient nutrient absorption and barrier function after birth. Most of the molecular features and regional patterning of the SI are programmed very early in prenatal development. However, the chromatin regulatory dynamics that underpin early SI development in humans is largely unknown. To fill this knowledge void, we apply a cutting-edge genomic technology to a state-of-the-art model of human SI development. Specifically, we leverage chromatin runon sequencing (ChRO-seq) to define the landscape of active transcriptional regulatory elements across early stages of directed differentiation of human pluripotent stem cells (hPSCs) into SI organoids. Through comprehensive bioinformatic analysis of the data we provide the first-ever view of the changing chromatin regulatory landscape and define stage-specific key enhancer hotspots during human SI development. We also identify candidate transcription factors and their cistromes that are associated with the acquisition of SI identity and the initiation of regional patterning. This work offers a rich resource for studying transcriptional regulation of early human SI development.


INTRODUCTION
The embryonic development of the small intestine (SI) is critical for a fetus to thrive and grow. The genetic programming of SI cell identity and regional specification during early development is fundamental to ensure proper SI morphogenesis and maturation with specialized functions, including nutrient digestion and absorption, energy balance, and pathogen defense. Several seminal studies have identified important molecular regulators associated with gut development 1,2 , and more recent studies have leveraged advanced genomic technologies (e.g., single cell RNA-seq and ATAC-seq) to provide insights at a more granular level into gut development [3][4][5][6][7] ; however, this work relies almost exclusively on animal models. Studies of human SI development have been few, due in large part to limited access to primary human fetal tissues. Early time points of gut development in the human are essentially unexplored. In this study, we sought to fill this important knowledge gap by profiling for the first time the chromatin regulatory dynamics and transcriptional programs that are associated with early development of the human SI.
Robust temporal and spatial regulation of gene expression is fundamental to all biological processes including development 8,9 . Transcriptional programs are precisely controlled by promoters and distal cisregulatory regions known as enhancers. Enhancers harbor binding sites for transcription factors (TFs), activate long-range gene transcription, and are often cell-type specific 10, 11 . RNA polymerases are known to be recruited to active enhancers, generating divergent short transcripts (also known as enhancer RNA or eRNAs) 12,13 . Recently, an approach called chromatin run-on sequencing (ChRO-seq) 14 was developed for genome-wide identification of promoters, active enhancers, and actively transcribed gene bodies in a single assay. ChRO-seq represents the newest generation of nascent RNA sequencing technologies, and overcomes several limitations of previous versions including global run-on sequencing (GRO-seq) 15 and precision run-on sequencing (PRO-seq) 16 . ChRO-seq was very recently successfully applied to archived solid tumor tissues to define the distinct regulatory landscapes across different subtypes of glioblastoma multiforme 14 .
Advances in the directed differentiation of human pluripotent stem cells (hPSCs), including human embryonic stem cells (hESCs), provide a powerful strategy for studying the early development of human SI 17,18 . Human SI organoids generated from hPSCs recapitulate many aspects of in vivo SI development 19,20 , including embryonic patterning into different regions of the GI tract 21 , and ultimately exhibit molecular, structural, and functional features similar to those of the human fetal SI 22 . While hPSC differentiation into different regions of the SI has been described, the molecular mechanisms governing this pattering is unclear. To reveal temporal dynamics of the chromatin state during early human SI development and differentiation, we performed ChRO-seq in the early stages of the hPSC-derived SI model, including hESC, definitive endoderm (DE), duodenal spheroid (primitive SI with proximal regional identity), and ileal spheroid (primitive SI with distal regional identity). This study provides the first-ever view of the changing chromatin regulatory landscape and defines stage-specific key enhancer hotspots during human SI development. We also identify candidate key transcription factors and their cistromes that are associated with the acquisition of SI identity and the initiation of regional patterning. Moreover, we offer an unprecedented resource for the research community to develop and test targeted hypotheses about key regulatory hotspots and molecular drivers of early events of human SI development.

Generation of SI spheroids by directed differentiation of hPSCs
The directed differentiation of hPSCs (here we used H9 hESCs) into SI organoids was carried out as previously described 19,21 (Figure 1A). For genomic analysis, we included four early stages of the hPSCderived SI model: hESC, DE, duodenum (Duo) spheroid, and ileal (Ile) spheroid 21 . These region-patterned spheroids, which represent the primitive fetal SI, are comprised of mainly stem/progenitor cells 23 that can give rise to different mature lineages after prolonged organoid culture, or following transplantation into a murine host 24 . We performed ChRO-seq, together with deep RNA-seq, on these four stages in order to define the chromatin and gene expression dynamics in the key events of human SI development: DE formation (DE vs. hESC), SI identity acquisition (Duo spheroid vs. DE), and regional specification (Ile vs. Duo spheroid) ( Figure 1A). ChRO-seq enables the quantification of genome-wide nascent transcriptional activity at gene bodies, promoters, and enhancers, whereas RNA-seq enables only determination of steady state levels of gene expression ( Figure 1A).
ChRO-seq reveals temporal dynamics of nascent transcription at gene loci during early development of human SI To assess nascent transcription of genes, we first analyzed ChRO-seq signals within the body of annotated genes. Hierarchical clustering analysis and principal component analysis (PCA) of the gene transcriptional profiles is sufficient to cleanly stratify different stages of the hPSC-derived SI model ( Figure 1B and C). The ChRO-seq and RNA-seq signal for genes encoding transcription factors (TFs) that are well-established protein markers of specific stages indicate the appropriate specificity ( Figure 1D, Supplementary Figure  1A). Moreover, global comparison of the ChRO-seq and RNA-seq profiles across all stages ( Figure 1E) and within each stage (Supplementary Figure 1B) reveal a robust correlation, indicating that nascent transcription profiles globally reflect steady state expression profiles in this model system (albeit not perfectly, as expected due to other layers of gene regulation such as those at the post-transcriptional level).
Next we sought to identify from the ChRO-seq data differentially transcribed genes during DE formation, SI identity acquisition, and regional specification to the Ile region (DESeq2: padj < 0.2, p < 0.05, log2 fold change > 1, average TPM > 25) ( Figure 1F-L). During the transition from hESC to DE, 1886 genes (1328 up; 558 down) are altered ( Figure 1F) and the Gene Ontology (GO) term analysis reveals that the uptranscribed genes in DE are enriched in pathways that drive endoderm formation ( Figure 1G Identification of stage-specific markers at the levels of both transcription and steady state expression during early development of human SI First we demonstrated from the ChRO-seq data that markers of the esophagus, lung, stomach, liver, pancreas, and colon lineages are transcribed at low levels, as expected, in the Duo and Ile spheroids (Supplementary Figure 3A and 3B) compared to CDX2 (Supplementary Figure 1A), which is a known marker of SI. Next we determined that previously reported markers of human fetal duodenum and ileum 21 (Supplementary Figure 3C) are not sufficient to mark SI spheroids but rather only mature SI organoids after culture for 28 days (Supplementary Figure 3D). This likely reflects that fact that spheroids, which are newly differentiated, represent the early gut lineage and have not been given time in vitro to mature into SI organoids, which are typically cultured for ~1 month prior to analysis. Given the lack of known markers distinguishing Duo from Ile spheroids, we next sought to fill this knowledge void.
To identify genes that label a specific stage during early human SI development at the levels of both transcription and steady-state expression, we performed integrative analysis of the ChRO-seq and RNAseq data (see Methods). The results of hierarchical clustering analysis of those genes actively transcribed (TPM > 50) in at least one stage are shown in Figure 2A. We identified genes that are significantly elevated according to both transcription and expression in each stage relative to all other stages: hESC (n=239), DE (n=149), SI spheroid irrespective of regional identity (n=88), Duo spheroid (n=9), and Ile spheroid (n=40) ( Figure 2B; Supplementary Data 2). The stage-specific genes we identified include well-established markers such as POU5F1 and SOX2 for hESCs ( Figure 2C), NODAL, SOX17, GATA6, EOMES and LEFTY2 for DE ( Figure 2D), and CDX2 for SI spheroid irrespective of regional identity ( Figure 2E). We defined many novel stage-specific markers including HES4 and HOXB family members for SI spheroids irrespective of regional identity, FOXJ1 for Duo spheroids, and FJX1 for Ile spheroids. We also identified several long, non-coding RNA (lncRNA) and anti-sense transcript markers: LINC00678 for hESCs ( Figure  2C), LINC00543 for SI spheroids irrespective of regional identity ( Figure 2E), and EVX1-AS for Ile spheroids ( Figure 2G).
Active transcriptional regulatory element landscapes reveal chromatin re-wiring during early development of human SI Active transcriptional regulatory elements (TREs), including promoters and enhancers, are identified in ChRO-seq data by the hallmark feature of short bidirectional transcription ( Figure 3A). To identify active TREs across the entire genome in each stage, we employed dREG 25 , which was developed specifically for this purpose. Using this method, we identified a total of 125,863 active TREs across all four stages included in this study. The length distribution of these active TREs is consistent with what has been reported previously 26 ( Figure 3B). We found that, as expected, the vast majority of the active TREs are located in intergenic regions, introns, and annotated transcription start sites (TSS) ( Figure 3C). PCA showed that active TRE profiles are sufficient to cleanly stratify samples based on developmental stage and SI regional identity (Supplementary Figure 4A). We next categorized the identified TREs into 49,855 proximal and 76,008 distal TREs, which from here on in we refer to as promoters and enhancers, respectively ( Figure  3A). Then we carried out unsupervised hierarchical clustering analyses using profiles of promoters plus enhancers, promoters only, and enhancers only ( Figure 3D-F). We observed that enhancer profiles stratify different stages more accurately than promoters or all TREs ( Figure 3D-F, Supplementary Figure 4B-D), consistent with the notation that enhancer signature is the most cell-type specific 8,10 .
Identification of genes associated with stage-specific TREs during the process of SI identity acquisition and regional patterning Next we defined stage-specific TREs, determined the density of stage-specific TREs for every transcribed gene in each stage, and identified genes associated with the most stage-specific TREs ( Figure 4A; see Methods). We identified a total of 4166 Duo-specific and 540 Ile-specific active TREs ( Figure 4B and F). We confirmed that, as expected, genes associated with a greater number of Duo-specific TREs also exhibit greater increases in transcription (ChRO-seq signal) in Duo spheroids relative to DE ( Figure 4C). Similarly, genes associated with a greater number of Ile-specific TREs also exhibit greater increases in transcription (ChRO-seq signal) in Ile spheroids relative to Duo ( Figure 4G). Of the 3317 genes highly transcribed in Duo, 195 are associated with at least one Duo-specific TRE, are significantly up-transcribed (ChRO-seq) relative to DE, and significantly increased in steady-state expression (RNA-seq) relative to DE ( Figure 4D). As expected, CDX2 is one of the top genes ranked by the density of Duo-specific TREs (Figure 4 E). Notably, most of the other top ranked genes are HOXB family members ( Figure 4E). Of the 3418 genes highly transcribed in Ile, only 48 are associated with at least one Ile-specific TRE, are significantly up-transcribed (ChRO-seq) relative to Duo, and significantly increased in steady-state expression (RNA-seq) relative to Duo ( Figure 4H). The genes associated with the greatest number of Ilespecific TREs include members of the HOXA, C and D families, factors involved in canonical or noncanonical WNT signaling (JUND, FJX1 and CSRNP1), as well as HOTTIP ( Figure 4I). Several of these genes were identified as Ile spheroid-specific markers ( Figure 2G The same analyses as described above using the active enhancers only reveals the same temporal dynamics of the HOX cluster genes (Supplementary Fig. 6). Indeed, we detected high transcriptional activity (ChRO-seq) in Duo spheroids and high density of nearby Duo-specific TREs for HOXB genes ( Figure 5A and B). The increase in transcription of each HOXB family member in Duo relative to DE correlates with the density of Duo-specific TREs ( Figure 5C). Also, we observed dramatic increases in transcriptional activity (ChRO-seq) in Ile spheroids and high density of Ile-specific TREs for HOXA, C, and D family members ( Figure 5D and E). The increase in transcription of each HOXA/C/D family member in Ile relative to Duo tracks closely with the density of nearby Ile-specific TREs ( Figure 5F). Together, these observations indicate an initial activation of HOXB during SI identity acquisition, likely driven by the identified nearby Duo-specific TREs, followed by the activation of the other HOX clusters during ileal specification, likely driven by the identified nearby Ile-specific TREs.
Identification of stage-specific enhancer hotspots associated with SI identify acquisition and ileal regional patterning It has been shown in previous studies that dense clusters of highly active enhancers (which we refer to as 'hotspots') occur nearby to genes that are especially critical for defining cell identity and status [27][28][29] . We sought to define for the first time the changing landscape of enhancer hotspots in a model of human SI development. To accomplish this, we adapted a previously described methodology 27,28 , which requires several different histone modification ChIP-seq datasets, to work with ChRO-seq data instead (see Methods; Supplementary Data 3).
To identify enhancer hotspots and the candidate genes which they may regulate in SI identity acquisition ( Figure 6A), we first analyzed Duo-specific distal TREs to define 'stitched enhancers' specific to Duo spheroids relative to DE. Among the 2389 stitched enhancers, we identified 145 that exhibit strong enough transcriptional activity to be designated as Duo-specific 'enhancer hotspots' (Figure 6B). The distribution of the distance between every gene and its nearest Duo-specific enhancer hotspot is shown in Figure 6C. We assigned every stitched enhancer in Duo spheroids (non-hotspots and hotspots) to the nearest gene and found that the overall increase in transcription is significantly greater for the set of genes associated with Duo-specific hotspots compared to those associated with stitched enhancers that are not hotspots ( Figure  6D). This finding is consistent with the notion that hotspots exert particularly strong effects on transcription. Among the 110 genes linked to Duo-specific enhancer hotspots, 30 exhibit highly significant increases in both transcription (ChRO-seq) and steady state expression (RNA-seq) during the transition from DE to Duo spheroid ( Figure 6E and F). Many of these 30 (e.g., CDX2, FOXA1, HOXB family members, and LINC00543) were defined earlier as SI or Duo-specific markers ( Figure 2).
We next performed a similar analysis to identify enhancer hotspots and nearby genes associated with ileal regional specification. Among 310 stitched enhancers formed by Ile-specific enhancers (relative to Duo spheroids), we defined 29 Ile-specific enhancer hotspots ( Figure 6G). The distribution of the distance between every gene and its nearest Ile-specific enhancer hotspot is shown in Figure 6H. Similar to the observation made for SI fate acquisition, the genes nearest to Ile-specific enhancer hotspots exhibit a significantly greater increase in transcription compared to those linked to non-hotspot stitched enhancers ( Figure 6I). Among the 31 genes linked to Ile-specific enhancer hotspots, 12 exhibit highly significant increases in both transcription (ChRO-seq) and steady state expression (RNA-seq) in Ile relative to Duo spheroids ( Figure 6J). These genes include HAND2, HOXC/D family members, FJX1, DLX5, and PITX1 ( Figure 6K), many of which were also identified earlier as Ile-specific markers ( Figure 2). Similar analyses were performed for the hESC and DE stages and the results are summarized in Supplementary  Figure 7.

Identification of candidate TFs and cistromes associated with SI identity acquisition and ileal regional patterning
To discover putative key transcription factor (TF) drivers and their cistromes (the genome-wide set of TF targets) along the early developmental axis of the hPSC-derived SI model, we developed and executed the following bioinformatic pipeline ( Figure 7A, see Methods): (1) Motif enrichment analysis was performed by HOMER in stage-specific TREs (Supplementary Data 4), (2) The enriched binding motifs were filtered further based on the RNA-seq expression levels of the cognate TFs, and (3) The cistromes with different candidate TFs of interest were defined and compared across different TFs.
We first sought to identify candidate TF-motif modules that are associated with SI identity acquisition. Among the 4166 Duo-specific TREs, we detected significant enrichment for the binding motifs of CDX4, CDX2, and many forkhead box TFs including FOXA1 and FOXM1 ( Figure 7B; Supplementary Figure  9G). The motif enrichment analyses using only Duo-specific enhancers generated similar results (Supplementary Figure 8). We chose the top ranked motifs associated with CDX4, CDX2, FOXA1 and EP300 for cistrome analysis ( Figure 7C; Supplementary Figure 10), which led to several notable observations. First, FOXA1 and CDX2 are associated with Duo-specific TREs that harbor FOXA1 and CDX2 binding sites, strongly suggesting that the transcription of these two TFs may be activated and maintained in part through an auto-regulatory mechanism. Second, we found that although the genes associated with Duo-specific TREs that contain CDX2 or CDX4 binding sites are largely overlapping as expected, there are some genes uniquely associated with only one or the other, indicating that there may be some distinct functional roles in SI fate acquisition. Third, many of the genes we had identified as markers of SI spheroids (and even specifically Duo-spheroids), including HOXB genes, FOXJ1, WFDC2, and LINC00543, are indeed associated with one or more of the TFs that are associated with SI fate acquisition.
We next performed a similar analysis for ileal regional specification. Among the 540 Ile-specific TREs, we detected significant enrichment for several key TF families ( Figure 7B; Supplementary Figure 9G and 10). We observed an enrichment for motifs of HOXD11 and PITX1, which themselves are associated with Ile-specific enhancer hotspots (Figure 6), suggesting that these two factors are associated with Ilespecific chromatin re-wiring both upstream and downstream of their activity. Surprisingly, we also detected significant enrichment for motifs of CDX2 and CDX4 in Ile-specific TREs; suggesting that these two factors have stage-specific binding sites and therefore may have different cistromes for controlling SI fate acquisition and ileal regional specification. Finally, and perhaps most notably, we identified very strong enrichment in Ile-specific TREs of several AP-1 related factors (JUN, JUNB, JUND and FOSL2), which underscores a point made earlier in the GO analysis that the AP-1 transcriptional network may be critical for ileal regional specification ( Figure 1L). The motif enrichment analyses using only Ile-specific enhancers generated similar results (Supplementary Figure 8). We next chose the motifs of CDX2, AP-1 factors, HOXD11 and PITX1 for further cistrome analysis ( Figure 7E), which yielded several findings. First, many of the genes we had identified as markers of Ile spheroids, including HAND2, DXL5, FJX1, and HOXA/C/D family members are indeed associated with one or more of the TFs that we have identified as important in the Ile-specific cistrome. Second, the members of the HOXA, C, and D families are associated with overlapping but distinct TF regulators. For example, while HOXD9-11 are strongly associated with all four major TFs including AP-1, the HOXA genes are associated with all of the TFs except AP-1. Similar analyses were performed for the hESC and DE stages and the results are summarized in Supplementary  Figure 9 and 10.
Overall, based on the chromatin regulatory dynamics defined by the analyses in this study, we have provided a working model of transcriptional programming that may drive SI fate acquisition and ileal regional patterning (Figure 8).

DISCUSSION
By leveraging the recently developed ChRO-seq technology, we generated for the first time ever comprehensive chromatin regulatory landscapes across the beginning stages of directed differentiation from hPSC to SI organoids. Specifically, we defined: (1) marker genes that label specific stages along the developmental axis of the human SI, (2) the map of active regulatory elements (promoters and enhancers) as well as enhancer hotspots associated with SI identity acquisition and ileal regional patterning, and (3) candidate key TF drivers and their cistromes relevant to these critical developmental events.
The markers that we identified as labels of SI fate acquisition and Duo or Ile regional identity provide the first glimpse of genes that are involved in early SI development and patterning. Several markers (candidate regulators) of SI spheroids drew our attention. HES4 is identified for the first time in the context of gut development; one reason for this is that it is absent in the genome of the mouse 30 , the model organism most used previously to study gut development. FOXA1, which is involved in endoderm formation in mice, was identified as a marker of SI spheroids and among the genes strongly associated with active enhancers that emerge only during SI identity acquisition. In fact, a recent study linked dysregulation of intestinal FOXA1 with necrotizing enterocolitis in infants 31 , which further supports its functional relevance in human SI development and thereby warrants future investigation. Our finding of WFDC2 as a marker of SI spheroids is consistent with the enrichment of Wfdc2 in SI spheroids derived from the murine fetal progenitor population compared to SI organoids derived from murine adult stem cells 32 . Notably, some of the cells in this model are committed to the mesoderm lineage (Supplementary Figure 1A), consistent with previous observations 18,19 . Our marker analysis highlights genes that are traditionally thought to be mesenchymal (e.g., WNT5A, DLX5, HAND2) and have been shown to be critical for early gut development in mice or to be involved in regional patterning in other tissue types. Future experiments are required to confirm the cell type in which these candidate regulators are expressed and have functional roles in human SI development. Future experiments are also required to assess sufficiency and necessity of the stage-specific enhancers and enhancer hotspots identified in this study for controlling transcriptional networks relevant to human SI development and patterning.
Our ChRO-seq analysis highlights the spatiotemporal dynamics of HOX clusters during early development of human SI, unveiling the complexity of HOX biology in this context. We found that the HOXB cluster is associated with the acquisition of the SI fate whereas the other HOX clusters (HOXA, C and D) are associated with the initiation of ileal regional patterning. In fact, our study in the human SI model, together with a recent single cell RNA-seq study in E8.75 mouse gut 3 , does not suggest the typical spatial collinearity of HOX coding observed in various organ types 33,34 . Importantly, we were able to identify known and novel TF drivers of HOX genes in the context of human SI development (Figure 8). Our observation that CDX2 likely targets and regulates the HOXB cluster genes has been validated by a very recent CDX2 ChIP-seq study of SI-patterned endoderm monolayer derived from hPSCs 5 , demonstrating that ChRO-seq is a very sensitive and robust tool to identify key TFs and their target genes in any particular biological context. Also, our finding supports the finding that ectopic co-expression of CDX2 and HOXB genes in gastric intestinal metaplasia and Barrett esophagus [35][36][37] leads to the cells acquiring an intestine-like identity. Interestingly, we also found that CDX2 is predicted to regulate HOXA, C and D clusters only during ileal regional specification, which is a new discovery that merits further investigation.
We identified several candidate TF drivers, including PITX1 and AP-1 family members, associated with ileal specification. A recent scRNA-seq study of primary human fetal gut tissues showed that the regulatory network of JUN, JUNB and FOS are more enriched in an early stage of SI development compared to a late stage 4 ; however, their roles in early SI development remained unknown. Our findings suggest that AP-1 members likely contribute to early SI development by establishing ileal identity. Notably, multiple AP-1 members are expressed in the H4 cell line established from human fetal ileum 38 and JunD is found to be enriched in ileal compared to duodenal enterocytes of adult mice 39 , both of which indicate the relevance of AP-1 TFs to ileal properties. Interestingly, AP-1 factors are known to be involved in non-canonical WNT signaling; FJX1 (a marker of Ile identity in our study) functions through the Wnt/planar cell polarity (PCP) pathway to determine anterior-posterior axis in Drosophila 40,41 and is expressed in the epithelium of developing gut in mice 42 . Whether these genes/TF networks enriched in Ile spheroids function through non-canonical WNT pathways and how they interplay with canonical WNT signaling during ileal regional patterning warrant detailed future investigation.
The use of the state-of-art in vitro model of human fetal SI is a critical feature of this study given the lack of other existing platforms to study early organogenesis of human SI and the growing appreciation for cross-species differences in developmental processes in vertebrates at the molecular level [43][44][45][46] . Furthermore, the genome-wide characterization of the chromatin regulatory landscape by ChRO-seq in this model system generates valuable translational knowledge for a better understanding of initial transcriptional programming of human SI development. This analysis pipeline can also be applied to other hPSC-derived organoid systems to gain insights into dynamic transcriptional programming and chromatin status during early stages of human organogenesis. Most importantly, the identification of the candidate drivers in the present study may ultimately improve methods of generating therapeutic replacement SI as well as molecular therapies for children and possibly adult patients.

ACKNOWLEDGMENTS
We would like to thank members of the Sethupathy laboratory, most notably Dr. Michael Shanahan, Dr. Ajeet Singh, and Dr. Matt Kanke, for helpful comments and feedback on the study and manuscript. We

CONFLICTS OF INTEREST
The authors disclose no conflicts.

METHODS
Directed differentiation of hESCs. Differentiation of H9 hESCs and organoids was performed as previously published, with minor modifications 20,55 . Briefly, the endoderm was generated by treatment of activin A (100 ng/ml) for 3 consecutive days in Roswell Park Memorial Institute 1640 (RPMI-1640) media supplemented with 0% (v/v), 0.2% (v/v) and 2.0% (v/v) Hyclone defined fetal bovine serum (dFBS). The endoderm cultures then received daily treatments of FGF4 (500 ng/mL) and CHIR99021 (2 μM) for next 10 days. The intestinal spheroids representing fetal duodenum and ileum were collected on day 5 and day 10, respectively 21 . Subsets of spheroids collected at day 5 and day 10 were then cultured in Matrigel with the previously defined intestine growth media 21 for 28 days in order to mature into organoids. The resulting organoids were then prepared for cell sorting to purify EPCAM+ cell population (epithelial fraction) and the sorted cells were processed for RNA-seq library preparation. The cells generated at the stages of hESC, endoderm, duodenal spheroids and ileal spheroids were subject for ChRO-seq and RNA-seq library preparation.

Fluorescence activated cell sorting (FACS).
Methods for organoid dissociation into single cells followed by selection of the epithelial component with FACS, were based on previously described procedures 56 . All solutions, including overnight pretreatment of the organoid cultures, contained 10μM Y27632 (Tocris). Matrigel was digested for 30 minutes with cold 4mM EDTA-DPBS and organoids were washed 4X with cold DPBS. Structures were enzymatically dissociated into single cells using the Tumor Dissociation Kit (human) (Miltenyi Biotec) with a gentleMACS™ Octo Dissociator (with heaters; Miltenyi Biotec) for 50 minutes at 37°C. The cell suspension was then washed with 0.5% BSA-2mM EDTA-DPBS over a succession of cell strainers, 100μm, 40μm (Corning) and 20μm (CellTrics) and centrifuged for 5 minutes at 500xg. Cells were labeled with EpCAM phycoerythrin (PE)-conjugated antibody (BioLegend) and an EpCAM isotype-PE control (BioLegend), and were sorted in 0.1% BSA 2mM EDTA-DPBS on a MoFlo Astrios 1 (Beckman Coulter; Brea, California) instrument at the University of Michigan BRCF Flow Cytometry Core facility. Events were first selected with light-scatter and doublet discrimination gating, followed by exclusion of dead cells using 1μM DAPI dilactate (Molecular Probes). EpCAM-PE(+)/DAPI(-) cells were sorted into cold Advanced DMEM/F12 (Invitrogen). Collected cells were reanalyzed for a purity-check and showed greater than 89% viable and 98% EpCAM-PE(+) events. Cells were pelleted at 500xg for 5 minutes and flash-frozen for subsequent RNA isolation.
Chromatin isolation. The chromatin isolation for ChRO-seq library preparation was performed as previously described 14,57

ChRO-seq library and sequencing.
After chromatin isolation, the ChRO-seq library preparation closely followed the protocol described previously 58  The run-on reaction was incubated at 37 °C for 5 min. The reaction was stopped by adding Trizol LS (Life Technologies, 10296-010) and pelleted with GlycoBlue (Ambion, AM9515) to visualize the RNA pellet. The RNA pellet was resuspended in diethylpyrocarbonate (DEPC)-treated water and heat denatured at 65 °C for 40 s. In the present study, base hydrolysis of RNA was performed by incubating RNA with 0.2N NaOH on ice for 4 min. Nascent RNA was purified by binding streptavidin beads (NEB, S1421S) before and in between the following procedures: (1) 3′ adapter ligation with T4 RNA Ligase 1 (NEB, M0204L), (2) 5′ de-capping with RNA 5′ pyrophosphohydrolase (RppH, NEB, M0356S), (3) 5′ end phosphorylation using T4 polynucleotide kinase (NEB, M0201L), (4) 5′ adapter ligation with T4 RNA Ligase 1 (NEB, M0204L). The resulting RNA fragments were used for a reverse transcription reaction using Superscript III Reverse Transcriptase (Life Technologies, 18080-044) to generate cDNA. cDNA was then amplified using Q5 High-Fidelity DNA Polymerase (NEB, M0491L) to generate the ChRO-seq libraries. Libraries were sequenced (single-end 75x) using the NextSeq500 high-throughput sequencing system (Illumina) at the Cornell University Biotechnology Resource Center. Supplementary Data 1 provides the mapping statistics of the ChRO-seq experiments.
Total RNA isolation, mRNA-seq library and sequencing. Total RNA was isolated using the Total Purification kit (Norgen Biotek, Thorold, ON, Canada). High Capacity RNA to cDNA kit (Life Technologies, Grand Island, NY) was used for reverse transcription of RNA. Libraries were generated using the NEBNext Ultra II Directional Library Prep Kit (New England Biolabs, Ipswich, MA) and subjected to sequencing (single-end 92x) on the NextSeq500 platform (Illumina) at the Cornell University Biotechnology Resource Center. At least 80M reads per sample were acquired.

Mapping sequencing reads.
In the ChRO-seq studies, the publicly available pipeline 25 was used to align ChRO-seq reads. Since the libraries were prepared using adapters that contained a molecule-specific unique identifier (first 6 bp sequenced), the PCR duplicates were first removed using PRINSEQ lite. Adapters were trimmed from the 3′ end of remaining reads using cutadapt with a 10% error rate. Reads were mapped with the Burrows-Wheeler Aligner (BWA) to the human reference genome hg38 plus a single copy of the Pol I ribosomal RNA transcription unit (GenBank U13369.1). The location of active RNA polymerase was represented by a single base that denotes the 3′ end of the nascent RNA, which corresponds to the position on the 5′ end of each sequenced read. Mapped reads were converted to bigwig format using BedTools and the bedGraphToBigWig program in the Kent Source software package. ChRO-seq signal for visualization purpose was normalized by total bigwig signal (wigsum) of 10,000,000. The transcription (ChRO-seq) levels of genes were normalized by the length of gene bodies to transcripts per million (TPM).
In the RNA-seq studies, reads were mapped to human genome hg38 using STAR (v2.5.3a) 59 and transcript quantification was performed using Salmon (v0.6.0) 60 with GENCODE release 25 transcript annotations. The expression (RNA-seq) levels of genes were normalized using DESeq2 61 . All the samples except an Ile HIO sample had > 80% mapping rates. Although the Ile HIO sample had an unfavorable mapping rate, it was able to show elevated levels of Ile-associated regional markers compared to the Duo HIO sample (Supplementary Figure 2). Differential expression and pathway analyses. The differential analysis of gene bodies and TRE regions in ChRO-seq data was performed using DESeq2 package 61 . The differential analysis of RNA-seq was performed using DESeq2 package 61 . For all the analyses except stage-specific marker analysis (see below), the normalized levels of transcription or expression, foldchange and the statistic filtering were based on the