Essential amnion signals for primate primitive streak formation resolved by scRNA map

The signaling network governing the formation of the primitive streak is well understood in mice, but largely unexplored in primates. Advances in single-cell technology and in vitro embryo culture have enabled to characterize the major cell populations involved. However, a detailed map of this process and insights into its regulatory networks are lacking. Herein, we generated a serial single cell atlas of over 30,000 cells spanning peri-implantation to early primitive streak stages in non-human primates (NHP) describing the emergence of the primitive streak, extraembryonic mesenchyme and amnion. We discovered that ISL1, a gene with a well-established role in cardiogenesis, controls a gene regulatory network in primate amniotic cells. Strikingly, CRISPR/Cas9-targeting of ISL1 resulted in NHP embryos failing to form primitive streak. BMP4 was identified as a key signaling pathway in this process. This was confirmed in human embryonic stem cell lines, suggesting a conserved function in humans. Notably, no viable ISL1 hypomorphic NHP mutant embryos could be recovered after embryo transfer confirming the essential requirement of ISL1. This highlights the importance of the amnion as a signaling center during primate embryogenesis and demonstrates the potential of in vitro primate model systems to investigate the genetics of early human development.


Introduction
One of the key steps during mammalian early development is the formation of the three primary germ layers. This occurs in a complex process termed gastrulation where cells from the columnar shaped epiblast undergo epithelial-to-mesenchymal transition and move anteriorly to form the primitive streak and the first mesodermal cells [1][2][3]. Since this step is crucial for further development of the embryo, disturbances in this process lead to embryonic lethality. It is believed that improper gastrulation occurs frequently in human embryos and accounts for a significant proportion of early miscarriages in the human population.
The tight regulatory network governing this process has been well studied during murine embryonic development [1,4], but is largely elusive in humans. Despite the fact that mouse and primate embryos have a similar appearance before the implantation stage, the transcriptome already differs in key aspects [5,6]. After implantation, the differences become more evident.
Mouse embryos form a cup-like structure, while primate embryos acquire a disk-like shape and have a prominent amnion, a structure which is absent in mouse embryos before gastrulation [7,8]. Notably, some evidence from in vitro models of early human development suggest that amniotic cells could be involved in the signaling network that governs primitive streak 3 formation [9]. Unravelling the pathways that form this regulatory network in primates is key for understanding primitive streak formation in humans and for identifying potential causes for congenital malformations and early pregnancy loss. Recently, two publications on cynomolgus embryogenesis [10,11] and one publication on human embryogenesis [12] have created a framework of post-implantation development in primates and characterized the major cell populations involved in gastrulation. However, their interplay and the transcriptional networks guiding this essential step remain unknown.
Here, we created a high-resolution map of the peri-gastrulation development of early NHP embryogenesis with a focus on the signaling events leading to the formation of the primitive streak. We identified an ISL1-dependent gene regulatory network that is specifically active in amnion. Disturbance of this network in NHP embryos by CRIPSR/Cas9-mediated gene-editing of ISL1 led to significant downregulation of BMP4 signaling from the amnion and subsequent failure to from the primitive streak. We confirmed these findings in a microfluidic-based model of amnion-epiblast interactions using ISL1-null human embryonic stem cells suggesting that these findings also apply to humans. Taken together, this study shows for the first time that signals from the amnion are indispensable for primitive streak formation in primate embryos.

High-resolution transcriptomic map
We created a high-resolution transcriptomic map by single cell RNA (scRNA) sequencing of 11 in vitro cultured cynomolgus macaque embryos at three different time points (Day 10, Day 12 and Day 14) (Fig. 1A). 7194 cells passing quality control ( Fig. S1A-B) were embedded for each day separately in low-dimensional space ( Fig. 1b and Fig. S2A-B). In line with previous results [10,11], the cells grouped into four main cell types, namely trophoblast, endoderm, epiblast with its derivatives and extraembryonic mesenchyme ( Fig. 1B-C). Integration of our dataset with a published scRNA sequencing dataset of in vivo cynomolgus embryos [13] (Fig. 1D-E and Fig. S2C) revealed a striking difference in the transcriptomic profile between early (Day 10 + E08/E09) and late (Day 12/Day 14 + E13/E14) epiblast, reflecting the transition from a naïve to a primed state, which has been suggested before to happen during this time window [14]. Indeed, a published gene signature of naïve cells derived from scRNA sequencing analysis of in vitro cultured human embryonic stem cells (hESCs) [15], was highly enriched in the early epiblast (Fig. 1F), while genes belonging to the primed signature were enriched in the late epiblast (Fig. 1F). Aligning cells from the early and late epiblast in pseudotime disclosed a set of differentially regulated genes that formed two distinct clusters based on their expression dynamics (Fig. 1G). Genes previously associated with a naïve state [14][15][16], including DNMT3L, KHDC1L, NLRP7, OOEP and DPP4 were significantly downregulated over pseudotime (Fig.   1H). Notably, a number of those genes belong to the subcortical maternal complex, which is crucial for proper early development [17]. In contrast, genes associated with a primed state [16,18,19], including CD24, CRABP2, SFRP1, USP44 and VCAN showed strong upregulation (Fig.   1H). This expression pattern was observed in cells from our dataset as well as in cells from the in vivo dataset, suggesting that the naïve to primed transition happening in vivo can faithfully be recapitulated in in vitro cultured embryos.

Gene regulatory network of early epiblast
Taking advantage of our high-resolution scRNA map, we next investigated the sequence of appearance of epiblast-derived cell populations. Strikingly, we identified cells expressing amnion marker genes ISL1 and TFAP2A [10], although very low in numbers, as early as Day 10 ( Fig. 2A and Fig. S2D). Embryos at this stage also consisted of a large cell population expressing genes typical of a mesodermal signature such as MIXL1 and MESP1 (Fig. S2E), which were previously annotated as early gastrulating cells [10]. However, in addition to their mesodermal signature, these cells show high expression of the transcription factor ETS1 and the cell adhesion protein PODXL (Fig. S2F), which mark extraembryonic mesoderm in mice [20], while lacking expression of the receptor tyrosin-kinase EPHA4 or the transcription factor ZIC3 (Fig. S2G) expressed by murine embryonic mesoderm [20]. Thus, we provide strong evidence, that the epiblast-derived mesodermal-like cells present in Day 10 embryos are extraembryonic mesoderm, which appear to precede primitive streak formation.
Extraembryonic mesenchyme, a cell population unique to primates that contributes to a number of extraembryonic tissues [7,13,21,22], was first present in Day 12 embryos ( Fig. 2A). The close proximity of extraembryonic mesoderm and mesenchyme in the UMAP plot ( Fig. 2A) as well as the expression pattern of PODXL and ETS1 (Fig. S2H), advocate that extraembryonic mesoderm contributes to extraembryonic mesenchyme, as was suggested previously [23].
However, the large increase in cell number in the extraembryonic mesenchyme over a few days suggests, that this cell population might get additional contributions from other parts of the embryo, such as the trophoblast. The formation of the primitive streak, marked by expression of the transcription factors MESP1 and GSC (anterior) [24,25], as well as HOXD9 and CDX2 (posterior) [26] (Fig. 2B and Fig. S2I-J), was observed first at Day 14 ( Fig. 2A). Late amniotic cells had a distinct expression profile with high expression of the transcription factor ISL1, the 5 Wnt-ligand WNT6 and a subunit of the GABA-receptor GABRP (Fig 2B, Fig. S2K). Transcripts of these markers were specifically enriched in amniotic cells across the whole dataset including trophoblast, while other markers such as HEY1 were also expressed in subpopulations of the trophoblast (Fig. S2L). Cells classified as early amniotic cells expressed amniotic markers in addition to marker genes of the posterior primitive streak, such as CDX1 [26] (Fig. 2B, Fig.   S2M).
Gene regulatory network (GRN) analysis at Day 14 using SCENIC (Single-cell regulatory network inference and clustering) [27] revealed sets of highly specific GRNs (Fig. 2C-D). The histone demethylase KDM5B, creating bivalent histone marks during development [28], controlled a GRN active in epiblast and all its derivatives, while the pluripotency factor SOX2 [19] controlled a network specifically active in the epiblast (Fig. 2D). One of the most active GRNs in amniotic cells was an ISL1-dependent network (Fig. 2D), suggesting that ISL1 is not only a specific marker of the amniotic cell population in primates, but also likely to play a functional role in amniotic cells. This finding is in sharp contrast to the mouse, where Isl1 is first expressed in cardiac progenitor cells of the lateral plate mesoderm, but absent from the early embryo before E7.0 [29,30].

ISL1 hypomorphs fail to form primitive streak
The functional role of ISL1 in primate amniotic cells was investigated by generating ISL1 Genotyping of the mutant embryos revealed a mosaic pattern with presence of ind/dels of different sizes in the targeted region of the ISL1 locus and no alterations in selected off-targets from the in-silico prediction [31] (Fig. S3B-C). Interestingly, the frequency of mutations resulting in an internally shortened, but functional ISL1 was significantly higher in the transcriptome (Fig. S3C). This could be due to nonsense-mediated decay of mutated ISL1 message or suggest that cells with a complete loss of function of ISL1 are not able to enter an amniotic fate. Overall, this resulted in a 50% reduction of functional ISL1 message in hypomorphic mutant embryos.
To gain mechanistic insights, in vitro cultured ISL1 hypomorphic mutant embryos were subjected to scRNA-sequencing at the same time points as for the wildtype (Day 10, Day 12,  [33] that is essential for murine mesoderm formation [34] as well as for inducing primitive streak like cells from hESCs [32] (Fig. 3F). Taken together, this suggests that BMP4 is secreted from the amnion in an ISL1-depentend manner to induce primitive streak formation in the early NHP embryo.

Loss of ISL1 in amnion impairs BMP4 signaling
To validate this conclusion and to investigate whether these findings also apply to humans, ISL1-null hESCs (ISL1-null) harboring the most abundant long deletion in the ISL1 locus found in the mutated NHP embryos ( Fig. S3C and S5A) were generated and analyzed in vitro [35] ( Fig. 4A). Amniotic ectoderm-like cells (AMLCs) derived from the ISL1-null showed a 50% reduction in ISL1 mRNA (Fig. 4B) and absence of ISL1 protein (Fig. 4C) which was abundantly expressed in wildtype hESCs derived AMLCs (Fig. 4C). In concordance with the findings from the in vitro cultured NHP embryos, AMLCs derived from the ISL1-null showed higher mRNA and protein levels of the amnion markers TFAP2A and GATA3 ( Fig. 4D-E). In addition, we noticed a slight reduction in WNT6 and a significant reduction in BMP4 expression, suggesting a functional deficit in ISL1-null derived AMLCs (Fig. 4F). Indeed, AMLCs derived from the

ISL1-null failed to induce mesoderm-like cells (MeLCs) from hESCs shown by a lack of
Brachyury (BRA) signal ( Fig. 4G-H). Notably, BRA expression was similar between ISL1-null and wildtype using a directed differentiation protocol towards anterior and posterior primitive streak cells [32] (Fig. S5B). This highlights that the failure of mesoderm induction in the ISL1null is a non-cell autonomous defect caused by altered signaling from AMLCs.
The capacity of ISL1-null and wildtype hESCs to self-organize into an embryonic-like sac was assessed in a microfluidic system that has been shown to faithfully recapitulate the periimplantation development of the epiblast lineages [9]. With a reduced BMP4 dose as compared to the protocol used in the original publication, the wildtype cells still showed proper formation of embryonic-like sacs, adequate break of symmetry and formation of MeLCs in the epiblastlike region as shown by strong signals for BRA ( Fig. 4I and Fig. S6). ISL1-null cells were capable of self-organizing into embryonic-like sacs similar to the wildtype but failed to develop further. The epiblast-like cells remained in a columnar shape with high levels of the pluripotency factor NANOG and absence of BRA [19] (Fig. 4I), indicating failure to form MeLCs similar to the findings observed in the mutant NHP embryos during in vitro culture.

ISL1 mutant NHP embryos are not viable
Finally, we investigated whether the observed alterations during embryonic development in the ISL1 hypomorphic mutant NHP embryos lead to early embryonic lethality in vivo. The pregnancy rate per NHP surrogate mother after transfer of ISL1 mutant embryos was 0% as compared to 58.3% with wildtype embryos ( Table 1). Transfer of embryos that were targeted with injection of only a single gRNA leading to a slightly lower mutation rate resulted in a pregnancy rate of 28.6% (Table 1). Strikingly, genotyping of all 4 fetuses from this experiment showed an unmodified ISL1 locus on both alleles, highlighting the importance of ISL1 for proper early embryonic development.

Discussion
In this study, we generated a high-resolution developmental roadmap of post-implantation NHP embryos and identified the amnion as a key signaling structure essential for initiating gastrulation in primates. NHP embryos hypomorphic for the amnion marker ISL1 failed to form primitive streak due to a lack of BMP4-signaling and did not give rise to viable offspring, demonstrating a novel, primate-specific role of ISL1 in early embryogenesis (Fig. 5).
It is known that the initiation of primitive streak formation is largely dependent on BMP4-8 signaling, which is provided by the extraembryonic ectoderm in mice [1,36]. The findings from our study suggest that this role is taken over by the amnion during primate embryogenesis.
Notably, the role of ISL1 acting upstream of BMP4-signaling seems to be a conserved pathway.
The complete loss of Isl1 in mice leads to embryonic lethality at approximately E10.5 with severe cardiac defects accompanied by a strong reduction in Bmp4 [29], suggesting that Bmp4 is acting downstream of Isl1 in cardiac development. Similar observations were made in mice during genital development [37] and embryonic limb formation [38].
Isl1 has a well-established role in mammalian cardiac development and is expressed in multipotent cardiovascular progenitor cells in mice [39][40][41] and humans [42,43]. Despite its established role in heart development in mice, loss of function variants in the ISL1 locus are drastically underrepresented in large human cohorts of congenital heart malformations like the Pediatric Cardiac Genomics Consortium (PCGC) [28,44]. In detail, among the 23,000 alleles reported in the PCGC cohort, 112 ISL1 variants have been identified, none of which were damaging de novo mutations. We hypothesize that the exceptionally low frequency of damaging ISL1 variants reported in this cohort is due to its essential role in initiating primitive streak formation. Importantly, we did not observe any pregnancy with an ISL1 hypomorphic mutant primate embryo after transferring almost 50 blastocysts in over 15 surrogate NHP mothers. This highlights the importance of two intact ISL1 alleles for proper embryonic development and suggests that mutations in ISL1 could be a cause for early pregnancy loss in humans.
This study shows that the amnion is a crucial signaling center during early primate embryogenesis and demonstrates that in vitro cultured primate embryos are a powerful tool to model key steps of early human development. Further advances in the in vitro culture systems might enable us to support the embryo longer and to study early organogenesis, including the emergence of cardiac progenitor cells in lateral plate mesoderm. This would enrich our knowledge on human embryogenesis, help to identify causes for pregnancy loss and congenital malformations and, eventually, open the avenue for new therapies.

Cynomolgus macaque
Healthy cynomolgus monkeys (Macaca fascicularis), ranging from 5 to 12 years old, were used in this study. All animals were housed either at the facility of Yunnan Key

Culture of NHP embryos
NHP embryos were collected as described in previously publication [45]. In brief, healthy female monkeys aged 5-8 years with regular menstrual cycles were selected as oocyte donors. Zygotes were cultured in embryo culture medium-9 (HECM-9) containing 10% fetal calf serum (Hyclone Laboratories) in 37°C incubator supplying 5% CO2 until blastocyst stage. Blastocysts were then used for embryo transfer or post-implantation in vitro culture.
To culture blastocyst beyond implantation stage, we applied an optimized protocol based on the human embryo culture protocol from Zernicka-Goetz's group [46]. Frozen NHP blastocysts were thawed right before culturing by using the Thawing Media from Kizatato and cultured in blastocyst culture medium for at least 4 hours to recover. Blastocysts were then treated with Acidic Tyrode's solution (Sigma) to remove the Zona pellucida and transferred to an ibiTreat 8-well μ-plate (Ibidi) with 300 µL of pre-equilibrated in vitro culture medium 1 (IVC1). On the second day, 150 µL of IVC1 was carefully removed and 200 µL pre-equilibrated IVC2 was added. Blastocyst growth was monitored and medium was changed every two days until termination of experiments.

Culture of NHP embryonic stem cells
The in-house generation of NHP embryonic stem cells (ESC) was performed according to an established human embryonic stem cell protocol [47]. Briefly, MitC-treated mouse embryonic fibroblast (MEF) feeder cells were seeded at a density of 4 x 10 4 per cm 2 . Media was changed to ESC medium before use. After removing the Zona pellucida the ICSI generated blastocysts were seeded on top of the prepared feeder cells and cultured without disturbance for three days in CO2 incubator at 37°C. Fresh ESC medium was changed daily until primary ESC colonies formed. Colonies were cut into small pieces and passaged on fresh MEF feeder cells with ESC medium. Subsequently, ESC colonies were passaged with Collagenase type IV (STEMCELL Technologies) and seeded on fresh MEF feeder cells.

ISL1-null hESCs was performed on HES-3 cells by applying CRISPR/Cas9 with the same guide
RNAs used in NHP blastocysts. Two ISL1 knockout cell lines were generated, named ISL1_ko_c15 and ISL1_ko_c51, and genotyped (Fig. S3c). All cell lines were authenticated as karyotypically normal by Cell Guidance Systems (United Kingdom) (Fig. S5a). Mycoplasma contamination test was performed regularly as negative (EZ-PCR Mycoplasma Detection Kit, Biological Industries). hESCs were maintained in a standard feeder-free culture system using mTeSR1 medium (STEMCELL Technologies) on 1% Matrigel (Corning) or Essential 8 medium (Thermo Fisher Scientific) on 1% Vitronectin (Thermo Fisher Scientific). Cells were passaged every 4-5 days and visually examined during each passage to ensure absence of spontaneously differentiation. Work with human embryonic stem cells was carried out according to Swedish legislation following the recommendations of the Swedish National Council on Medical Ethics.

NHP embryo transfer and pregnancy diagnosis
Embryos with high quality were transferred into the oviducts of the matched recipient monkey as described in previous study [48]. A total of 27 female monkey recipients with proper hormone level of β-estradiol and progesterone were used as surrogate recipients. Each recipient received 2-4 blastocysts. The pregnancy was primarily diagnosed by ultrasonography at 2-3 weeks after embryo transfer. Clinical pregnancy and the number of fetuses were confirmed by fetal cardiac activity and the presence of gestation sacs. When terminating pregnancy, caesarean section was performed. Tissue from umbilical cord, ear and tail was collected for genotyping.

Generation of ISL1 hypomorphic mutant NHP embryos
NHP zygotes were injected with mix of Cas9 protein and guide RNAs. Intracytoplasmic injections were performed with a Nikon microinjection system under standard conditions. The embryos were cultured in HECM-9 supplemented with 10% fetal calf serum (Hyclone Laboratories) in 37°C incubator supplying 5% CO2. Genetic modified embryos with high quality from morula to blastocyst stage were used for further studies.

Reads mapping, gene expression counting and correction
Sequencing data was aligned and quantified by using the Cell Ranger Pipeline v3.1.0 (10x Genomics) against the ensemble genome build Macaca_fascicularis_5.0 release 96 [49].
Ambient RNA contamination was estimated through the levels of choriogonadotropins expression in epiblast (POU5F1 positive) cells and removed from the count matrix using SoupX [50]. A gene was retained for analysis if it showed expression in at least 3 cells. Each sample was filtered based on expression level of mitochondrial genes (below 7.5%) and number of expressed genes. Details on the estimated contamination in each sample, the filtering criteria and number of cells retained for the analysis are provided in Fig. S1a.

Reads mapping and gene expression counting of in vivo dataset
The raw, archived single cell RNA sequencing data from in vivo cynomolgus embryos [13] was downloaded from the GEO database (GSE74767) and processed using TrimGalore v0.6.1. The reads passing quality control were aligned against the ensemble genome build macaca_fascicularis_5.0 release 96 using STAR v2.5.3 and counted using featureCounts v1.5.2.
Cells expressing at least 1000 genes were kept for the integration with our dataset. Graph was constructed on the UMAP embedding by calling the FindNeighbors() function followed by the identification of clusters using the FindClusters() function, both part of the Seurat package. In some samples this clustering approach separated large, homogeneous cell groups into small sub-clusters with no distinct biological meaning. In these cases, the clusters were re-combined manually and both, the unsupervised and the manually adjusted clustering, was reported in the manuscript.

Data integration, dimensionality reduction and clustering
To integrated the single cell RNA sequencing data from in vivo cynomolgus embryos [13] with our dataset we combined the three time points (Day 10, Day 12 and Day 14) from our dataset for each batch separately and did the same for the in vivo data from embryonic day 8 (E08), E09, E13 and E14 resulting in three separate datasets. Normalization, Scaling and PCA was performed separately on each of these datasets after which they were combined using the reciprocal PCA approach described above based on 30 dimensions and 2000 anchor features.
For the analysis of the wildtype and ISL1 hypomorphic mutant embryos the different batches were integrated separately for each day by the same reciprocal PCA approach outlined above based on 30 dimensions and 5000 anchor features using the wildtype datasets as reference.
Dimensionality reduction and clustering was performed as described above.

Differential gene expression analysis
Mainly due to the differences in cell numbers we observed a significant variation in sequencing depth between samples in our dataset (Fig. S1). It has recently been shown, that the effect of differences in read depth on differential gene expression analysis can be minimized by using regularized negative binomial regression as implemented in the R-package SCtransform [53]. Thus, all differential gene expression analysis was performed using a t-test on Pearson residuals after SCtransformation of the raw, filtered counts of the integrated Seurat object as implemented previously [53]. Gene expression data depicted throughout the manuscript in feature plots or violin plots are SCtransformed data. Expression data depicted in heatmaps are scaled, log-transformed expression values normalized to the total counts for each cell calculated through running the NormalizeData() function followed by the ScaleData() function from the Seurat package.

Visualization of gene signatures
Scoring and visualization of gene signatures was performed using the Single Cell Signature Explorer v3.1 [54]. Gene signatures were created by identifying orthologues for the genes that have been previously described to mark the naïve and primed state of pluripotency in human embryonic stem cells [15] in the Macaca fascicularis genome using the according orthologue list provided from Ensemble through BioMart [55].

Pseudotime analysis
Pseudotime analysis was performed using Monocle3 v0.2 [56][57][58]. The principal graph was learned on the UMAP embedding extracted from the integrated Seurat object. Differentially expressed genes were calculated on the raw, filtered count matrix extracted from the integrated Seurat object using the Moran's I test implemented in the graph_test() function from the Monocle3 package. The genes were ranked according to their Moran's I and the top 100 genes were selected for display in the heatmap.

Gene regulatory network analysis
Gene regulatory network (GRN) analysis was performed using the R-package SCENIC (Single The raw, filtered count matrix extracted from the integrated Seurat object was pre-filtered and genes with at least 39 counts, equal to at least 3 UMI counts in 1% of the cells, present in at least 13 cells, equal to 1% of the cells, were used as input for the CLI. The human motif collection v9 and the cisTarget databases for hg38 were used in the pipeline and downloaded from https://resources.aertslab.org/cistarget/. Thresholds used for binarization were derived from the AUC values using Hartigan's Dip Test (HDT). After binarization, regulons showing activity in at least 1% of the cells were included in the downstream analysis.

Genotyping
Genomic DNA was extracted by Phenol-Chloroform method. DNA fragment covering both guide RNA target sites were PCR amplified and ligated to TOPO TA cloning vector (Thermo Fisher Scientific). At least 50 bacteria clones per sample were picked for Sanger sequencing and used to estimate the genomic mutation rate. The transcriptomic mutation rate of ISL1 hypomorphic mutants was also calculated. cDNA libraries of each scRNA-sequencing sample 14 were used to amplify the ISL1 mRNA fragment covering both guide RNA target sites. PCR products were ligated into TOPO TA cloning vector (Thermo Fisher Scientific). At least 50 clones per cDNA library sample were picked and performed Sanger sequencing.

Off-target assay
Cas-OFFinder was applied to search for potential off-target sites with maximal two mismatches and two bulges [31]. Among all off-target candidates of both gRNAs, targets located on gene exons were selected for test. The DNA fragments of target sites were PCR amplified and the sequences were confirmed by Sanger sequencing.

RNA extraction and quantitative real-time PCR
Total mRNA was extracted by Direct-zol RNA miniprep kits (Zymo) and reverse transcription to cDNA library was prepared by GoScript Reverse Transcriptase (Promega). Quantitative realtime PCR was performed by PowerUp SYBR Green Master Mix (Thermo Fisher Scientific) on ABI 7500Fast machine.

Transwell assay
The transwell assay was performed based on previous work by Zheng and colleagues [9]. In brief, it was performed on Transwell 12-well plates with permeable polyester membrane inserts (0.4 μm, Corning). The membrane inserts were coated with 1% Geltrex diluted in DMEM/F12 for 1 hour before use. hESCs were collected and re-suspended in culture medium containing Y-27632 (10 μM, Tocris), and seeded onto the membrane insert at a density of 3 x 10 4 cells per cm 2 . Eighteen hours after seeding, culture medium was changed to E6 medium supplemented with bFGF (20 ng/mL, R&D System) and BMP4 (50 ng/mL, PeproTech) and cultured for 48 hours. Undifferentiated hESCs were collected, re-suspended in E6 supplemented with bFGF (20 ng/mL) and seeded at a density of 9 x 10 4 per well on freshly coated 12-well plates. The membrane inserts were washed with E6 + bFGF and transferred on top of the re-seeded hESCs.
Cells were collected after 48 hours for analysis. Two wildtype hESC-lines (HES-3 and H9) and two ISL1-null lines were used in this assay. Both of the wildtype cell lines showed comparable results, as did the two ISL1-null lines.

Primitive streak induction from hESCs
Differentiation of hESCs to primitive streak-like cells was done in chemically defined media (CDM) as previously described [32]. In brief, posterior primitive streak was induced by the addition of bFGF (20 ng/ml, R&D System), the phosphoinositide 3-kinases (PI3K)-inhibitor LY294002 (10 µM, Tocris) and BMP4 (10 ng/ml, PeproTech). Anterior primitive streak was induced with the same factors and, additionally, Activin A (50 ng/ml, PeproTech). After 40h cells were harvested. RNA extraction, reverse transcription and quantitative real-time PCR were performed as detailed below with 200 ng RNA as input for RT-reaction. All experiments were performed in at least biological triplicates.

Microfluidic assay
This assay was performed as previously described [35]. Briefly, the microfluidic device is fabricated by bonding a PDMS structure layer to a coverslip. Geltrex is diluted to 70% using E6 medium and loaded into the central gel channel separated from the side channels by trapezoid-shaped supporting posts. Upon gelation, Geltrex matrix would generate concave Geltrex pockets between supporting posts for cell seeding. hESCs suspended in mTeSR1 medium was introduced into the cell loading channel and allowed to settle and cluster in the gel pockets. After hESCs cluster formation, mTeSR1 medium was replaced by a basal medium (E6 and 20 ng/mL bFGF), and 20 ng/mL BMP4 was supplemented only into the cell seeding channel. After 18 hours of BMP4 stimulation, the BMP4 medium was replaced by the basal medium. The microfluidic devices were fixed at 48 hours since the hESCs clusters were exposed to BMP4.

Immunohistochemistry
Immunohistochemistry of cells from the transwell assay was performed following standard procedures. Briefly, cells were fixed in 2% paraformaldehyde for 30 minutes at room temperature and washed with PBS. Cells were blocked in blocking buffer (serum diluted in PBS with 0.1% Triton X-100) for one hour and then incubated with primary antibodies diluted in blocking buffer overnight at 4°C. Cells were washed with PBS supplemented with 0.1% Tween-20 (PBS-T) and incubated with secondary antibodies diluted in blocking buffer for 2 hours at room temperature. After incubation, secondary antibodies were washed off by PBS-T, and the samples were mounted for imaging. Staining of embryonic-like sac structure was performed as previously described [35]. Confocal micrographs were acquired by Zeiss 700 LSM confocal microscope or Olympus spinning-disc confocal microscope (DSUIX18) equipped with an EMCCD camera (iXon X3, Andor). The bright-field morphologic images of embryonic-like sacs were acquired by Zeiss Observer.Z1 microscope equipped with a monochrome CCD camera (AxioCam, Carl Zeiss MicroImaging). Images were analyzed by iMaris (Bitplane).

16
Values are shown as the mean value plus SEM. Continuous data was analyzed using student's t-test. P-values or adjusted p-values (where appropriate) below 0.05 were considered statistically significant. Details on the samples (e.g. number of biological replicates) are indicated in figure legends. Graphs were generated using Prism or R.

Data and code availability
The raw data, unfiltered count matrix and processed count matrix are deposited in the Gene Expression Omnibus (GEO) database with the accession number GSE148683 and will be publicly released upon publication. All code is available from the authors upon request.