High-Resolution Single-Cell Models of Ensemble Chromatin Structures during Drosophila Embryogenesis from Population Hi-C

1 We describe a computational method for reconstruction of large 3D en2 semble of single-cell chromatin conformations from population Hi-C. Our 3 method identifies specific interactions constituting a small fraction (5–6% 4 in Drosophila) of measured Hi-C frequencies. Yet, surprisingly, they are 5 sufficient to drive chromatin folding, giving rise to much of observed Hi6 C topological features. Our models reveal dramatic changes in chromatin 7 compaction across three developmental stages of Drosophila, with >50% of 8 single cells likely maintaining TAD-like structures, possibly even in early 9 embryos when no population TADs are discernable. Our models reveal de10 tails on how varying single-cell domain boundaries become fixated through11 out development, with strong preferred-positioning at binding sites of in12 sulator complexes during and after midblastula transition. Overall, our 13 method can be used to re-interpret population Hi-C by generating ensem14 bles of single-cell chromatin conformations at high resolution, facilitating 15 better understanding of genome organization. 16


Introduction
Overall, results presented in this study demonstrate that our method can be 89 used to re-interpret population-based Hi-C measurement. Through transforma- each cell type (see Methods). 110 We estimate the random contact probabilities of pairs of loci by counting the 111 frequency of 3D conformations in which the spatial distance of the corresponding 112 pair of beads is within a Euclidean threshold of 80 nm [27,28]. We evaluate the there are 2.28×10 6 out of 42.39×10 6 (5.4%), 2.03×10 6 out of 40.07×10 6 (5.1%), 121 and 2.20×10 6 out of 34.98×10 6 (6.3%) Hi-C interactions that are found to be 122 specific for embryos at cycles 9-13, stages 5-8, and S2R+, respectively.

137
Another known long-range loop interaction in late embryos is between gene Scyl 138 and chrb [33,34]. This is also identified as a specific interaction (Supplementary 139 Figure S2b).   Lower-left triangles represent all Hi-C interactions, upper-right triangles represent the identified specific interactions. Cell types from top to bottom are embryos at cycles 9-13, embryos at stages 5-8, and S2R+, respectively. (b) A virtual 4C plot of the distribution of specific interactions in a 1.86 Mb region of embryos at cycles 9-13. Red bar represents the anchor which contains the gene Bsg25A, green bar represents a specific interaction that targets gene slam. (c) Pie charts of percentages of different types of specific interactions in the three cell types. A: active, I: inactive, P: polycomb-repressed. I-I interactions increase from 46.5% to 60.9% and then to 66.1%, P-P increases from 1.8% to 2.8% and then to 4.9%, while A-I decreases from 29.8% to 19.7%, then to 12.8%, and A-P decreases from 3.8% to 2.3%, then to 1.4%. (d) Percentages of four interaction types of specific interactions (left) and all nonzero Hi-C interactions (right). These four types are A-I, A-P, I-I and P-P, respectively.
Clear biological pattern emerging from identified specific The predicted specific interactions also offer clearer biological patterns of genomic 143 contacts, which would otherwise be obscure. This is illustrated by the nature of To assess the roles of specific interactions, we ask whether they can drive chro-166 matins to fold into conformations as measured in Hi-C studies. To ensure our 167 conclusion is general, we examine 10 genomic regions of varying lengths (200 kb-2 168 Mb, Supplementary Table S1). 169 We construct 3D ensembles of single-chain chromatin conformations at high  Figure S1).

173
For each region, we construct a proper ensemble of 50,000 single-chain confor-174 mations. To estimate contact probability, we take the minimalistic assumption 175 that DNA fragments in close proximity are available for Hi-C ligations. The con-176 tact probabilities are then taken as the proportions of 3D conformations in which 177 the spatial distance of the specific pair of loci of interests is within an euclidean 178 distance threshold (80 nm following ref [27,28]). We then aggregate single-chain

188
We then construct an ensemble of 3D chromatin conformations for the whole 189 chromosome X of S2R+ cells at 5 kb resolution, using 4,485 beads and only 6.1% Hi-C contact maps are very similar (r = 0.94 and r = 0.64).

193
Overall, these results show that using only 5-6% of measured Hi-C contacts  Although ensembles generated using all Hi-C constraints and using only spe-   Figure S4d).

281
Overall, these results show our models enable detection of important 3D struc- with studies of super-resolution imaging [19][20][21]. Surprisingly, we also detect 328 strong presence of TAD-like structures in many single-cells of early embryos at      We assume regions that are in close proximity are available for Hi-C ligation.

509
Contact probabilities are then calculated as the proportions of single-cell confor-510 mations satisfying the distance requirement, namely, the distances of the pair of 511 loci of interests are within a threshold, which is the longest distance for ligation.

512
Constructing physical null model of chromatin chains 513 We generate random chromatin polymer chains using a novel Monte Carlo approach (see Supplementary Methods) and construct an ensemble of 2×10 5 random polymer chains within a defined space for each type of cell (Figure 1a). They are used as our null model to estimate contact probabilities p null of random collisions which lead to non-specific Hi-C interactions. p null were defined as where I (k) (i, j) is an indicator function of 1 if the distance between i and j in the 514 k-th chain is < d c , with d c = 80 nm [27,28], N is the total number of polymer 515 chains, and w (k) is the importance weight of the k-th chain for bias-correction due 516 to deviations of the sampling distribution from the target uniform distribution in 517 our null model. , with P (H) being a constant. We generate chromatin 542 polymers sequentially: where n is the length of each chain, H t the selected Hi-C propensities p obs (x 1 , x t ), t , · · · , X (N ) t 545 at the step t, each chain consists of (t − 1) beads that are previously placed and 546 a newly generated bead x     interactions increase from 46.5% to 60.9% and then to 66.1%, P-P increases from 1.8% to 2.8% and then to 4.9%, while A-I decreases from 29.8% to 19.7%, then to 12.8%, and A-P decreases from 3.8% to 2.3%, then to 1.4%. (d) Percentages of four interaction types of specific interactions (left) and all nonzero Hi-C interactions (right). These four types are A-I, A-P, I-I and P-P, respectively.
Specific Hi-C Non-specific All