High-resolution mapping of transcription factor binding sites on native chromatin

A method to map transcription factor binding across the genome, at high resolution and without cross-linking. Sequence-specific DNA-binding proteins including transcription factors (TFs) are key determinants of gene regulation and chromatin architecture. TF profiling is commonly carried out by formaldehyde cross-linking and sonication followed by chromatin immunoprecipitation (X-ChIP). We describe a method to profile TF binding at high resolution without cross-linking. We begin with micrococcal nuclease–digested non-cross-linked chromatin and then perform affinity purification of TFs and paired-end sequencing. The resulting occupied regions of genomes from affinity-purified naturally isolated chromatin (ORGANIC) profiles of Saccharomyces cerevisiae Abf1 and Reb1 provide high-resolution maps that are accurate, as defined by the presence of known TF consensus motifs in identified binding sites, that are not biased toward accessible chromatin and that do not require input normalization. We profiled Drosophila melanogaster GAGA factor and Pipsqueak to test ORGANIC performance on larger genomes. Our results suggest that ORGANIC profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-DNA interactions.

sequence-specific dnA-binding proteins including transcription factors (tFs) are key determinants of gene regulation and chromatin architecture. tF profiling is commonly carried out by formaldehyde cross-linking and sonication followed by chromatin immunoprecipitation (X-chiP). We describe a method to profile tF binding at high resolution without cross-linking. We begin with micrococcal nuclease-digested non-cross-linked chromatin and then perform affinity purification of tFs and paired-end sequencing. the resulting occupied regions of genomes from affinity-purified naturally isolated chromatin (orGAnic) profiles of Saccharomyces cerevisiae Abf1 and reb1 provide highresolution maps that are accurate, as defined by the presence of known tF consensus motifs in identified binding sites, that are not biased toward accessible chromatin and that do not require input normalization. We profiled Drosophila melanogaster GAGA factor and Pipsqueak to test orGAnic performance on larger genomes. our results suggest that orGAnic profiling is a widely applicable high-resolution method for sensitive and specific profiling of direct protein-dnA interactions.
Sequence-specific DNA-binding proteins including TFs reside atop the eukaryotic regulatory hierarchy and functionally interpret signals encoded in the genome to control transcription, modulate chromatin structure and ultimately shape cellular identity. As a result, comprehensive mapping of genomic loci engaged by regulatory factors is of great interest. ChIP is the most widely used method for profiling genomic targets of DNA-binding proteins. In most ChIP protocols, protein-DNA interactions are fixed by formaldehyde treatment before sonication of chromatin and immunoprecipitation of the resulting fragments (X-ChIP) 1 . After cross-link reversal, immunoprecipitated DNA can be analyzed by microarray hybridization (X-ChIP-chip) or high-throughput sequencing (X-ChIP-seq) 2,3 .
Although X-ChIP methods have played a central role in interrogating protein binding genome wide, they have limitations stemming from cross-linking and sonication 4,5 , and recent work has uncovered systematic biases in these methodologies [6][7][8][9][10] . Formaldehyde cross-linking can cause epitope masking and complicate subsequent immunoprecipitation. Although formaldehyde cross-linking identifies direct protein-DNA interactions 11  the preferential formation of protein-protein cross-links by formaldehyde 12 may lead to the identification of DNA binding events that represent indirect or transient protein-DNA interactions, particularly in highly transcribed regions 5,10,13 . The limited resolution of X-ChIP-seq was recently addressed by the development of ChIP-exo, which uses exonuclease digestion of cross-linked, sonicated chromatin to achieve single-base resolution 14 .
ChIP of native chromatin (N-ChIP) is not associated with epitope masking or protein-protein cross-linking and can be used with small amounts of input chromatin 5,15 . N-ChIP has been applied to histones 5 and nonhistone proteins, including RNA polymerase II, TFs and chromatin remodelers [16][17][18][19] . We therefore sought to determine whether N-ChIP could produce high-resolution maps of sequence-specific protein binding sites. We previously demonstrated that micrococcal nuclease (MNase) digestion of native chromatin followed by paired-end sequencing (MNase-seq) can footprint both nucleosomal and subnucleosomal particles protecting as little as ~25 bp with single-nucleotide resolution 20 . We recently used this method in conjunction with N-ChIP to yield ORGANIC profiles of chromatin remodeler binding 19 . However, application of this approach to profiling sequence-specific DNAbinding proteins had not been investigated.
Here we applied ORGANIC profiling to generate genome-wide high-resolution maps of TF binding, which can be combined with fragment length-versus-midpoint plots (V-plots) 20 to produce single-base-resolution profiles. We applied the approach to the structurally distinct S. cerevisiae TFs ARS binding factor 1 (Abf1) and rDNA enhancer binding protein 1 (Reb1) as well as D. melanogaster GAGA-binding factor (GAF) and Pipsqueak (Psq). We identified more Abf1 and Reb1 binding sites than have been previously published, and we show high accuracy in the detection of consensus motifs within TF binding sites.

results orGAnic profiles of reb1 and Abf1 binding sites
We performed MNase digestion of non-cross-linked intact nuclei from S. cerevisiae strains expressing Reb1-Flag and Abf1-Flag; solubilized chromatin by needle extraction; and immunoprecipitated tagged TFs with buffer containing 80, 150 or 600 mM NaCl ( Fig. 1a and Supplementary Fig. 1). Under these conditions, direct protein-DNA interactions are generally stable (see Discussion). We then prepared TF-bound and input DNAs for paired-end sequencing using a modified library preparation protocol 20 (Fig. 1a, Supplementary Fig. 1 and Online Methods). We found that, consistent with the fact that TFs have small footprints, small fragments were enriched in Reb1 ChIP relative to input (Supplementary Fig. 2a), and we therefore profiled the <100-bp ('len50' , to designate the median fragment length of 50 bp) size class.
The Reb1 immunoprecipitated fractions showed sharp peaks over a negligible background relative to the corresponding input chromatin (Fig. 1b). Similar peaks were identified when fragments were not filtered by size (Supplementary Fig. 2). Interestingly, the len50-size class inputs showed strong peaks corresponding to Reb1 binding sites seen in the immunoprecipitated samples, though at a lower level of occupancy. In the input, we observed highly occupied peaks not corresponding to Reb1 binding sites in intergenic regions (Fig. 1b). With increasing salt concentration, there was a dramatic reduction in both the number and dynamic range of ORGANIC peaks (Fig. 1b), a result consistent with disruption of relatively weak electrostatic TF-DNA interactions at low-affinity sites. Some but not all ORGANIC peaks corresponded to Reb1 binding sites previously identified by X-ChIPchip 21 and ChIP-exo 14 (Fig. 1b). Similar results were obtained for Abf1 when compared to X-ChIP-chip data ( Supplementary  Fig. 3). For both Abf1 and Reb1, we observed a high degree of overlap between sites detected at different extents of MNase digestion ( Supplementary Figs. 2-4).
Using a peak-calling algorithm with a conservative threshold, we identified 1,992 ORGANIC peaks in the Reb1 len50-size class 80 mM (low-salt) experiment ( Fig. 2a and Supplementary Table 1). The low-salt ORGANIC Reb1 sites included 204 (83.3%) of previously published X-ChIP-chip 21 and 935 (52.6%) ChIP-exo sites when both primary and secondary ChIP-exo sites were considered ( Fig. 2c and Supplementary Table 1). When primary and secondary sites were considered independently, we found that the majority of the ChIP-exo sites overlapping ORGANIC sites were primary sites (Supplementary Fig. 5). Low-salt Abf1 ORGANIC peaks included 162 of 278 (58.3%) X-ChIP-chip peaks, whereas 600 mM (high-salt) Abf1 ORGANIC peaks identified more total sites (1,258), including 214 (of 278) sites also identified by X-ChIP-chip (Fig. 2b,d and Supplementary Table 1). The ORGANIC Reb1 and Abf1 motifs obtained by de novo motif discovery with the MEME algorithm 22 matched those reported in previous studies 14,21 (Fig. 2a,b and Supplementary Table 2).
We characterized the reproducibility of our method by performing pairwise comparisons of positions and occupancies of peaks called using independent biological replicates and from peak sets using varying salt concentrations, and we found that data sets were well correlated (R 2 = 0.80-0.95; Supplementary  Fig. 4). Occupancies at Reb1 sites called by ChIP-exo (primary and secondary sites) and ORGANIC profiling were poorly correlated (R 2 < 0.05; Supplementary Fig. 5b). We conclude that ORGANIC profiling reproducibly captures a large fraction of previously published Abf1 and Reb1 binding sites while identifying 2-to 8-fold more sites than other methods. orGAnic profiles are sensitive and specific Given the strong sequence specificities of Abf1 (ref. 23) and Reb1 (ref. 24), we evaluated the accuracy of ORGANIC profiling by using the presence of a MEME-derived motif within a peak region as the 'gold standard' for classifying a peak as a true positive. Strikingly, 99.3% of low-salt Reb1 sites contained the TTACCCG motif (Fig. 2a). The percentage of peaks containing the motif decreased to 61.5% at high salt (Fig. 2a). We observed that 59.6% of all (primary and secondary) ChIP-exo sites were associated with a Reb1 motif; of the primary sites, ~92% contained motif matches 14 . We estimate a false negative rate of ~0.5% for ORGANIC profiling at Reb1 motifs (Online Methods). In contrast to Reb1 ORGANIC sites, the 1,066 low-salt Abf1 ORGANIC sites contained a percentage of peaks with motifs (63.3%) smaller than that of sites from the high-salt extraction (93.7%; Fig. 2b).
In order to evaluate the specificity of ORGANIC profiling, we determined how well peak sequences matched consensus motifs by scoring peaks using MEME-derived position-specific scoring matrices (PSSMs). Using the Reb1 ORGANIC PSSM (Supplementary Table 2), we found a distribution of high motif scores (defined as true positives; Online Methods) with no strongly negative scores at low salt (Fig. 3a). When the salt concentration was increased to 150 mM and 600 mM, we observed  npg a graded reduction in the fraction of Reb1 sites with high scores and the appearance of strongly negative scores, results that gave a bimodal distribution. In comparison, ChIP-exo Reb1 sites included a high number of negative calls and showed a motif score distribution similar to that of the high-salt ORGANIC Reb1 data set (Fig. 3a). A similar trend was obtained using the ChIPexo-derived PSSM (Supplementary Fig. 6).
Abf1 motif scores were also narrowly distributed (Fig. 3b), and, as expected from the increase in Abf1 motif-containing peaks at high salt, we observed a reduction in false positive calls and an increase in true positive calls at 600 mM salt (Fig. 3b). Given the structural differences in the Reb1 and Abf1 DNA-binding domains 25,26 , it is likely that optimal extraction and ChIP conditions for the proteins differ, explaining the differential specificity of ORGANIC profiles across varying salt concentrations and different DNA-binding proteins. The increase in apparent specificity, with >90% of Abf1 peaks containing the Abf1 motif at higher salt concentrations despite reduction in dynamic range (Supplementary Fig. 3), suggests that experimental parameters can be tailored to the DNA-binding protein of interest. orGAnic sites display dnase i footprints and are conserved In order to further assess whether ORGANIC sites are bound in vivo, we used published S. cerevisiae DNase I-seq data 27 to determine whether sites detected by ORGANIC profiling are associated with DNase I footprints indicative of in vivo occupancy [27][28][29] . For both Reb1 and Abf1, average DNase I-seq profiles at ORGANIC sites showed characteristic footprints (Fig. 4a,b). In contrast, average DNase I-seq tag counts at all (primary and secondary) ChIP-exo sites did not show a footprint (Fig. 4c) except at the primary ChIP-exo Reb1 sites, which overlap substantially with ORGANIC sites (Supplementary Fig. 5). We found that DNase I footprint depth, which is correlated with in vivo occupancy 27,29 , corresponds well with Reb1 and Abf1 ORGANIC site occupancies (Supplementary Figs. 7 and 8). These results suggest that ORGANIC sites are occupied in vivo and also that relative occupancies determined by ORGANIC profiling are quantitatively correlated with in vivo binding.
To test the possibility of TF redistribution during our protocol, we mixed equal numbers of isolated Drosophila S2 cell nuclei and Reb1-Flag budding yeast nuclei and performed ORGANIC profiling at 150 mM salt. We expected that, if redistribution occurred, the ~300-fold excess of Drosophila sequences with Reb1 binding sites would be enriched in the ChIP fraction. However, we detected only a background level of Drosophila DNA in the ChIP fraction relative to the input (Fig. 4d). We observed good correlation between Reb1 sites detected in replicates of the mixing experiment and, consistent with stable binding under conditions used in ORGANIC profiling, a high level of correlation (R 2 = 0.995) between occupancies detected in experiments with mixed and unmixed nuclei (Supplementary Fig. 9a,b). The motif score distribution of Drosophila Reb1 peaks was dominated by negative scores (Supplementary Fig. 9c). These analyses suggest that Reb1 does not shift detectably from yeast to Drosophila chromatin during chromatin preparation and ChIP.
We also considered the validity of identified binding sites by analyzing the correlation between motif score and occupancy of Reb1 and Abf1. At low salt, we observed poor correlation between occupancy and motif score (R 2 < 0.1; Fig. 4e). A similar relationship between occupancy and motif score was observed at high salt and with ChIP-exo Reb1 sites (Supplementary Fig. 10  npg this analysis also suggests that TFs do not redistribute to thermodynamically favored binding sites during ORGANIC profiling. We expected that the new Abf1 and Reb1 sites that we identified would show evolutionary conservation above background levels because conservation of TF binding sites implies purifying selection 31 . We plotted phastCons scores 32 , which represent the probability that a given base is in a conserved region, in windows centered at Reb1 or Abf1 sites (Fig. 4f,g). Interestingly, we observed increased conservation of new sites at motif positions relative to background (Fig. 4f,g and Supplementary Fig. 5). In general, new sites had either higher or comparable conservation scores when compared to all sites detected by X-ChIP-chip or ChIP-exo (see Supplementary Fig. 5 for conservation analysis of primary and secondary sites considered separately). We detected, consistent with a role for Abf1 and Reb1 in positioning flanking nucleosomes at a subset of promoters 24,33 , well-positioned flanking nucleosomes around Reb1 and Abf1 sites (Fig. 5a,b). Because virtually all of the ORGANIC Reb1 sites have a Reb1 binding motif, we asked whether there is a difference in motif strength or TF occupancy that could explain differential phasing of nucleosomes. We ranked ORGANIC sites by nucleosome occupancy and considered the top and bottom 200 sites, which corresponded to sites with relative nucleosome occlusion and depletion, respectively (Fig. 5a,b and Supplementary Fig. 11). We detected no substantial difference in MEME-derived motifs between the two groups ( Fig. 5a,b): a result suggesting that the degree of nucleosome phasing is not associated with motif strength.
orGAnic profiles do not prefer accessible chromatin Sequencing formaldehyde-cross-linked and sonicated chromatin (Sono-Seq) is known to preferentially recover regions of accessible chromatin 9 . We performed Sono-Seq and generated profiles of average normalized Sono-Seq counts in 2-kb windows centered at all reported Reb1 binding sites determined by different methods.
Strikingly, ChIP-exo Reb1 sites showed enrichment for Sono-Seq reads, whereas there was virtually no enrichment at ORGANIC or X-ChIP-chip Reb1 sites ( Fig. 5c and Supplementary Fig. 5f). Similarly, we detected no enrichment of Sono-Seq reads at ORGANIC or X-ChIP-chip Abf1 sites (Fig. 5d). Sono-Seq enrichment is consistent with the observation of increased DNase cleavage at ChIP-exo sites and comparatively lower cleavage at ORGANIC sites ( Fig. 4a-c). We obtained similar results using previously published Sono-Seq data (Supplementary Fig. 12) and using sensitivity to MNase digestion as an independent measure of chromatin accessibility (Supplementary Results). Analysis of genomic locations of binding sites revealed that we could detect binding events in generally inaccessible regions (Supplementary Results).
The Reb1 X-ChIP-chip study was performed using a two-color spotted microarray on which ChIP DNA was hybridized with DNA from an unenriched sample (input) 34 ; this input normalization procedure likely corrected for bias from preference for accessible chromatin. In contrast, input normalization is not performed with ChIP-exo, which likely explains the observed strong preference for binding sites in accessible chromatin. The ORGANIC accessibility profile is similar to input-normalized X-ChIP-chip even in the absence of input normalization. The absence of crosslinking and sonication steps could account for the insensitivity of ORGANIC profiling to the degree of chromatin accessibility.

orGAnic profiles of Drosophila transcription factors
We tested ORGANIC profiling on larger eukaryotic genomes by mapping GAF and Psq from Drosophila S2 cell active chromatin extracted with 80 mM salt 35 (Supplementary Fig. 1). In order to determine whether GAF is lost from the nucleus under native conditions, we used western blotting to monitor losses incurred during processing of nuclei during the ORGANIC and modENCODE (model organism Encyclopedia of DNA Elements) X-ChIP protocols. In both cases, ~15-20% of total GAF was lost (a,b) Histograms of motif scores determined using MEME-derived position-specific log-odds scoring matrices are shown for Reb1 (a) and Abf1 (b) binding sites. MEME-ChIP-derived motifs corresponding to each 1,000-unit log-odds motif score cohort are included above each histogram. Bins that contained either too few sites or sequences that did not produce a MEME-ChIP-derived motif are designated "N/A." Equivalent histograms for ChIP-exo Reb1 sites, including both primary and secondary sites, are shown in the bottom plot in a. Fig. 13; similar analyses could not be performed for Flag-tagged S. cerevisiae TFs for technical reasons). We observed enrichment for peaks in the len25 (1-50 bp) and len50-size class ChIP fractions relative to input for both GAF and Psq (Supplementary Fig. 14 and Supplementary Table 1). Enriched peaks were associated with DNase I-hypersensitive sites and were evolutionarily conserved, results suggesting that they represent bona fide in vivo sites (Supplementary Fig. 15). As seen with previous work demonstrating that GAF and Psq heterodimerize and act in concert at many loci 36 , we observed similar genome-wide profiles for the two TFs. Using the same peak-calling method and de novo motif analysis approach used to characterize yeast TF binding sites, we called 3,300 GAF and 957 Psq sites and recovered expected GAF repeat-containing motifs in 76.5% of GAF and 40% of Psq ORGANIC peaks (Fig. 6a). In contrast, X-ChIP-chip identified 4,567 GAF sites 37 , of which only ~5% had characteristic motifs. Therefore, as defined by the presence of known binding motifs, ORGANIC profiling is more specific than X-ChIP-chip by more than an order of magnitude for factor binding in the Drosophila genome.

npg (Supplementary
In addition to sites with the characteristic GAF motif, GAF is known from X-ChIP-chip data to bind TF 'hotspots' 37,38 , which are thought to reflect dynamic, low-affinity binding of multiple transcription factors 38 . Remarkably, TF hotspots were absent from GAF ORGANIC peak calls ( Fig. 6b and Supplementary Fig. 14). When we searched for the GAF consensus motif among TF hotspot regions, only 17.5% displayed a stringent motif, which could account for the absence of ORGANIC signals at these sites. We suggest that the dynamic binding of GAF at TF hotspots resulted in trapping of transiently bound GAF by formaldehyde cross-linking to these sites when X-ChIP was used. In contrast, ORGANIC profiling detects only sites that are stably bound under native extraction conditions.

npg discussion
We have shown that ORGANIC profiling identifies direct TFchromatin interactions at high resolution and with high specificity and sensitivity. De novo motif discovery revealed that the large majority of ORGANIC binding site calls have the expected consensus motif and correspond to DNase I footprints, suggestive of in vivo binding. Our study also demonstrated the flexibility of ORGANIC profiling in mapping genomic binding sites of proteins with structurally distinct DNA-binding domains from different species and showed that the specificity of ORGANIC maps can be modulated by varying salt concentration. Although it is not possible to rigorously infer function from genome-wide mapping studies, accurate high-resolution maps of TF binding occupancies serve as valuable resources for developing testable functional hypotheses. Native-chromatin profiling has been widely used for epigenome mapping in the context of methods such as DNase I-seq 27 and, more recently, the assay for transposase-accessible chromatin using sequencing 39 . However, a potential concern with nativechromatin profiling is that small-footprint DNA-binding proteins that are highly dynamic in vivo could redistribute during chromatin preparation and ChIP, which is a frequently cited rationale for cross-linking protein-DNA interactions with formaldehyde 5 . We argue that rearrangement of bound factors is unlikely to occur during ORGANIC profiling for a number of reasons.
First, conditions under which ORGANIC profiling is performed (Supplementary Fig. 1) differ substantially from those in the in vivo state and disfavor rearrangement. MNase digestion is performed at a salt concentration more than tenfold lower than that present in vivo. Given that salt competes for electrostatic protein-DNA interactions, low-salt conditions can functionally fix protein-DNA interactions in a noncovalent manner 40 . There is some evidence for a role for chromatin remodeling machinery in facilitating the high degree of in vivo TF dynamics 41 , but because there is no readily available ATP in the ORGANIC profiling protocol, it is unlikely that these active processes could contribute to TF rearrangement. The cold, dilute conditions under which ORGANIC profiling is performed also render it unlikely that an unbound factor will find its recognition sequence. For GAF we determined that, under the conditions used for ORGANIC profiling, TF loss at various points of the protocol is comparable to that observed with X-ChIP, consistent with stable TF binding under ORGANIC profiling conditions 42,43 .
Second, using MNase to fragment chromatin ensures that any accessible, unbound binding sites will be digested such that they are not available for engagement by free factors. Indeed, we showed for Reb1 and Abf1 that there is little change in binding sites detected over a fourfold digestion range. Third, sites identified by ORGANIC profiling are reflective of in vivo occupancy as determined by DNase I footprinting, which is also performed under native, low-salt conditions 27 . Fourth, inconsistent with a redistribution of factors to the most thermodynamically favorable motifs as inferred by motif strength, ORGANIC profiles show a wide range of occupancies for various motif scores. Fifth, ORGANIC sites are evolutionarily conserved, suggesting a possible functional role for some of these sites. Finally, there is no enrichment for Drosophila sequences in yeast TF ORGANIC profiling when Drosophila S2 nuclei and yeast nuclei are combined before MNase digestion, chromatin extraction and ChIP. The linear correlation between occupancies in yeast-only and mixed yeast-fly ORGANIC profiles suggests that ORGANIC profiling both qualitatively and quantitatively preserves characteristics of in vivo binding sites. Taken together, these data indicate that ORGANIC profiling captures direct TF binding events that are reflective of in vivo occupancy.
Interestingly, the chromatin accessibility bias inherent to some cross-linking and sonication-based methods is corrected by normalizing to input, but such input normalization is not generally done with a sequencing readout. Moreover, previously published input-normalized X-ChIP-chip maps were unable to detect some sites in inaccessible chromatin. In contrast, we detected no accessibility bias using ORGANIC profiling. This lack of bias obviates the need for input normalization, which is impractical for large genomes, for which the input library, unlike the ChIP library, must be sequenced at sufficient depth to provide wholegenome coverage.
The high signal-to-noise ratio for ORGANIC profiling means that it is relatively inexpensive to perform even for large genomes. Other advantages include the simple library preparation protocol that requires only a few nanograms of DNA and the precise fragment lengths obtained from paired-end sequencing that can be used to both produce base-pair-resolution maps of protein binding and deduce regional features around binding sites by V-plotting 20 . The successful application of ORGANIC profiling to DNA-binding proteins of different types, including nucleosomes, RNA polymerases, nucleosome remodelers and sequence-specific  online methods Yeast strains, cell growth, MNase digestion and chromatin immunoprecipitation. The S. cerevisiae ORGANIC protocol is summarized in Supplementary Figure 1a. W1588-4C S. cerevisiae strains (isogenic to W303-1A, except that a weak RAD5 mutation is repaired 44 ) carrying Flag-tagged Abf1 and Reb1 under the control of their respective endogenous promoters were generated as previously described 45 . S. cerevisiae cultures were grown in 500 mL YPD medium at 30 °C to OD 600 = 0.7. Nuclei were isolated as previously described 46 ; all centrifugation steps during the nuclei isolation, MNase digestion and chromatin preparation procedures were performed using a RC-5 centrifuge (DuPont). Briefly, cells were pelleted at 4,000 r.p.m. at 4 °C for 10 min using a GS-3 rotor (Sorvall; rotor diameter, 33 cm) and washed with deionized water. Cells were spheroplasted in resuspension buffer (1.2 M sorbitol, 100 mM potassium phosphate, pH 7.5, 0.5 M CaCl 2 , 0.5 mM β-mercaptoethanol) with 1 mg/ml Zymolyase. Spheroplasted cells were centrifuged at 3,000 r.p.m. at 4 °C for 10 min using a Fiberlite rotor (Thermo; rotor diameter, 20.8 cm) and were washed twice with SPC (1 M sorbitol, 200 mM PIPES, pH 6.3, 0.1 mM CaCl 2 ) with 1 mM phenylmethanesulfonyl fluoride (and 10 µg/ml each of leupeptin, pepstatin A and chymostatin (PMSF+LPC)). The pellet was resuspended in SPC without PMSF+LPC and mixed with Ficoll buffer (9% Ficoll 400, 20 mM PIPES, pH 6.3, 0.5 mM CaCl 2 ) and centrifuged at 8,000 r.p.m. at 4 °C for 10 min using a Fiberlite rotor. The resulting nuclei pellet was washed twice with SPC with PMSF+LPC and centrifuged at 6,000 r.p.m. at 4 °C for 10 min using a Fiberlite rotor, resuspended in 5 mL SPC with PMSF+LPC, snap frozen in liquid nitrogen and stored at −80 °C.
ORGANIC profiling of TFs was performed as previously described for chromatin remodeling enzymes 19 except that either 80, 150 or 600 mM salt was used for chromatin extraction (Supplementary Fig. 1a). Specifically, nuclei were thawed on ice and PMSF+LPC was added to thawed nuclei and subsequently prewarmed by incubation in a 37 °C water bath for 5 min. CaCl 2 was added to 2 mM, and 2 U MNase (Sigma-Aldrich) was added. Nuclei were mixed by inversion and incubated in a 37 °C water bath for 2.5 min or 10 min. MNase digestion was stopped by chelating divalent cations with 10 mM EDTA and incubation on ice for 5 min. Nuclei were disrupted and chromatin was solubilized by passing the slurry resulting from MNase digestion through a 20-gauge needle followed by a 26-gauge needle, four times each. The resulting chromatin was centrifuged at 10,0000 r.p.m. using a Fiberlite rotor at 4 °C for 10 min. The supernatant was removed and retained as fraction S1. The pellet was resuspended in 5 mL extraction buffer (70, 140, or 590 mM NaCl, 0.75 mM EDTA, 10 mM phosphate buffer, pH 7.4, 0.1% Triton X-100) and incubated on a nutator (TCS Scientific) for 4 h at 4 °C. Salt-extracted chromatin was then clarified by centrifugation at 13,000 r.p.m. at 4 °C for 15 min using a Fiberlite rotor. The clarified supernatant was retained as fraction S2, and the insoluble pellet was discarded. The salt and Triton X-100 concentrations of fraction S1 were adjusted to reflect the composition of fraction S2, and the two fractions were combined. For each ChIP, 100 µL of anti-Flag M2 magnetic beads (Sigma-Aldrich, cat. no. A2220) blocked with 0.5% BSA in PBS was added to the combined fractions. Immunoprecipitation was allowed to proceed overnight at 4 °C with agitation on a nutator. Magnetic beads were washed three times and then resuspended in IP wash buffer (10 mM phosphate buffer, pH 7.4, 0.75 mM EDTA, 70 mM NaCl) with PMSF+LPC, treated with 0.5 µg RNase A at 37 °C for 10 min, 20 µg proteinase K and 0.5% SDS at 70 °C for 10 min, and then extracted by phenol-chloroform and ethanol precipitation using 20 µg glycogen as a carrier, typically yielding ~60 ng of DNA.
Drosophila cell lines, cell growth, MNase digestion and immunoprecipitation. The D. melanogaster ORGANIC protocol is summarized in Supplementary Figure 1b. Centrifugation steps during the nuclei isolation stage of the protocol were performed in a Centra CL-2 centrifuge (Thermo), and all subsequent centrifugation steps were performed in a standard benchtop microcentrifuge. Drosophila S2-DRSC cells were grown in Schneider's medium with 10% FBS. For each ORGANIC sample, 6 × 10 8 logphase cells were collected by scraping, pelleted by centrifugation at 2,000g, washed once with PBS and resuspended in TM2+ buffer (10 mM Tris, pH 7.4, 2 mM MgCl 2 , 0.5 mM PMSF) to a concentration of 1 × 10 8 cells per mL. To release nuclei, 0.6% NP-40 was added, and cells were kept on ice for 4 min with occasional mild vortexing. Nuclei were pelleted at 1,000g, washed with TM2+, and resuspended in TM2+IC (TM2+ with 1× Roche EDTA-free protease inhibitor cocktail) to a concentration of 2.5 × 10 8 nuclei per mL. MNase treatment was administered as follows: nuclei were preheated at 37 °C for 3 min and then incubated for 6 min with 200 U of USB MNase (Affymetrix) in the presence of 1 mM CaCl 2 . Digestion was stopped with 2 mM EGTA, and nuclei were transferred to ice, pelleted by centrifugation at 1,000g, washed in TM2+IC with 2 mM EGTA and resuspended in 80T+IC buffer (70 mM NaCl, 10 mM Tris, pH 7.4, 2 mM MgCl 2 , 2 mM EGTA, 0.1% Triton X-100, 0.5 mM PMSF, 1× EDTA-free Roche Complete Protease Inhibitor Cocktail) to a final concentration of 5 × 10 8 nuclei per mL. Nuclei were cavitated ten times through a 26.5-gauge needle, and the chromatin solution was separated from the insoluble pellet by centrifugation (1,000g). Input chromatin was incubated with the appropriate amount of primary antibody overnight at 4 °C. The GAF rabbit polyclonal antibody 47 (a gift from G. Cavalli, Institut Génétique Humaine, CNRS) and the Psq AS2 rabbit polyclonal antibody 48 (a gift from C. Berg, University of Washington) were both used at 1:100 dilution. The chromatin solution was added to Dynabeads Protein G magnetic beads prewashed in 80T+ buffer and incubated for 2 h at 4 °C. Beads were washed and then resuspended in 80T+ buffer with 10 mM EDTA, treated with 0.5 µg RNase A at 37 °C for 20 min, 20 µg proteinase K and 0.5% SDS at 70 °C for another 10 min, and then extracted by phenol-chloroform and ethanol precipitation using 20 µg glycogen as a carrier, typically yielding ~20 ng of DNA.
Mixed budding yeast-Drosophila nuclei experiment. In the mixed S. cerevisiae-Drosophila S2 cell Reb1 experiment, nuclei were isolated as described above. An approximately equal number (as determined by cell count before nuclei isolation) of yeast and fly nuclei were mixed and digested with MNase for 10 min at 37 °C. Reb1 was then immunoprecipitated following the above described yeast protocol at 150 mM salt. Supplementary  Fig. 1c) and sequencing were carried out as previously described, except that for Tru-Seq adaptors, the ratio of sample to Ampure npg beads in the two clean-up steps was increased from 5:9 (ref. 20) to 1:1 to compensate for the longer adaptor lengths. Briefly, the standard Illumina library preparation protocol was followed, with the exception of omitting Qiagen purification and size-selection steps; using phenol-chloroform extraction followed by S-300 spin column cleanup to stop reactions; and using Ampure XP bead cleanup steps to remove excess adaptors and primers after ligation and PCR steps. Note that after ligation of Illumina Tru-Seq adaptors, 12-18 cycles of PCR amplification were performed before a final Ampure cleanup step.

Library preparation and sequencing. Preparation of pairedend sequencing libraries (schematized in
Cluster generation, followed by 25 rounds of paired-end sequencing (2 × 25 bp), was performed on the Illumina HiSeq 2000 platform (Fred Hutchinson Cancer Research Center Genomics Shared Resource). Processing of sequencing reads, base calling, and alignment to the yeast and Drosophila genomes were performed as previously described 20 . Briefly, after processing by Illumina Eland software, reads with 0-2 mismatches were aligned to the yeast genome using Novoalign with default parameters. Note that minimum read length is 25 bp on the basis of parameters used for sequencing. Reads aligning to multiple regions of the genome were assigned to one mappable location at random. Counts per base pair were normalized as previously described 49 with the fraction of mapped reads spanning each base-pair position multiplied by the genome size.
Note that Ampure cleanup steps exclude fragments below 90-100 bp in size, and the Illumina Tru-Seq adaptors add 66 or 67 bp to each fragment, indicating that inserts as small as ~25 bp can be retained in this modified library preparation protocol 20 . Although it is unlikely that TF footprints in the range of 10-20 bp can be mapped back with high confidence, we do not believe that the exclusion of fragments of this size introduces biases because of the way MNase digestion proceeds. MNase is an endo-/exonuclease that 'nibbles' on DNA and is therefore not highly processive. In the digestion conditions used in ORGANIC profiling, a range of fragments representing stochastic cleavages by MNase on either side of TF binding sites are produced. We have previously shown that mapping these fragments on midpoint vs. fragment-length plots (V-plots) allows the visualization and determination of the size of TF footprints at single-base-pair resolution 20 .
In contrast, X-ChIP methods typically rely on library preparation procedures that involve exclusion of inserts much larger than ~25 bp and/or sonication to an average fragment length of 250-500 bp. For example, ChIP-exo involves exclusion of inserts below ~100 bp and sequencing fragments that are approximately nucleosomal in size (~150 bp) 14 . These fragments are then mapped back to produce 'peak pairs' separated by a specified range of distances from which binding sites are determined. Given the processivity of lambda exonuclease, it is highly likely that many of the fragments produced by ChIP-exo are below the ~100-bp size cutoff. This may, in part, explain the very high input requirement for ChIP-exo 50 because many of the reads resulting from exonuclease digestion would be the size of the minimal fragment protected by the DNA-bound protein, thereby preventing their high-confidence mapping back to the genome.
Yeast data sets used for comparisons. Reb1 and Abf1 X-ChIP-chip data 21 and Reb1 ChIP-exo sites 14 were previously published. Yeast Sono-Seq data used for validation of Sono-Seq performed in this study were also previously published 9 .
Sono-Seq. Sono-Seq was performed as previously described 9 , and purified DNA was subject to paired-end sequencing as described above.
Peak calling. Peaks were called using a genome-wide threshold determined by taking the average of normalized counts across all chromosomes and adding an empirically determined multiple of the s.d. of normalized counts. Positions above the threshold were grouped into a discrete peak if they were within a specified distance of each other. The position of a given peak was defined as the position of the base with the highest normalized count value of all positions constituting a peak. A window (56 bp for yeast and 100 bp for fly DNA-binding proteins, respectively) around each peak position was defined for the purposes of motif discovery. Reb1 ChIP-exo peaks reported by Rhee and Pugh were analyzed similarly 14 . FASTA sequences were generated from chromosome coordinates produced by peak calling and windowing using BEDTools 51 . De novo motif discovery on FASTA sequences corresponding to windowed peaks was performed using MEME 22 or MEME-ChIP 52 . Peaks were recentered on best matches to motifs (derived from MEME for S. cerevisiae data sets or published motifs 53  Motif scoring. Called peaks were scored to determine the presence of a motif using MEME-derived position-specific log-odds scoring matrices (PSSMs) or previously published position weight matrices (PWMs) 54,55 . Sequences corresponding to called peaks were decomposed into all possible constituent k-mers (for motif length k) on both strands, and scores were determined by summing the corresponding elements in the PSSM or PWM at each position of the k-mer. The score of the highest-scoring k-mer was assigned to each peak. Peaks were then recentered at centers of highest-scoring k-mers. Motif scores can be interpreted both probabilistically and thermodynamically. Because scores are logodds scores, they represent the degree to which a given sequence can be explained by the motif model relative to the background nucleotide frequencies. A positive motif score suggests that the motif model specified by the position-specific log-odds matrix better fits the observed sequence than the background nucleotide frequencies and vice versa. A negative motif score suggests that the background nucleotide frequencies better model the observed sequence than the motif model. In this probabilistic context, we define 'true positive' peaks as having positive motif scores and 'false positive' peaks as having negative motif scores. Thermodynamically, positive motif scores correlate with binding energy 30 such that for positive motif scores, higher values are correlated with higher binding energies. Indeed, low-scoring Reb1 sites with one or two nucleotide deviations from the consensus motif likely do not represent direct binding sites given the abrogation of in vitro Reb1 binding upon even a single nucleotide change in the motif 56,57 .
Calculation of false negative rate at Reb1 sites. We estimated the level of false negatives by enumerating all high-scoring Reb1 motifs in the genome and determining how many were occupied by Reb1. We found 2,622 sites with motifs that score above 1,000; we chose this threshold score by assuming that nearly every npg motif scoring above 1,000 is a true positive (Fig. 3a). We observed that ~76% (1,992/2,622) of high-scoring Reb1 motifs were occupied by Reb1, suggesting that ~24% of Reb1 motifs were missed using our conservative cutoff. When we reduced the peak-calling threshold to include all detectably occupied sites, we obtained 2,638 sites, of which 2,626 (99.5%) had the Reb1 motif (0.5% false negative rate).
DNase I-seq footprinting analysis. S. cerevisiae DNase I-seq data were published previously 27 . Average DNase I-seq profiles were created by averaging tag counts (excluding missing data, i.e., positions without any reported tags) in 100-bp windows centered at a peak set of interest. Drosophila S2 cell DNase I-seq data were also previously reported 58 . Analysis of D. melanogaster DNase I-seq data was performed as described above for budding yeast, with the exception of defining wider windows (1 kb) centered at peak sets of interest. Yeast and fly ORGANIC peak coordinates were converted to match older genome builds using the UCSC Genome Browser liftover utility 59 . It was not possible to determine the presence of individual footprints using S2 DNase Iseq data for two reasons. First, very high sequencing depth is required to reliably identify footprints 29 . Second, sequencing reads that are not uniquely mappable are discarded in the DNase Iseq analysis pipeline 27,29 , and, because many of the regions bound by GAF and Psq are located within regions of GA repeats that are not uniquely alignable, there is not sufficient read density at these sites to determine whether footprints are present.
MNase-seq/nucleosome positioning analysis. Previously published nucleosome-size class MNase-seq data 20 were used to generate profiles of average nucleosome occupancy around ORGANIC peak sets (Supplementary Fig. 16). Briefly, the average of normalized counts from the nucleosome size class for peak sets were plotted in 2-kb windows centered at ORGANIC sites, similarly to the analysis of DNase I-seq data above.

Conservation analysis.
Previously published base-by-base phast-Cons conservation scores for budding yeast and D. melanogaster were obtained from the UCSC Genome Browser 32,59 . These scores represent the probability that a given base is in a conserved region as defined by a phylogenetic hidden Markov model analysis of a 7-Saccharomyces-species multiple alignment 32 or a 15-insectspecies multiple alignment. Note that conservation analysis at GAF and Psq sites is slightly complicated by the fact that these factors tend to bind GA repeats that are not easily aligned, particularly in intergenic regions, which may explain the relative depletion in conservation scores at binding sites, although the binding sites themselves are found in regions of local increases in conservation (Supplementary Fig. 15). The dependence of phastCons scores on the quality of multiple alignment is well known 32 .
Yeast genomic context analysis. Annotations of genomic regions corresponding to genes, intergenic regions, telomeres and autonomously replicating sequences (ARSs) were obtained from the Saccharomyces Genome Database (SGD; http://www. yeastgenome.org/download-data/) 60 . Intergenic regions were defined as regions that did not correspond to annotated genes. Each Reb1 or Abf1 binding site detected by ORGANIC profiling, ChIP-exo or X-ChIP-chip was then assigned genomic locations according to the coordinates specified in the SGD annotations. Note that some sites fell into multiple annotated areas (for example, some regions were annotated with "Gene" and "ARS" labels, whereas other regions were annotated as both "Telomeres" and intergenic regions, as defined above). Regions with multiple annotations were, therefore, counted in each annotation class in Supplementary Figure 17.
Estimation of the number of Reb1 and Abf1 binding sites in the budding yeast genome. The number of Reb1 and Abf1 sites was estimated on the basis of the probability of finding a given k-mer in the yeast genome, with k = 7 for Reb1 and k = 8 for Abf1 (corresponding to the number of bases with high information content in the motifs bound by these transcription factors). As a first-order approximation of the number of expected binding sites in genes based on chance, we determined the amount of coding sequence in the budding yeast genome from the estimate that ~72% of the genome is genic 61 , which gives ~9 Mb coding sequence. The probability of finding a given k-mer is 1/4 k , and the number of expected k-mers in coding sequences is therefore 9 Mb × 1/4 k , which gives a rough estimate of ~137 chance occurrences of the Abf1 motif and ~549 occurrences of the Reb1 motif in coding regions.
Estimation of the number of Reb1 binding sites in the D. melanogaster genome. We estimated the number of Reb1 sites in the D. melanogaster genome by calculating the effective genome size of tetraploid S2 cells (~170 Mb haploid × 4 = 680 Mb). We then calculated the expected occurrence of a five-base consensus sequence (corresponding to the number of highinformation content bases in the Reb1 motif) in S2 cells by dividing the effective genome size by 4 5 to arrive at ~6 × 10 5 Reb1 sites. We then divided this by the approximate number of Reb1 sites found in the yeast genome by PSSM scoring (~2,000) to estimate the fold excess of Reb1 sites present in the Drosophila genome. We calculated a similar fold excess by scoring of the D. melanogaster genome using the Reb1 ORGANIC 80 mM PSSM (Supplementary Table 2).
Determination of GAF levels in ORGANIC and X-ChIP protocol fractions. ORGANIC profiling of GAF was performed as described above, and aliquots were taken from the following steps for western blotting analysis: whole-cell extract (after resuspending in PBS), supernatant from washing nuclei, nuclei after the wash step (nuclear extract), nuclei after MNase digestion at 37 °C for 10 min and chelation of calcium to stop MNase digestion with 10 mM EDTA, supernatants after washing these MNasedigested nuclei, the 80 mM input and the insoluble pellet after removal of input material. The pellet was easily resuspended in TM2 containing 2 M salt.