Small Angle X-ray Scattering Analysis of Clostridium thermocellum Cellulosome N-terminal Complexes Reveals a Highly Dynamic Structure*

Background: The cellulosome N terminus contains the sole substrate-binding module of the cellulosome scaffoldin. Results: Two N-terminal cellulosomal fragments are devoid of intermodular interactions, are highly dynamic, and inhabit compact and elongated conformations equally. Conclusion: The characteristics of the cellulosome N terminus may facilitate its role in substrate binding. Significance: Information on cellulosome structure and dynamics aids in engineering designer cellulosomes. Clostridium thermocellum produces the prototypical cellulosome, a large multienzyme complex that efficiently hydrolyzes plant cell wall polysaccharides into fermentable sugars. This ability has garnered great interest in its potential application in biofuel production. The core non-catalytic scaffoldin subunit, CipA, bears nine type I cohesin modules that interact with the type I dockerin modules of secreted hydrolytic enzymes and promotes catalytic synergy. Because the large size and flexibility of the cellulosome preclude structural determination by traditional means, the structural basis of this synergy remains unclear. Small angle x-ray scattering has been successfully applied to the study of flexible proteins. Here, we used small angle x-ray scattering to determine the solution structure and to analyze the conformational flexibility of two overlapping N-terminal cellulosomal scaffoldin fragments comprising two type I cohesin modules and the cellulose-specific carbohydrate-binding module from CipA in complex with Cel8A cellulases. The pair distribution functions, ab initio envelopes, and rigid body models generated for these two complexes reveal extended structures. These two N-terminal cellulosomal fragments are highly dynamic and display no preference for extended or compact conformations. Overall, our work reveals structural and dynamic features of the N terminus of the CipA scaffoldin that may aid in cellulosome substrate recognition and binding.

Cellulose is the major constituent of plant cell walls and the most abundant organic molecule on Earth (1). With increasing energy consumption, the depletion of fossil fuels, and growing environmental concerns, development of alternative fuel sources is paramount. Conversion of plant biomass to ethanol represents a renewable and environmentally friendly alternative to fossil fuels. The critical bottleneck that prevents bioethanol from becoming a competitive energy source is the low efficiency of plant cell wall polysaccharide hydrolysis (2)(3)(4). Success in engineering "designer" hydrolases that can effectively break down cellulosic biomass has been limited; however, selected microbial species have been discovered that can rapidly and efficiently degrade plant cell wall polysaccharides through the synergistic activity of a variety of cellulases, hemicellulases, and other hydrolases all assembled in large protein complexes called cellulosomes (5)(6)(7)(8)(9).
The best studied cellulosome is from the thermophilic anaerobe Clostridium thermocellum and is composed of three main components: a multimodular non-catalytic scaffoldin subunit called CipA, catalytic subunits of varying activities that bear a type I dockerin (DocI), 4 and one of the three cell surface-associated proteins that contain one, two, or seven type II cohesin (CohII) modules, i.e. SdbA, Orf2p, and OlpB, respectively (see Fig. 1) (5, 10 -13). The CipA scaffoldin subunit comprises a cellulose-specific carbohydrate-binding module (CBM) that targets the multienzyme complex to its substrate, an X module of unknown function, a DocII module responsible for anchoring the scaffoldin to the cell-surface subunit via interaction with its cognate CohII module, and nine CohI modules, all of which are connected by linker regions of varying lengths. The latter cohesins mediate the assembly of the catalytic subunits into the complex via their respective DocI modules (14 -19).
Despite the fact that many structures of individual modules have now been solved using x-ray crystallography and NMR spectroscopy (19 -25), very little is known about the arrangement of these modules in three-dimensional space that ultimately leads to the characteristic synergy among the cellulosomal enzyme components. Small angle x-ray scattering (SAXS) has been used to investigate how linker length and composition affect the modular orientation in the context of engineered chimeric minicellulosome-like complexes (26). This work has provided valuable insight into the behavior of engineered scaffoldins; however, it is unclear how well these complexes represent the behavior of natural scaffoldins. Early EM studies revealed the dynamic nature of the cellulosome ultrastructure (27,28). In the absence of substrate, the cellulosome forms bulbous protuberances on the surface of the cell. However, in the presence of a crystalline cellulose substrate, the cellulosome forms fibrous structures that adhere to the surface of the cellulose. More recently, single-particle cryo-EM was performed on a fragment of CipA comprising CohI modules 3-5 in tandem (CohI 3 -CohI 4 -CohI 5 ), each with bound DocIbearing Cel8A catalytic components. The data revealed a compact structure, with the enzymes projecting outward in opposite directions (29).
In this study, we investigated the structure and flexibility of two overlapping N-terminal fragments of the cellulosome from C. thermocellum using SAXS. These fragments included the same Cel8A enzyme used in the previous cryo-EM work (29) together with two overlapping CipA scaffoldin fragments: CohI 1 -CohI 2 and CohI 2 -CBM-CohI 3 (Fig. 1). We present the best fit structures determined by ab initio and rigid body methods. We show that a minimal ensemble of conformers better fit the data than a single structure, which indicates that both complexes are flexible. Furthermore, the minimal ensemble analysis depicts equal populations of compact and elongated conformers for both complexes. The structural analysis presented here reveals a high degree of structural dynamics in the N terminus of the cellulosome, which may facilitate substrate recognition and binding by allowing greater access to the CBM of CipA.
Plasmids were transformed into Tuner TM cells (Invitrogen) and grown in LB broth supplemented with 50 g/ml kanamycin. Cultures were induced with 0.2 mM isopropyl ␤-D-thiogalactopyranoside at A 600 ϭ 0.6 and grown for an additional 16 h at 19°C. Cells were harvested by centrifugation for 20 min at 4200 rpm using a Sorvall H-6000A/HBB-6 rotor and then resuspended in Buffer A (20 mM Tris-HCl (pH 8.0), 300 mM NaCl, 10 mM imidazole, and 1 mM CaCl 2 ). Lysozyme (1 mg/ml) was added to the suspension and incubated at 4°C with stirring for 30 min, followed by sonication. The lysate was cleared by centrifugation at 16,000 rpm using a JA-20 rotor. Ni 2ϩ affinity resin (2 ml) pre-equilibrated with Buffer A was added to the supernatant and incubated at 4°C with rocking for 1 h. The mixture was subsequently applied to a column containing the affinity resin and washed with Buffer A and eluted with Buffer A containing 400 mM imidazole. The eluted protein was further purified over a Sephadex 200 size exclusion column (Amersham Biosciences) that was pre-equilibrated with 50 mM HEPES (pH 7.5), 50 mM NaCl, 1 mM CaCl 2 , and 1 mM DTT. Fractions containing protein were analyzed by SDS-PAGE. Purified complexes were pooled and concentrated.
Dynamic Light Scattering Analysis-Prior to dynamic light scattering (DLS) measurements, samples were centrifuged at maximal speed using a tabletop microcentrifuge for 30 min at 4°C. DLS experiments were performed with a DynaPro instrument (Protein Solutions), and the data were analyzed using the accompanying Dynamics version 5.25.44 software package.
SAXS Data Collection-SAXS data were collected at the F2 station at the Cornell High Energy Synchrotron Source (Ithaca, NY) using an ADSC Quantum-210 CCD detector. A sample (6.0 mg/ml) of bovine serum albumin was measured as a reference and for calibration. Samples (25 l) were oscillated back and forth in the capillary throughout the data collection to minimize sample damage during each exposure. Scattering patterns were measured with a total exposure time of 3 min at 21°C. The wavelength was 1.25480 Å with a sample-to-detector distance of 800 mm. The scattering vectors (q) covered 0.01-0.4 Å Ϫ1 . The concentration of the protein samples ranged from 5 to 16 mg/ml for (CohI 1 -CohI 2 )⅐2Cel8A and from 1 to 3 mg/ml for (CohI 2 -CBM-CohI 3 )⅐2Cel8A. Samples were measured twice with background scattering measurements of the FIGURE 1. Schematic representation of the cellulosome from C. thermocellum. The cellulosome is composed of three types of proteins: 1) a cell surface-anchoring protein, 2) the CipA scaffoldin protein, and 3) enzyme subunits. Cell-surface proteins are composed of CohII modules (white C) and surface-like homology (SLH) domains, which interact with the cell surface. CipA is composed of a DocII module (white D), an X module (X), a family 3a cellulosespecific binding module (the CBM), and nine CohI modules (black C 1-9 ) enumerated from the N terminus, which is on the left side in this schematic. Enzyme subunits contain a DocI module (black D) and a catalytic module (black E). The (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes and where they map in the full-length cellulosome are shown in the solid and dashed boxes, respectively.
final column flow-through from the protein purification taken both before and after each protein sample to monitor any beam fluctuations or damage to the protein sample. Background scattering was subtracted from the protein scattering patterns after proper normalization and correction for detector response.
SAXS Data Analysis-The radii of gyration (R g ) were derived from the Guinier approximation: is the scattering intensity and I(0) is the forward scattering intensity (30). R g and I(0) were determined from the slope and intercept, respectively, of the linear fit of ln(I(q)) versus q 2 in the q range q*R g Ͻ 1.3. All scattering curves were indicative of monomeric states of the molecules in solution. The distance distribution function (P(r)) was calculated using the Fourier inversion of the scattering intensity I(q) using GNOM (34). The P(r) function was also used to calculate the R g , taking into account the whole data collected. The molecular masses of both complexes were determined as described previously (31).
SAXS-based Modeling Procedures-The low-resolution shapes of the protein constructs were determined ab initio from the scattering curve using the programs DAMMIN and GASBOR (32,33). Rigid body modeling and minimal ensemble generation were performed with BILBOMD (34). Initial models of the (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes for BILBOMD analysis were generated with PyMOL using the structures with Protein Data Bank codes 1ANU, 1NBC, 1OHZ, and 1CEM and Phyre-generated homology models for CohI 1 , CohI 3 , and the Cel8A DocI module (15,19,24,(35)(36)(37). Intermodular linkers were built in PyMOL (36) and defined as flexible in BILBOMD analysis. Individual modules and CohI-DocI interactions were defined as rigid and therefore maintained throughout the analysis. Figures for the ab initio, rigid body, and BILBOMD conformer ensemble models were generated using PyMOL (36). SAXS data are summarized in Table 1.

RESULTS
Protein Sample Preparation-Two N-terminal fragments of the CipA scaffoldin subunit (CohI 1 -CohI 2 and CohI 2 -CBM-CohI 3 ) were coexpressed and co-purified with the Cel8A cellulase bearing S458A/S459A mutations in its DocI sequence. These mutations allowed the incorporation of Cel8A onto the CipA CohI-containing fragments in a single DocI orientation, thereby ensuring the homogeneity of complex protein samples for SAXS analysis (14). Typical yields for the (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes were 20 and 7 mg/liter of Escherichia coli culture, respectively. The purity was judged to be ϳ95% based on SDS-PAGE.
Analysis of Raw SAXS Data-SAXS data were recorded for both complexes at a range of different concentrations: 5-16 and 1-3 mg/ml for (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A, respectively ( Fig. 2A). All SAXS-determined parameters are summarized in Table 1. The R g determined from Guinier plots ranged from 60.3 to 62.6 for the (CohI 1 -CohI 2 )⅐2Cel8A complex and from 63.3 to 63.8 for the (CohI 2 -CBM-CohI 3 )⅐2Cel8A complex, which indicates that R g is independent of concentration for the concentration ranges measured for both complexes. Similarly, R g calculated using the program GNOM also displayed a narrow range: 61.1-62.6 for the (CohI 1 -CohI 2 )⅐2Cel8A complex and 63.3-63.7 for the (CohI 2 -CBM-CohI 3 )⅐2Cel8A complex, which also supports the concentration-independent behavior of R g for the two complexes studied. Moreover, the Guinier-derived R g is in good agreement with that determined using GNOM for both complexes. The molecular masses of both complexes were calculated using the method developed by Fischer et al. (31). The SAXS-determined molecular masses ranged from 140 to 154 kDa for the (CohI 1 -CohI 2 )⅐2Cel8A complex and from 147 to 155 kDa for the (CohI 2 -CBM-CohI 3 )⅐2Cel8A complex, consistent with the monomeric molecular mass of each complex. Furthermore, the pair distribution functions (P(r)), also calculated using GNOM, are skewed to the right for both complexes, which is indicative of elongated molecules (Fig. 2C).
Ab Initio Modeling of Cellulosomal Fragments-The overall shapes of both the (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes were calculated using DAMMIN and GASBOR (32,33). Six envelopes were generated for each complex using both algorithms (supplemental Figs. S1-S4). The best fit envelope in each case, selected based on their exhibiting the lowest 2 value to the experimental curve, displays elongated conformations consistent with their respective P(r) functions (Figs. 3 and 4). Each of the envelopes contains globular regions that correspond to the GH8 catalytic subunits, the CohI-DocI pairs, and, in the case of the (CohI 2 -CBM-CohI 3 )⅐2Cel8A complex, the family 3a CBM. Narrower regions represent the intermodular linker regions. In addition, the GASBOR best fit structure for the (CohI 2 -CBM-CohI 3 )⅐2Cel8A complex contains small extensions protruding from one of the terminal domains. These extensions are not seen in the DAMMIN-derived structures for this complex and are not present in all of the GASBOR-derived structures. The crystal structures of the GH8 catalytic module, the family 3a CBM, and the CohI⅐DocI complexes were manually placed within the best fit envelopes to

Rigid Body Modeling of Cellulosomal Fragments-Although
x-ray crystal structures are not available for intact (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes, crystal structures have been determined for CohI 2 , CBM, and the Cel8A catalytic module (14,15,19,35). Moreover, the high degree of sequence similarity (Ͼ85%) shared between the CohI-DocI-interacting modules in these complexes and those that have previously been solved allowed us to model these with confidence using structural homology modeling techniques. Given the wealth of structural detail available, we decided to combine this with our solution scattering data to provide a better picture of the modular spatial orientation in solution, similar to the method previously used for engineered minicellulosome chimeric complexes (38,39). BILBOMD is a program designed to investigate flexibility in proteins by building a pool of conformers sampling the conformational space available to the molecule of interest and then selecting those conformations that best fit the experimental solution scattering data (34). Intermodular linkers were built in PyMOL (36) and defined as flexible in subsequent analysis with BILBOMD. Individual modules and CohI-DocI interactions were defined as rigid and therefore maintained throughout the analysis. Large R g ranges, 43-83 for the (CohI 1 -CohI 2 )⅐2Cel8A complex and 51-91 for the (CohI 2 -CBM-CohI 3 )⅐2Cel8A, were initially selected to thoroughly explore all conformations that were available. However, in both cases, BILBOMD returned conformers from a narrower R g range, 49 -79 for the (CohI 1 -CohI 2 )⅐2Cel8A complex and 47-80 for the (CohI 2 -CBM-CohI 3 )⅐2Cel8A, which suggests a physical restraint on the size of both complexes in solution. The best fit conformers for the (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes had R g values of 55 and 56 and fit the experimental data with 2 values of 2.3 and 3.2, respectively (Fig. 5).

DISCUSSION
The cellulosome is the paradigm of bacterial cellulose degradation. However, despite the wealth of structural information available for well structured cellulosome modules (19 -25, 35), the mechanism of its synergy remains enigmatic. In this study, we employed SAXS to build on the limited knowledge of the modular architecture and dynamics of the cellulosome from C. thermocellum. We uncovered structural features of the N terminus of the cellulosome through the study of the (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes. We found that both complexes are highly dynamic, lack interscaffoldin interactions, and inhabit various compact and extended conformations without preference. The N-terminal portion of the CipA scaffoldin subunit bears the sole substrate recognition module. Therefore, this behavior may be attributed to its unique role within the complex. A more dynamic and open structure would allow greater access to substrate.
The first SAXS study of the cellulosome explored the solution structure of the Clostridium cellulolyticum enzyme Cel48F both alone and bound to a CohI module (38). These experiments were performed with both the native DocI and CohI modules from C. cellulolyticum and the Cel48F enzyme appended to a C. thermocellum DocI module both alone and bound to a C. thermocellum CohI module. In each case, the authors observed more compact and less flexible structures when the enzyme was in complex with a CohI module. Interestingly, we observed Cel8A catalytic modules at variable distances from their cognate DocI modules when bound to CohI modules in both the ab initio envelopes and rigid body models. It should be taken into account that the linker segment of C. cellulolyticum Cel48F is relatively short (ϳ6 residues) versus that of C. thermocellum Cel8A (ϳ24 residues). In any event, the excellent agreement between the two independent models likely reflects the dynamic nature of the linker in order for the catalytic module to maximize its ability to reach its substrate. Furthermore, our ensemble analysis confirmed that our observations in the static ab initio and rigid body models are the result of flexibility in the enzyme-borne linkers.
SAXS has also been used to investigate the dynamics of inter-CohI linkers in the context of chimeric scaffoldins composed of a CohI module from C. thermocellum and another from C. cellulolyticum joined by linkers of various lengths and compositions (26). Few or no intermodular contacts and dynamic intercohesin linkers in the enzyme-free state were reported. Our work has expanded on this study by evaluating the dynamics in not only two additional types of cellulosomal linkers, CohI-CBM and DocI-GH8, but also in intact natural minicellulosome complexes. Species-specific CohI-DocI interactions are well documented in natural cellulosome systems (e.g. C. cellulolyticum CohI does not interact with C. thermocellum DocI), and recent work suggests that weak intermodular interactions may also play a role in higher order cellulosome structure (40,41). However, due to the chimeric nature of the scaffoldins analyzed in previous SAXS studies (38,39), any species-specific interac-tions would be lost. The results from our work indicate that the N terminus of the C. thermocellum cellulosome is devoid of intermodular interactions, thus ensuring maximal flexibility and, as a result, greater accessibility of the family 3a CBM to its substrate.
Structural studies of fragments of the native C. thermocellum cellulosome suggest that the cellulosome, although flexible, may prefer certain modular arrangements. Cryo-EM analysis of (CohI 3 -CohI 4 -CohI 5 )⅐3Cel8A revealed a compact structure, with enzymes projected outward in opposite directions (29). Although the intermodular linkers accommodated dynamic elongated conformations as well, these were a minor component of the conformer pool. By contrast, we have demonstrated that both the (CohI 1 -CohI 2 )⅐2Cel8A and (CohI 2 -CBM-CohI 3 )⅐2Cel8A complexes exhibit no preference for compact or elongated conformations. However, due to differences in sample preparation and structural methods, our findings cannot be directly compared with the cryo-EM analysis of the (CohI 3 -CohI 4 -CohI 5 )⅐3Cel8A complex. As a result, more work will be needed to address the possibility of potential dynamic differences along the CipA scaffoldin.
Interestingly, the high degree of dynamics observed in the complexes studied here indicates that the DocI orientation would have little to no effect on enzyme position in the N terminus of the cellulosome. Therefore, the DocI dual binding mode likely contributes little to the function of the N terminus of the cellulosome. However, regions where more compact conformations predominate may benefit more from precise enzyme orientations.
The work described here has identified key structural and dynamic attributes of the cellulosome N terminus. These characteristics potentially contribute to the function of this distinct region of the cellulosome. Although more work is required to delineate the significance of these conformational preferences in cellulosome function, these findings provide the foundation for optimizing the next generation of designer cellulosomes.