Crystal Structure of an Insect Antifreeze Protein and Its Implications for Ice Binding*

Background: Antifreeze proteins bind to ice crystals and inhibit their growth. Results: The crystal structure of a potent beetle antifreeze protein was determined by direct methods. Conclusion: Ordered crystallographic waters on the protein surface match several planes of hexagonal ice. Significance: The structure is the largest determined ab initio without heavy atoms, and its ordered waters suggest a molecular basis for ice binding. Antifreeze proteins (AFPs) help some organisms resist freezing by binding to ice crystals and inhibiting their growth. The molecular basis for how these proteins recognize and bind ice is not well understood. The longhorn beetle Rhagium inquisitor can supercool to below −25 °C, in part by synthesizing the most potent antifreeze protein studied thus far (RiAFP). We report the crystal structure of the 13-kDa RiAFP, determined at 1.21 Å resolution using direct methods. The structure, which contains 1,914 nonhydrogen protein atoms in the asymmetric unit, is the largest determined ab initio without heavy atoms. It reveals a compressed β-solenoid fold in which the top and bottom sheets are held together by a silk-like interdigitation of short side chains. RiAFP is perhaps the most regular structure yet observed. It is a second independently evolved AFP type in beetles. The two beetle AFPs have in common an extremely flat ice-binding surface comprising regular outward-projecting parallel arrays of threonine residues. The more active, wider RiAFP has four (rather than two) of these arrays between which the crystal structure shows the presence of ice-like waters. Molecular dynamics simulations independently reproduce the locations of these ordered crystallographic waters and predict additional waters that together provide an extensive view of the AFP interaction with ice. By matching several planes of hexagonal ice, these waters may help freeze the AFP to the ice surface, thus providing the molecular basis of ice binding.

Antifreeze proteins (AFPs) 3 enable the survival of many organisms that inhabit subzero environments, from vertebrates to bacteria (1). It is generally accepted that AFPs adsorb to the surface of nascent ice crystals. Upon adsorption to ice, AFPs block water molecules from accessing the ice surface at the bound location. The ice front thus becomes convex toward the solution between the surface-bound AFPs, which is energetically unfavorable for ice growth by the Gibbs-Thompson-Herring effect (2)(3)(4). This results in thermal hysteresis, or the noncolligative depression of the freezing point of the solution without altering its melting temperature. Although the adsorption-inhibition mechanism for ice growth inhibition has been generally accepted for Ͼ30 years, the mechanism by which AFPs recognize and bind ice has gone through several radical revisions without a consensus emerging. Recent simulations reveal that in contrast to other inorganic surfaces to which proteins may bind, an ice surface is disordered at the molecular level, with a 1-2-nm transition region separating ice from the surrounding bulk water solution (5).
The accepted structure of ordinary hexagonal ice consists of sheets of tessellating hexagonal rings composed of water molecules (6) with the angle between hydrogen bonds in the crystal lattice approximating the tetrahedral angle of 109.5°. By slicing this crystalline lattice at different angles, various two-dimen-sional ice surfaces are exposed that represent different planes of ice. It has been proposed recently that the ice-binding residues of antifreeze proteins might organize waters into an ice-like configuration that matches and merges with the quasiliquid layer at the ice/water interface (7,8). The ordered waters match specific ice planes, facilitating interaction between the AFP and ice and thereby influencing ice crystal morphology. Molecular dynamics and a solution NMR study reveal that these organized waters are highly mobile, exchanging with bulk water on a subnanosecond time scale (9).
Crystal structures have been determined for six different types of AFPs from freeze-avoiding organisms: three from fish (10 -13) and three from insects (14 -16). Despite their common function, AFPs display remarkable diversity in their tertiary architectures. The richness in structural folds found in AFPs represents an exceptional example of radically different proteins serving the same function through convergent evolution. Although the most active (hyperactive) AFPs thus far are found in insects, they have proven difficult to produce in sufficient quantities for crystallographic determination (17). The difficulty of recombinant expression has also limited the use of the hyperactive AFPs as cryoprotective agents for applications outside of basic science (18).
The longhorn beetle, Rhagium inquisitor, survives harsh winter temperatures in Siberia and can supercool to below Ϫ25°C (19). The ability of the beetle to withstand these temperatures is due in part to the presence of an AFP in the hemolymph. The only AFP identified in R. inquisitor, RiAFP, is a 13-kDa protein with one of the highest antifreeze activities measured for any AFP (20). Its amino acid sequence compared with those of other insect hyperactive AFPs from Choristoneura fumiferana (spruce budworm, CfAFP), Dendroides canadensis (fire-colored beetle, DcAFP), Hypogastrura harveyi (snow flea, sfAFP), and Tenebrio molitor (yellow mealworm, TmAFP) (14 -16, 21, 22) suggests that RiAFP is a distinct type of AFP with a novel fold. Because the mechanism of ice recognition by antifreeze proteins is not well understood, a structural study of RiAFP could provide new insights into the molecular basis for antifreeze activity as well as the adaptations which proteins have acquired to fold and function at temperatures as low as Ϫ25°C. Structural comparisons of RiAFP with other insect AFPs may also provide important molecular insights into their convergent evolution and the basis for antifreeze hyperactivity.
Here, we report the crystal structure of RiAFP determined at 1.21 Å resolution by ab initio direct methods. Direct methods provide an alternative for macromolecular crystal structure determination at atomic resolution (beyond 1.2 Å) in the absence of experimental phases or an adequate molecular replacement search model. However, proteins with Ͼ1,000 nonhydrogen atoms have been recalcitrant to structure determination by direct methods, especially without anomalous scatterers in the crystal (23). The RiAFP structure presented here is the largest determined ab initio without heavy atoms.
Our structure has a square prism shape and reveals a novel ␤-solenoid architecture with an ice-binding surface (IBS) containing a remarkably regular and extensive array of thre-onine residues. The IBS is exceptionally flat and holds crystallographic waters in a pattern that matches the primary prism plane of hexagonal ice. Based on simulations these waters occupy natural positions and help recruit other waters, some of which match the basal plane. A fluorescence-based ice etching experiment reveals that GFP-tagged RiAFP can bind the entire surface of a single crystal hemisphere, suggesting affinity to basal, primary prism and additional planes of ice.

EXPERIMENTAL PROCEDURES
Structure Determination of RiAFP-Cloning, expression, purification, and crystallization of RiAFP were reported previously (24). Diffraction data for RiAFP were collected using a 1.1 Å synchrotron radiation source at Brookhaven National Laboratory (beamline X25) (24) and processed to 1.147 Å resolution with XDS (25). Data collected on a 1.54 Å copper anode x-ray source were not of sufficient quality for structure determination ab initio or by more conventional methods. Molecular replacement using other insect antifreeze protein crystal structures as search models did not yield any obvious solutions. The high resolution diffraction (1.15 Å resolution, albeit with only 8.3% completeness in the highest resolution shell, see Table 1) prompted us to consider ab initio methods. Initial phases were obtained by applying the charge flipping program SUPERFLIP, which is mostly used for small molecule crystal structure determination (26,27). The resulting electron density map was input Friedel-mates. e R work ϭ ⌺ hkl PF obs (hkl) Ϫ F calc (hkl)P/⌺ hkl PF obs (hkl) , standard crystallographic R-factor. f R free is the cross-validation R-factor for ϳ5% of the total unique reflections that were randomly selected. g r.m.s.d., root mean square deviation to ideal values.
into PEAKMAX in the CCP4 (28) suite for peak picking. The all-water "dummy atom" model produced by PEAKMAX was then used as the starting model for ACORN (29). The observed data were artificially extended to 1.0 Å with anisotropy corrections in ACORN. The phases derived from ACORN were input into Arp/wArp (30) for automatic model building. The ␤-strands built from Arp/wArp runs (Fig. 1A) were used as a molecular replacement search model in PHENIX (31). Iterative cycles of model building in COOT (32) and refinement in REFMAC (33) were performed. Notably, even with most of the structure built (20 of the total 26 ␤-strands, Fig. 1B), electron density for the surrounding loops was weak or absent, prompting us to reexamine the data and reprocess it in space group P3 1 and P1. Molecular replacement in P3 1 followed by automated model building with Arp/wArp yielded a nearly complete structure. The structure was ultimately refined in space group P3 1 21 with R work ϭ 12.76% and R free ϭ 15.10%. The six C-terminal histidine residues from the affinity purification tag in chain A were clearly visible in the electron density map. However, residues 90 -91 and the polyhistidine tag in chain B were disordered but with some residual electron density in the area. There are two RiAFP molecules in the asymmetric unit with a solvent content of ϳ42.8%. Complete crystallographic and refinement statistics are listed in Table 1.
Thermal Hysteresis Experiments and Fluorescence-based Ice Plane Affinity Analysis-Fluorescence-based ice plane affinity analysis and thermal hysteresis measurements were performed as described in (7).
Multiangle Light Scattering-Purified RiAFP (ϳ10 mg/ml) and GFP-RiAFP (5 mg/ml) were injected onto a Superdex 75 (10/300) column (GE Healthcare) equilibrated in 20 mM Tris-HCl, pH 7.5, 0.1 M NaCl, coupled to a DAWN EOS spectrometer and OPTILAB DSP interferometric refractometer (Wyatt Technologies) at 25°C. Peaks were detected as they eluted off the column with a UV detector at 280 nm, a light scattering detector at 690 nm, and a refractive index detector. The molar mass of the protein sample was determined from the Debye plot of light scattering intensity versus scattering angle. Data processing was performed with ASTRA software (Wyatt Technology Corp.).

Molecular Dynamics Simulations to Locate Surface Waters-
A simulation to locate water molecule positions on the IBS of RiAFP was performed as described previously (34) with slight differences. The simulation consisted of an energy minimization step and two position-restrained molecular dynamics steps. The solvent in the box contained 6,220 water molecules and 5 Cl Ϫ ions to offset the charge of the protein. The first position-restrained step was a 100-ps simulation in which the system volume remained constant. The second position-restrained step involved a 20-ns simulation in which the system pressure remained constant, saving data at 0.02-ns intervals. The trajectory of the constant-pressure position-restrained simulation was used to calculate the water density around the protein after doing a least squares fit of the protein backbone using the first frame as a reference. Water density was calculated using the VolMap plugin (version 1.1) of VMD (version 1.9.1) with a resolution of 0.5 Å, atom size 1 Å, weights as mass, computed as the average for all saved frames. Waters were manually built into peaks in the VolMap density using PyMOL.

RESULTS AND DISCUSSION
Structure Determination of RiAFP-The RiAFP structure, which contains 1,914 nonhydrogen protein atoms and no atom heavier than sulfur in the asymmetric unit (26 kDa), is the largest structure determined to date by direct methods without anomalous scattering data. It also represents the first structure determined by direct methods at a resolution limit lower than 1.0 Å that has Ͼ1,000 nonhydrogen atoms. The procedure for ab initio structure determination presented here significantly extends the capability of direct methods and provides an alternative to traditional heavy atom-based methods for proteins of up to 26-kDa when crystallographic data are available to 1.2 Å or beyond. Data and refinement statistics are listed in Table 1.
Overall Structure of RiAFP-The structure of RiAFP reveals a new ␤-solenoid architecture in which the polypeptide chain follows a mixed handed superhelical trajectory to form a ␤-sandwich of two parallel 6-and 7-stranded ␤-sheets (Fig. 2, A  and B). The central section forms a left handed superhelix of remarkable regularity where the ␤-sheets lie on top of each other with the upper and lower strands parallel but in the opposite orientation. The only deviations from this handedness and helix regularity occur at either end to form ␤-helix capping structures (Fig. 2B). At the N terminus there is a reversal of the handedness of the first two strands (␤1 and ␤2), which results in ␤1 and ␤2 being antiparallel with their neighboring strands. At the C terminus, ␤-strands 12 and 13 loop back to insert between strands 10 and 11. These capping structures help to prevent end-to-end associations that would spoil the solubility of RiAFP and lead to oligomerization and aggregation. The three residues in ␤-strand 11 at the C terminus (Gln 110 , Gln 112 , and Ile 114 ) that are too bulky to be accommodated into the core may also contribute to the capping structure to prevent amyloid-like polymerization.
Several AFPs have independently evolved to have a ␤-solenoid fold, including CfAFP from spruce budworm (21), TmAFP from yellow mealworm beetles (14), and MpAFP from an Antarctic bacterium (7). Also, the AFP from an inchworm (a different moth species from the spruce budworm) has been modeled as a ␤-solenoid (17). Whereas ␤-solenoids often form multimers (35), the AFPs are distinctly monomeric and are essentially devoid of twist. The lack of twist exemplified in RiAFP (Fig. 2, B and C) is essential for forming a flat IBS. A novel feature of the RiAFP solenoid is its compressed nature. In the core of RiAFP, the side chains within apposed ␤-strands from the two ␤-sheets are staggered, allowing the side chains to interdigitate and pack tightly against one another (Fig. 3). Similar interdigitation was observed in silk fibers (36) and was predicted in an inchworm AFP model (17). Most of the side chains in the core are from alanine and serine, with no residue larger than threonine. With this configuration, the average distance between the sheets is only 6 Å, compared with 10 Å across for TmAFP (14) and 14 Å across for CfAFP  . The thin core of RiAFP spans less than 6 Å and is tightly packed with interdigitating Ala, Ser, and Thr residues. A, end-on and 180°view of the core residues in RiAFP contributing to the capping motifs. B, close up of the interdigitating alanine, serine, and threonine residues within the tightly packed, thin, stable core. (21,22), creating a more compact fold that may contribute to the high stability and antifreeze activity of RiAFP. In addition to the hydrophobic interactions within the core, hydrogen bonds between Thr 65 -Ser 55 , Thr 85 -Ser 75 , and Thr 132 -Ser 124 , along with a single disulfide bond between Cys 4 and Cys 21 , link the two sheets together to further stabilize the ␤-sandwich fold. The ␤-turns in the structure contain mostly glycine or proline residues.
Surface Complementarity to Ice-The adsorption-inhibition mechanism of AFPs was proposed by Raymond and DeVries 35 years ago (2). The driving force for AFP binding in the adsorption process was at one time thought to be the entropic gain from the liberation of the water molecules coordinated on the IBS, together with van der Waals interactions and hydrogen bonding (42). Adsorption of the AFP ice-binding surface to ice could be facilitated by the flatness of the IBS of the AFP. Indeed, like other insect AFPs, RiAFP has exceptional flatness along its ice-binding site. The flatness can be quantified by a "flatness function" (43) or by calculating the difference between the C␣-C␣ and C␤-C␤ distances of the first and last threonine residues within each row of TXT(XTXT) motifs. Smaller distances reflect reduced curvature of the surface. In the RiAFP structure, this difference is limited to 0.01 Å, whereas in T. molitor AFP the difference reaches 0.28 Å. By aligning ice-binding surface atoms of RiAFP to hexagonal ice, we identified four possible interactions of RiAFP with the primary prism plane of ice (Fig. 5). The shape complementarity (44) between protein and ice interfaces is comparable for all four binding modes, with Sc values of 0.75, 0.68, 0.70, and 0.78, respectively. A Sc value of 1.0 indicates perfect shape complementarity between two partners. For comparison, antigen-antibody complexes usually have their Sc values in the range of 0.64 -0.68 (44). Although it is possible that RiAFP directly binds to ice along its entire IBS, a closer examination of the structure suggests a more indirect binding mechanism.
Ordered RiAFP-bound Waters and the Molecular Basis for Ice Binding-In the RiAFP structure, the threonine hydroxyls on the IBS bind three ranks of six water molecules with equivalent spacing between the four ranks of threonine side chains ( Fig.  2A). These hydrogen-bonded water molecules have lost both translational and rotational freedom and resemble those in an ice lattice. They have lower average temperature factors (B factors) than the protein side chains (6.1 Å 2 versus 9.7 Å 2 , respectively), indicating that the water molecules are bound tightly. The two molecules in the asymmetric unit are juxtaposed with their ice-binding surfaces in contact through the bound water (Fig. 2E). However, it is not clear from the crystal structure alone whether the positions of bound waters on the IBS of RiAFP are influenced by crystal contacts between the proteins. To evaluate this concern, we performed molecular dynamics simulations using the Gromacs MD software package (45) with a single molecule of RiAFP in a box of waters represented by the TIP5P model. After 20 ns the average positions of waters on the IBS of RiAFP matched very closely the 18 regularly spaced waters seen in the crystal structure (Fig. 6A). After restraining the protein during the simulation a second layer of waters became apparent. Indeed, many of the waters on the IBS of the second RiAFP molecule can be found in this layer (Fig.  6B). Overall, the waters observed in the simulation appear to be organized in an ice-like formation, with close matches to the primary prism and basal planes of ice. This supports an emerging idea that an AFP ice-binding surface is responsible for ordering an ice-like array of anchored "clathrate" water molecules to promote adsorption to ice, so that the IBS merges with and freezes to the ice surface (7,8). An advantage of this indirect binding mechanism is that the organized waters are still fluid enough to make flexible matches to the ice-like quasi-liquid layer around the ice before becoming rigidified as the junction layer freezes. At this point, the protein is in direct contact with ice or an ice-like lattice.
The average distance between the threonine hydroxyls within the TXTXTXT motifs (or between the bound waters between the threonine rows) is 6.66 Å (6.19 -6.93 Å in chain A). The average distance between hydroxyls of these threonine residues in adjacent TXTXTXT motifs is 4.73 Å (4.44 -5.38 Å). The two-dimensional array of residues formed by these threonine hydroxyls is a close match to the primary prism plane of the hexagonal ice lattice (7.35 Å along the c-axis and 4.52 Å along the a-axis), but a less good match to the basal plane (7.83 Å and 4.52 Å). However, the discrepancy in the longer distance is compounded by the wider IBS in RiAFP than in TmAFP or CfAFP, and it is unlikely that more than two ranks of threonine residues can fit well to either of these planes. What then is the advantage of the wider IBS? We suggest that having several ranks of matching hydroxyls/waters will give the RiAFP more than one opportunity to recruit a quorum of ice-like waters that can match the ice crystal lattice (Fig. 7). A corollary to this argument is that if the waters on the ice-binding site were to make a perfect match to one particular ice plane, then the AFP would only bind this plane.
All the very active (hyperactive) AFPs bind to more than one ice plane, necessarily including the basal plane (42,46). Aside from the superior thermal hysteresis activity of RiAFP, an early indication of protein hyperactivity and basal plane ice binding is provided by the dendritic pattern of ice growth seen after the nonequilibrium freezing point is exceeded (Fig. 8, A and B). Ice growth occurs from six prism surfaces and not out of the basal  plane. However, when hyperactive AFPs are tested, most of them, including TmAFP, sfAFP (47), and MpAFP (7), bind to multiple planes as shown by adsorption to all surfaces of a single crystal hemisphere. GFP-tagged RiAFP also uniformly binds the entire surface of a single crystal hemisphere during fluorescence-based ice plane affinity analysis (Fig. 8, C and D). In contrast, an antifreeze protein from grass LpAFP clearly has a more limited ability to bind ice planes and only targeted the basal plane and primary prism plane (Fig. 8, E and F). It is possible that the ordered waters observed in the crystal structure of RiAFP extend for several additional layers in nature, providing options to bind to multiple other planes of ice, thereby resulting in total ice crystal hemisphere coverage.
Like other hyperactive insect AFPs (14,16,21,22), RiAFP crystallized with the ice-binding surfaces of two molecules facing each other and with extensive hydrogen bonding between the ordered water molecules on the two IBSs (Fig. 2E). However, multiangle light scattering and size-exclusion chromatography profiles show that RiAFP is a monomer in solution and that the ice-binding site is therefore fully accessible under phys- Dashed lines (black) represent potential hydrogen bonds (within 2.4 -3.5 Å). It is unlikely that more than two ranks of anchored clathrate waters can simultaneously fit well to either the primary prism or basal plane of ice. FIGURE 8. Ice-binding characteristics of RiAFP. A, an ice crystal formed in a solution containing RiAFP has a rounded, oval shape with the c-axis perpendicular to (coming out of) the plane of the page. B, at temperatures below the thermal hysteresis gap, the ice crystal "bursts" laterally along the a-axes in a 6-point dendritic pattern. C and D, top-down and side views of a single ice crystal hemisphere grown in the presence of GFP-RiAFP are shown. The direction of the c-axis is indicated by a white arrow. Green color represents the binding to ice and overgrowth of the GFP-tagged RiAFP, which in this experiment occurs over the entire hemisphere with no indication of preferential ice plane binding. E, an ice hemisphere grown in GFP-LpIBP with its c-axis perpendicular to the page shows binding to the basal plane (center) and six equivalent primary prism planes. F, an ice hemisphere grown in GFP-LpIBP reveals binding to three of the six equivalent primary prism planes, with the other three planes being hidden by the hemisphere. The images in E and F are reprinted with permission from Middleton et al. (38).
iological conditions (Fig. 9, B-D). Because the IBS mimics the ice lattice that it binds, it is not surprising that AFPs tend to associate through their ice-binding surfaces. However, the shape complementarity (44) between the two subunits in the RiAFP crystallographic dimer (Sc ϭ 0.62) is not as high as between RiAFP and ice (Sc ϭ 0.68 to 0.78), which explains why a RiAFP dimer is not stable in solution.
Biological Implications-The longhorn beetle has a supercooling point of Ϫ28°C (19). Recombinant RiAFP purified from Escherichia coli has a thermal hysteresis activity of up to 6°C at 0.5 mg/ml (Fig. 9A), which closely matches the activity of RiAFP purified from insect hemolymph (20) and is the highest reported activity of any antifreeze polypeptide. The activity measured in vitro is likely to be a significant underestimation of the natural activity because the cooling rate and the initial ice crystal size are greater than those occurring in nature. It has been demonstrated that these two parameters are inversely proportional to the thermal hysteresis activity measured by experiment (48). Importantly, whereas other hyperactive antifreeze proteins have proven difficult to produce in large quantities, RiAFP is expressed in E. coli at concentrations of up to 50 mg/liter of cell culture (24). Hyperactive AFPs have not yet been used for applications outside basic science due to the difficulty and high cost of producing them. The hyperactivity and efficient recombinant production of RiAFP make it a suitable candidate for further study for applications requiring freeze resistance or the control of ice growth and morphology. The crystal structure reported here also provides a springboard from which to design increasingly active antifreeze proteins.
Notably, Rhagium mordax, a longhorn beetle from northern Europe and a close relative of R. inquisitor, produces several isoforms of a hyperactive AFP (RmAFP1-8), each with significant sequence identity (75-82%) to RiAFP (49). A molecular dynamics model of RmAFP1 reveals a ␤-solenoid prediction that is similar to the architecture RiAFP. However, the absence of capping motifs in the RmAFP model compared with our crystal structure reveals that our understanding of the complexity of protein folding, although advanced, remains incomplete.
Compared with AFPs from the Tenebrinoid beetles T. molitor and D. canadensis, which share a 12-residue consensus sequence TCTXSXXCXXAX and similar tertiary structures, our crystal structure of RiAFP revealed a tight, two-layered ␤-sandwich with a more extensive IBS, suggesting that not all AFPs from beetles share a common architecture. Indeed, the primary sequences of RiAFP and RmAFP are unlike those from any other protein, suggesting a different genetic origin and prompting RiAFP to be considered the second distinct type of AFP found in beetles. The surprising structural similarity of RiAFP to the unrelated inchworm antifreeze protein (iwAFP) provides another striking example of convergent evolution and of the paradigm of a regular and repetitive ice-binding site as Nature's preferred solution to reduce ice damage.
Acknowledgments-We thank N. Grindley and C. Joyce for the use of laboratory space.