Conformational Ensembles of an Intrinsically Disordered Protein pKID with and without a KIX Domain in Explicit Solvent Investigated by All-Atom Multicanonical Molecular Dynamics

The phosphorylated kinase-inducible activation domain (pKID) adopts a helix–loop–helix structure upon binding to its partner KIX, although it is unstructured in the unbound state. The N-terminal and C-terminal regions of pKID, which adopt helices in the complex, are called, respectively, αA and αB. We performed all-atom multicanonical molecular dynamics simulations of pKID with and without KIX in explicit solvents to generate conformational ensembles. Although the unbound pKID was disordered overall, αA and αB exhibited a nascent helix propensity; the propensity of αA was stronger than that of αB, which agrees with experimental results. In the bound state, the free-energy landscape of αB involved two low free-energy fractions: native-like and non-native fractions. This result suggests that αB folds according to the induced-fit mechanism. The αB-helix direction was well aligned as in the NMR complex structure, although the αA helix exhibited high flexibility. These results also agree quantitatively with experimental observations. We have detected that the αB helix can bind to another site of KIX, to which another protein MLL also binds with the adopting helix. Consequently, MLL can facilitate pKID binding to the pKID-binding site by blocking the MLL-binding site. This also supports experimentally obtained results.

other contacts (non-native interactions) likely slow the folding rate. However, the non-native interactions might speed up the folding rate when they help the unfolded polypeptide to collapse, as occurs with competition with chain entropy [16]. If this scheme is correct, then non-native interactions might also positively support or facilitate the IDP-partner association. However, the protein models so far used are too simple to support a realistic discussion of the native and non-native factors. To elucidate these factors, an all-atom protein model is expected to be useful.
The all-atom model involves all the interaction factors. However, it is usually difficult to construct a statistically significant conformational ensemble of protein because the conformational space (potential energy surface) is constructed with a huge number of degrees of freedom and the conformation is frequently trapped in energy local minima during a simulation. Consequently, sampling the ensemble requires unrealistically long computation time by conventional simulations. We have overcome this difficulty using an enhanced sampling method: multicanonical molecular dynamics (McMD) [17,18]. The advantage of McMD is that the energy barriers among the energy local minima are overcome, as explained later. Recently, we developed a more efficient sampling method, trivial trajectory parallelization of multicanonical molecular dynamics (TTP-McMD) [19,20]. Using TTP-McMD, we have conducted an all-atom folding simulation of a 57 amino-acid residue protein [21] and the coupled folding and binding of NRSF and mSin3 [22], in explicit solvent. The NRSF is IDP and folds into the helix when binding to the partner mSin3. The simulation reproduced a conformational ensemble at room temperature, which contained the native-like complex structure in the largest population cluster (i.e., the most thermodynamically stable cluster/the lowest free-energy cluster) as well as non-native complex structures in minor clusters. Therefore, the TTP-McMD is a useful computational technique to examine IDP systems.
The cAMP-response element-binding protein (CREB) induces transcription via an interaction with its co-activator CREB binding protein (CBP). The transcription factor CREB contains a kinase-inducible domain (KID) to bind the kinase-induced domain interacting domain (KIX) of CBP [23,24]. The binding affinity of KID with KIX depends on phosphorylation [25,26]: the affinity increases as Ser133 of KID is phosphorylated. Both KID and the phosphorylated KID (pKID) are IDPs [27], and the tertiary structure of the pKID-KIX complex was determined using NMR at 315 K (PDB ID: 1kdx [27]). The deposited 17 NMR models show that pKID adopts a helix-loop-helix structure on the KIX surface. Therefore the binding of pKID to KIX is cooperated with folding [28]. Sugase et al. studied the kinetics of pKID binding with KIX by NMR [29]. This system is suitable for all-atom computations because the pKID sequence deposited in PDB is short (28 residues).
We note some experimental features of this pKID-KIX system. First, the binding affinity of the C-terminal helix of pKID (called α B ; residues 134-145 in the original PDB file) is one order stronger than that of the N-terminal helix (α A ; residues 120-131), and formation of helix in α B is necessary for the affinity maintaining, although the helix formation of α A is not [25]. The NMR study [27] has shown that the orientation of α A relative to the KIX framework is disordered, although that of α B is well determined with contacting tightly with KIX. Contrarily, in the unbound state, α A has a higher helix propensity than α B [30]. These features should be confirmed through simulations.
As described in this paper, we performed TTP-McMD simulations of pKID in the presence and absence of KIX. We denote the residues 120-131 as "α A residues" and the residues 134-145 as "α B residues" whether these residues form helices in simulation snapshots or not. Similarly, when the α A and α B residues are expressed as elements, they are denoted respectively as an "α A region" and "α B region." We show that the obtained conformational ensemble from the simulation agrees with the experimental features described above, and that the α A and α B regions have different mechanisms of coupled folding and binding.

Setting Simulation Systems (pKID and pKID-KIX Systems)
We designated the simulation system of pKID in the absence of KIX as the "pKID system" and that in the presence of KIX as the "pKID-KIX system." In the NMR experiment on the pKID-KIX complex [27], the pKID sequence was longer than that deposited in PDB (residues 119-146) because unstructured regions are not deposited. We used the deposited pKID region for the simulation, which is the minimum sequence of pKID binding with KIX (residues 586-666).
We prepared the pKID system as follows. The coordinates of pKID were taken from the first model out of the 17 NMR ones. The pKID was immersed in a solvent sphere (called sphere 1; radius = 30 Å), which consists of water molecules with the density of 1 g/mL equilibrated at 300 K in advance. The mass center of pKID was set to the center of sphere 1, and water molecules overlapping with pKID were removed. To neutralize the net charge of the system and make consistency with the ionic strength of the NMR experiment, nine water molecules were selected randomly and replaced with five Cl − and four Na + ions. Finally, the pKID system consisted of 3473 water molecules, pKID, and nine ions. Although pKID was taken from the NMR model at this stage, the structure was randomized completely using a high-temperature simulation, as described later.
Next, we prepared the pKID-KIX system as follows. The coordinates of the two proteins were taken from the first NMR model again. They were immersed in a solvent sphere (called sphere 2; radius = 37 Å). The sphere 2 radius was sufficiently large to contain the entire KIX structure, as described later. The center of sphere 2 was set to the mass-center position of pKID of the NMR model. It is noteworthy that the center of sphere 2 is fixed in space (i.e., the center of sphere 2 is not fixed to pKID) when pKID is moving in the simulation: The translation and rotation of pKID are not restrained. Water molecules overlapping the proteins were removed. Eighteen randomly selected water molecules were replaced with nine Cl − and nine Na + ions. Then, the pKID-KIX system came to consist of 6166 water molecules, pKID, KIX, and 18 ions. This complex was dissociated by a hightemperature simulation, as shown later.
We used the AMBER-based hybrid force field for the proteins. This force field is the combination of AMBER force field parm94 ( E 94 ) [31] and parm96 ( E 96 ) [32] with a mixing rate ω as 96 94 . We set 75 . 0 = ω because our previous works indicated that 75 . 0 = ω reproduces the optimal secondary structure preference for some peptides [33,34]. The TIP3P model [35] was used for the water molecules. After energy minimization, we performed a high-temperature (700 K) canonical MD simulation for each of the pKID and pKID-KIX systems to generate the initial simulation conformations for the following TTP-McMD. Figure 1 shows that this temperature was sufficiently high to demolish the native conformation of pKID and to dissociate the native complex. Although pKID was able to move freely in the solvent spheres, the structure of KIX was weakly

Trivial Trajectory Parallelization of Multicanonical Molecular Dynamics (TTP-McMD)
Before introducing TTP-McMD, we describe the conventional McMD, i.e., the single-run McMD. The single-run McMD is a canonical simulation at a temperature T (a constant-temperature method [40] was used as thermostat in this study) using the multicanonical energy mc E instead of the original potential energy E of the system, as In that equation, mc Z is the partition function for McMD, which can be regarded simply as a factor to normalize the distribution function mc P . To derive Equation (2), we used a formulation of statistical mechanics: , the partition function of the system). This flat energy distribution guarantees that the conformation can overcome the energy barriers and visit low energy conformational regions during the simulation. We refer to the entire conformational ensemble obtained from McMD as a "multicanonical ensemble." Then we can reconstruct a canonical energy distribution c P at any target temperature tag T from mc P as mc c tag mc mc To derive Equation (3), we used Equation (2) in the following form: The canonical conformational ensemble at tag T is constructed by assigning the probability ) , ( tag mc T E P to all conformations in the multicanonical ensemble.
McMD uses the probability density function ) , ( c T E P in Equation (1), which is generally unknown a priori (before simulation). Then, we must construct it self-consistently through iterative simulation runs, during which ) , ( converges to a precise function. TTP-McMD takes an advantage over the single-run McMD [19,20]. TTP-McMD is technically equivalent with performing independent multiple runs of single-run McMD starting from various initial conformations. The multiple trajectories generated are integrated simply into an ensemble. It is noteworthy that low-energy conformations (low-energy basins) distribute widely in the conformational space, which are spaced by high-energy barriers. Then, a single-run McMD takes a long flight while overcoming the barriers to reach the low-energy basins.
On the other hand, the multicanonical algorithm tries to ensure the flat distribution along the energy axis (Equation (2)) so that the conformation is expected to exist evenly in both the low-energy and high-energy regions. This evenness might cause a difficulty of single-run McMD, i.e., no convergence. The initial conformations for TTP-McMD were those shown in Figure 1(a) and (b), which were generated from the high-temperature canonical MD simulations, as described above. In this study, we performed 256 multiple runs for each system. To obtain the accurate ) , ( c T E P in Equation (1) . We used 256 computing cores (intel Xeon X5365 3.0 GHz), each core executed one run of TTP-McMD. In the equilibration stage (i.e., the stage to estimate E mc before the final sampling stage), the simulations were done for 21 ns and 23 ns in each of 256 trajectories (total of 5.4 μs and 5.9 μs) of the pKID and pKID-KIX system, respectively. The final runs were done for 8.51 ns in each of 256 trajectories (total time of 2.18 μs) and stored the snapshots every 5 ps for subsequent conformational analyses. The computation times for the equilibration stage were 33 and 76 days for the pKID and pKID-KIX system, respectively. Those for the final sampling stage were 13 and 28 days for the pKID and pKID-KIX system, respectively. As described above, we reset the simulation temperature from 700 K to 200 K. The multicanonical energies at the two temperatures are, respectively, Mathematically, the multicanonical ensembles from ) K 700 ( are expected to be equivalent: Consequently, the temperature reset is theoretically non-sense. However, the two simulations produce practically different ensembles. We use a polynomial function to approximate ) , (

Results a
The conf Section 3.1, unbound sta andscape of he KIX fra coupled fold bound states

The Con
The TTP −35000.0 k he multican "315K-ensem he NMR ex structure co helix more t helix conten he unbound Furthermore 315K-ensem ntrinsically

Free En
The      This result suggests that the high flexibility of pKID might help the pKID-KIX association because pKID binds to KIX without adopting a well-ordered structure. The final native-complex formation is completed after forming the various non-native complexes. Later in this report, we describe our examination of why the non-native low-free-energy fraction was larger than the native-like low-free-energy fraction in the free-energy landscape and why the α A helix is partially disordered, even in the native-like low-free-energy fraction (Figure 4(b)).
We found pKID bound to another site of KIX in 323 snapshots of the 315K-ensemble (Figure 4(d)). This site is a binding site for the activation domain of the mixed lineage leukemia (MLL) transcription factor [42]. The MLL-KIX complex structure (Figure 4(e)) shows that a segment of MLL adopts helix and binds to KIX, and the other parts are unstructured. In all of these snapshots, the α B region adopted helix to bind to the MLL binding site with the α A region unstructured. The orientation of the α B helix cylinder was approximately the same as that of the MLL segment in the MLL-KIX complex structure [43]. Consequently, the α B region corresponds to the MLL segment. In fact, both the MLL-binding site and the genuine α B -binding site on the KIX surface consist of hydrophobic amino acids. Furthermore, the hydrophobic residues in the α B region and the MLL segment have similarity; they contains φ-x-x-φ-φ motif (φ = hydrophobic residue and x = any residue), which is conserved in many KIX binding proteins (see Figure 9 of reference [43]). It is particularly interesting that in the presence of MLL, pKID binds to KIX with the two-fold higher affinity than pKID in the absence of MLL [44]. Our simulation results suggest that MLL might facilitate the pKID binding to the genuine binding site via blocking the MLL binding site.
The orientations of the α A and α B regions with respect to the KIX framework were investigated, respectively, using inner products, A Å, which involves the native-like (green circle in Figure 4(a)) and non-native (red circle) low-free-energy fractions. The histogram for B α I ( Figure 5(b)) has the largest peak at 1, which indicates that the α B region attaches to KIX with the same orientation as that in the final bound form in both low-free-energy fractions. We discussed the factors stabilizing this orientation later in Section 3.4. Recall that the α B region less adopts helix than the α A region in the isolated state ( Figure 2) and that the α B region seldom adopts native-complex form in the isolated state (Figure 3(b)).

Correla
The foldi between pK of side chain contacting. W assigned to contacts: 8 b esidues in simulation s We coun Nnnc, for ea he Nnc-Nn n complexe ncreasing n with a few ( egion acco encounter co unbound sta mechanism. he pKID-K non-native c es 2012, 2 ly, it is likel contrast, th with the exp the native l report that e 5. Orienta nd B α I are g ation betwee ing of α A a KID and KIX ns of two re We calculat a residue p between α A pKID and snapshots as nted the qua ach conform nnc plane. es where the native conta (ca. 5) nonrds to the omplex to r ate (Figure 3 In fact, the KIX associat contacts wit ly that the α he α A region perimental complex fo the α B regio ational prob given in the en Intermole and α B regi X. An interm esidues: If t ted the con pair if the pa A and KIX r KIX residu s "non-nativ antities of th mation. Figu Coupled fo e native con cts, the non -native cont induced-fit reach the n 3(b)), the in e formation tion, as disc thout waitin α B region bi n has a larg observation orm [27]. Fu on has a stro bability distr text (Equat  Nnc than th were able to ons of α A , ment [27], w diversity by α A and KI finity assign r population n selection.    were able t is direction 3), which philic residu restrained th Figure 8(b e iso-densit gh the high to al is ue he b) ty h-density site was slightly deviated from that in the NMR structure, the anchor effect of pS133 is clearly shown. Consequently, phosphorylation plays an important role for specifying the α B -helix orientation ( Figure 5(b)). We presume that this phosphorylation-induced restraint on the α B helix increases the binding affinity more than non-phosphorylated KID [25,26].

Conclusions
We performed the TTP-McMD simulations for the pKID folding and binding with KIX in explicit solvents. Although the overall property of pKID in the unbound state was disordered, pKID has the nascent helix propensity in the α A and α B regions in the computed conformational ensemble. The propensity for α A was stronger than that for α B , which agrees with an experiment described in the literature [30].
In the presence of KIX, the free-energy landscape at 315 K involved two low-free-energy fractions: The native-like low-free-energy fraction and non-native low-free-energy fraction. Because the α B region can bind to KIX with various non-native contacts (various encounter complexes), the α B region might provide fast association with KIX [8]. This landscape proposes an induced-fit mechanism for coupled folding and binding of the α B region: various encounter complexes are possible in the early stage, and the complex passes through the free-energy barrier to reach the native-like low-free-energy fraction. The well-oriented binding of the α B region was controlled by the phosphorylated SER133 located in the middle of the α A and α B regions. This control supports the higher binding affinity of pKID than KID, as observed experimentally [25,26]. In contrast to the α B region, the α A region exhibited high flexibility, which agrees qualitatively with that found in the NMR structure [27]. An earlier experiment [25] has demonstrated that the helix formation of α A is not important for binding to KIX. Consequently, the simulation supports the high binding affinity of α B and the low binding affinity of α A .
It is particularly interesting that the α B region bound to another shallow hydrophobic concave of KIX than the genuine pKID-binding site, where α B adopted helix. This hydrophobic concavity is the MLL binding site of KID, to which a segment of MLL binds with adopting helix [42]. It has been pointed out in an earlier report that the hydrophobic residue pattern of the α B and MLL segments have a similar hydrophobic amino-acid residue pattern [43]. In the presence of MLL, pKID binds KIX with the two-fold higher affinity than pKID alone [44]. We presume that MLL might facilitate the pKID binding to the genuine binding site by blocking the MLL binding site.
The current study demonstrated the importance of hydrophobic interactions between pKID and KIX. Because the multicanonical simulation is an efficient sampling method, the multicanonical trajectory can overcome high potential energy barriers in the conformational space. At the cost of this high performance, the trajectory does not provide information of time series. A conventional MD simulation may provide another important aspect, such as electrostatic interaction, on the complex formation.
Finally, it is noteworthy that the non-native low-free-energy fraction (red circle in Figure 4(a)) is larger than the native-like low-free-energy fraction (green circle) in the free-energy landscape and that the α A helix is partially disordered, even in the native-like low-free-energy fraction (Figure 4(b)). These points disagree with the NMR experimentally obtained results [27]. Presumably, this disagreement results from the truncation of pKID, which are unstructured in the NMR experiment. It is important to remember that the computed pKID segment is the only part deposited to PDB. The unstructured region might stabilize the α A helix more, which might result in an increase of the native-like low-free-energy fraction.