Target Flexibility: An Emerging Consideration in Drug Discovery and Design

Department of General and Inorganic Chemistry, UniVersity of Parma, Via G.P. Usberti 17/A 43100, Parma, Italy, National Institute for Biosystems and Biostructures, Rome, Italy, Department of Medicinal Chemistry and Institute for Structural Biology & Drug DiscoVery, Virginia Commonwealth UniVersity, Richmond, Virginia 23298-0540, Department of Pharmaceutics, UniVersity of Parma, Via GP Usberti 27/A, 43100 Parma, Italy, High Performance Systems, CINECA Supercomputing Centre, Casalecchio di Reno, Bologna, Italy, Dulbecco Telethon Institute, Department of Chemistry, UniVersity of Modena and Reggio Emilia, Via Campi 183, 41100 Modena, Italy, Department of Mathematics and Natural Sciences, Pharmaceutical Institute, Christian-Albrechts-UniVersity, Gutenbergstrasse 76, 24118 Kiel, Germany, Departments of Biochemistry & Molecular Biology, Computer Science & Engineering, and Physics & Astronomy, Michigan State UniVersity, East Lansing, Michigan 48824-1319, Department of Molecular Biology, MB-5, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037-1000, Molecular Modeling and Bioinformatics Unit, Institute of Biomedical Research, Scientific Park of Barcelona, Department of Biochemistry and Molecular Biology, UniVersity of Barcelona, Josep Samitier 1-5, Barcelona 08028, Spain, Department of Experimental Medicine, UniVersity of Parma, Via Volturno, 39, 43100, Parma, Italy, Department of Chemical, Food, Pharmaceutical and Pharmacological Sciences, UniVersity of Piemonte Orientale “Amedeo AVogadro”, Via BoVio 6, 28100 NoVara, Italy, Institute of Pharmacy and Food Chemistry, UniVersity of Würzburg, Am Hubland, D-97074 Würzburg, Germany


Introduction
Structure-based drug discovery has played an important role in medicinal chemistry, 1 beginning nearly when the first X-ray crystal structure of the myoglobin and hemoglobin proteins at near-atomic resolution were described by Perutz, Kendrew and colleagues. [2][3][4][5] Even though only static structures were (and still generally are) used for most structure-based drug design (SBDD a ) and indeed most molecular modeling, the importance of flexibility was recognized immediately: hemoglobin has two rather different structures, "tense" and "relaxed", depending on its oxygenation, although in recent years a family of relaxed hemoglobin structures with different tertiary structure conformations have been reported. 6 In fact, all proteins are inherently flexible systems. This flexibility is frequently essential for function (e.g., as in hemoglobin). Proteins have an intrinsic ability to undergo functionally relevant conformational transitions under native state conditions 7,8 on a wide range of scales, both in time and space. 9 In adenylate kinase large conformational changes due to movements of the nucleotide "lids" (ratelimiting for overall catalytic turnover 10,11 ) are "linked" with relatively small-amplitude atomic fluctuations on the picosecond time scale such that changes in the local backbone conformation are required for lid closure. 12 Nuclear receptors are modular proteins where a significant degree of conformational flexibility is essential to biological function. Most of the pharmacology of nuclear receptor ligands has been discussed on the basis of their ability to stabilize (or displace) a short R-helix segment (known as H12 or AF-2) localized at the carboxy terminus of the receptor in (or from) its conformation in the protein "active" form. [13][14][15] Available X-ray crystal structures show a surprisingly wide range of structural diversity in ligands binding to, and inhibiting, nuclear receptor proteins such as the farnesoid X-receptor (FXR). 16,17 Protein dynamics is also a key component of intramolecular and intermolecular communication/ signaling mechanisms and an essential requirement for the function of G-protein-coupled receptors (GPCRs), which are the largest known superfamily of membrane proteins. GPCRs regulate cell activity by transmitting extracellular signals to the inside of cells and respond to these signals by catalyzing nucleotide exchange in intracellular G-proteins. 18 Emerging evidence suggests that these receptors exist as homodimers, heterodimers, and oligomers 19 and act in multicomponent units comprising a variety of signaling and scaffolding molecules. 20 Thus, regulated protein-protein interactions are key features of GPCR function, and understanding these interactions in the dynamic cellular environment is a major research goal that may lead to new therapeutic approaches for this important target family.
In terms of medicinal chemistry and drug discovery, even the angiotensin-converting enzyme inhibitor captopril, credited as the first drug discovered using a protein binding site, binds to a protein (carboxypeptidase A) that was known to be highly flexible. 21 While most examples of successful modeling and computational drug design have been accomplished without true consideration of protein flexibility, this may be largely due to the fortunate result of relatively minor induced-fit adaptations of proteins upon ligand binding. Using flexibility as a criterion, we can classify three types of proteins: (i) "rigid" proteins, where ligand-induced changes are limited to relatively small side chain rearrangements, (ii) flexible proteins, where relatively large movements around "hinge points" or at active site loops, with concomitant side chain motion, occur upon ligand binding, and (iii) intrinsically unstable proteins, whose conformation is not defined until ligand binding. Currently, for technical reasons, the Protein Data Bank (PDB) 22 is artificially enriched in the first family, but genomic and proteomic projects have shown that the last two protein classes represent a very significant proportion of the proteome 23 and probably include many important therapeutic targets. It is also noteworthy that the steadily increasing availability of experimentally determined (Xray/NMR/cryoEM) protein structures has not appeared to have resulted in a similar increase in the success rate of structurebased design approaches, although the significant time lag between availability of a structure and its exploitation may be masking the true success rate.
These facts clearly suggest that too much emphasis, or perhaps even hope, has been placed on rigid structures, regardless of their experimental or theoretical origin, and especially when dealing with proteins expected or known to undergo large conformational changes during their biological function. There are probably two main reasons that this shortsighted approach has evolved and continued: first, the impact of the static appearance of X-ray crystal structures (and their often beautiful images!) on the perception of protein structure, i.e., that the crystal structure is always the "correct" structure, and second, the conceptual (and technical) difficulties of dealing with moving targets. There is one extremely obvious cause of this rigid "bias"; to reduce disorder and also to preserve precious crystals, crystallography data are now usually obtained at extremely low, nonbiological temperatures. Similarly, although proteins are somewhat solvated in their crystal lattice for crystallographic analysis, that is likely to be a flawed approximation of the true biological environment for many, if not most, proteins. In biological systems proteins express their functions in aqueous or semifluid environments and proteins in solution exist as an ensemble of energetically accessible conformations such that their three-dimensional structure is best described when all states are represented. For drug discovery, this is a radical paradigm shift from when captopril was invented, even if the active site flexibility of carboxypeptidase A was acknowledged at the time! Thus, because protein conformational changes are initiated or stabilized by ligand binding and are essential to the protein's own function, the ability to measure or simulate dynamic changes taking place in proteins upon ligand binding is becoming a central issue in the design of bioactive compounds. The essence of the problem for drug discovery is that for a flexible target it is not known in advance which conformation the target will adopt in response to the binding of a particular ligand or how to design such a ligand for an unknown conformation.
In the fall of 2007 the authors of this Perspective participated in a course organized at the University of Parma (Italy) entitled "From Structural Genomics to Drug Discovery: Modeling the Flexibility" (www.course07.unipr.it). While we approach this problem from a wide variety of directions, both experimental and computational, there was a very strong agreement that the "flexibility era" for drug discovery was rapidly approaching and that there were a number of points of consensus among the authors of this contribution on how flexibility could be and should be exploited for drug discovery.
While techniques such as fluorescence spectroscopy, 24 spin label electron paramagnetic resonance (EPR), 25 and small angle X-ray scattering 26 have significant applications in elucidating some aspects of protein flexibility, X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy are the key technologies for characterizing molecular structure and will be the focus of this Perspective. As will be described below, these two tools have been evolving with respect to describing molecular flexibility. Molecular dynamics (MD) and molecular docking have developed as separate subdisciplines of computeraided molecular design, but at least for the purposes of drug discovery for flexible targets, these techniques and tools are complementary (see Figure 1). In this Perspective we will first describe some of the innovations in both experimental measurement and computational modeling of flexibility that we believe are going to impact strongly the future success of computeraided molecular design. Thus, this Perspective will not exhaustively cover all methodological approaches that can be used to address the complex issues of protein dynamics but instead reflects the themes of the course. We are excited about these prospects and offer some guidelines and forecasts for working under this new paradigm.

Characterizing Flexibility in Biomacromolecules
High-resolution experimental techniques such as X-ray crystallography and NMR usually only provide snapshots of one or just some of the conformations that are accessible to proteins. However, synchrotron X-ray sources have recently opened up the possibility of time-resolved measurements on single crystals. With the superior resolution of these sources it is also possible to evaluate the electronic oscillations around single atoms and obtain the probability density for the atom. While NMR has a number of limitations, it does have the seemingly large advantage of being performed under conditions and in solutions that mimic the biological environment. Generally the results from NMR are ensembles of low energy conformations that satisfy the coupling energy constraints displayed in the multidimensional spectra. Importantly, as the field strength of NMR spectrometers increases, not only is the size range of proteins amenable for study increased but the resolutions of the spectra are enhanced, leading to the identification of more conformers. Also, new and more sophisticated NMR pulse sequences have allowed the extraction of more detailed 3D structural data.
Molecular dynamics simulation is at present the best means to obtain a more complete set of protein conformers, especially for those at relatively high energy and not detectable with the current set of experimental tools. While the starting conformations for MD are, by necessity, crystallographic, NMR, or computationally-built models, the resulting trajectories of conformations or atomic motion can be thought of as "movies" showing the somewhat random motion of a protein at a given temperature. Sampling of these conformations can provide a set of unbiased structures for analysis with docking, virtual screening, or other computational approaches. However, the stochastic nature of molecular dynamics, along with the rather limited time scales of simulations (typically only a few tens of nanoseconds or less), and the presence of high energetic and entropic barriers in flexible molecules can render many of the conformational models uninteresting or redundant because only a small subset of the available models is obtained.
On the other hand, docking and virtual screening have progressed from placing rigid ligands into rigid sites in its earliest incarnations 27 to placing flexible ligands in rigid sites [28][29][30][31] to the current state-of-the-art of placing flexible ligands in semiflexible sites. [32][33][34][35][36][37][38] In all of these cases the initial site model usually has its origin in experimental structure data. The challenge for drug discovery, as in docking or virtual screening, is to model the receptor plasticity that enables binding partners to conformationally adapt to one another. First, one needs to understand what can move and how; second, this knowledge needs to be transformed into a useful and reliable docking algorithm. The predisposition of proteins to undergo functionally relevant conformational transitions provides a route to this requirement, as it implies the pre-existence of conformations even in the absence of triggering events like ligand binding. The ligand "selects" the proper conformation from the ensemble of rapidly interconverting species. [7][8][9] In this approach, computational investigations of conformational fluctuations of unbound receptors with MD should reveal conformational states adopted by the analogous bound receptor. Alternatively, the problem of modeling conformational changes can be simplified by focusing on what has been learned recently about protein side chain motions that occur upon natural ligand or inhibitor binding. [39][40][41][42] Thus, the active site can be rationally adapted to the incoming ligand, i.e., by "induced fit", which may also take advantage of low-energy conformational changes. The opposite can also be true: consider that the "soaking" method of inserting ligands into preformed protein crystal lattices to form crystals of the complex, which works for some but certainly not all cases, suggests that static lattices may, in fact, select ligands.

X-ray Crystallography
X-ray crystallography is generally considered the gold standard technique in producing experimental structural models of biological macromolecules. In quantitative terms (i.e., the number of structures deposited at the Protein Data Bank) X-ray crystallography has been the most productive structure elucidation tool for biomacromolecules. Indeed, a highly detailed picture emerges from X-ray diffraction analysis of a crystal that contains an active biomacromolecule. The standard use of macromolecular crystallography results in a static-, time-, and space-averaged structure that unfortunately only poorly represents the ensemble of conformations of the protein in action. Part of this may be attributed to crystallization being a "purification" in that only molecules with conformations compatible with the growing crystal lattice will be "frozen". The accuracy of a static crystal structure is defined by the resolution to which X-ray data have been measured. Numerous Figure 1. X-ray crystallography, NMR spectroscopy, and computational molecular dynamics. (a) Crystal illustrating the uniformity of the protein molecular structures within their unit cells necessary for X-ray diffraction. Ordered water molecules (not shown) partially hydrate the structure. (b) Protein molecule in solution for NMR experiment. Ordered water and other (not shown) ions solvate the structure, but many other solvent molecules are not structurally ordered. (c) Protein immersed in virtual (water) solvent for computational molecular dynamics with periodic boundary conditions. (d) Diffraction pattern from an X-ray data collection. (e) Typical 2D NOESY spectrum (http://www.sanger.ac.uk/Users/sgj/thesis/html/node86.html) from protein NMR. (f) Molecular dynamics potential energy as a function of simulation time for coarse-grained motions. (g) Electron density maps for histidine and lysine residues. The density around more labile (high B-factor) atoms is either diffuse or nonexistent (e.g., the NZ atom of lysine). (h) Typical backbone traces for structures meeting constraints determined by NMR NOESY experiments. The high degree of flexibility at either end of the structure is evident. (i) Protein structure illustrating (see arrows) typical large scale motions that may be observed with computational molecular dynamics. experimental factors contribute to the observed resolution, some related to the size and quality of the crystal itself, others to the brightness of the X-ray source employed and to experimental conditions. While X-ray diffraction is probing electron density, neutron diffraction (available from nuclear reactor or spallation sources) probes the atomic nuclei and can thus experimentally locate hydrogen positions because of inherently high resolution; the downside is that rather large crystals are required. Typically, a resolution of less than 2 Å is generally considered good crystallographic data for a protein structure, while a resolution of around 1 Å is termed "atomic". Another set of quality metrics for crystallography, the R-factor or free R-factor, are measures of how well the refined structure fits the observed data or, in other words, the percentage difference between the measured electron density and that of the refined model. 43 Finally, each atom in a reported crystal structure will have a B-factor (or temperature factor) that in its most basic sense represents the atom's individual uncertainty in position, whether due to thermal motion, occupancy, experimental and modeling artifacts, or other effects. 44 In recent years however, mainly because of the availability of powerful synchrotron X-ray sources, several new approaches have been developed that go beyond the classical static crystallographic analysis. Two major contributions can be attributed to these advances: (i) time-resolved measurements either on a series of crystals or using Laue diffraction for fast data collection can detect transient chemical states in the crystal; (ii) it is possible to more precisely evaluate the oscillations of an atom around its position, thereby providing information on the dynamic properties of a protein in a crystal lattice. The latter approach is based on the modeling of an atomic displacement parameter, i.e., the thermal parameter or B-factor described above, which can be thought of as a probability density function for the location of each atom in the protein. At low resolution, modeling of this parameter is restricted to a spherical or "isotropic" shape, but at atomic resolution, an ellipsoid or "anisotropic" model can be applied. 45,46 This anisotropic model provides both the magnitudes and directions of movement of each atom and its inclusion in a model determined at high resolution allows a dynamic description of the protein structure. Thus, at atomic resolution, analysis of the anisotropic thermal parameters allows extraction of the direction of motion, possibly leading to better detection of both subtle changes and larger scale motions (see ref 47 and references therein). Figure 2 illustrates the high resolution (1.05 Å) X-ray crystal structure for Mycobacterium tuberculosis FprA, 48 which was solved with anisotropic thermal parameters, and revealed an unexpected chemical conversion of NADP + to NADPO.
The dynamics of a biological system or process can be reconstructed in the form of a "movie" prepared by assembling a series of static structures corresponding to various states along a reaction pathway. Such an approach has been employed to visualize transitions from one type of folding motif to another, conformational changes associated with different types of ligandinduced structural adaptations, alternate motions between welldefined distinct conformations, changes in quaternary structure, subtle side chain movements in the interior or on the surface of a protein, and other structural rearrangements. 49 By exploitation of synchrotron radiation and a multiwavelength data collection technique known as the Laue method, "kinetic crystallography" can be performed to obtain the necessary set of static structures on relevant reaction pathways. When biological turnover is initiated in the crystal with light or other radiation, the formed transient structural species can be seen together with the associated structural rearrangements (see refs 50 and 51 and references therein). Although movements occurring on a time scale that is faster than the time required to determine the structure by X-ray crystallography cannot be detected, timeresolved crystallography, when experimentally feasible, can provide an important contribution in describing protein flexibility. There is an ever-increasing synergy between kinetic crystallography and in crystallo UV/visible absorption and fluorescence, fluorescence lifetime, and Raman spectroscopies, in combination with various evolving physical trapping (e.g., temperature) and chemical trapping (e.g., adjusting solvent, pH, etc. to manipulate concentrations of chemical intermediates) strategies. These and other combinations of methods will be increasingly employed in the future to provide a dynamic view of biological processes carried out by proteins and to give insight on how to influence these processes with chemical agents. Indeed, a stunning example was published very recently on superoxide reductase, an iron containing enzyme that neutralizes the highly cytotoxic superoxide radical produced by oxygen metabolism. 52 By combining kinetic crystallography with Raman spectroscopy, the authors have been able to film the enzyme in action, 53 thus providing an invaluable contribution for the understanding of its catalytic mechanism at the molecular level.

Nuclear Magnetic Resonance Spectroscopy
Nuclear magnetic resonance spectroscopy is an alternative method used to determine the three-dimensional structure of proteins. In contrast to crystallography, NMR experiments are performed in solution and thus can allow direct observation of the physical flexibility and the dynamics of their interactions with other molecules. Also, in contrast to X-ray crystallography, NMR studies of biomacromolecules provide an ensemble of low-energy conformations of the molecule that satisfy specific geometric criteria determined by the experimental protocol. While each conformation alone can be thought of as a static snapshot of the molecule, together they provide a dynamic representation of the protein. To obtain structural data for biological-scale molecules, multidimensional multinuclear NMR is used where the off-axis cross peaks encode information regarding the interaction between nuclei in the molecule, which in turn relates to the distances between atoms. It is key to extract The FAD cofactor and a covalently modified NADP + (labeled NADPO) identified in the 1.05 Å resolution crystal structure (PDB code 1LQT) are highlighted. 48 Shown in blue is the 2Fo -Fc map contoured at the level of 2σ above its mean. Figure was prepared using PyMOL. 168 as much information as possible about these interatomic distances; thus, each peak must be assigned to its particular nucleus in the biomolecule. Spectral analysis is primarily focused on the association of the position of the individual NMR lines in the spectrum (chemical shift) to a specific nucleus ( 1 H, 15 N, or 13 C) of the protein. Although the chemical shift for an atom is primarily determined by the atomic connectivity of the amino acid residue, it can also be affected by the interactions with the solvent and by the involvement of that residue in the protein's secondary and/or tertiary structure. Various strategies of applying pulse sequences, magnetization transfer, and isotopic labeling ( 15 N and 13 C) are employed to resolve the peak assignments. As a result of dramatic advances of the technique, in terms of both hardware and software, the range of protein size amenable for NMR structure solution has been significantly extended to 80-100 kDa. 54,55 High resolution structure solutions of proteins embedded in membrane systems have also become possible. 56 Recently, a new method of determining the structure of complexes in solution based on the changes in chemical shift that occur when a ligand binds to a receptor has been described. 57 The strategy used to derive 3D structure from NMR data is to start with a randomly folded structure derived from the primary sequence. Then the structure is optimized using either a molecular dynamics/simulated annealing protocol or distance geometry against the NMR-derived distance and torsion angle data (restraints) combined with empirical data (e.g., known bond lengths and angles) to reach a minimum potential energy. In general, there will be a family of structures satisfying the NMR constraints. Interestingly, the rmsd (root-mean-squared deviation from the minimum energy structure) turns out to be different for different regions in the structure. For example, flexible regions without secondary structure, e.g., loops, show a relatively larger deviation because these regions have fewer constraints.
Protein functionality can usually be associated with backbone and side chain dynamics. Figure 3 illustrates backbone dynamics data derived from 15 N-1 H NMR experiments on mutant Sm14-M20(C62V), 58,59 a fatty acid binding protein found in Schistosoma mansoni. S. mansoni is a significant parasite of humans and one of the major agents of schistosomiasis. These data show a direct correlation between the decrease of the protein flexibility, mainly in the loop regions, and ligand binding. There are larger differences in the NMR-derived generalized order parameter S 2 between the apo and holo forms of the protein in these regions and near the residues involved in binding (see Figure 3). 60 It is worth emphasizing that protein movements span a broad range of time scales and NMR is the only currently available technique that can monitor and discriminate these molecular processes. Nuclear spin relaxation rate measurements report on fast (<ns) and slow (µs to ms) internal motions and can allow unraveling of the intermingled effects of static disorder, coherent intramolecular motions, and chemical exchange processes but can also determine molecular rotational diffusion (5-50 ns). The determination of the rates of magnetization transfer among protons with different chemical shifts and of proton/deuterium exchange, instead, reports on very slow movements of protein domains (ms to days) and provides insight into conformational exchange processes. 61 On the basis of the results obtained from these NMR experiments, it is possible to characterize both the thermodynamic and kinetic features of interactions with other molecules (either macromolecules or low molecular weight ligands). Eisenmesser et al. 62 demonstrated that dynamics can be monitored during enzyme catalysis at multiple sites by means of newly reported NMR relaxation dispersion experiments that probe molecular motions in the µs to ms time scale with higher sensitivity than other relaxation experiments. 63 By use of this technique, the motions in free cyclophilin A were compared with those during turnover; this comparison showed that the motions are collective, propagating from the active site to remote sites.
Some proteins, such as partially folded polypeptide chains, are difficult to crystallize, and even if crystals can be obtained, the chain segments that are disordered in solution either will be ordered by intermolecular contacts in the crystal lattice or will remain disordered in the crystal. In these cases, NMR is capable of providing structural information and an indication of the rate of the processes that mediate the transitions between the discrete structured states present in the conformational space spanned by the ensemble of NMR conformers. 64 While there are intrinsic experimental difficulties and limitations with NMR structure determination, its ability to reveal flexibility and the fact that it is obtained in solution at conditions more similar to the conditions in vivo, where biomolecules are active, are compelling. Damm and Carlson 65 showed, in a comparison between 28 NMR structures and 90 crystal structures of HIV-1 protease using a method termed multiple protein structure (MPS), that NMR-MPS is better able to represent protein flexibility because the ensemble of NMR-derived conformers possesses a greater structural variation.

Computational Molecular Dynamics
The experimental data derived from X-ray crystallography and nuclear magnetic resonance provide a framework for understanding the structure and flexibility of proteins and other biomacromolecules but cannot illuminate all of the details of the motions that these molecules undergo. As stated above, only relatively long-lived and more populated (i.e., lower energy) states will be observed and recorded by these methods. It is not unlikely that some ligands may bind to and stabilize higher energy states that are, for example, transitional between lower energy conformations. In addition, several experimental limits, i.e., protein molecular weight, protein solubility, time required for the analyses, crystallization difficulties, etc., affect either or both of these experimental techniques and prevent them from being applicable to all biomacromolecules of interest. In particular, the membrane-bound proteins, e.g., G-protein-coupled receptors, have proven very difficult to crystallize for examination by X-ray crystallography and are often too insoluble for NMR analysis (vide infra). Computational approaches, and in particular, molecular dynamics (MD), can generate large numbers of protein conformations to be used in docking analyses and virtual screening experiments. 66,67 The impact of flexibility on docking will be discussed in the next section, but it is clear that just considering ligand or protein side chain flexibility may be inadequate for some (or many) cases and that coupling molecular dynamics with docking in some manner is better able to represent the accessible conformational space. [68][69][70][71] Molecular dynamics approaches are often characterized by scale, depending on the nature and size of the system to be analyzed, with reference to the ubiquitous compromise between speed and accuracy (or level of detail). Both coarse-grained and atom-level simulations will be discussed below, followed by a description of a relevant and current problem in drug discovery requiring multiscale MD analysis. Figure 4 illustrates for the estrogen receptor R (ERR) how more than one of these scales can be relevant within the same modeling system. We must acknowledge a key technological advance that underlies all aspects of this research, but particularly MD. It is interesting to note that the first MD simulation of a biomolecule is usually attributed to McCammon, Gelin, and Karplus who generated a 9 ps trajectory of the bovine pancreatic trypsine inhibitor protein (886 atoms) in 1977. 72 Recently, a 50 ns dynamics simulation of an entire virus containing one million atoms was reported. 73 In simple terms this is about a 10 7 -fold performance increase in about 30 years due to advances in parallel computer architectures coupled with algorithms and software able to exploit them efficiently. This trend is expected to continue for the foreseeable future.
Coarse-Grained Approaches. Coarse-grained models, where some of the fine atomistic details are usually smoothed over or averaged out, were developed for investigating longer timescale dynamics. Information on general protein flexibility can be obtained by using simple normal-mode analysis with simple Hooke's-law-like potentials as shown in eq 1, where it is assumed that biologically important deformations (including those resulting from ligand binding) follow one or several of the natural deformation modes of the protein: where k d is a distance-dependent (or distant-independent) force constant, d ij is the distance between residues i and j, and d ij 0 is the distance between residues i and j in the experimental structure. Potentials similar to those in eq 1 can be easily implemented into Brownian dynamics algorithms, which present several advantages with respect to normal-mode analysis. For example, (i) the effects of time can be incorporated; (ii) several protein molecules can be considered simultaneously, opening the possibility of studying flexibility related to protein-protein interactions; (iii) ligand-protein interactions can be generated during the trajectory; and (iv) solvent effects can be introduced by adding suitable residue-based potentials. An alternative to Brownian dynamics is discrete dynamics, where harmonic potentials are replaced by square wells delimited by finite or infinite energy walls. The advantage of discrete dynamics is that no integrations of Newton's equations of motions are required, since the residues move in a space of constant velocity until they hit an energy wall, at which time an elastic collision is assumed.
While the coarse-grained dynamic techniques can provide surprisingly accurate information on general protein dynamics, including deformation leading to the formation or reshaping of binding pockets, the finer details are lost, which can lead to erroneous results when the resulting models are used as targets in structure based drug discovery.
Atomic-Level Molecular Dynamics. Atomic-level-of-detail molecular dynamics simulations allow the simultaneous representation of both small atomic fluctuations and large protein movements. The main limitation of this technique is its computational cost, which has limited its general application. A critical issue in this approach is the extent to which the MD simulation is able to sample the conformational states accessible to the protein (or protein/ligand complex). Indeed, despite vast improvements in computer power (and the parallelization of many MD codes), most of the published MD simulations are still limited to just a few ns, where the probability of sampling multiple biologically significant minima is rather low. The true extent of the achieved sampling should be evaluated in this case. Several computationally inexpensive techniques can be employed once the MD trajectories have been collected. For example, essential dynamics allows filtering of noise from essential motions, and cosine analysis of the eigenvectors may allow estimation of the separation of "real" motions from random diffusion. Even simpler is comparing the experimentally obtained crystallographic B-factors (vide supra) with the rmsd fluctuation per residue as extracted from the MD trajectory. A qualitative correlation between B-factors and rmsd fluctuation is a sign that the MD simulation has only wandered around the crystallographic minimum and not effectively sampled conformational space. In other words, the MD is producing little more than "wiggles" and not generating new energetically accessible conformations that can be used as targets for docking or virtual screening.
The very clear necessity of documenting protein flexibility and the reliability of MD simulations has recently resulted in the MODEL (molecular dynamics extended library, http:// mmb.pcb.ub.es/MODEL) project. MODEL is a massive plan to provide the community with a database of protein flexibility covering all unique proteins in the PDB. At present it contains information (∼10 ns simulation time) on the dynamics behavior in water for around 1500 proteins, covering most of cluster-90 in the PDB; i.e., it is representative of all structurally known proteins. The MODEL relational database contains more than 12 terabytes of data and provides atomic-level-of-detail samplings, usable to improve ligand docking, to analyze the existence of hinge points, to study ligand diffusion to binding sites, and to predict potential deformation movements that can alter the protein structure. MODEL is now being extended to cover protein-protein complexes and to analyze ligand-induced changes in structure and dynamics for a subset of proteins for which both bound and unbound structures are known. This project has only been possible with the newest generation of supercomputers and enhancements in MD algorithms that can now produce state-of-the-art trajectories approaching biologically relevant time scales (µs to 1 ms). In fact, systematic analysis of the dynamics of the entire proteome is now possible. For applications in rational drug design, the accessibility of an ensemble of structures for a given protein instead of only a single conformer multiplies the possibility of success in docking and virtual screening procedures.
Ensemble docking (vide infra) employs a small number of carefully chosen receptor conformations, obtained from X-ray, NMR, or MD simulations. The goal is to increase the probability that the ligand will dock, because there are multiple, hopefully somewhat diverse, targets. While experimental conformations from multiple X-ray or NMR derived structures would be most desirable, molecular dynamics simulations are more accessible for most proteins and can generate protein "receptor ensembles" adequate for docking ligands for lead discovery or refinement. 67 Ensemble docking will be a topic for the next section, but there are a few MD considerations that should be addressed here. First, the extent of diversity in conformations required for successful docking may be modest as long as a number of distinct conformational states are generated. The results thus obtained, however, must not be overinterpreted, especially if ligands with significantly different profiles are studied, and significant conformational changes are expected to take place. In this case, the extent of the achieved sampling should definitely be evaluated. A second potential element of bias may be due to the conditions under which the MD simulations are carried out. For example, an ensemble docking carried out on snapshots of an MD simulation of a protein complexed with a given chemotype will more favorably score ligands structurally related to that chemotype compared to those of other chemotypes. This same bias also applies to experimentally determined conformations, again suggesting caution that the results not be overinterpreted.
Multiscale MD Simulations: G-Protein-Coupled Receptors. Multiscale simulations, i.e., involving both coarse-grained and atomic level-of-detail molecular dynamics, are virtually the only way for structurally investigating the biological properties, function, and molecular interactions of signaling proteins like GPCRs for which experimental structural data are not available.
Here, GPCRs will be used to illustrate the vital structural and functional information that can be obtained when applying MD simulations. GPCRs are allosteric proteins that transform extracellular signals into promotion of nucleotide exchange in intracellular G proteins. As such, they are composed of regions of high stability (low flexibility) and regions of low stability (high flexibility) that communicate with each other by transmitting signals between the extracellular region and the distal intracellular region. GPCRs exist as complex statistical conformation ensembles. 66,[74][75][76] Their functional properties are related to the distribution of states within the native ensemble, which is differently affected by ligands, interacting proteins, the lipid membrane environment, and/or amino acid mutations. 66,74,77 Thus computational modeling of GPCR function requires effective integration of supramolecular modeling and multiscale simulations.
Difficulties in understanding GPCR mechanisms of function are perhaps primarily due to the lack of high resolution structural information on these proteins. The data currently available are for rhodopsin, the cornerstone of family A GPCRs in its dark (inactive) state, 78 and the human 2 -adrenergic receptor-T4 lysozyme fusion protein (at 2.4 Å resolution) bound to the partial inverse agonist carazolol. 79,80 These structural models, especially rhodopsin, are suitable templates for comparative modeling of the many homologous receptors. 66 Additionally, the structural model of a photoactivated deprotonated intermediate of bovine rhodopsin, reminiscent of metarhodopsin II (MII) (PDB code 2I37, 4.15 Å resolution), 81 has been recently released. Interestingly, this structure revealed unexpected structural similarities between the dark and photoactivated structures, in contrast to earlier predictions of large activation-associated conformational rearrangements. 81 Much effort has been applied in the past decade or so in elaborating a computational strategy to infer the mechanisms of intra-and intermolecular communication in GPCRs of the rhodopsin family. 66,82 The computational approach consists of comparative or ab initio modeling, ligand-protein and protein-protein docking followed by comparative MD simulations and analyses. Extensive MD analyses are instrumental in inferring the most significant structural features that make the difference between free and bound states of the proteins. Reducing the system's degrees of freedom by employing implicit membrane models and intrahelix distance restraints facilitates detection of the essential motions or structural changes that may correlate with receptor and G protein functionality. Potential mechanisms of ligand-or mutation-induced receptor activation and of receptor-induced guanosine diphosphate (GDP) release from the G protein can be obtained by comparing MDgenerated average structures that are representative of different receptor and G protein states.
MD simulations of the communication between the sites of activating/inactivating mutations or of ligand binding and the putative G-protein-coupling domains in members of the glycoprotein receptor subfamily, 66,82-84 as well as receptors for seroto-nin, 85 melanine-concentrating hormone, 86 and thromboxane A2 (TXA2), 87 suggest that activating ligands (agonists) or mutations communicate with a distal cytosolic receptor domain near the highly conserved "E/DRY" motif by inducing a perturbation in the interaction pattern of the E/DRY arginine. This increases the solvent accessibility of some amino acids compared to that in inactive forms. 66,82,83,87 This communication is two-way in the sense that perturbation in the cytosolic domains can also be associated with structural changes in the extracellular receptor portions at the agonist binding site. This has been demonstrated for the κ opioid receptor. 88 Likewise, activating mutations in different sites of the receptor helix bundle have the same effect. 66,[82][83][84] In spite of the tremendous structural diversity of the different GPCR agonists, a few critical interactions appear to be needed for activating ligands in establishing proper communication with the G protein coupling domains. 66,82,85,87 The role of ligand binding, independent of its activating or inhibitory effect, does not appear to be limited to conformation selection but instead promotes new conformational states unlikely to be explored by the unbound receptor forms. 66,82,85,87 Developing an understanding of this complex and dynamic interplay will ultimately lead to improved design criteria for drugs targeting a wide range of disease states involving GPCRs.
In a recent study of the TXA2 receptor, the increase in solvent accessibility around the E/DRY receptor motif in response to agonist binding appeared to favor the docking of the C-terminus of the Gq R subunit of the G protein between the cytosolic ends of selected receptor helices. 87 The establishment of interactions between agonist-bound receptor and G protein is, in turn, instrumental in favoring the formation of a GDP exit route between selected portions of the R-helical and Ras (GTPase)like domains ( Figure 5). 87 This is the first example in which comparative MD analyses highlighted the potential players in the communication between the binding site of the receptor agonist and that of GDP, which are almost 70 Å apart. 87 Practical Considerations in Applying MD. Molecular dynamics seems to represent the most affordable and accessible method to produce many protein conformations at reasonable cost. Indeed, thanks to the availability of software and adequate hardware, MD has become a very popular tool for SBDD and other modeling tasks. It may even seem that MD is being overutilized when it is applied to problems where simple energy minimizations would suffice for structure optimization. Importantly, the setup of MD simulations is certainly far from trivial, as is the interpretation of results. 89 There are a number of issues related to widespread usage of molecular dynamics that should be briefly described here as a caution to the casual MD user. As mentioned above, MD remains an expert system; medicinal chemists and other nonspecialists should not view the computers or MD software as the legendary "black box".
First, MD simulations are often referred to as "computer experiments" and should be regarded as such. Second, the lack of standards in file formats for structures, force fields, and trajectories is a major stumbling block to universal acceptance of the technology. Third, MD simulations of proteins or other biomolecules require many conformational degrees of freedom, which is further complicated when the environment of the molecules, e.g., solvent or membrane, is also modeled. Fourth, and this is of particular interest for application to docking to relevant target structures, there are technical difficulties in accurately sampling the conformational space of large macromolecules where the energetic and entropic barriers are high with respect to the thermal energy at physiological temperatures. Fifth, it is not obvious to the nonexpert which force field and/ or MD program should be used for a particular system. Commonly used force field families include AMBER, 90 CHARMM, 91 GROMOS, 92 and OPLS. 93 While studies have shown that in most cases all these force fields give qualitatively the same results, there can be subtle differences, and they are all approximations to the real potential energy surface. Common MD programs for simulating biomolecules include CHARMM, 91 AMBER, 90 GROMACS, 94 and NAMD. 95 The factors that determine the choice of the program include the force field compatibility and also the operation of the code on the particular computer hardware. Two aspects of an MD program's suitability for a hardware configuration are considered: the "performance" and the "parallel scalability". The first is measured in terms of picoseconds of simulation time per day on a single processor, while the second is a function of the number of computing nodes or processors allocated, normally expressed as "speedup", which is the ratio of the performance for N processors with respect to that for one processor. Most MD programs show good parallel scalability only up to a certain number of processors, after which  80 TXA2 and the G protein R-, -, and γ-subunits are, respectively, colored in green, gray, violet, and magenta. The GDP molecule is colored by atom type. Red dots indicate the solvent accessible surface of GDP, which is exposed to solvent upon receptor binding. 80 Only the intracellular half of the receptor is shown. communication costs between nodes begin to degrade performance. GROMACS often is cited with the best performance for small numbers of nodes, while NAMD currently appears to have the best scalability and is more suitable for systems with many processors. Lastly, it is not just total CPU time that should be considered in MD but also the "global elapsed time" that is "real" time used before final results are obtained.

Molecular Docking and Virtual Screening
There are many aspects of molecular docking and its variations like virtual screening that deserve significant attention with respect to the evolution of SBDD, but most are beyond the scope of this Perspective and have been discussed elsewhere. [96][97][98][99] In particular, scoring function development, e.g., consensus methods, 100-102 QM/MM-based free energy perturbation methods that include high quality quantum-derived parameters for novel small molecules, as well as desolvation terms, [103][104][105][106][107] or empirical free energy functions, 108-111 has received recent attention. Here, while acknowledging the critical future role for better scoring functions, we are focusing on the interface between docking and flexibility.
It is well-accepted that proteins and in particular their active sites do not exist in single frozen conformations, perfectly sculpted to the shape of the incoming ligand. As described above, the beauty of X-ray crystal structures belies the uncertainties in both the technique and the molecular structure itself. However, despite this knowledge, docking experiments often begin with the false assumption that the protein can be represented by a single structure. This assumption can remain successful when there is no significant induced fit structural rearrangements upon ligand binding 30 (or when the active site has been preformed to recognize a particular class of ligands). Nonetheless, much effort over the past several years has been expended in developing new algorithms and docking programs that allow flexibility in fitting and scoring flexible ligands or ligand candidates in flexible binding sites. The first and considerably less time-consuming approaches included protein flexibility in docking experiments by simulating the possible movements of active site side chains; e.g., version 4 of the popular program AutoDock 29-31 introduced the ability to include explicit protein side chain flexibility. 34,112 GOLD uses a similar approach by allowing a limited number of protein side chains to sample alternative rotameric conformations 37 and also rotating terminal hydrogen atoms to optimize hydrogen-bond interactions. FlexE selects from alternative side chain conformations observed in other crystallographic structures to model flexibility. 32 SLIDE uses mean-field optimization to rotate protein or ligand side groups sufficiently to remove intermolecular van der Waals overlaps while docking. 40 Other approaches involve constrained geometric simulations, 113 invoking elasticity network theory, 114,115 or ensemble docking to structure families arising from dynamics, rotamer libraries, NMR, or X-ray crystallography, 33,35,[116][117][118][119][120] Monte Carlo methods, 121 protein structure prediction techniques, 122,123 or virtual alanine scanning and refinement. 124 DOCK3.5.54 38 evaluates complementarity of "components" or independently moving regions of the receptor. These components in combination give rise to a comprehensive ensemble of receptor conformations, yet the algorithm scales linearly with respect to the receptor's degrees of freedom. Other approaches use MD to effectively optimize the docked solutions obtained from rigid-receptor docking experiments in a postprocessing step. 125 Finally, an algorithm employing "flexibility trees" has been recently described. 126 This approach greatly reduces the computational overhead for modeling receptor flexibility during docking.

Side Chain Flexibility.
Kuhn and co-workers demonstrated the importance of including local flexibility with a study that docked the bound conformation of a series of known ligands to the ligand-free (unbiased) protein conformations. Only 9 out of 42 known thrombin ligands and 9 out of 15 glutathione S-transferase (GST) ligands could be docked without steric clashes when side chain flexibility was neglected. 40 However, 90% of the same ligands could be docked within 1.3 Å rmsd of the correct atom positions, on average, when small-scale side chain flexibility was modeled using the docking and screening tool SLIDE. 127 Beyond the necessity of modeling flexibility to capture known inhibitors or substrates in protein complexes, accurate sampling of low-energy protein conformers allows these structures to be used as alternative targets for inhibitor design and screening.
Fortunately, the challenge of flexibility modeling is simplified enormously by the fact that most protein side chains undergo only small motions during ligand binding. In a further study of 63 protein-ligand crystallographic complexes, 83% of all active-site side chain bonds that rotated in 32 thrombin-ligand complexes were observed to rotate 15°or less relative to the ligand-free structure. The same was true for 91% of the GST side chain rotations in 13 complexes and 75% of side chain rotations in 18 other, nonhomologous complexes. 40 This is consistent with the finding that ligand binding often induces strain or nonrotamericity in side chains. 128 The dominance of small side chain rotations is very good news because relatively simple energy minimization or steric optimization procedures are often sufficient to model them. 129,130 Why do active-site side chains typically move so little upon ligand binding? Studies performed on 30 low-homology protein complexes and on the corresponding ligand-free structures indicated that preservation of direct intraprotein hydrogen bonds is the main reason (see Figure 6a). 131 About 75% of all intraprotein hydrogen bonds in these sites are preserved upon ligand binding, and the percentage of main-chain hydrogen bonds preserved is even higher (typically 85-100%), as indicated by the values for Ala, Ile, Gly, and other residues lacking side chain hydrogen-bonding groups. However, the picture reverses when water-mediated hydrogen bonds in ligandbinding sites are considered: from the same 30 pairs of ligandbound and free structures (Figure 6b) 50-80% of watermediated intraprotein hydrogen bonds are observed to break upon ligand binding. 131 Thus, ligand-binding sites can be considered to be partitioned into preorganized regions consisting of directly hydrogen-bonded groups within the protein, and other regions that are readily reorganized because of the plasticity of their water-mediated hydrogen bonds. This allows a simplifying divide-and-conquer strategy in which most of the flexibility sampling effort in ligand binding sites can be focused on the hydrogen bonding groups that are not yet satisfied by intramolecular hydrogen bonds, exposure to solvent, or interaction with the ligand.
Elastic Network Models and Constrained Geometric Simulations. While molecular dynamics and related modeling approaches are suitable for investigating subtle movements and conformational substates, larger conformational changes are not usually observed in the simulations unless particularly long time scales are studied. Proteins are often able and required to undergo significant movements to carry out their catalytic activity or to interact with ligands and/or other macromolecules such as been demonstrated by ion channels, allosteric proteins, or heat shock proteins. [132][133][134] In order to reproduce these significant conformational adjustments starting from unbound receptor structures, Gohlke and co-workers developed a twostep method based on recent developments in rigidity and elastic network theory. 135 In the first step, static properties of the macromolecule are determined by decomposing the molecule into rigid clusters using the graph-theoretical approach FIRST. 136 In the second step, the dynamic properties of the biomolecule are revealed by the rotations-translations of blocks approach 137 using an elastic network model representation of the coarse-grained protein. 138 On a data set of 10 proteins that show conformational changes upon ligand binding, the predicted directions and magnitudes of motions were shown to agree well with experimental observations, demonstrating that the motions presumed to be "ligand-induced" are already well-defined in the unbound receptor structure.
Although the constantly increasing availability of computational resources has mitigated this issue somewhat, MD is still extremely computationally expensive. The technology is also an expert system that is neither simple nor obvious in its application and sometimes prone to convergence problems. Computational time can be significantly reduced by performing constrained geometric simulations that use an efficient algorithm able to reproduce the motion of flexible and rigid parts by ghost template rearrangements. 139 The associated program, FRO-DA, 139 also uses natural coarse-graining for the treatment of rigid regions identified by FIRST. 136 When applied to the protein-protein interface of interleukin-2, FRODA highlighted transient pocket formation in agreement with experiments 140 and more elaborate MD studies (see Figure 7). In fact, when simulations were started from the unbound state, interface configurations were sampled by both methods that came as close as 1.0 Å rmsd to the bound conformation. These results strongly support the "conformation selection" model, 7 and the configurations may well be used in subsequent flexible docking approaches.
A number of docking programs, like AutoDock, use grid maps to represent the three-dimensional patterns of atomic affinity, electrostatic potential, desolvation free energy, etc. around the target molecule. These programs perform an interpolation for each atom in the ligand to estimate its interaction energy with the target, generally assuming the target is rigid. Some slight conformational flexibility in the target can be added by allowing some overlap in the radii of the grid points. However, a more sophisticated and novel paradigm for fully flexible protein-ligand docking can be proposed, based on an elastic representation of potential grids in the binding pocket region of a receptor (Kazemi, Krüger, and Gohlke, unpublished results). Protein conformations can be sampled during docking without the need to recalculate potential grids. Instead, grid points are moved along with the binding pocket region according to the laws of elasticity. This approach was tested on a comprehensive data set of protein targets representing different classes of conformational changes during ligand binding. Notably, compared to docking to the apo conformation of the proteins, ligand binding mode predictions were greatly improved when grids deformed to a bound protein conformation were used instead. In addition, not only can side chain and backbone movements be accounted for, but in principle any pairwise scoring functions can be used.
Docking with an Ensemble of Protein Conformations. While there are a small number of cases where experimental data from X-ray crystallography or NMR comprise an ensemble of protein structures (the "receptor ensemble") with enough range to fully explore conformational space, computational ensembles from molecular dynamics are most often used for generating protein ensembles adequate for lead docking. The caveats and limitations on MD described above must be considered. In particular, the scale of movement in the ensemble must be monitored to ensure that it is consistent with the structural diversity of ligands to be docked. This method of docking Figure 6. (a) Direct intraprotein hydrogen bonds tend to be preserved upon ligand binding. Ligand-free and ligand-bound crystal structures for 32 proteins with low pairwise sequence identity (<30%) were analyzed. The percentage of direct, intraprotein hydrogen bonds preserved upon ligand binding (red) is compared with the percentage broken (green) for each residue type, including main-chain and side chain hydrogen bonds. Typically 75% or more of direct hydrogen bonds are preserved. (b) Intraprotein water-mediated hydrogen bonds tend to break upon ligand binding. Analysis was performed for intraprotein hydrogen bonds mediated by one water molecule. The trend is opposite to that found for direct hydrogen bonds; 50-80% of water-mediated hydrogen bonds are broken upon ligand binding. Details are as follows: All residues containing an atom e4 Å from the ligand (in the ligandbound structure or e4 Å from the ligand superimposed into the ligandfree structure) or within 4 Å of a water molecule bridging between the protein and ligand were kept for analysis. Intraprotein hydrogen bonds were initially identified as having a donor-acceptor distance of e3.6 Å, hydrogen-acceptor distance of e2.6 Å, and donor-H-acceptor angle of 90-180°. This set was screened by a hydrogen bond energy function 169 evaluating detailed atom chemistry-dependent features to ensure that very weak hydrogen bonds were excluded. A total of 60 of the 64 structures had resolution of 2.2 Å or better, and the remaining 4 had resolution of e2.6 Å. the set of ligands into all derived discrete conformations of the receptor is termed "ensemble docking". 117 In the most simplistic variant, ensemble docking calculations are carried out sequentially for one protein conformer after the other, which multiplies the required calculation time by the number of considered conformers. Alternatively, the protein conformers may be combined to an average representation, which is relatively straightforward with grid-based docking methods; i.e., combine two or more sets of grid maps from an ensemble of different conformations of the target. Two ways of combining these grid maps, an energy-weighted average and a geometric-weighted average of the interaction energy between the ligand and the receptor, were described by Knegtel et al. 141 These and three other methods of combining maps, simple mean, grid point minima, and simple Boltzmann-weighted average of the interaction energies, were evaluated, and the Boltzmannweighted average was shown to be best able to model both variations in conformation of the very plastic HIV-1 protease and the presence or absence of structural water molecules. 35 One considerable advantage of combining many different conformations of a target protein into a single grid is computational speed as compared to individual docking to multiple targets. There are, however, pitfalls to this approach: (i) there are limits with respect to the tolerated structural differences among the conformers being averaged; (ii) the average/ composite structure as represented by the grid may not be a physically "real" target, and the danger exists that "artificial" ligand poses are generated that are reasonable only for the averaged representation. As a further variant, there is an "in situ cross-docking" approach where multiple protein structures can be addressed simultaneously in a single (grid-based) docking run. 116,142,143 Although conceptually simple, in situ crossdocking can lead to significant speedup over conventional serial cross-docking approaches, and the simultaneous optimization across multiple protein structures allows for a more direct selection of the optimal binding site. The method can be applied simultaneously to proteins with a wide range of structures, but there are limitations in the number of structures that can be examined in each calculation.
Given this evidence, does it actually make sense to address issues of flexibility and induced fit by analyzing a set of pregenerated protein conformations? One can think of induced fit as a process of preferential selection of conformations and corresponding shifts of equilibria. 144,145 In this view, states with an appropriately formed binding pocket, which in the absence of a ligand may be very weakly populated, are preferentially selected by the corresponding ligand because it stabilizes those states, resulting in a net gain in free energy compared to other protein conformations in the ensemble. The conformational equilibrium is thus shifted toward the binding-competent conformations, and those conformational states become predominant and experimentally observable. Consequently, using pregenerated protein conformations to deal with protein flexibility appears reasonable, at least in principle. With access to the entire ensemble of low-energy protein conformers and to a reliable free energy function for calculating the affinity based on the protein-ligand interactions and the conformational contributions from the protein, predictions of the preferred geometry of the complex should then reduce to a "simple" optimization task. Unfortunately, neither of the two conditions are met in reality, as there is almost never access to a complete (or even sufficiently representative) ensemble of protein conformers, and current free energy functions used in docking are not sufficiently reliable to provide accurate scores for alternative binding modes for different protein conformers (especially if the free energy differences are small and/or dominated by entropic contributions). Accordingly, the success of induced fit docking with a structural ensemble of conformers is often limited to cases where the available protein conformers from complexes with particular ligands are indeed (and likely by chance) representative for the complexed state. This most often occurs when the ligands being investigated are similar to the ligands in the structural models comprising the ensemble. Also, the energetics must be such that the scoring function is competent to discriminate between alternative binding modes.
A pioneering development in modeling protein flexibility for docking calculations using MD to generate the conformation ensemble is the relaxed complex method from the work of Figure 7. Adaptive nature of the protein-protein interface of the cytokine interleukin-2. (a) Overlay of the unbound (red) (PDB code 1M47) and bound (green) (PDB code 1M48) protein structure of IL-2 together with a small molecule (cyan) that buries into a groove not seen in the free structure of IL-2. Residue Phe42, whose resultant movement after small molecule binding opens the groove, is depicted in stick representation. (b) Overlay of the protein-protein interface region (opaque sticks) of IL-2 in unbound (red) and bound (green) form. In addition, a snapshot from a FRODA simulation started from the unbound state is shown (yellow), which demonstrates that the movement of Phe42 can even be observed in the absence of the ligand, leading to a transient pocket opening. Interestingly, regions for which no movement was observed by experiment (around Glu60 and Ala106) also remain immobile during the simulation (Pfleger, Metz, Kopitz, and Gohlke, unpublished results).
McCammon et al. [68][69][70] In this approach, molecular dynamics simulations of the targets are performed prior to docking and different conformations are selected; the simplest is to choose conformations at regular time intervals, but it is also possible to select the most structurally diverse conformations from all conformations generated. It is then a matter of docking the ligand of interest to each of these different "snapshots" from the molecular dynamics trajectory. At the end, a histogram of binding energies and one or more different binding modes are obtained. This approach, using AutoDock as the docking engine, is credited with the discovery of a novel binding trench in HIV integrase, 71 which laid the groundwork for the development of the first clinically approved integrase inhibitor raltegravir. 146 However, both the success and limits of ensemble docking can readily be illustrated with an example from the wellcharacterized aldose reductase system, an enzyme showing pronounced conformational adaptations upon ligand binding. 147 On the basis of dozens of high-resolution crystal structures and extensive molecular dynamics simulations, detailed knowledge of the binding site conformations is available. [148][149][150] The binding pocket is characterized by a very stable region surrounding the catalytic site and a highly mobile area close to the "specificity pocket". Although very localized, this mobility is mediated by side chain rotations and a stretch of flexible backbone. Essentially, three different protein conformers (and binding modes) are observed, with minor variants for two of the three major conformers. One of the conformers has to date only been observed in complex with the inhibitor tolrestat (1), thus representing a unique binding mode (Figure 8). Two recently synthesized tolrestat analogues (2, 3) with high affinity 151,152 were investigated to ascertain whether they adopted a similar binding mode. While docking of tolrestat to all three aldose reductase conformers correctly reproduced its binding mode in the open specificity pocket, docking of 2 and 3 predicted preferred binding to a conformer with a closed specificity pocket, which was confirmed by subsequent crystallographic analysis. Interestingly, the actual binding mode of 3 is very similar to the docking prediction, but compound 2 shows a hitherto unobserved binding mode with an entirely new conformation of the aldose reductase binding site not involving the specificity pocket ( Figure 9)! This new conformation features the unexpected opening of a salt bridge involving a lysine side chain that is not obviously compensated by a new protein-ligand contact. While this broken salt bridge likely explains the failure to properly dock 2 in aldose reductase even with explicit side chain flexibility, more importantly, this brings into focus the larger reality that, even with supposedly well-characterized systems like aldose reductase, new binding modes are possible and one can never be certain that a precalculated ensemble has exhaustively explored flexibility.
A slightly different approach was illustrated in a recent study carried out by Orozco and co-workers 153 on p38 MAP kinase, a serine/threonine kinase involved in major signal transduction pathways and a key factor in the modulation of the level of tumor necrosis factor-R. This study was designed to understand the binding of a new pyridinyl-heterocycle family of inhibitors and is a very relevant example of combining results from X-ray crystallography, homology modeling, quantum chemistry, classical docking, and molecular dynamics. Figure 10a illustrates the four potential binding modes of one member of this series, 4, a lead compound with excellent inhibitory activity (161 nM) 153,154 that is presumed to bind at the ATP binding site of p38 MAP kinase. While a crystal structure for the MAP kinase complex with 4 is not available, even one at relatively high resolution would not necessarily resolve these four cases. 153 First, a high-level ab initio study of the N1-H/N2-H tautomerism in the pyrazolopyridine group suggested that the N1-H model (see Figure 10a) is energetically favored over the N2-H model. Docking analysis suggested two binding modes ( Figure  10a, top and bottom row) with fairly minor differences in protein-ligand interactions. The two modes and two tautomers were thus subjected to 2 ns MD simulations where monitoring of the rmsd of 4 from the binding site, the ligand-protein interaction energy, and the key hydrogen bond between the pyridine nitrogen and the amino group of Met109 clearly favors the first binding mode (and N1-H) as illustrated in Figure 10b. This approach was also able to quantify the interactions of each protein residue with X (see Figure 10c).

Summary and Perspectives
The history of developments in computer-aided drug discovery (CADD) was recently reviewed by John Van Drie 155 in the context of the Bezdek curve showing the progression of the technology from "naive euphoria" to "peak of hype" through the "depths of cynicism" to "true user benefits" approaching the "asymptote of reality". The current environment in CADD is an apparently healthy mixture of multiple paradigms, each  having gone through the hype and cynicism phases and rebounded with their somewhat diminished cadre of true believers to being generally beneficial. The technologies with long-term benefit are generally those that have adapted by adding value from complementary techniques, much as combinatorial synthesis has found a niche by becoming targetspecific. In a similar way, structure-based drug design has matured by taking advantage of emerging technologies over the years. It is not certain whether the first protein crystallographers, who were generally physicists, were thinking about SBDD as they were painstakingly collecting and processing their data. Certainly, however, it was only a few short years before the enormous potential of understanding biology through structure emerged. 1,156,157 Currently, through the structural biology consortia and their high-throughput crystallography, there is an explosion of data available as targets, but the functions of some of these new proteins remain poorly understood. This is another challenge not in the scope of this Perspective. An emerging development that we do expect to have a profound impact on SBDD is the consideration of target flexibility as part of the design process. We have reviewed above many of the experimental and computational approaches that are currently in use or under development. We offer here some perspectives on what still needs to be accomplished and the resulting benefits of incorporating flexibility as a key component in drug discovery methodology.
First and foremost, there is the need for more and better quality experimental data on flexibility of protein systems. Indeed, this is a major challenge for both X-ray crystallography and NMR spectroscopy. Two barriers exist: (a) The first barrier is time resolution. NMR is able to resolve rapid small scale motions and with isotope exchange very slow vibrations, and X-ray crystallography, while it clearly cannot deal with large motions within a single crystal, can detect with synchrotron radiation some high frequency vibrations. Nonetheless these are lengthy, complex, and sometimes expensive experiments. (b) The second barrier is structure resolution. The introduction of very high field magnets (up to 22 T) and the development of cryoprobes coupled with innovations in data processing have improved considerably the NMR resolution power; however, molecular size is still the major limitation (50-60 kDa is currently considered the limit for obtaining high resolution structures). Operating at very low temperature, thus dampening the vibrational motion, has been a major breakthrough for X-ray crystallography. However a consideration often lost in this regard is that free energies of association and binding are measured at room temperature or higher and the biological processes that are being simulated occur at biological temperature, while highresolution crystallography is routinely performed at around liquid nitrogen temperature. This disconnect has not been fully resolved, but it is interesting to note that at Very high resolutions alternative conformations for labile residues can be observed in X-ray structures, suggesting that rapidly freezing crystals do preserve some flexibility information.
In the absence of extensive experimental data, the technology that will make the largest impact on understanding and exploiting flexibility is molecular dynamics simulations. While MD is almost ubiquitously available, there are a number of issues regarding its widespread usage that suggest some degree of caution (vide supra). However, over the next few years, MD will very likely become even more accessible, and hopefully many of these issues will become transparent even to the less experienced users. But looking past implementation issues, there is probably inadequate evidence that MD simulations are truly representative of molecular motions, i.e., are following real paths, especially for large and complex molecules, even if the sampled conformations do appear to represent realistic local minima. Conventional MD single trajectory simulation is usually not able to reproduce large conformational changes because of time scale limitations; the application of multiple-trajectory or replica exchange methods 158 may lead to a better exploration of conformational space. Combining coarse-grained sampling to identify large motions with fine-scaled sampling methods to more accurately probe local transitions and energetics is another solution. However, there is absolutely no guarantee that submitting an unliganded (apo) structure to long MD simulations (of any type or combination of types) will generate conformations suitable for docking a set of ligands. While both protein (and ligands) undergo significant conformational adjustments upon binding, those motions are only a very small fraction of the motions simulated by MD. Ligand-induced MD, where the simulation is performed in the presence of a ligand, may represent a possible solution, but the protein conformations thus generated will be greatly affected by the structure of that particular molecule. Identifying and focusing on the molecular motions truly critical to ligand binding and subsequent docking may end up in the same place as docking algorithms incorporating local flexibility but from the opposite direction. To expand this notion, we believe that the best way to identify correctly docked ligand and receptor residue conformations is in proper estimation of the free energy of binding and not necessarily on suitable overlap with reference crystal structures. 110 Docking a ligand against an ensemble of protein models and normalizing the resulting set of binding energy predictions with Boltzmann statistics, including improved evaluation of solvent effects, are likely to be an upcoming evolutionary development in free energy scoring functions. One approach to this problem that dramatically illustrates the difficulties of scoring is the computational titration method that simultaneously optimizes protonation, solvent conformation, and hydrogen bonding for ligand-protein complexes while calculating a Boltzmannweighted free energy score for the ensemble. 159,160 At present this is applied only to static structures where the resulting ensemble is isocrystallographic. 160 The multiple protein structure method of Carlson 65 and developments in ensemble docking also may reveal new technology in scoring functions. 116,117,[161][162][163] To conclude, perhaps the major challenge for most practitioners of drug discovery is that modeling flexibility requires a change in mind-set. The comforting idea that there is one ligand perfectly adapted for one static protein "structure" is outdated. Even the approach of searching through an ensemble of conformations to find one matching and accommodating the ligand of interest is an inexact approximation of biological reality even though it is pragmatic and successful. It is also the current state of the art. However, the future may provide us with tools able to visualize and score flexible biological molecules even as they move and change conformation. The ability to accurately predict the free energy of binding for proposed protein-ligand complexes remains an unresolved problem. However, the recent work of Gilson and co-workers in characterizing the binding in host-guest complexes 164,165 and the extension to ligand-protein complexes 166 is providing invaluable information for designing new methods of estimating the free energy of binding. Scoring functions for docking and virtual screening have recently been reviewed. 96 Certainly, the additional dimension of protein flexibility, whether it arises from localized site adaptations or from an ensemble of test conformations, further complicates the development of reliable scoring functions. Enthusiasm for expanding the set of conformations in creating more exhaustive ensembles must be tempered with reality in that the conformations must be energetically accessible and meaningful, which of course reiterates the need for more experimental data on flexibility in biomacromolecules. The right solution, as always, resides in a real understanding of the biological system of interest, which allows us to use the proper tools. Thus, the benefits of including flexibility in docking studies that were illustrated in the sample cases above were large for systems where extensive experimental data was available. Much of this work has been performed with paradigms, algorithms, and software that are still at the cutting edge and definitely not mainstream. However, as with most developments in CADD, once the technology is proven and the market assessed, turnkey systems will start becoming available. Clearly, virtual screening campaigns that incorporate more target flexibility will identify more putative ligands worthy of closer examination than those with a single static target but will also, unfortunately, probably generate more false positives. 60,167 New virtual screening approaches will certainly emerge, probably using an efficient hybridization of static, averaged, and ensemble targets. Because recent innovations in experimental and computational biology and medicinal chemistry have now coalesced around flexibility and dynamics of structure, the authors have collaborated in producing this consensus Perspective. We are, despite the remaining hurdles, very enthusiastic about the future of drug discovery in the upcoming "flexibility era" and look forward to the innovations that will arise as this paradigm takes root in the broader drug discovery community. troscopy, quantum mechanical calculations, and organometallic synthesis. After a postdoctoral period at Northwestern University with Tobin Marks, he joined the Department of Medicinal Chemistry at Virginia Commonwealth University where he has focused on computational chemistry and molecular modeling. Dr. Kellogg is coauthor of the HINT program that uses the empirical data from solvent partitioning as the basis of a free energy force field. He is currently Associate He is the author of more than 80 papers, and his research interests cover the application of computational methods to the design and synthesis of biologically active compounds, with special interest devoted to CNS-acting ligands.
Andrew Emerson received his Ph.D. in computational chemistry in 1991 from the University of Southampton. From 1991 to 1999 he studied the phase behavior of liquid crystals by computer simulation at Southampton and at the University of Bologna. During this period he was awarded the title ″Young Scientist of the Year″ by the British Liquid Crystal Society (1995). In 2000 he took up a post at the CINECA supercomputer centre (Bologna) where apart from performing research, he also provides support and gives courses to researchers using the centre's facilities. His current interests include the application of high performance computing to the molecular dynamics of biomolecular systems. He has published papers in a variety of fields including theoretical and computational chemistry, systems biology and grid computing.