SARS-CoV-2 Spike Protein Mutations and Escape from Antibodies: A Computational Model of Epitope Loss in Variants of Concern

The SARS-CoV-2 spike (S) protein is exposed on the viral surface and is the first point of contact between the virus and the host. For these reasons it represents the prime target for Covid-19 vaccines. In recent months, variants of this protein have started to emerge. Their ability to reduce or evade recognition by S-targeting antibodies poses a threat to immunological treatments and raises concerns for their consequences on vaccine efficacy. To develop a model able to predict the potential impact of S-protein mutations on antibody binding sites, we performed unbiased multi-microsecond molecular dynamics of several glycosylated S-protein variants and applied a straightforward structure-dynamics-energy based strategy to predict potential changes in immunogenic regions on each variant. We recover known epitopes on the reference D614G sequence. By comparing our results, obtained on isolated S-proteins in solution, to recently published data on antibody binding and reactivity in new S variants, we directly show that modifications in the S-protein consistently translate into the loss of potentially immunoreactive regions. Our findings can thus be qualitatively reconnected to the experimentally characterized decreased ability of some of the Abs elicited against the dominant S-sequence to recognize variants. While based on the study of SARS-CoV-2 spike variants, our computational epitope-prediction strategy is portable and could be applied to study immunoreactivity in mutants of proteins of interest whose structures have been characterized, helping the development/selection of vaccines and antibodies able to control emerging variants.


■ INTRODUCTION
Protein sequences evolve as a result of selective pressure to optimize function, create improved phenotypes, and introduce new advantageous traits. In pathogens like bacteria and viruses, sequences evolve via modifications such as point mutations, recombination and deletions/insertions to induce higher infectivity, more efficient replication, and ultimately escape from the host immune systems. 1−7 The SARS-CoV-2 virus, the etiological agent of Covid-19, is no exception to these general rules. The spread of the virus to more than 200 million people worldwide, combined with the pressure determined by the reactions of immunocompetent populations, led to the emergence of "variants of concern". In this context, attention has been focused on the SARS-CoV-2 spike protein (S protein), the large, heavily glycosylated class I trimeric fusion protein which mediates host cell recognition, binding and entry. Because it represents the first point of contact with the host, and given its crucial role in viral pathogenesis, 5,6,8−10 the S protein has been the basis for the design of currently used vaccines effective at reducing viral spread, hospitalization and mortality rates. 11−16 While for almost one year the only notable mutation in S has been the D614G (Asp 614 → Gly), which increases affinity for the cell receptor ACE2 and has immediately become dominant, novel S protein variants reported of late may pose new potential challenges for efficacy of vaccination, antibody-based therapies and viral diffusion control. Three notable examples of such evolved S proteins, which correspond to major circulating variants, are B.1.1.7 (the so-called UK or α variant), 501Y.V2/ B.1.351 (the South African or β variant), and B.1.1.28 (P.1, the Brazilian or γ variant). All such sequences contain various mutations due to nonsynonymous nucleotide changes in the receptor-binding domain (RBD), including E484K, N501Y, and/or K417N. 10 In B.1.1.7 and B.1.351, deletions are also present in the N-terminal domains (NTD) (Figure 1).
Several studies showed how some of these circulating variants may have reduced sensitivity to neutralizing antibodies targeting the RBD or to the NTD. 10,17−19 In this context, polyclonal antibodies contained in convalescent plasma (CP) from individuals infected with the D614G-containing SARS-CoV-2, showed reduced potency in neutralizing 501Y.V2/B.1.351 virus isolates. 20,21 Furthermore, antibodies elicited after vaccine treatment showed reduced neutralization of pseudoviruses bearing the mutations of the P.1 and 501Y.V2/B.1.351 variants. 22 The same was observed for pseudoviruses with variations in S mimicking those of the B.1.1.7 lineage. 22,23 Yet, fortunately, it was shown that vaccine-generated antibody titers were sufficient to neutralize B.1.1.7 in sera from 40 BNT162b2vaccinated individuals. 24 In this context, it is encouraging to note that new studies are reporting high levels of efficacy against severe forms of Covid-19 also in countries where these variants have become dominant. 25−28 A crucial question for understanding the impact of S-protein evolution on the development of monoclonal antibody (mAb)based and vaccine-based therapies, is whether we can develop a simple model to rationalize, and eventually predict, the effect of variations on the structural properties of S that ultimately underpin antibody recognition. Fundamentally, comparison across S-proteins mutants can help us understand the molecular basis of the protein's evolvability, furthering our grasp of the The full-length, fully glycosylated trimeric structure corresponding to pdb code 6VSB. Protomer A (RBD "up"): secondary structure in green; protomers B and C (RBD "down"): grey and sand, respectively. Glycans' C, N, and O atoms rendered as teal sticks. (B) Positions and nature of mutations highlighted on protomer A of different variants. Mutant residues' heavy atoms are rendered as spheres; a different color is assigned to each variant, as indicated in the legend. Mutations common to more than one variant are rendered and/or labeled in black, with colored asterisks denoting variants carrying the mutation. The insertion in the PT188-EM variant (cyan) is denoted by "In(248−249)". Protomers B and C are also shown with their respective mutations, but rendered with increased transparency for clarity; glycans are omitted; (C) synopsis of mutations on the different variants simulated in this work, including the 11-residue insertion in the PT188-EM variant. relationships between sequence, structure and (immuno)recognition. From the practical point of view, this knowledge could in principle be harnessed to design and engineer improved S-based antigens or multicomponent domain/peptide combinations, focusing for instance on those antibody binding regions, known as epitopes, that are predicted to be conserved in multiple variants.
Here, we apply a straightforward structure-dynamics-energy strategy to predict potentially immunogenic regions in representative 3D conformations of several variants of the fulllength glycosylated trimeric S protein ( Figure 1). The selected S proteins represent some of the major variants of concern circulating at the time of setting up simulation. In this respect, the African variant we simulate, which is named 501Y.V2.noΔ, corresponds to the S lineage originally discovered in South Africa in late November 2020 by Tegally et al. 29 This S variant features the additional mutations L18F (in common with P.1) and R246I but does not feature the Δ241−243 deletion, whose existence was still debated when the authors released their study in January 2021. 29 This variant has subsequently been referred to in several papers as B.1.351. 20,21 The list of studied proteins is further enriched by a laboratory-evolved escape S-variant, obtained by Rappuoli and coworkers by co- Figure 2. Mapping epitopes on each variant. Epitope mapping on S protomer A for the 2 NTD-targeting antibodies (two top left panels; cf. numbering on Y-axis) and the 15 RBD-targeting antibodies (bottom three rows) considered in this study. In each panel, using a distinct color for each antibody (right palette), the experimentally (Cryo-EM or X-ray) detected residues that belong to an epitope (labeled "Experimental" on each panel's X-axis) are compared to epitopes predicted in silico on each of the seven variants considered. Predicted immunogenic residues are colored according to the Ab they would be targeted by. Non-immunogenic residues are shown in gray; gaps/insertions in white. The figure shows how the extent of the epitopes for the different antibodies varies in the distinct mutants.
incubating the SARS-CoV-2 virus with a highly neutralizing plasma from a Covid-19 convalescent patient. Interestingly, after several passages this strategy generated a variant completely resistant to plasma neutralization. This "artificial" variant is labeled here as the PT188-EM variant. 21 Conformations are extracted from independent atomistic molecular dynamics (MD) simulations totaling 4 μs for each mutant. Our approach to the detection of epitopes on S, that is, its antibody-binding protein regions, is based on the concept that such sites should continuously evolve to escape immune recognition by the host without impairing the native protein structure required for viral function and survival. We previously showedand experimentally confirmedthat these regions coincide with substructures that are not involved in major stabilizing intramolecular interactions with core protein residues that are important for its folding into a functional 3D structure. 30 In other words, antigen−antibody (Ab)-interacting regions show minimal energetic coupling with the rest of the protein, which in turn should favor accumulation of escape mutations while preserving the antigen's 3D structure. Furthermore, minimal intramolecular coupling provides epitopes with greater conformational freedom to adapt to and be recognized by a binding partner. Actual binding to an external partner such as an Ab is expected to occur if favorable intermolecular interactions determine a lower free energy for the bound state than for the unbound state. 30−33 These concepts are analyzable by the MLCE (matrix of low coupling energy) approach 30,34 (see also Materials and Methods). Starting from the characterization of the energy of pairwise interactions between all aminoacids and monosaccharides, and filtering the resulting interaction map with structural information extracted from the same protein's inter-residue contact map, MLCE identifies groups of spatially contiguous residues with poor energetic coupling to the rest of the protein as potential immunogenic regions. At the same time, groups of residues with high energetic coupling are identified as stabilization centers.
Upon comparing our results to recently reported characterization of Ab binding and reactivity, the analysis we report consistently shows that mutations, deletions, and/or insertions in S variants determine a reorganization of internal interactions leading to the loss of potentially immunoreactive regions on the surface. Encouragingly, these findings can be qualitatively reconnected to the decreased ability of some of the Abs elicited against the dominant S-sequence to recognize variants.

■ RESULTS
To characterize the effects of mutations, deletions, and insertions on the definition of potential Ab-binding substructures in S variants, we apply a combination of the energy decomposition (ED) and MLCE methods 30,34,35 to representative structures extracted from long timescale MD simulations of the S protein variants reported in Figure 1.
Briefly, we first run 4 independent 1 μs long all-atom MD simulations of each variant of the full-length fully glycosylated S protein in solution ( Figure 1) (each built from PDB ID: 6VSB 11 ). Next, for each variant, we concatenate individual trajectories into one a single 4 μs metatrajectory. Cluster analysis on each variant's metatrajectory is then conducted to identify the 3 most representative conformations. These are then used to compute nonbonded pairwise potential energy terms (van der Waals, electrostatic interactions, solvent effects) obtaining, for a given variant with N aminoacid and monosaccharide residues, a symmetric N × N inter-residue interaction matrix. The three matrices extracted from a variant's trajectory are then weighted and averaged to yield an average nonbonded interaction matrix, M ij . Upon eigenvalue decomposition of M ij , eigenvectors associated with the most negative eigenvalues can help build a simplified version of M ij that only highlights series of residues with high-and low-intensity couplings. The former represent residues acting as folding hotspots and responsible stabilizing the protein's 3D structure; the latter represent residue pairs with weak energetic coupling to the rest of the protein, whose mutation is expected not to impact S′ structure and thus function. In this framework, once information contained in the simplified energy map is combined with information contained in the protein's residue−residue contact map, it permits to "filter out" clusters of residues whose energetic coupling to the rest of the structure is weak and that are spatially contiguous. Such localized networks of low-intensity couplings, located in proximity of the protein surface represent potential interaction Ab-interaction regions, or epitopes. This approach has been previously experimentally validated in a number of applications. 31,32,36−43 The reference S structure we use here is the dominant D614G variant. We analyze the results of epitope predictions we obtain on isolated S variants by comparing them against selected spikeantibody complexes. To this end, we collected publicly available X-ray or Cryo-EM structural data of complexes between S and various Abs, reported in Tables 1 and S1. Epitopes in experimental structures are defined as the sets of S protein residues within 5 Å of any Ab residue (see Supporting Information Table S2). The experimental epitopes thus derived are used as the reference against which to compare epitopes predicted in silico. Figure 2 schematically reports the sequences of the RBD and NTD for each variant studied (sequences on the Y-axis, variant on the X-axis). The different colors point out the residues of a certain variant that are predicted to be part of an epitope for a certain characterized antibody.
Importantly, for the reference variant predicted epitopes largely overlap with experimentally identified regions. In particular, epitopes are correctly predicted for Abs targeting both the RBD and the NTD of the protein (Table S2, Figure 2).
In the remaining variants of concern, a diverse landscape of epitopes emerges. A number of residues/regions that are predicted immunogenic in the reference S-protein disappear in the variants. Overall, this is observed for all the Abs considered.
In this framework, after running an epitope prediction on each variant we monitor epitope conservation across variants through a conservation ratio: the number of residues in each predicted epitope for a given variant is divided by the number of residues in the corresponding experimental epitope in the reference S structure, which is defined based on the 5 Å threshold from its respective Ab, as discussed above. We define epitope loss when the conservation ratio is lower than 0.5; otherwise the epitope is considered to be conserved. In Table 2 and Figure 3 we report such conservation ratios for each D614G S epitope on each simulated variant, and confront them with available experimental data (at the time of writing) on the variant's reactivity towards the Ab that would be expected to bind to that particular epitope. Each cell in the table is color-coded according to the experimentally measured activity of the corresponding Ab on one of the given variants. If the Ab remains active, the cell is yellow. If the Ab has lost activity against that variant, the cell is Journal of Chemical Information and Modeling pubs.acs.org/jcim Article blue. If experimental data is unavailable for a particular Ab on a particular variant, the cell is white.
Analysis of Table 2 clearly shows that the vast majority of blue cells, indicative of a loss of Ab reactivity, contain ratios lower than 0.5. This is an important validation of our prediction: whenever a variant's predicted epitope residuesthat is according to MLCE, contiguous residues uncoupled from the S protein coreshrink in number compared to D614G S, it is very likely that experimental data will also confirm that variant evades Abs binding to the shrunk or lost epitopes. On the other hand, the overwhelming majority of cases for which Abs retain activity against a variant (yellow cells) are also confirmed by our prediction to retain their respective epitopes (conservation ratio > 0.5) with respect to D614G S. Disagreement between our predictions and experiment only occurs in a minority of cases: corresponding cells are marked by thicker borders.
Analysis of B.1.1.7 (U.K.) and 501Y.V2.noΔ South Africa; (late November 2020) immediately shows that a large portion of predicted epitopes in the RBD are conserved compared to the reference D614G. Interestingly, however, we also observe a dramatic drop in the number of NTD residues predicted as epitopes for the 501Y.V2.noΔ. Epitope loss in the NTD, which was deemed to host a super-antigenic hotspot 44 can help explain the ability for immune evasiveness observed for these two variants. In B.1.1.7, the NTD epitope is largely conserved consistent with the conservation of activity of Abs targeting this region against the variant ( Table 2, Figures 2 and 3).
Importantly, conservation of a dominant part of the epitopes in the RBD still endows the two variants with reactivity against Abs directed to this domain, which may help explain the observed effectiveness of some convalescent plasma treatments and vaccines. 13,45 Calculations on the Brazilian variant correctly indicate loss of immunoreactivity of several Abs as well as conserved reactivity of Abs 4A8 and S2M11. This variant is the only one for which our predictions of epitopes binding Abs of the DH family generally disagree with experimental data.
Finally, it is important to note that the "artificial" PT188-EM variant, evolved in the lab under the pressure of convalescent serum to evade Ab-effects, appears to have lost a very large number of protein epitopes (see Table 2, Figures 2 and 3). In particular, the insertion at residues 248 modifies the conformational properties of the region otherwise recognized by Ab 4A8. As a consequence, the epitope to this antibody disappears from the predictions on the PT188-EM variant. 21 Interestingly, in this case, the carbohydrate motifs coating the protein appear to host most of the uncoupled regions (117 carbohydrate moieties in the PT188-EM variant vs 90 in the reference S-protein), pointing to a role of the glycan shield in protecting the protein from immune recognition, besides playing a key part in modulating interactions for ACE2 recognition and cellentry. 46−52 (see Figure 4). Importantly, mutants N439K, is correctly predicted as an escape variant from all Abs for which experimental data proved lower efficacy.

■ DISCUSSION
In this work, we analyzed full-length models of 7 trimeric glycosylated SARS-CoV-2 S protein variants, derived from the prefusion conformation of the Cryo-EM structure 6VSB, 11 in which the receptor binding domain of chain A (RBD-A) is in an "up" conformation, exposed to interaction with host cell receptors and potential targeting by Abs. The data from our energetic analyses can be aptly integrated in the characterization of the properties of S and other SARS-CoV-2 proteins from long scale simulations, such as those recently presented by Zimmerman et al., 53 Casalino et al., 50 Spinello et al., 54,55 Oliveira et al., 56 Shoemark et al., 57 Wang et al., 58 and Fallon. 59 Our MLCE analysis of the full-length trimers correctly identifies a number of epitopes in the RBD that have been previously experimentally characterized. RBD is in fact targeted by the largest fraction of neutralizing antibodies. MLCE also identifies regions in the NTD, which are known to be targeted by different Abs, some of which potently neutralize SARS-CoV-2 19 and highlights putative immunoreactive substructures at the end Each cell reports an epitope conservation ratio for each S variant−Ab combination, relating in silico predictions to experimental epitopes from experimental Cryo-EM and/or crystal structures. Conservation ratios lower than 0.5 indicate epitope loss; otherwise an epitope is considered to be conserved. Each cell in the table is color-coded according to the experimentally measured activity of the corresponding Ab on the respective variant. If the Ab remains active, the cell is yellow. If the Ab has lost activity against that variant, the cell is blue. If experimental data is unavailable for a particular Ab on a particular variant, the cell is white. Disagreement between predictions and experiment (i.e., blue and conservation ratio >0.5 or yellow and conservation ratio <0.5) is indicated by thick borders and dotted-line diagonal.
of the S2 domain, where sugar-engaging Abs have recently been characterized (see Table S2, Figure 4).
As a caveat, it is worth pointing out here the fact that antibodies can bind to conformations of the spike protein that are different from the ones sampled here. Indeed, work by Casalino et al., 50 Zimmerman et al., 53 and by Fallon et al. 59 show that the protein can undergo dramatic structural changes. In our simulations, despite running 4 ms of all-atom MD simulation for each system, we could not observe such changes, if not in their initial stages. To be consistent in our comparative among the different species, we in fact decided to use the same protocol on every system and benchmarked the data obtained against available experiments. This may indeed partly limit the exploration of the conformational space available to this flexible protein, in turn somewhat limiting the prediction of immunogenic regions. We hypothesize that this is the reason behind the limited success we have with the Brazilian variant. The mutations, insertions, deletions in this sequence can expectedly favor the exploration of structures that are different from the ones we are considering here. We notice, however, that a significant number of experimental immunoreactivity data are correctly captured by our approach even on the Brazilian variant, supporting the validity of MLCE in this context.
Energy-based epitope prediction through the MLCE approach reveals a common theme across variants: the number and surface exposure of potentially immunoreactive regions decrease in S protein mutants compared to the reference D641G. In particular, the number of residues defining the epitope located in the long RBD loop (residues 417−503, recognized by many protective Abs) is much lower in mutants 501Y.V2.noΔ, B1.1.28, and N439K (see Figures 2 and 3, Table  S2). Interestingly, in the case of B.1.1.7, which shows limited evasion, the loop is largely active in terms of immunoreactivity. In contrast, in the evading variant PT188-EM the entire loop disappears from the list of potential Ab-targets.
Potentially important contributions to the perturbation of epitopes' physico-chemical properties may be related to charge variations. Two striking examples are the loss or reduction of epitopes determined by the N439K and E484K mutations. Both cases involve residues that are part of epitopes of a large number of antibodies and after these mutations the antibodies completely or partially loss their efficacies. In the case of the mutation N439K, it has been reported 60 that this variant maintains fitness while evading antibodies immunity. In fact, N439K RBD forms a new interaction with the human ACE2 receptor (hACE2) and has enhanced affinity for hACE2. The salt bridge at the RBD-hACE2 interface (RBD N439K/hACE2 E329) plausibly adds a strong interaction at the binding interface during viral cell entry. On the other hand, the N to K mutation determines stronger intra-spike protein interactions which Journal of Chemical Information and Modeling pubs.acs.org/jcim Article dramatically decrease the decoupling of this region from the core, making it substantially less prone to interaction with Abs. The E484K mutation is of particular concern due to its location within nAb epitopes, and it has been shown to reduce or eliminate binding to many potent RBD-directed nAbs. 61 Experimental characterization of Abs targeting the NTD revealed a site recognized by most Abs, located between the N3 and N5 loops of the domain. This epitope was correctly predicted in our previous work. 43 Specifically, Lys147 and Arg246, known to be important in stabilizing interactions with the complementarity-determining regions of different Abs are correctly predicted as epitope elements.
On the other hand, sequence mutations in SARS-CoV-2 variants lead to the N3 and N5 NTD loops disappearing from the ensemble of Ab-binding substructures. This is observed computationally and is corroborated by recent experimental data by Veesler and coworkers. 19,44 Interestingly, these epitopes largely coincide with the regions where alanine substitutions reduced affinity for antibodies 4A8, CM17, and CM25 (see ref 62). The impact of epitope loss in these regions is also confirmed by the observation that an engineered N3−N5 double mutant and native β variant 29 both evade neutralization by mAbs CM25 and 4A8.
Interestingly, our approach correctly captures the epitopes for Abs, such as C121 and C144, that are known to engage different RBDs. 63 Antibody C121, for instance, can bind to an RBD in the down conformation and to an adjacent RBD in the up conformation. 63 In the structural paper, the epitope is reported to entail only residues in protomer A with the RBD in the up conformation. Contacts with the nearby RBD in the down conformation are made by Ab residues that are outside the complementary determining region. In this respect, our approach can correctly predict potential immunoreactive sequences even for Abs that would end up binding across different domains. MLCE in fact only aims to predict substructures on the antigen that can potentially be complexed by one or more Fabs. Focusing only on the antigen, MLCE would not be able to predict whether different epitopes are targeted by the same or distinct Abs at the same time.
Finally, our strategy correctly predicts the loss of most epitopes in the lab-evolved escape variant described by Andreano et al. 21 (see Figure 4, Table S2).
We propose a model for the study of Ab-reactivity of SARS-CoV-2 S protein variants that integrates sequence and structural information and incorporates dynamics and energetics into the analysis of the variation/loss of epitopes. Mutations in S variants determine the loss of epitopes and as a consequence can confer escape from antibodies. Upon sequence variation, the protein shifts to states characterized by different intramolecular interactions compared to the initial D614G structure; this transition decreases the number of energetically uncoupled substructures available for engaging interactors such as Abs. Unique to this model is the observation that mutations, insertions, and deletions exhibiting different immunoreactivity experimentally are consistently captured by the energy based decomposition of structures extracted from unbiased classical MD simulations of the glycosylated S protein isolated in solution, without any input of prior information on Ab-binding propensities. Although qualitative in nature and focused on the study of S variants of concern, our approach is general and immediately portable to other targets to provide physicochemical information on the determinants of Abs recognition.
Since one of the fundamental goals of structural vaccinology is the identification and design of structures with optimized properties for immunoreactivity, development and validation of computational methods that help identify conserved versus nonconserved epitope regions in different variants independently of whether structures of related protein−antibody complexes are available may hold great potential. In the case we have presented here, one may consider designing chimeras or multicomponent systems (peptide-or domain-based) presenting all (or most of) the conserved sequences that are predicted to be potentially Abreactive.
Furthermore, our results suggest that approaches like the one we presented here may be used prospectively as an aid in the analysis and characterization of emerging variants.
Though targeted experiments and design of mutants with tailored reactivities based on MLCE analysis are required to further validate these ideas and precisely define their progression to real-world applicability, our findings provide a new basis to understand how mutations could directly result in escape from immunorecognition.

■ MATERIALS AND METHODS
Preparation of Spike Protein Variants. Fully glycosylated S protein variants simulated in this work were variously derived from simulations described by Grant et al. 47 based on the Cryo-EM structure of the WT S protein at PDB entry 6VSB, 11 wherein one RBD is in the "up" conformation and the other two are "down". All mutations, including the "reference" D614G, are introduced using the "mutations wizard" in the PyMOL molecular modeling package (Schrodinger LLC): rotamers of non-glycine side chains are chosen from the first suggested option for S protomer A, and then, where possible, we have sought to adopt the same rotamers for protomers B and C. Histidine tautomers and disulfide bridges are retained as in our reference simulations. In B. Consistent with our reference simulations, 43,47 all three protomers are modeled without gaps, from Ala27 in the NTD to Asp1146 just downstream of heptapeptide repeat 1 (HR1); −NH 3 + and −COO − caps are added, respectively, at N-and Ctermini of each protomer.
In the case of the B.1.1.7 variant, gaps left by deletions in all three protomers are replaced with artificially long C−N bonds; systems are then allowed to relax with a 400-step preminimization cycle in vacuo (200 steepest-descent + 200 conjugate gradient), using the AMBER platform's sander utility (version 18), 64 in which harmonic positional restraints (k = 5.0 kcal mol −1 Å −2 ) are applied to all atoms except those in the five residues on either side of the gap. Distortions and clashes introduced with the glycosylated Ser13-Pro26 fragment are resolved using a similar approach.
The artificial PT188-EM was modeled following the methods described in ref 21.
MD Simulation Details. After preparation, glycosylated S protein structures are solvated in a cuboidal box of TIP3P water molecules using AMBER's tleap tool; where necessary, Na + or Cl − ions are added accordingly to neutralize the charge. Nglycosylated asparagines and oligosaccharides are treated using the GLYCAM_06j forcefield, 65 whereas ions are modeled with parameters by Joung and Cheatham. 66 To all other (protein) atoms, we apply the ff14SB forcefield. 67 Starting structures and topologies for all simulated variants are electronically provided as Supporting Information.
On each glycosylated S protein variant, we conduct 4 independently replicated atomistic MD simulations, using the AMBER package (version 18): each replica consists of two 300step rounds of minimization, 2.069 ns preproduction, and 1 μs production. The sander MD engine 64 is used into the earlier stages of preproduction; thereafter, we switch to the GPUaccelerated pmemd.cuda. 64 Details on MD Production. The 1 μs production stage is carried out in the NpT ensemble (T = 300 K; p = 1 atm) using a 2 fs time step; a cutoff of 8.0 Å is applied for the calculation of Lennard-Jones and Coulomb interactions alike. Coulomb interactions beyond this limit are computed using the particle mesh Ewald method. 68 All bonds containing hydrogen are restrained using the SHAKE algorithm. 69 Constant pressure is enforced via Berendsen's barostat 70 with a 1 ps relaxation time, whereas temperature is stabilized by Langevin's thermostat 71 with a 5 ps −1 collision frequency.
Details on MD Preproduction. Prior to the production stage, every independent MD replica for every S variant goes through a series of preproduction steps, namely: minimization, solvent equilibration, system heating, and equilibration. The first two are conducted using the sander utility, after which the GPUaccelerated pmemd.cuda is invoked instead.
Minimization takes place in two 300-step rounds, the first 10 of which use the steepest-descent algorithm and the last 290 conjugate gradient. In the first round, we only minimize backbone Hα and H1 hydrogens on aminoacids and monosaccharides, respectively, restraining all other atoms harmonically (k = 5.0 kcal mol −1 Å −2 ). Thereafter, all atoms are released, including solvent and ions.
Solvent equilibration occurs over 9 ps with a time step of 1 fs; the ensemble is NVT, with temperatures in this case enforced by the Berendsen thermostat. 70 Positions of non-solvent atoms are harmonically restrained (k = 10 kcal mol −1 Å −2 ). Solvent molecules are assigned initial random velocities to match a temperature of 25 K. Fast heating to 400 K (coupling: 0.2 ps) is performed over the first 3 ps; the solvent is then retained at 400 K for another 3 ps; and cooled back down to 25 K over the last 3 ps, more slowly (coupling: 2.0). The cutoff for determining Lennard-Jones and Coulomb interactions remains at 8.0 Å for this and all subsequent stages, as does the particle mesh Ewald method 68 to determine Coulomb interactions beyond this cutoff. SHAKE constraints 69 are not applied at this stage, but are always present thereafter.
For system heating, the time step is increased to 2 fs and, whilst continuing in the NVT ensemble, temperatures are now enforced by the Langevin thermostat 71 (which remains in place for all subsequent stages). With an initial collision frequency of 0.75 ps −1 , the system is heated from 25 to 300 K over 20 ps: all atoms are free to move except aminoacids' Cα atoms, which are positionally restrained with k = 5 kcal mol −1 Å −2 .
For equilibration, the ensemble is switched to NpT (p = 1 atm; Berendsen barostat coupling: 1 ps), and the system is simulated for a further 2040 ps. The thermostat's collision frequency is kept lower than in the production stage (1 ps −1 ). Restraints on Cα atoms are lifted gradually: k = 3.75 kcal mol −1 Å −2 for the first 20 ps; 1.75 kcal mol −1 Å −2 for the following 20 ps; none thereafter.
Clustering of MD Simulations. Following MD, each variant's 4 replicas are concatenated into a single 4 μs "metatrajectory", desolvated, stripped of any ions, and aligned on backbone heavy atoms of all aminoacid residues, in all three protomers, that belong to neither the NTD nor the RBD according to domain definitions by Huang et al. 72 Clustering calculations are then conducted using the hierarchical agglomerative algorithm, 73 considering every 20th metatrajectory frame (i.e., every 50 ps), based on the root-mean-square deviation of backbone heavy atoms of aminoacid residues Journal of Chemical Information and Modeling pubs.acs.org/jcim Article composing the NTD and the RBD in all three protomers. Values of ε are chosen so that they provide the best compromise between maximizing cluster homogeneity, based on silhouette score, and ensuring at least 60−80% of the metatrajectory is covered by the three most populated clusters: this usually means ε = 9−12.
All of the steps discussed in the previous paragraph are conducted using AMBER's postprocessing utility cpptraj.
MLCE Method. Potential epitopes on each S variant are predicted using the MLCE method (of which we also provide a more detailed account in our previous work). 43 The procedure is automatically carried out by our own in-house code (https:// github.com/colombolab/MLCE) which we have now rewritten to rely on the computationally more efficient MMPBSA.py utility 74 instead of mm_pbsa.pl.
To begin with, three representative S protein structures for each variant (i.e., the centroids of its three most populated clusters; see Clustering of MD Simulations) are minimized for 200 steps (10 steepest descent; 190 conjugate gradient), with the sander engine, in implicit solvent (per the modified generalized born model by Onufriev et al.). 75 The cutoff used for the computation of Lennard-Jones and Coulomb interactions is 12.0 Å, under nonperiodic conditions; the mobile counterion concentration is always set at 0.1 M for all variants, and the solvent-accessible surface area is calculated by employing linear combinations of pairwise overlaps.
After minimization, MMPBSA.py 74 is initialized and uses the MM/GBSA method 76 to construct a nonbonded pairwise interaction matrix M s for each of the variant's representative structures's: otherwise put, each term M s,ij in this matrix contains the van der Waals and Coulomb interactions (including 1−4 interaction terms) for a representative structure's ith and jth residues (aminoacids and monosaccharides alike). Settings for this stage are identical to minimization, apart from a 0 M implicit ion concentration. M s matrices for a variant's representative structures s = 1, 2, and 3 are then averaged, and scaled by the fraction of that variant's trajectory represented by their parent cluster: this gives an average weighted nonbonded interaction matrix M.
Using the validated approach explained in detail by Genoni and coworkers, 77 our code performs eigen decomposition of M: in other words, each averaged interaction matrix element M ij  that is for each individual pair formed by the ith and jth residuesis first diagonalized and re-expressed as (1) where N is the total number of aminoacid and glycan residues, λ α is the αth eigenvalue, and vi α and v j α are the ith and jth components of its associated vector. From this eigen decomposition, the code is programmed to (re)select a minimum number of essential eigenvectors N e that are sufficient to "cover" interactions of the maximum number of residues in the protein. The full details of the selection are described in ref 77. This step results in an "essential folding matrix" M fold constructed from M's N e essential eigen vectors, and whose elements are each expressed as follows that is, no longer as a sum over all vectors (residues) N, but only over the chosen N e essential eigenvectors. M fold thus only contains information on whether interaction of a variant's ith and jth's residues is actually more or less stabilizing for the folding of one of that variant's domains 77 (e.g., NTD, RBD, etc.). This is in contrast to M, whose elements M ij simply indicate whether a variant's ith and jth's residues attract, repel, or don't interact.
To obtain the final MLCE matrix, the essential folding matrix M fold is subsequently Hadamard-multiplied by a pairwise residue−residue contact matrix C C's elements C ij are either 0 or 1, depending on whether Cβ atoms in the ith and jth residues (C1 in the case of monosaccharides; H in the case of glycines) fall below or above an arbitrary 6.0 Å threshold, respectively. In this way, each MLCE element MLCE ij will be nonzero if and only if residues i and j are spatially contiguous and exhibit an energetic interaction that is stabilizing for the folding of a particular domain, in which case MLCE ij takes the actual value of the pair's degree of stabilization.
Individual MLCE ij elements are ultimately ranked from the most stabilizing (i.e., pairs that are most energetically relevant for the folding of their particular domain) to least stabilizing (i.e., pairs showing the weakest energetic coupling within their domains). Our final epitope predictions are made by isolating the top 10% weakest-interacting spatially contiguous residue pairs yielded by this ranking.

* sı Supporting Information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.1c00857. Table S1 explicitly showing all the references used in which Abs are characterized; Table S2 with the residues that are part of the experimentally determined epitopes and of the predicted epitopes on the various spike variants. The numbering of the residues refers to that reported in UNIPROT entry P0DTC2. https://www. uniprot.org/uniprot/P0DTC2 (PDF) ■ REFERENCES