Understanding pathogenic single-nucleotide polymorphisms in multidomain proteins – studies of isolated domains are not enough

Studying the effects of pathogenic mutations is more complex in multidomain proteins when compared with single domains: mutations occurring at domain boundaries may have a large effect on a neighbouring domain that will not be detected in a single-domain system. To demonstrate this, we present a study that utilizes well-characterized model protein domains from human spectrin to investigate the effect of disease-and non-disease-causing single point mutations occurring at the boundaries of human spectrin repeats. Our results show that mutations in the single domains have no clear correlation with stability and disease; however, when studied in a tandem model system, the disease-causing mutations are shown to disrupt stabilizing interactions that exist between domains. This results in a much larger decrease in stability than would otherwise have been predicted, and demonstrates the importance of studying such mutations in the correct protein context.


Introduction
A key area of interest in the post-genomic era is to relate changes in gene sequence to phenotypic variation. As more than 70% of eukaryotic proteins are composed of multiple domains, when studying the effects of pathogenic mutations in multidomain proteins, we must determine the effect that a mutation in one domain may have on neighbouring domains [1]. Diseases caused by missense mutations, often referred to as non-synonymous single nucleotide polymorphisms (nsSNPs), are well documented [2][3][4]. Although some mutations directly affect an active site or binding to a ligand, most mutations affect protein function by reducing the stability of the protein [5][6][7][8][9]. Several computational databases exist that attempt to predict the effect of nsSNPs on protein function and stability [10][11][12][13]. As it is not possible to experimentally characterize the effect of all mutations on all affected proteins, we have previously shown that well-characterized model proteins may be employed to determine the effects of disease-causing mutations, a technique that is especially useful when the variant proteins are difficult to express in the laboratory [14]. In this study by Randles et al. [14], which employed immunoglobulin-like (Ig-like) domains as models, it was found that any mutation that caused a loss of stability > 2 kcalÁmol À1 resulted in disease. Moreover, the severity of disease correlated with the extent of destabilization. Such a 'cut-off' has been observed in other studies [9,15,16]. In multidomain proteins in which the domains behave independently of each other (e.g. Ig domains in the I-band of titin), a mutation in one domain is highly unlikely to affect the stability of a neighbouring domain [17]. However, where adjacent domains interact in multidomain proteins, the stability of one domain may be increased by interaction with its neighbours. Thus, the effect of a mutation in one domain may result in the destabilization of neighbouring domains.
To demonstrate this effect, we use the well-studied protein domains R15, R16 and R17 from chicken brain a-spectrin as model systems to study the effects of pathogenic mutations in human spectrin domains. These domains are a common component of proteins involved in cytoskeletal and membrane-associated structures, including spectrin, a-actinin and dystrophin [18,19]. Each spectrin repeat, or domain, is a stable, independently folding three-helix bundle comprising 106 amino acids. When arranged in tandem, a continuous a-helix links the C-terminus of one domain to the N-terminus of the following domain (see Fig. 1) [18][19][20]. Although the interdomain interface is small (barely 800 Å 2 ), there are significant interactions between adjacent domains [1,21,22].
Erythrocyte and brain spectrin most commonly exist as a tetramer: two antiparallel spectrin molecules, one a and one b, associate laterally to form heterodimers that further associate to form tetramers [23][24][25]. Many disease-associated point mutations in erythrocyte spectrin have been mapped to these tetramerization sites, and may result in perturbation of the red blood cell structure, leading to haemolytic anaemias [26][27][28]. However, over a dozen disease-causing mutations that are located distal to the tetramerization site have also been linked to haemolytic anaemias. Interestingly, many of these mutations occur at the spectrin repeat interface and many are mutations to proline [29]. It has been suggested that some of these mutations may affect the cooperativity between spectrin domains [29]. Using our model protein systems R15, R16 and R17, we take this analysis significantly further, specifically quantifying any changes in this 'cooperativity' upon mutation. We compare disease-related SNPs with others that are not associated with disease. Using a combination of thermodynamic and kinetic measurements, our results show that there is no clear pattern regarding the effect of each mutation on stability in the single-domain model proteins: the disease-causing mutations are only marginally more destabilizing than the non-disease-causing mutations. However, when the mutations are placed in the tandem spectrin models R1516 and R1617, a much clearer pattern emerges: our results suggest the disease-causing mutations disrupt the stabilizing interactions between adjacent domains, which results in a much larger decrease in stability than in the singledomain models. Our results also clearly show that in the tandem protein model, a mutation in one domain may have more of an effect on the stability of its neighbour than on itself: this behaviour is unlikely to be predicted by modelling programs. These findings highlight the importance of understanding the biophysical implications of a mutation in the context of neighbouring domains.

Selecting mutations to study
Spectrin domains are 106-residue repeats, and the domain boundaries were as defined previously [21]. We used UniProt (www.uniprot.org/uniprot) and the Human Genome Mutation Database (www.hgmd.cf. ac.uk) to compile a list of 12 disease-related and 20 non-disease-related nsSNPs in the spectrin domains of the human proteins aand b-spectin (UniProt designations SPTA1_HUMAN and SPTB1_HUMAN), dystrophin (DMD_HUMAN) and a-actinin (ACTN3_ HUMAN). These 32 SNPs were found in 24 spectrin domains spread between the four proteins. We ignored all mutations found at the tetramerization site in aand b-spectrin. The sequences of these domains were compared with those of the previously well-characterized chicken brain a-spectrin R15, R16 and R17, using alignments compiled using ClustalW (www.ebi. ac.uk/tools/msa/clustalw2). These were subsequently verified by comparison with the Pfam alignment (www.pfam.sanger.ac.uk) (Fig. S1). As observed previously, most of the disease-related mutations are found at the domain boundaries, while the non-diseaserelated sequence changes are found throughout the protein (Fig. 1). We identified sites in our model spectrin domains at which we could create a point mutation that was analogous to the amino acid change found in the host domain ( Fig. 1 and Table 1). In some cases, we created a mutation that was an exact match to the disease-related mutation (e.g. L104P as a model for L260P, in all three model proteins); in others, we matched residue type (e.g. I51P in R15 to model L207P). This resulted in seven disease-related mutations and four non-disease-associated mutations. Some of these mutations were created in all three model protein systems and others in only one. In total, we characterized 24 single-domain mutant model proteins (Table 1 and Fig. S2).  [41]. c Data taken from [42].
There is no clear difference in the effects of disease and non-disease mutations on the stability of isolated domains The effect of mutation on the stability of the singledomain model proteins was determined using the equilibrium denaturation method (Table 1 and Fig. S3). Interestingly, equivalent mutations did not always have the same effect on the various model proteins, in contrast to previous observations on Ig-like domains [14]. This may reflect the greater structural plasticity of the helix-bundle proteins when compared to the b-sandwich Greek key Ig-like domains. In Ig-like domains, there is significant conservation of core residues [30,31], whereas only two of the 106 residues in spectrin domains are conserved to any significant extent: Trp 17 and Leu 104. Most importantly, and perhaps surprisingly, many of the mutations that result in disease were not strongly destabilizing, and there was certainly no clear distinction between the pathogenic and non-pathogenic datasets. Only one mutation, L207P, which involves substitution of a buried hydrophobic residue by a proline residue in the middle of helix B, caused a significant loss of stability in our model proteins, resulting in unstable R15 and R16 domains. Interestingly, even this change is inconsistent, as when there is a larger hydrophobic residue (Phe) at this position (in R17), insertion of the helix-breaking Pro residue is tolerated, possibly reflecting the plasticity of these domains, i.e. if there is a large enough cavity, the protein may accommodate Pro even in the centre of the helix.

Determining the effects of mutation in multidomain proteins
Most of the pathogenic mutations are clustered in the linker helix between domains. It has previously been demonstrated that spectrin domains are stabilized by their neighbours [21,22]. These stabilizing interactions are dependent on the contiguous helix between the domains (Fig. 1). Thus, to mimic both pathogenic and non-pathogenic SNPs, we created a number of mutations in the model two-domain proteins R1516 and R1617, which have the same structure and the same linking helix (Table 2 and Fig. S4). We have previously shown that it is not possible to determine the stability of a two-domain protein by simple equilibrium measurements: it is necessary to determine the folding and unfolding rate constants of each domain (k f and k u , respectively), both alone and in the twodomain system, to determine the effect of a mutation on the stability of a two-domain protein [32][33][34]. Thus we performed a series of kinetic experiments. We determined k f and k u , extrapolated to 0 M denaturant, for each domain in the two-domain protein constructs.
The method of analysis is explained in detail in Doc. S1 and Figs S5-S10, which include some sample kinetic chevron plots. The results are given in Table 2.
Note that the domain with the mutation is marked with an asterisk; thus, for instance, R1617*I18V has an I?V substitution at position 18 in R17.
For the non-pathogenic mutations, we found that the stabilizing interactions between the domains were retained, such that the total loss of stability in the twodomain protein was essentially the same as the loss of stability in the single domain. As an example, the mutation I18V in R17 ( Table 2, highlighted in red) destabilizes single-domain R17 by~1.5 kcalÁmol À1 ( Table 2, column 12) and the R17 domain in R1617 by the same amount (1.6 kcalÁmol À1 ) ( Table 2, column 13). In both cases, the destabilization arises from slowing the rate of folding and increasing the rate of unfolding. R17 is stabilized by wild-type (WT) R16 in R1617 mainly by speeding the folding by~30-fold. In R1617*I18V, the R17 domain still folds very rapidly [k f of~560 s À1 (column 8)] compared with the mutant single-domain protein [k f of~7.6 s À1 (column 6)], and the stability of the R16 domain in R1617*I18V is the same as that in WT R1617 (column 11). Thus the stabilizing interactions are retained and the loss of stability of the two-domain protein is the same as the loss of stability of the singledomain protein (columns 12 and 14). We found essentially the same results for all the non-pathogenic mutations. Thus the overall loss of stability resulting from these non-pathogenic mutations, even in the multidomain context, was < 2.0 kcalÁmol À1 in all cases. This is consistent with the threshold that has been observed previously for other proteins [9,[14][15][16].
However, we obtained very different results for the pathogenic mutations: for most pathogenic mutations, the loss of stability comprises the loss of stability of the parent domain plus the loss of all the stabilizing reactions between the domains. As an example, the pathogenic-like mutation N105P in R15 and R1516 destabilizes single-domain R15 by~1.1 kcalÁmol À1 : although the mutant folds at approximately the same speed as WT R15, it unfolds more rapidly (the rate constant for unfolding, k u , is approximately five times larger than WT) ( Fig. 2A). However, the same mutation in R1516 has a much greater effect on the R15 domain. The kinetic data for this mutation are shown in Fig. 2 and Table 2 (highlighted in blue). Figure 2B shows that, although the mutant R15 domain still folds as fast as WT R15 in R1516, it now unfolds much more rapidly than WT, i.e. k u is increased 200-fold, In R15 alone In R1516 In R16 alone In R1516 in R17 and N105P in R15 are discussed in detail in the text and are highlighted in red and blue, respectively. a As R16 folds first and unfolds last, the R16 kinetic parameters are always determined in the presence of an unfolded R17 neighbour. b As R16 folds first and unfolds last, the R17 kinetic parameters are always determined in the presence of a folded R16 neighbour. c The stability changes are calculated using the kinetic data presented, using the relationship from 0.071 s À1 (WT) to 15 s À1 (mutant). This means that the mutation N105P destabilizes R15 in R1516 bỹ 3.4 kcalÁmol À1 . Moreover, R16, which was originally stabilized by R15 (folding more rapidly and unfolding more slowly), loses this stability (Fig. 2C). R16 in R15*16 N105P now behaves as if it were a single domain, with a loss of stability of~2 kcalÁmol À1 . Thus, as seen in Table 2, column 14, the mutation N109P causes a total destabilisation of 5.6 kcalÁmol À1 , rather than the loss of 1.2 kcalÁmol À1 found for the mutation in isolated R15.

Discussion
Studying tandem spectrin domains: use of model protein systems Early equilibrium studies showed that spectrin domains were stabilized by their neighbours [21,22]. This 'cooperativity' was ascribed to the linking helix region. However, this effect could not be effectively quantified until kinetic experiments were introduced [32,34]. Such kinetic studies may be very difficult to undertake. The domains must be investigated both in isolation and in tandem, and the kinetics may be extraordinarily difficult to disentangle [34]. The results of these studies were in some respects quite surprising. Using spectrin repeats R15, R16 and R17 and tandem domains R1516, R1617 and R151617, with other extended constructs, we were able to show that R15 and R16 are both stabilized by a simple extension at the C-terminus (but not at the N-terminus) [35]. In other words, R15 is stabilized even by unfolded R16, and R16 is stabilized even by unfolded R17, in both cases by~1-2 kcalÁmol À1 . However, there is also a mutual stabilization between neighbouring domains when both domains are folded (by 2-3 kcalÁmol À1 in both cases). The folding pathways for R1516 and R1617 are essentially the same (Fig. 3) [32,34]. First the N-terminal domain folds, then the C-terminal domain. Thus R15 folds before R16 in R1516, and R16 folds before R17 in R1617. The order of unfolding is the reverse, first the C-terminal domain unfolds (R16 in R1516 and R17 in R1617), and then the N-terminal domain unfolds. The consequence of this is that, in kinetic experiments, we may investigate the folding behaviour of the N-terminal domain in the presence of an unfolded C-terminal domain (but not in the presence of a folded one), and we may investigate the folding behaviour of the C-terminal domain in the presence of a folded N-terminal domain (but not in the presence of an unfolded one). However, this is enough to enable us to determine the stability of the entire system, because (in these two-state proteins), the free energy of unfolding (ΔG) may be calculated from the folding and unfolding rate constants [ΔG = ÀRT ln(k u /k f )]. The stability of the entire two-domain spectrin construct is thus: where ΔG N and ΔG C are the free energies of unfolding of the N-and C-terminal domains, respectively, ΔG extension is the gain in free energy of the N-terminal domain from simple extension (by the unfolded C-terminal domain), and ΔG interface is the stabilization of one domain by its folded neighbour.
As a full analysis of the folding of individual domains and the sometimes very complex two-domain constructs is necessary to fully characterize spectrin repeats, a systematic analysis of the thermodynamic and kinetic effects of pathogenic (and non-pathogenic) variants in their natural environment is simply prohibitive. Here, for instance, we investigate nine mutations in seven different a-spectrin domains, as well as two mutations in different domains of b-spectrin and one mutation of dystrophin. We had previously characterized three different wild-type single repeat proteins plus two tandems [32,37]. To have achieved these results using the natural spectrin domains, we would have had to characterize at least ten new single wildtype domain constructs and at least nine wild-type two-domain constructs. The use of model proteins that have been previously well characterized is a useful alternative strategy, in particular if the same or a closely equivalent mutation may be created in more than one model domain.

Comparing the effects of mutations in single-and multidomain systems
Here we find that the point mutations in the single domains result in proteins that are somewhat destabilized relative to the WT domain (Table 1). This is manifested by slower folding and faster unfolding. However, none of the mutations, with the exception of those that mimic L207P, are exceptionally destabilizing. Indeed, other L?P substitutions (e.g. those that mimic L260P) are benign at the single-domain level. Certainly, the difference between pathogenic and nonpathogenic variants is not clear from the isolated domains.
For the non-pathogenic mutations, the effect of a mutation in the two-domain system is approximately the same as the effect in the isolated domain. All stabilizing interactions between the domains are intact. This is shown in Fig. 4. A mutation in one domain has no effect on the stability of the neighbouring domain.
However, the case for the pathogenic mutations is entirely different. With one exception, all pathogenic mutations caused a loss of the stabilizing interactions between the domains. The result is quite remarkable (Fig. 4). Thus, the pathogenic mutations, again with one exception, result in a loss of stability of the system of~5 kcalÁmol À1 or more, an increase of~3.5-4 kcalÁmol À1 over and above the effect of the mutation in the single domain. The one exception was the mutation E106D in both R15*16 and R16*17. This mutation was created to model the mutation D791E in human a-spectrin, which has been shown to cause hereditary elliptocytosis [36]. This substitution has very little effect in any of the single domains, and no effect at all on the inter-domain stabilizing interactions. We infer that this mutation has site-specific effects within the spectrin molecule, possibly removing an unknown interaction site within the human spectrin heterodimer or an interaction with other cytoskeletal proteins.
Undoubtedly, spectrin repeats gain significant stability through nearest-neighbour effects, mediated through the linker helix. The stabilization conferred by these effects (~4 kcalÁmol À1 per interface) is very significant when compared to the stability of spectrin domains that we have studied, which ranges from 3.5-6.5 kcalÁmol À1 . In a previous study, Johnson et al. [29] investigated a single pathogenic mutation (Q471P) in the context of a five-repeat fragment of a-spectrin. The mutant protein had a lower thermal stability relative to WT, consistent with the experiments reported here. However, a detailed thermodynamic study was not Fig. 3. The folding pathways of R1516 and R1617 are essentially the same. The N-terminal domain (pink) folds first, followed by the C-terminal domain (blue). Unfolding is the reverse of this process. This is a consequence of the relative folding and unfolding rate constants (Table 2).
performed, and indeed is not possible using purely equilibrium methods [33]. Here we have used model protein systems to help illustrate the effects of pathogenic mutations that destroy the stabilizing interactions between domains. What is quite remarkable is the much higher destabilisation of the mutations that result in disease compared to non-pathogenic mutations in the same model systems (Fig. 4). We have also identified a mutation with a functional effect, perhaps disruption of a binding site, (D791E), as well as a mutation that is likely to cause disease by directly destabilizing a single domain (L207P), although, in the latter case, we note that, if a domain is destabilized to the extent that it is unfolded, inter-domain interactions will be lost. Importantly, structural modelling in the absence of biophysical data would not have predicted such drastic effects for the mutations investigated here.

Protein expression and purification
The mutants were selected according to sequence alignments (Figs S1 and S2). Mutagenesis was performed using a QuikChange site-directed mutagenesis kit (Agilent Technologies, Santa Clara, CA, USA), and the identity of the mutants was confirmed by DNA sequencing. Protein expression and purification were performed as described previously [37]. Note that, in our previous studies, we always used extended domains to ensure that we did not artificially destabilize the proteins by making domain boundaries too short [37][38][39]. We use the same extended domains here, but number them to agree with the numbering convention described by MacDonald and Pozharski [21]. Thus, the numbering in the present paper is different from that in our previous work. For example, the residue numbered 1 in this paper is in fact the 5th residue in our previous studies [40][41][42].

Stability measurements
The stability of the mutant single-domain proteins was determined by equilibrium denaturation using urea as the denaturant. Folding was monitored on the basis of intrinsic tryptophan fluorescence, measured using a Perkin Elmer (Waltham, MA, USA) fluorescence spectrometer with a final protein concentration of 1 lM. Dithiothreitol was added to a final concentration of 5 mM for R17 and R1617 proteins. All experiments were performed at 25 AE 0.1°C in 50 mM sodium phosphate buffer (pH 7.0). The data were fitted to a two-state transition curve as described previously [43,44].

Kinetic measurements
Kinetic experiments on the mutant proteins were performed using a stopped-flow fluorimeter (Applied Photophysics Leatherhead, Surrey, UK SX.18MX) at 25 AE 0.1°C in 50 mM sodium phosphate buffer (pH 7.0). The final protein concentration was 1 lM, with 5 mM dithiothreitol added for R17 and R1617. Samples were excited at a wavelength of 280 nm, and the emission was monitored above 320 nm. At least six overlying traces were obtained at all concentrations of urea. Single-jump experiments on all proteins were performed using 10 : 1 mixing (buffer:protein). Double-jump experiments were performed on tandem repeats R1516 and R1617. For R1617, interrupted unfolding experiments allowed the folding rate of R17 to be observed: proteins were initially unfolded in urea at a 1:5 ratio (protein:urea) for a delay time of 500 ms, and then jumped into refolding solutions at 1:10 ratio (protein:buffer). For R1516, interrupted refolding experiments allowed the unfolding rate of R15 to be observed: unfolded protein in 8 M urea was allowed to refold to a final concentration of 4 M with a delay time of 100 ms, and then jumped into unfolding solutions. Data for all experiments were fitted using Kaleidagraph (Synergy Software, Reading, PA, USA).

Supporting information
Additional supporting information may be found in the online version of this article at the publisher's web site: Doc. S1. Analysis of kinetic data for two-domain spectrin fragments. Fig. S1. Alignments used to identify suitable mutation sites for this study. Fig. S2. Model proteins used in this study. Fig. S3. Equilibrium denaturation curves for all singledomain proteins described in this study. Fig. S4. The linking helix is conserved in spectrin repeats. Fig. S5. Chevron plot showing folding and unfolding arms. Fig. S6. Comparison of the folding of R15 and R16 alone and in R1516. Fig. S7. Comparison of the folding of R16 and R17 alone and in R1617. Fig. S8. The mutation E106D does not disrupt interdomain interactions in R1516. Fig. S9. The mutation I18V does not disrupt interdomain interactions in R1617. Fig. S10. The mutation E105P destroys interactions between the domains.