CCG•CGG interruptions in high penetrance SCA8 families increase RAN translation and protein toxicity

Spinocerebellar ataxia type 8 (SCA8), a dominantly inherited neurodegenerative disorder caused by a CTG•CAG expansion, is unusual because most individuals that carry the mutation do not develop ataxia. To understand the variable penetrance of SCA8 we studied the molecular differences between highly penetrant families and more common sporadic cases (82%) using a large cohort of SCA8 families (N=77). We show that repeat expansion mutations from individuals with two or more affected family members have CCG•CGG interruptions at a higher frequency than sporadic SCA8 cases and that the number of CCG•CGG interruptions correlates with age at onset. At the molecular level, CCG•CGG interruptions increase RNA hairpin stability and steady state levels of SCA8 RAN polyAla and polySer proteins. Additionally, the CCG•CGG interruptions, which encode arginine interruptions in the polyGln frame increase the toxicity of the resulting proteins. In summary, CCG•CGG interruptions increase polyAla and polySer RAN protein levels, polyGln protein toxicity and disease penetrance and provide novel insight into the molecular differences between SCA8 families with high vs. low disease penetrance.


Introduction
1 Spinocerebellar ataxia type 8 (SCA8) is a microsatellite expansion disorder caused by a 2 bidirectionally transcribed CTG•CAG repeat expansion mutation within the 3 ATXN8OS/ATXN8 genes (Koob et al, 1999;Moseley et al, 2006). This slowly 4 progressive cerebellar ataxia is typically characterized by ataxia, spasticity, dysarthria 5 and nystagmus; however, extra-cerebellar features including psychiatric disturbances 6 and developmental delays have been reported (Ayhan et al, 2014;Day et al, 2000; the repeat tract. Interestingly, one to four CCG interruptions were detected in multiple 1 configurations among affected members of a large highly penetrant SCA8 family (MN-A) 2 and the number of interruptions often increases when passed from one generation to 3 the next (Moseley et al., 2000b). 4 Repeat interruptions have been reported to have different modifying effects in a 5 number of other microsatellite disorders. For several of these disorders (SCA1, SCA2 6 and FXS), sequence interruptions appear to stabilize repeat tracts found on 7 unexpanded alleles, and the loss of interruptions predisposes repeat tracts to expand 8 above the pathogenic threshold (Chung et al, 1993;Gunter et al, 1998;Imbert et al, 9 1996;Kunst & Warren, 1994;Pulst et al, 1996;Sanpei et al, 1996). In other cases, To investigate the effects of sequence interruptions in SCA8, we performed a detailed 3 genetic evaluation of expanded SCA8 alleles from a large cohort of SCA8 families 4 (N=77) including 199 expansion carriers (n=111 affected, n=88 asymptomatic). Disease 5 onset ranged from birth to 79 years with an average age of onset of 33.7 years ( Table   6 EV1). Although the mutation is transmitted in an autosomal dominant pattern, 7 surprisingly 82% (63/77) of these families had sporadic ataxia with no family history of 8 disease, 5% (4/77) had family histories that appeared recessive and only 13% (10/77) 9 showed the expected autosomal dominant inheritance pattern ( Fig 1A). Interestingly, 10 four of the sporadic and two familial cases are homozygous and have two expanded 11 alleles. These data and previous reports of expansion alleles in unaffected family 12 members and in the general population (Cellini et al, 2001;Ikeda et al., 2004;Moseley 13 et al, 2000a;Stevanin et al., 2000;Worth et al., 2000;Zeman et al, 2004) highlight the 14 need to understand the molecular basis of the variable penetrance found in SCA8 15 families.
16 17 SCA8 repeat length does not correlate with age of onset or predict disease status 18 Similar to previous reports (Ayhan et al., 2014;Ikeda et al., 2004;Juvonen et al., 2000;19 Zeman et al., 2004), we found no correlation in the number of SCA8 repeats and age of 20 onset (Fig EV1A), no significant difference in repeat length between affected patients 21 (median: 113 repeats) and asymptomatic carriers (median: 98 repeats; p=0.0672; Table   22 EV1) and a wide and overlapping range of repeat lengths in affected (54-1455) and  Table EV1). The lack of correlation of 1 repeat length and disease status is often seen in individual SCA8 families. For example, 2 in Fig 1B, individual I-2 carries an expansion of 1000 repeats yet remains 3 asymptomatic, while individual II-1 has an expansion of 849 repeats and presented with 4 disease at one year of age. Similarly, in Fig 1C, individual II-1 presented with disease at 5 age 40 with 133 combined repeats while her mother and two siblings, who carry SCA8 6 expansions of similar lengths, remain asymptomatic. Taken together, these data provide 7 additional evidence that repeat length is not a reliable predictor of disease or age of 8 onset and suggest other genetic or environmental modifiers contribute to the variable 9 penetrance of SCA8.  To better understand the effects of CCG•CGG interruptions on disease penetrance, we 3 compared the sequences of SCA8 expansion alleles in families with high (≥3 affected) 4 versus low disease penetrance. The seven-generation MN-A family (Day et al., 2000;5 Koob et al., 1999), the largest SCA8 family reported to date, has a much higher disease 6 penetrance than most SCA8 families (Ikeda et al., 2004) and CCG•CGG interruptions individual II-4 who was not affected at the time of examination but subsequently showed 15 signs of ataxia and in individual III-5 who was asymptomatic at age 41 ( Fig 1E).

16
CCG•CGG interruptions were found at a higher frequency in families with multiple 17 affected individuals: 100% (5/5) of families with three or more affected individuals, 18 28.6% (2/7) of families with two affected individuals and 13.9% (5/36) of sporadic cases.

19
Overall, CCG•CGG interruptions were found at a higher frequency in SCA8 families with 20 2 or more affected members compared to sporadic cases (n=48; p=0.0047; Table 1 widely among SCA8 families (Fig 1G), the number of CCG•CGG interruptions is 1 inversely correlated with, and accounts for 37% of the variation in age of onset 2 (R 2 =0.3709; p=0.0016; Fig 1H).

3
Taken together, these data demonstrate that CCG•CGG interruptions increase 4 disease penetrance and that the number of interruptions, and not repeat length, is 5 inversely correlated with age at onset in SCA8. To better understand the molecular effects of interrupted alleles we examined if 10 constructs containing CCG•CGG interruptions are more toxic to cells than pure 11 expansion constructs. T98 glial cells were transfected with length-matched constructs 12 containing pure or interrupted expansions cloned from patient DNA and expressed in 13 the CAG direction (Fig 2A). Interrupted expansions were cloned from individuals from 14 the high-penetrance multigeneration families shown in Fig 1F ( Fig EV2). Taken together, these data indicate that 3 CGG interruptions increase the toxicity of CAG repeats independent of RNA levels. Arginine-encoding CGG interruptions increase toxicity of polyGln proteins 6 Next, we tested the hypothesis that CGG interruptions increase the toxicity of expanded 7 alleles by affecting RAN and polyglutamine proteins expressed from the CAG repeat.

8
First, we examined if the arginine interruptions in the polyGln(Arg) proteins increase 9 their toxicity compared to pure polyGln proteins. To perform these experiments, we 10 generated minigene constructs to express polyGln and polyGln(Arg) using non-hairpin 11 forming alternative codons ( Fig 3A). This enables the toxicity of pure and interrupted 12 proteins to be assessed individually and independent of possible effects from CAG 13 expansion RNAs and RAN proteins. We focused these experiments on pure and 14 interrupted polyGln proteins because non-hairpin forming alternative codons are 15 available for both Gln and Arg. Transient transfections in T98 cells show that interrupted 16 polyGln(Arg) proteins expressed with alternative codons increased cell death by 25% 17 (p<0.05; Fig 3B) and decreased cell viability by 10% compared to pure polyGln proteins 18 (p<0.05; Fig 3C), independent of RNA levels ( Fig EV3A). Protein blot and 19 immunofluorescence analyses show that the pure and arginine interrupted polyGln 20 proteins have different properties. For example, the interrupted polyGln(Arg) proteins 21 migrate further into the gel (Fig 3D, EV3B) and show droplet-like nuclear staining not 22 found with pure polyGln proteins ( Fig 3G). These changes may contribute to the 23 increased toxicity of the polyGln(Arg) proteins. Surprisingly, substantially less 1 polyGln(Arg) compared to pure polyGln protein was detected by 1C2 antibody (Fig 3D-2 F). This may be caused by reduced affinity of the 1C2 antibody for the interrupted 3 protein or incomplete extraction of polyGln(Arg) proteins from nuclear aggregates.

4
Taken together, these data demonstrate that arginine interruptions increase the toxicity 5 of polyGln expansion proteins and that the increased toxicity of the interrupted 6 polyGln(Arg) proteins is independent of possible CAG RNA gain-of-function or RAN 7 protein.  Protein blots showed even higher increases (7.8-fold) in steady state levels of   The markedly reduced penetrance is one of the most puzzling features of SCA8 (Ikeda   2   et al., 2004;Koob et al., 1999;Stevanin et al., 2000;Worth et al., 2000). Here we show 3 that 82% of SCA8 families in a large cohort of SCA8 families have only a single affected  et al, 1989;Tian et al, 2000;Zu et al, 2020) and which has been recently shown 16 to be a major driver of RAN translation (Zu et al., 2020), is also increased by CGG 17 interruptions. Additionally, CGG interruptions introduce arginine amino acids into the 18 polyGln proteins which increases their toxicity. Taken together, these data demonstrate 19 that CCG•CGG interruptions act as cis-modifiers of SCA8 and provide a molecular 20 explanation for the dramatic variations in disease penetrance among SCA8 families. 21 We found CCG•CGG interruptions on expanded alleles in all families in our 22 cohort with three or more cases of SCA8. CCG•CGG interruptions were also identified 23 in sporadic SCA8 cases, but at a lower frequency. Additionally, we confirm that repeat 1 length in SCA8 is a poor predictor of disease penetrance (Ikeda et al., 2004;Stevanin et 2 al., 2000;Worth et al., 2000). Taken together, these data indicate that the inclusion of 3 sequence information during genetic testing, specifically the presence or absence of 4 CCG•CGG interruptions, will provide patients and families with additional information 5 relevant to disease penetrance. Sequence analyses will also further our understanding   tract. This method does not provide the sequence configuration. If the expansion size 23 was too large to perform PCR across the repeat or we were unable to draw blood from 1 the subject, the repeat length was estimated by a commercial diagnostic company.
2 Families found to have non-CGG interruptions were excluded from analysis of CGG 3 interruptions and disease penetrance (n=58 families sequenced in total). terminal tag in the CAG frame. Additionally, construct names denote the total CAG tract 16 length which, due to repeat instability during cloning, may not be the same total tract 17 length as the patient alleles used to clone the repeat sequences.   (1:10,000), and anti-GAPDH antibody (1:5,000) overnight at 4°C in blocking solution.

1
The membrane was incubated with species-specific HRP-conjugated secondary 2 antibody (Amersham) in blocking solution, and bands were visualized with the ECL plus 3 Western Blotting Detection System (Amersham). Quantification of protein expression 4 was performed using Image J. For dot blot quantification, Myc antibody signal for empty 5 vector transfections was used to perform background reduction. All protein levels are 6 normalised to pure repeat expansion protein levels. (1:1,000), or for 1hr at 37°C with anti-myc antibody (1:1,000). Cells were incubated with 14 AlexaFluor conjugated secondary antibodies for 1 hour at room temperature and were 15 mounted with ProLong Gold Antifade (ThermoScientific). Representative images were 16 taken using the ZEISS LSM 800 confocal microscope. Fisher's exact test. Linear regression analyses were performed to assess the 5 relationship between age of onset and repeat length or interruption number. All other 6 statistical analyses were performed using unpaired two-tailed Student's t-test or a one-7 way ANOVA with a Tukey's multiple comparison test, as appropriate. Data are reported 8 as mean ± SEM or mean ± SD.     An additional n=10 families, representing n=19 expansion carriers, were sequenced and 7 found to carry different interruptions.  Expanded View Table   1 2   Rel. polySer levels (a.u.) The folding free energy (ΔG) of hairpin structures for pure CAG and CGG interrupted repeat tracts for different interruption configurations, as predicted by m-fold (Zuker, 2003). Filled symbols represent sequences used for UV melting analyses. C The folding free energy (ΔG) of hairpin structures for SCA8 patient repeat expansions ( Figure 1G) and pure repeat tracts of the same length, as predicted by m-fold. Patient alleles are as follows: 48 repeats in length -(CAG) 7 (CGGCAG) 18 (CAG) 5 ; 53 repeats in length -(CAG) 8 (CGGCAG) 14 (CAG) 2 CGG(CAG) 5 CGG(CAG) 8 ; and 52 repeats in length -(CAG) 7 (CGGCAG) 16 (CAG) 4 CGG(CAG) 8 . Each symbol represents a single predicted hairpin structure; multiple hairpin structures, including branched hairpins, are predicted for SCA8 patient alleles and (CAG) 53 (Zuker, 2003).     Example UV melting absorbance curves (for Figure 5A) for pure and interrupted RNA oligos measured at 260nm monitored between 25°C and 95°C, recorded at 1°C intervals.