Sequences in the cytoplasmic tail of SARS-CoV-2 spike facilitate syncytia formation

The spike (S) protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) binds the cell surface protein ACE2 to mediate fusion of the viral membrane with target cells1–4. S comprises a large external domain, a transmembrane domain (TMD) and a short cytoplasmic tail5,6. To elucidate the intracellular trafficking of S protein in host cells we applied proteomics to identify cellular factors that interact with its cytoplasmic tail. We confirm interactions with components of the COPI, COPII and SNX27/retromer vesicle coats, and with FERM domain actin regulators and the WIPI3 autophagy component. The interaction with COPII promotes efficient exit from the endoplasmic reticulum (ER), and although COPI-binding should retain S in the early Golgi system where viral budding occurs, the binding is weakened by a suboptimal histidine residue in the recognition motif. As a result, S leaks to the surface where it accumulates as it lacks an endocytosis motif of the type found in many other coronaviruses7,8. It is known that when at the surface S can direct cell:cell fusion leading to the formation of multinucleate syncytia9–11. Thus, the trafficking signals in the cytoplasmic tail of S protein indicate that syncytia formation is not an inadvertent by-product of infection but rather a key aspect of the replicative cycle of SARS-CoV-2 and potential cause of pathological symptoms.

To dissect the roles of the different coat proteins we mapped the regions that they bind on the 37 residue S protein tail (Fig. 1b,c). The tail comprises two distinct sections (Fig.   1d). The membrane proximal half (1234-1254) contains eight cysteines which are known to be palmitoylated in the equivalent region in SARS and other coronaviruses and, once modified, are likely to be embedded in the surface of the bilayer 25,26 . The distal half of the tail (1255-1273) lacks cysteines and so will project into the cytoplasm. Testing GST fusions to these two halves showed that all the interactors bound to the distal region with the exception of SNX27 which exclusively bound to the cysteine-rich region (Fig. 1c, Extended Data Figs. 1b and 2a). To map binding at higher resolution we tested tails with adjacent pairs of residues mutated to alanine (Fig. 1b). COPII binding was reduced by mutations in the acidic stretch DEDDSE that contains three copies of the di-acidic ER exit motif that binds to the Sec24 subunit of the coat 27,28 . In contrast, COPI binding required the residues in the C-terminal KXHXX motif that was also found to be required for this interaction in SARS S protein 22 . The FERM domain proteins required residues between the COPI and COPII binding sites, and SNX27 binding required residues in the Nterminal half of the tail nearest the TMD.
To further validate the interactions with SNX27, WIPI3 and the FERM domain proteins, we used recombinant proteins to test direct binding. Moesin is known to interact with plasma membrane proteins via its N-terminal FERM domain, and this part of the protein bound directly to the S protein tail, with residues 1261SEPV being essential (Fig. 2a). The autophagy regulator WIPI3, when expressed in E. coli, also bound directly to the membrane distal half of the tail (Fig. 2b). Recombinant SNX27 also bound to the tail in vitro, with residues 1238TSC next to the transmembrane domain being important, with the same residues found to be required for recruitment of SNX27 from cell lysate (Fig. 2c, Extended Data Fig. 2). SNX27 associates with retromer via the latter's VPS26 subunit, and we found that VPS26 is recruited to the tail of S protein by the addition of SNX27 indicating that the tail can bind SNX27 whilst it is in a complex with retromer (Fig. 2d).
Interestingly, some tail mutants that lost SNX27 binding retained binding to retromer indicating that this complex can also bind elsewhere in the tail.
Mutation of residues required for SNX27 or moesin binding in full length S protein did not detectably alter its intracellular distribution or accumulation on the plasma membrane (Extended Data Fig. 3), indicating that these interactions do not have a role in cell surface delivery. Indeed, the binding of SNX27, although very efficient, is to the region of the tail that will be palmitoylated in host cells and so its in vivo significance remains unclear. We thus examined the contribution of the COPI and COPII binding sites to the subcellular distribution of S protein. Mutation of the acidic residues in the COPII binding region greatly reduced cell surface expression, with S protein accumulating in the ER, indicating that these residues direct efficient egress of the newly-made S protein into the secretory pathway ( Fig. 3a-c). The COPI binding region comprises KLHYT which differs somewhat from the canonical KXKXX or KKXX C-terminal COPI binding motif 29,30 . To test COPI binding we expressed a chimeric protein comprised of the S protein tail fused to the extracellular and TMD domains of the cell surface protein CD86. Immunoprecipitation of this chimera from transfected cells revealed co-precipitation with COPI and this was lost when lysine K1269 was mutated to alanine (Fig. 3d,e). When H1271 was altered to a canonical lysine, binding was substantially increased by both co-precipitation and affinity chromatography of cytosol, indicating that S protein has a suboptimal COPI binding site ( Fig. 3e,f). In the above alanine scanning of the tail, mutation of the terminal residue T1273 to alanine was found to increase COPI binding (Fig. 1b), and this effect was recapitulated with the equivalent mutation in the CD86 chimera, suggesting the Cterminal threonine is a further feature of the tail that reduces its affinity for COPI (Fig. 3e).
Extension of the tail with a C-terminal epitope tag also resulted in a loss of COPI binding indicating that, despite containing a histidine, the C-terminus is being recognised like a KXKXX motif that has to be at the C-terminus 30 . This also explains why the COPI interaction was not detected in recent proteomic searches for interaction partners of SARS-CoV-2 proteins as these used C-terminally tagged proteins 31,32 .
Incorporation of the K1269A COPI binding site mutation into full length S protein caused, at most, only a small increase in the cell surface expression (Fig. 3g,h). In contrast, the H1271K and T1273A mutations that increased COPI binding both caused S protein to instead accumulate intracellularly with substantial co-localisation with the ER (Fig. 3g,h).
Thus, the COPI binding site in S has conserved features that reduce its in vivo efficacy, and so allow it to reach the cell surface. Some other coronaviruses, including those that have histidine at the -3 position in the tail of S, have been found to be efficiently endocytosed if they reach the surface 7,8 . In these cases, endocytosis requires a tyrosinecontaining motif that resembles the classic Yxxφ signal, but the S protein of SARS-CoV-2 lacks such a motif and consistent with this we found that it not only accumulated on the surface but did not show efficient endocytic uptake ( Fig. 4a and Extended Data Fig. 3).
What might be the reason for S protein to accumulate at the cell surface? SARS-CoV-2, like other coronaviruses, buds into intracellular membranes and so S protein that has reached the plasma membrane will not contribute to virion formation but it is in a position to cause infected cells to fuse to adjacent cells and so facilitate spread without virion release. We thus tested the effect of the mutations in the COPI binding site on the degree of cell fusion induced by S protein. 293T cells were transfected with a plasmid expressing S protein and mixed with Vero cells which endogenously express the S protein receptor ACE2, and cell fusion then followed with a fluorescent assay 33 . The K1269A mutant that prevents COPI binding caused a small but reproducible increase in cell fusion (Fig. 4b,c).
In contrast, the H1271K mutation that binds better to COPI resulted in greatly reduced cell fusion, with the mutant S showing reduced levels of S1/S2 cleavage, consistent with it not moving beyond the early Golgi (Fig. 4d).
The data presented here argue that the S protein of SARS-CoV-2 has three features which facilitate accumulation on the plasma membrane. Firstly, a region containing diacidic COPII binding motifs directs efficient exit from the ER. Secondly, the COPI-binding site is suboptimal which allows S protein to escape the Golgi apparatus. Consistent with this, the S protein of the coronavirus porcine epidemic diarrhea virus (PEDV) has a related C-terminal sequence (-KVHVQ) and was found to bind COPI with a much lower affinity than canonical KXKXX motifs (Fig. 4a) 34 . Finally, the S-protein of SARS-CoV-2 is not efficiently endocytosed, consistent with it lacking a tyrosine-containing motif of the sort that is found in many coronaviruses, including PEDV, and has been shown to either induce endocytosis or prevent movement beyond the Golgi depending on the virus ( Once at the cell surface, S protein is able to induce cell fusion and hence the formation of multinucleate syncytia. For other coronaviruses, including SARS, it is known that in virusinfected cells the M protein holds some S protein in the Golgi so as to direct its packaging in virions 13,22,38 . However, this does not preclude some S reaching the surface. Indeed, syncytia have been observed in cultured airway epithelial cells infected with SARS-CoV-2 in vitro, and in post mortem samples of patients who have died of COVID-19 9,10,39 . Thus, it is clear that some SARS-CoV-2 S can reach the surface during an infection. Our analysis of the tail of S protein suggests that this is not simply an irrelevant side reaction of virion production but rather a feature of S that is a consequence of the sequence of its cytoplasmic tail. Like other viruses, syncytium formation may be advantageous to SARS-evade immune surveillance 40 . Indeed, it has been argued that syncytia formation may be a common infection strategy amongst respiratory viruses 41 . It also seems quite conceivable that the formation of large syncytia increases viral pathogenicity by destablising airway epithelia or creating holes that are more challenging to repair than the loss of single infected cell.
Our findings thus argue that syncytia formation by SARS-CoV-2 should be viewed as a potential target for therapeutic strategies, especially as the process appears entirely dependent on the cell surface protease TMPRSS2 to make the activating S2' cleavage, whereas viral entry can be facilitated by either TMPRSS2 or lysosomal cathepsins 1,9 .

Data Availability
Mass spectrometry data used in this study are summarised in Supplementary Table 1. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE 42 partner repository with the dataset identifier PXDXXXXX (https://www.ebi.ac.uk/pride/archive/projects/PXDXXXXX). We made use of information in the UniProt data base https://www.uniprot.org/. All reagents generated by this study are available from the corresponding author on request.

Acknowledgments
We thank Manu Hegde for comments on the manuscript, John James and Natalya

Competing Interests
The authors declare no competing interests.

Methods
Plasmids. Details of the plasmids used in this report can be accessed from Supplementary Table 2 Full length S protein constructs used for expression in mammalian cells: a sequence encoding SARS-CoV-2 S was codon optimised for expression in mammalian cells and cloned into pcDNA3.1+ (modified to be compatible with the PiggyBac transposase system) using the restriction sites NheI and NotI. Where indicated, an HA tag was inserted after the signal peptide by introduction into the forward primer, amplification by PCR and insertion into pcDNA3.1+. Key residues in the cytoplasmic tail of S were mutated as indicated by introducing mutations into primers, amplification of a small region at the 3' end of the gene and insertion using the restriction sites BstEII and NotI. For GFP-CD86 chimeric fusions, GFP and a GAGAGS linker was inserted immediately downstream of the signal peptide of CD86 (the cDNA from John James) using Gibson assembly and inserted into modified pcDNA3.1+ using the restriction sites NheI and NotI. DNA fragments containing a short luminal region of CD86 (with a membrane-proximal insertion of two aspartic acid residues) and the TMD of CD86 fused to different mutant forms of the cytoplasmic tail of S were either synthesised (Genewiz) or amplified by PCR. The short fragments were cloned into the GFP-CD86 vector using the restriction sites BbvCI and NotI. GFP, with a short C-terminal linker, was cloned into pcDNA3.1+ using the restriction sites NheI and NotI. From an overnight starter culture, cells were grown in 2xTY medium containing 100 µg/mL ampicillin (or 50 µg/mL Kanamycin for 6xHis-VPS26) and 34 µg/mL chloramphenicol at 37ºC in a shaking incubator. When the culture reached OD600 = 0.6 -0.8, the temperature was lowered to 16ºC, protein expression was induced with 100 µM of Isopropyl β-D-1-thiogalactopyranoside (IPTG), and incubated overnight. Bacteria cells were harvested by centrifugation at 4,000 x g at 4°C for 15 minutes and were mechanically resuspended on ice in lysis buffer containing 50 mM Tris, pH 7.4, 150 mM NaCl, 1 mM EDTA, 5 mM 2-mercaptoethanol, 1% Triton X-100, and supplemented with protease inhibitor cocktail (cOmplete, Roche). Cells were lysed by sonication and the lysates were clarified by centrifugation at 20,000 x g at 4ºC for 15 minutes. Clarified lysates were flash frozen in liquid nitrogen and thawed as needed for the binding assays.
GST-pulldowns using 293T cell lysates. Pull downs for mass spectrometry: clarified lysates from 450 mL 2xTY cultures containing bacteria expressing recombinant GST, GST-S tails (product of pJC149, pJC150, pJC247) were thawed. 100 μL of glutathione Sepharose 4B bead slurry (GE17-0756-01) was washed twice with lysis buffer (50 mM Tris, pH 7.4, 150 mM NaCl, 1mM EDTA, 5 mM 2-mercaptoethanol, 1% Triton X-100) by centrifugation at 100 x g for 1 minute at 4ºC and aspiration of the washing buffer. Clarified bacterial lysates were added to the Glutathione Sepharose beads and incubated at 4ºC for 1 hour on a tube roller. 293T cells (from four confluent T175 flasks per GST-tagged bait) were collected by scraping and lysed with lysis buffer supplemented with protease inhibitor cocktail (EDTA-free, cOmplete, Roche). The lysate was clarified by centrifugation for 5 minutes at 17,000 x g and pre-cleared with 100 μL of Glutathione Sepharose bead slurry per bait. Beads loaded with recombinant GST-tagged baits were washed once with ice-cold lysis buffer, once with lysis buffer supplemented with 500 mM NaCl, and once again with lysis buffer. Around 5% of the beads were kept aside as an input control and the remaining beads were incubated with the pre-cleared 293T cell lysate for 2-4 hours on a tube roller at 4ºC. Beads were washed twice with lysis buffer, transferred to 0.8 mL centrifuge columns (Pierce 89869B) and washed twice more. Columns were brought to room temperature and eluted 5 times with 100 µL of elution buffer (1.5 M NaCl in lysis buffer) by centrifugation at 100 x g for 1 minute; for the final elution the sample was centrifuged at 17,000 x g for 1 minute. Eluates were pooled together and concentrated down to around 75 μL using an Amicon Ultra 0.5 mL 3,000 NMWL centrifugal filter Database search parameters were set with a precursor tolerance of 10 ppm and a fragment ion mass tolerance of 0.2 Da. One missed enzyme cleavage was allowed and variable modifications for oxidized methionine, carbamidomethyl cysteine, pyroglutamic acid, phosphorylated serine, threonine and tyrosine were included. MS/MS data were validated using the Scaffold programme (Proteome Software Inc., USA). All data were additionally interrogated manually. The data presented was exported from Scaffold as total spectral counts with the protein threshold was set at 80%, the minimum number of peptides was set as 2 and the peptide threshold was set at 50%. the Lower Scoring Matches, and those of <5% probability were not shown.
Analysis of mass spectral intensities. All raw files were processed with MaxQuant v1.5.5.1 using standard settings and searched against the UniProt Human Reviewed KB with the Andromeda search engine integrated into the MaxQuant software suite 43,44 .
Enzyme search specificity was Trypsin/P for both endoproteinases. Up to two missed cleavages for each peptide were allowed. Carbamidomethylation of cysteines was set as fixed modification with oxidized methionine and protein N-acetylation considered as variable modifications. The search was performed with an initial mass tolerance of 6 ppm for the precursor ion and 0.5 Da for MS/MS spectra. The false discovery rate was fixed at 1% at the peptide and protein level. Statistical analysis was carried out using the Perseus module of MaxQuant 45 . Prior to statistical analysis, peptides mapped to known contaminants, reverse hits and protein groups only identified by site were removed. Only protein groups identified with at least two peptides, one of which was unique and two quantitation events were considered for data analysis. Each protein had to be detected in  Internalised anti-HA AF488 conjugate was inaccessible to quenching and therefore any associated AF488 signal was equated to levels of internalised S. Conversely, only anti-HA AF488 conjugate at the cell surface was accessible by the anti-mouse AF647 secondary antibody and therefore any associated AF647 signal was equated to levels of non-internalised S.

Comparison of internal and external levels of S by immunofluorescence. U2OS
cells were seeded at a density of 2x10 4 cells/cm 2 in 6-well plates in culture medium in a humidified incubator at 37°C with 5% CO2. 24 hours after seeding, cells were transfected with 1-2 μg of plasmid DNA encoding different N-terminally HA-tagged S cytoplasmic tail mutants using PEI. 24 hours after transfection, cells were washed once in EDTA solution and dissociated from the flask in trypsin for 2 minutes at 37°C and seeded onto coated microscope slides (Hendley-Essex) in culture medium in a humidified incubator at 37°C with 5% CO2. 24 hours after seeding, cells were washed with PBS and fixed in 4% PFA in PBS for 20 minutes at room temperature. Cells were incubated in an anti-HA AF647          .

T S C C S CL K GC C SCG SC C KF DE DDS EP VLK GV KL HY T SARS-CoV-1 .T S C C S CL K GA C SCG SC C KF DE DDS EP VLK GV KL HY T MERS-CoV
. HCoV-HKU1 . HCoV-OC43 .