Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein

Significance The structure of the SARS-CoV-2 ORF8 protein reveals two novel intermolecular interfaces layered onto an ORF7 fold. One is mediated by a disulfide bond, the other is noncovalent, and both are novel with respect to SARS-CoV. The structural analysis here establishes a molecular framework for understanding the rapid evolution of ORF8, its contributions to COVID-19 pathogenesis, and the potential for its neutralization by antibodies.

T he severity of the current COVID-19 pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) relative to past outbreaks of Middle East respiratory syndrome, SARS, and other betacoronaviruses in humans begs the question as to its molecular basis. The accessory protein ORF8 is one of the most rapidly evolving betacoronavirus proteins (1)(2)(3)(4)(5)(6)(7). While ORF8 expression is not strictly essential for SARS-CoV and SARS-CoV-2 replication, a 29-nucleotide deletion (Δ29) that occurred early in human to human transmission of SARS-CoV, splitting ORF8 into ORF8a and ORF8b, is correlated with milder disease (8). A 382-nucleotide deletion (Δ382) in SARS-CoV-2 (9, 10) was also found to correlate with milder disease and a lower incidence of hypoxia (11).
SARS-CoV-2 ORF8 is a 121-amino acid (aa) protein consisting of an N-terminal signal sequence followed by a predicted Ig-like fold (12). With <20% sequence identity to SARS-CoV ORF8, SARS-CoV-2 ORF8 is remarkably divergent. ORF8 proteins from both viruses possess a signal sequence for endoplasmic reticulum (ER) import. Within the lumen of the ER, SARS-CoV-2 ORF8 interacts with a variety of host proteins, including many factors involved in ER-associated degradation (13). Presumably, ORF8 is secreted, rather than retained in the ER, since ORF8 antibodies are one of the principal markers of SARS-CoV-2 infections (14). Several functions have been proposed for SARS-CoV-2 ORF8. ORF8 disrupts IFN-I signaling when exogenously overexpressed in cells (15). It has been shown that ORF8 of SARS-CoV-2, but not ORF8 or ORF8a/ORF8b of SARS-CoV, down-regulates MHC-I in cells (16).
These observations suggest the relationship between ORF8 structure, function, and sequence variation may be pivotal for understanding the emergence of SARS-CoV-2 as a deadly human pathogen. Yet not only is there no three-dimensional structure of any ORF8 protein from any coronavirus, there are no homologs of known structure with sequence identity sufficient for a reliable alignment. SARS and SARS-CoV-2 ORF7a are the most closely related templates of known structure (17), yet their core is approximately half the size of ORF8, and their primary sequence identity is negligible. Therefore, we determined the crystal structure of SARS-CoV-2 ORF8. The structure confirms the expected Ig-like fold and overall similarity of the core fold to SARS-CoV-2 ORF7a. The structure reveals two novel dimer interfaces for SARS-CoV-2 ORF8 unique relative to all but its most recent ancestors in bats. Together, our results set the foundation for elucidating essential aspects of ORF8 biology to be leveraged for the development of novel therapeutics.

Results
We generated SARS-CoV-2 ORF8 protein by expression in Escherichia coli and oxidative refolding. The structure of SARS-CoV-2 ORF8 was determined by X-ray crystallography at a resolution of 2.04 Å ( Fig. 1 A and B). Side-chain density was visible throughout most of the density map, and an atomic model was built ab initio into the density (Fig. 1C). ORF8 crystallized as a covalent dimer with three sets of intramolecular disulfide bonds per monomer and a single intermolecular disulfide bond formed by Cys20 of each monomer (Fig. 1B). The core of each ORF8 monomer consists of two antiparallel β-sheets ( Fig. 1 D and E). The smaller sheet consists of β2, β5, and β6, while the larger is formed from β3, β4, β7, and β8. β8 also contributes to a short parallel β-sheet with β1 ( Fig. 1 D and E).
ORF8 has a 16% sequence identity with the SARS-CoV-2 ORF7a protein. The Ig-like fold of ORF7a (Protein Data Bank [PDB] ID code 6W37) aligns with the ORF8 monomer with a

Significance
The structure of the SARS-CoV-2 ORF8 protein reveals two novel intermolecular interfaces layered onto an ORF7 fold. One is mediated by a disulfide bond, the other is noncovalent, and both are novel with respect to SARS-CoV. The structural analysis here establishes a molecular framework for understanding the rapid evolution of ORF8, its contributions to COVID-19 pathogenesis, and the potential for its neutralization by antibodies.  distance matrix alignment (DALI) Z score = 4.5 and an rmsd = 2.5 ( Fig. 2B). Based on the structural alignment, ORF8 and ORF7a share two sets of structural disulfide linkages that are central to the Ig-like fold ( Fig. 2 A and C). Punctuated between what would be β3 and β5 of ORF7a is an ORF8-specific region of ∼35 aa from residues 46 to 83 that are structurally distinct from ORF7a and other Ig-like folds ( Fig. 2 A, B, and D). These residues are responsible for a third, ORF8-specific, disulfide ( Fig. 2 A and C).
SARS-CoV-2 ORF8 separates into two distinct species when analyzed by size exclusion chromatography (SEC). Comparison of the elution peak volumes with molecular weight standards suggests dimeric and monomeric forms (Fig. 3A). The dimeric form of ORF8 has been observed using a tobacco BY2 cell expression system (18). The asymmetric unit within the crystal contains the dimeric form (Fig. 3B) and exhibits imperfect twofold noncrystallographic symmetry. This same disulfide-linked dimer was observed in the selenomethionine (SeMet) C2 1 crystal that provided the initial phasing information used for structure determination as well as an independent structure of SARS-CoV-2 ORF8 subsequently determined (PDB ID code 7JX6). The dimer is linked by an intermolecular disulfide bridge formed between two copies of Cys20 (Fig. 3B). Generation of the surface electrostatic potential for each monomer shows the interfaces are complementary (Fig. 3B). The dimeric interface amounts to ∼1,320 Å 2 in buried surface area (19). The intermolecular bonds are primarily contributed by β1 and β8 residues and the loop joining β3 and β4 (   Fig. 4. Conserved and unique features of SARS-CoV and SARS-CoV-2 ORF8. Primary sequence alignment of SARS-CoV-2 ORF8, SARS-CoV ORF8, and closely related ORF8 homologs found in bat betacoronavirus strains (3). Secondary structure assignments (blue cartoon arrows) correspond to the SARS-CoV-2 ORF8 structure. Putative and conserved biochemical features are highlighted: cysteines−disulfides (salmon), cis-proline (blue), residue 84 (green; SARS-CoV-2 ORF8 numbering), N-glycosylation site (yellow; predicted site for SARS-CoV-2) (12). Asterisks designate residues contributing to the "covalent" dimer interface. Open squares designate residues contributing to the alternate dimeric interface. Dots designate intervals of 10 aa according to SARS-CoV-2 ORF8 numbering. interaction with its symmetry-related counterpart. This central hydrophobic region is flanked by salt bridges between Arg115 and Asp119 (Fig. 3C). The ends of the interface are stabilized by a "clamp" loop which makes main-chain hydrogen bonding interactions between Phe120 of one subunit and Ala51 and Arg52 of the other (Fig. 3D). The guanidino group of Arg52, in turn, reaches across and forms a hydrogen bond with the carbonyl group of the Ile121 main chain. An additional hydrogen bond is formed between e-amino group of Lys53 and Ser24. These features are near identical on the opposite side of the interface, owing to the symmetry of the dimer. Sequence alignment of SARS-CoV-2 ORF8 and its closely related bat betacoronavirus RaTG13 and SL-CoVZC45 orthologs revealed the six Cys forming intramolecular disulfide bridges are all conserved. These Cys are also conserved in SARS-CoV and most of its relatives (Fig. 4), and are also present in the corresponding regions of SARS-CoV ORF8a and ORF8b. However, Cys20 is not conserved in SARS-CoV ORF8 and bat viruses clustering phylogenetically with human SARS-CoV (Fig. 4). The residues immediately surrounding Cys20 are also conserved in the most recent bat precursors of SARS-CoV-2. The features thus required for the overall fold of the ORF8 monomer are well preserved across SARS-CoV, SARS-CoV-2, and related betacoronavirus. The covalent dimer is, however, an evolutionarily recent addition among human betacoronaviruses unique to SARS-CoV-2.
Phylogenetic analysis of SARS-CoV-2 strains revealed two predominant isoforms of ORF8 in circulation, containing either Leu84 or Ser84 (4, 7). The structure reported here is of the Leu84 form. In the ORF8 structure, residue Leu84 is flanked by the disulfide-forming Cys83 on one side and by Pro85 on the other (Fig. 1C). Pro85 adopts the unusual cis conformation. Both Cys83 and Pro85 are conserved among ORF8 orthologs. Given the position of Leu84, with its solvent-exposed side chain, it seems unlikely mutation would influence overall tertiary structure. Leu84 is also distal to both novel SARS-CoV-2−specific dimer interfaces. The biological role of residue 84 remains to be B QECSLQSCTQHQPYVVDDPCPI HFYSKWYI RVGARKSAPLI ELCVDEAGSKSPI QYI DI GNYTVSCLPFTI NCQEPKLGSLVVRCSFYEDFLEYHDVRVVLDFI determined, but its unusual positioning controlled by a disulfide and cis Pro together suggests its likely role in function. Another major region of sequence unique to SARS-CoV-2 and its closest relatives begins immediately after Cys61 and extends until just before the Cys83−Leu84−Pro85 conserved motif. Structure alignment of chains A and B suggests this region is somewhat plastic (Fig. 5A). Further comparison with PDB 7JX6 shows how this unique region can adopt different conformations (Fig. 5A). A SARS-CoV-2−specific 73 YIDI 76 motif occurs at the center of this unique region. The YIDI motif is responsible for stabilizing an extensive noncovalent dimer interface in the crystal, scored as highly significant (19) on the basis of its 1,700 Å 2 of buried surface area and hydrophobicity. This suggests the noncovalent dimer seen in the crystal is a special feature of SARS-CoV-2 absent in SARS-CoV. This same noncovalent dimer interface was observed in the SeMet C2 1 crystal that provided the initial phasing information used for structure determination. It did not, however, appear in PDB ID code 7JX6, which was determined in space group P4 3 2 1 2. The combination of Leu95, Ile58, Val49, and Pro56 forms a hydrophobic interaction with Tyr73 of the YIDI motif (Fig. 5 C and D). Crystallographic contacts revealed an extensive array of hydrophobic interactions between residues 71 and 75 of chain A that interdigitate with the corresponding residues of a symmetry-related copy of chain B, distinct from the B subunit of the covalent dimer ( Fig. 5 E and F). The center of the interface comprises a twostranded parallel β-sheet that is distinct from the core β-sheets of the monomeric Ig-like fold. Taken in combination with the Cys20mediated covalent dimer interface, the structure shows that two sequence regions unique to SARS-CoV-2 control the oligomerization and crystal packing of ORF8, and potentially mediate higher-order macromolecular assemblies unique to SARS-CoV-2 (Fig. 5G). These observations show how recent evolutionary changes in the sequence of SARS-CoV-2 relative to its more benign precursors could contribute to a unique higher-order assembly mediating unique functions in immune evasion and suppression.

Discussion
The structure of SARS-CoV-2 ORF8 provides a framework for understanding the relationship between sequence features, viral evolution, and pathogenesis, but many key questions remain unanswered. Many putative cellular interactors have been identified (13,20), but the mechanism of action of ORF8 with respect to these interactors is still unclear. As structures are determined of ORF8 complexes with host receptors, we would expect to see ORF8 bound in these complexes as a covalent dimer. It is less certain whether to expect the YIDI motifmediated noncovalent dimer interface to persist in some functional complexes. Given the high solvent exposure of these hydrophobic residues and their uniqueness to SARS-CoV-2 and its closest relatives, it is reasonable to expect them to be important for pathogenesis. It may be that these residues engage directly with host targets in some cases, which would presumably not be compatible with the persistence of the noncovalent ORF8 dimer. Alternatively, other regions may mediate direct interactions with receptors, while the YIDI interface promotes higher-order assembly and increases the avidity of binding to receptor oligomers, or induces the oligomerization of receptors. It is intriguing that ORF8 antibodies are major serological markers of SARS-CoV-2 infection. It will be important to define the immunogenic epitopes on SARS-CoV-2 ORF8, and whether they include the YIDI motif. To the extent that antibodies react with putative functional surfaces of ORF8 such as the YIDI motif, it will be important to determine whether the antibodies are neutralizing with respect to the pathogenic functions of ORF8 and, if so, whether these properties can be exploited therapeutically.

Materials and Methods
Protein Expression and Purification. The gene for wild-type SARS-CoV-2 ORF8 18 to 121 was subcloned from a complementary DNA kindly provided by D. Gordon and N. Krogan, University of California, San Francisco, CA into the pET His6 tobacco etch virus (TEV) ligation independent cloning (LIC) cloning vector (2B-T). The plasmid was transformed into E. coli strain BL21(DE3) Rosetta pLysS (QB3 MacroLab, University of California, Berkeley), then expressed overnight (0.5 mM isopropyl β-D-1-thiogalactopyranoside) at 20°C in Luria Broth containing ampicillin and chloramphenicol. Cells were pelleted and resuspended in lysis buffer [50 mM Tris pH 8.0, 2 mM (ethylenedinitrilo)tetraacetic acid (EDTA), 100 mM NaCl, 1 mM dithiothreitol, 0.5% Triton-X100] supplemented with protease inhibitors (Roche) and lysed by sonication. Lysate was clarified by centrifugation, and the pellet was washed with lysis buffer and sonicated for an additional 10 min to homogenize. Suspension was again clarified by centrifugation, and the pellet was resuspended in solubilization buffer (100 mM Tris pH 8.5, 6 M guanidine hydrochloride, 10 mM reduced glutathione) followed by Dounce homogenization and incubated at room temperature with rocking for 1 h. Insoluble particulates were removed by centrifugation, and the supernatant was applied to nickel-charged agarose (GE Healthcare) preequilibrated in solubilization buffer. The resin was washed with solubilization buffer, and bound His-tagged ORF8 eluted with elution buffer (100 mM Tris pH 8.5, 6 M guanidine hydrochloride, 10 mM reduced glutathione, 350 mM imidazole). Solubilized ORF8 was added drop-wise to a 50-fold excess of cold refolding buffer (50 mM Tris pH 8.0, 500 mM L-arginine, 2 mM EDTA, 5 mM reduced glutathione, 0.5 mM oxidized glutathione, 0.2 mM phenylmethylsulfonyl fluoride) over a period of 2 h with gentle stirring followed by overnight incubation at 4°C. The refolding solution was filtered, concentrated, and applied to a Superdex S75 size-exclusion column equilibrated in buffer (20 mM Tris·HCl, pH 8.0, 150 mM NaCl, 1 mM EDTA). Folded ORF8 eluted as two peaks corresponding to monomer and dimer. Both peaks were pooled and incubated with TEV protease overnight. Cleavage products were passed through nickel-charged agarose (GE Healthcare) to remove any uncleaved ORF8. The flow-through was concentrated and applied to a Superdex S75 size-exclusion column equilibrated in buffer (20 mM Tris·HCl, pH 8.0, 150 mM NaCl, 1 mM EDTA). Both monomer and dimer peaks were pooled and concentrated.
The plasmid containing the ORF8 V32M, L84M double-mutant coding sequence was generated by QuikChange site-directed mutagenesis according to the manufacturer's instructions (Agilent). SeMet-labeled ORF8 double-mutant protein was expressed using SelenoMet Medium Base according to manufacturer's instructions (Molecular Dimensions). Purification was carried out in the same manner as wild type, but the TEV cleavage step was omitted.
The expression construct has been made available at http://www.addgene.org.
X-Ray Crystallography. Crystals of wild-type ORF8 were grown using the hanging drop vapor diffusion method at 18°C. Two microliters of the protein sample (7.8 mg·mL −1 ) was mixed with 2 μL of reservoir solution and suspended over a 500-μL reservoir of 100 mM sodium dihydrogen phosphate pH 6.5, 12% (wt/vol) PEG8000. Crystals appeared after 5 d and continued to grow for approximately 1 wk. Crystals were transferred into cryoprotectant (100 mM sodium dihydrogen phosphate pH 6.5, 12% [wt/vol] PEG8000, 20 mM Tris pH 8.0, 150 mM NaCl, 1 mM EDTA, 30% [vol/vol] glycerol) and flash frozen by plunging into liquid nitrogen. A native dataset was collected from a single crystal of space group P4 1 2 1 2 under cryogenic conditions (100 K) at a wavelength of 1.00001 Å using a Dectris Pilatus3 S 6M detector (Beamline 5.0.2, Advanced Light Source, Lawrence Berkeley National Laboratory). Crystals of SeMet-labeled V32M/L84M ORF8 were produced using the same reservoir condition as wild type and a lower concentration of protein (2.5 mg·mL −1 ). Crystals exhibiting a different morphology to wild type appeared after 2 d and continued to grow for ∼4 d. A single-wavelength anomalous dispersion selenium peak dataset in space group C2 1 was collected at a wavelength of 0.97903 Å.
The diffraction data were indexed and integrated using X-ray detector software (21). Integrated reflections were scaled, merged, and truncated using the CCP4 software suit (22). Initial phases were obtained for the SeMet dataset using the Phenix autosol pipeline (23). This solution led to a map that enabled the full ORF8 main chain to be traced, which was partially refined and then used as a search model for molecular replacement with respect to the native dataset using Python-based Hierarchical EnviroNment for Integrated Xtallography (PHASER) (24). The space group of the native dataset is P4 1 2 1 2 with one ORF8 dimer in the asymmetric unit. Iterative rounds of manual model building and refinement were carried out using Coot (25) and Phenix Refine (23), respectively (for statistics, see SI Appendix, Table S1). Structural figures were produced using the program PyMOL (https://pymol.org/2/). Sequence Alignment. Primary sequences of ORF8 and ORF7 from human and bat isolates were collected from GenBank. Sequences used for alignment were selected based on previous phylogenetic analyses (12). Primary sequence alignment of ORF8 homologs was performed using the ClustalOmega server (26). Sequence alignment of ORF7a and ORF8 and secondary structure annotations were guided by Define Secondary Structure of Proteins through the DALI server (27) and manually adjusted based on visual inspection of the structures.
Data Availability. Coordinates have been deposited in the PDB, http://www. wwpdb.org (PDB ID code 7JTL). The expression construct has been made available at http://www.addgene.org.