TARGETED PROTEOMICS FOR THE DETECTION OF SARS-COV-2 PROTEINS

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of coronavirus disease 2019 (COVID-19). The rapid, sensitive and specific diagnosis of SARS-CoV-2 by fast and unambiguous testing is widely recognized to be critical in responding the current outbreak. Since the current testing capacity by conventional PCR based methods is insufficient because of shortages of supplies such as RNA extraction kits and PCR reagents, alternative and/or complementary testing assays should be developed. Here, we exploit the potential of targeted mass spectrometry based proteomic technologies to solve the current issue of insufficient SARS-CoV-2 diagnostic testing capacity. We have assessed the limit of detection by parallel reaction monitoring (PRM) on an Orbitrap Eclipse mass spectrometer for target tryptic peptides of several SARS-CoV-2 proteins from a sample of virus infected Vero cells. For Nucleocapsid protein the limit of detection was found to be in the mid-attomole range (0.9 x 10-12 g), which would theoretically correspond to approximately 10,000 SARS-CoV-2 particles, under the assumption that all viral proteins are assembled in macromolecular virus particles. Whether or not this sensitivity is sufficient to play a role in SARSCoV-2 detection in patient material such as swabs or body fluids largely depends on the amount of viral proteins present in such samples and is subject of further research. If yes, mass spectrometry based methods could serve as a complementary protein based diagnostic tool and further steps should be focused on sample preparation protocols and on improvements in sample throughput.


INTRODUCTION
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent of coronavirus disease 2019 (COVID-19), which is a severe respiratory disease 1 . The virus is contagious in humans and the World Health Organization (WHO) has designated the ongoing pandemic of COVID-19 a Public Health Emergency of International Concern 2 . As of now, over 176,000 deaths have been reported worldwide and this is probably an underestimation because of lack of testing capacity in large parts of the world.
Tests provide rapid and unambiguous evidence of viral infection and further treatment is usually based on the outcome of such tests. Importantly, the reduction of time required to identify SARS-CoV-2 infections will significantly contribute to limiting the enormous social and economic consequences of this large global society paralyzing outbreak. Thus, the rapid, sensitive and specific diagnosis of SARS-CoV-2 is widely recognized to be critical in responding to this outbreak, but also for long-term improvements in patient care. Conventional methods for diagnostic testing of viral infections, which are also widely used for SARS-CoV-2 testing, are based on polymerase chain reaction (PCR) or other (multiplexed) nucleic-acid based technologies. Since its emergence late 2019 it has become clear that additional diagnostic tools that target SARS-CoV-2 should be developed to complement existing tools in a "proactive approach" proposed by the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses) 1 . Alternative and/or complementary SARS-CoV-2-specific diagnostic tests are desperately needed since the current testing capacity is insufficient, amongst others because of shortages of supplies such as RNA extraction kits, PCR reagents and delivery issues for primers and probes, etc. In addition, additional testing assays are helpful in research to better understand the biological activity and potential of the virus.
Besides PCR based approaches, immunoassays have been employed in the detection of other viruses. In addition, mass spectrometry (MS) based techniques have been applied, for instance to detect influenza virus proteins 4 and human metapneumovirus (HMPV) in clinical samples 5 . Recent developments in and targeted proteomics methods and Orbitrap mass spectrometry such as parallel reaction monitoring (PRM) have shown a substantial sensitivity increase for over the past years. Although mass spectrometry based approaches have been used in a few SARS-CoV-2 studies recently (David E. Gordon  Here, we exploit the potential of mass spectrometry based proteomics technology to solve the issue of insufficient SARS-CoV-2 diagnostic testing capacity. For this, we first assess the limit of detection by PRM on an Orbitrap Eclipse for specific tryptic peptides of SARS-CoV-2 proteins. The sensitivity was found to be in the mid-attomole range (~0.9 pg) for Nucleocapsid (NCAP), which is the most abundant viral protein and therefore the most likely target candidate for tests. A rough calculation indicates that this level of sensitivity should be sufficient to detect protein amounts theoretically corresponding to approximately 10,000 SARS-CoV-2 particles. Obviously, whether or not this sensitivity is sufficient to play a role in SARS-CoV-2 detection in patient material such as swabs, mucus and other body fluids, largely depends on the amount of viral proteins present in such samples. If yes, subsequent steps should be focused on sample preparation protocols that are in agreement with validated virus inactivation procedures and on improvements in sample throughput.
Finally, providing novel mass spectrometry based diagnostic tools that complement genomic approaches is also the major goal of the recently formed COVID-19 mass spectrometry coalition (www.covid19-msc.org). The aim of this 'proof-of-concept' study is to highlight the potential of mass spectrometry in identifying SARS-CoV-2 proteins for diagnostics and research.

Virus sample
Vero E6 cells were maintained in Dulbecco's modified Eagle's medium (DMEM, Gibco) supplemented with 10 % fetal calf serum (FCS), HEPES, sodium bicarbonate, penicillin (10,000 IU/mL) and streptomycin (10,000 IU/mL) at 37 °C in a humidified CO 2 incubator. SARS-CoV-2 (isolate BetaCoV/Munich/BavPat1/2020; kindly provided by Dr. C. Drosten) was propagated to passage 3 on Vero E6 (ATCC® CRL 1586™) cells in Opti-MEM I (1X) + GlutaMAX (Gibco), supplemented with penicillin (10,000 IU/mL) and streptomycin (10,000 IU/mL) at 37 °C in a humidified CO 2 incubator. Stocks were produced by infecting Vero E6 cells at a multiplicity of infection (MOI) of 0.01 and incubating the cells for 72 hours at 37 °C in a humidified CO 2 incubator. The culture supernatant was cleared by centrifugation and frozen in aliquots at −80 °C. Stock titers were determined by preparing 10-fold serial dilutions in Opti-MEM I (1X) + GlutaMAX. Aliquots of each dilution were added to monolayers of Vero E6 cells in the same medium in a 96-well plate. Twenty-four replicates were performed per virus stock. Plates were incubated at 37 °C for 5 days and then screened for cytopathic effect. The TCID50 was calculated according to the method of Spearman & Kärber. All work with infectious SARS-CoV and SARS-CoV-2 was performed in a Class II Biosafety Cabinet under BSL-3 conditions at Erasmus University Medical Center.

Sample preparation for MS
A 90 % confluent T75 flask of VeroE6 was infected at a MOI of 0.3 and incubated for 24 hours at 37 °C in a humidified CO 2 incubator. Next, cells were collected by scraping and the medium was removed after centrifuging at 400 g for 5 min. Cells were lysed in 2X Laemmli buffer (final concentration; Bio-Rad) and boiled at 95 °C for 20 min to inactivate the virus. Proteins were reduced and alkylated with DTT (Sigma) and IAA (Sigma) and precipitated using chloroform/methanol 10 . The protein pellet was then dissolved in 100 µl of a 50 mM Tris/HCl buffer (pH 8.0) with 2 M urea. Proteins were quantified using the BCA protein kit (ThermoFisher Scientific / Pierce, #23225); peptides were quantified with a quantitative colorimetric peptide assay (ThermoFisher Scientific / Pierce, #23275). Fifty µg of protein was digested with 1 µg trypsin (Thermo) overnight at room temperature. The peptide digest was cleaned on a 50 mg tC18 Sep-Pak cartridge (Waters) and the peptides were eluted with 2 ml acetonitrile/water (1:1) with 0.05 % TFA. For PRM measurements, peptide samples with concentrations ranging from 0 to 25 ng/µl were prepared. For global proteomics, peptides were fractionated off-line using high pH reversed-phase (ThermoFisher / Pierce, #84868) into four fractions.

LC-MS
Peptide mixtures were trapped on a 2 cm x 100 μm Pepmap C18 column (ThermoFisher Scientific, #164564) and separated on an in-house packed 50 cm x 75 μm capillary column with 1.9 μm Reprosil-Pur C18 beads (Dr. Maisch) at a flow rate of 250 nL/min on an EASY-nLC 1200 (ThermoFisher Scientific), using a linear gradient of 0-32% acetonitrile (in 0.1 % formic acid) during 60 or 90 min. The eluate was directly sprayed into the mass spectrometer by means of electrospray ionization (ESI).
For targeted proteomics, a parallel reaction monitoring regime (PRM) was used to select for a set of previously selected peptides on an Orbitrap Eclipse Tribrid mass spectrometer (ThermoFisher Scientific) operating in positive mode and running Tune version 3.3. Precursors were selected in the quadrupole with an isolation width of 0.7 m/z and fragmented with HCD using 30 % collision energy (CE). See Supplementary Table 2 for the isolation list. For global proteomics, data were recorded on an Orbitrap Fusion Lumos Tribrid mass spectrometer (ThermoFisher Scientific) in data dependent acquisition (DDA) mode. All MS1 and MS2 spectra were recorded in the orbitrap at 30,000 resolution in profile mode and with standard AGC target settings. The injection time mode was set to dynamic with a minimum of 9 points across the peak. The sequence of sampling was blanks first and then in order of increasing peptide input amounts to avoid any contamination of previous samples.

Data analysis
Mass spectrometry data were analyzed using Mascot v 2.6.2 within the Proteome Discoverer v 2.3 (PD, ThermoFisher Scientific) framework or with MaxQuant v 1.6.10.43 (www.maxquant.org), all with standard settings (note: fragment tolerance set to 20 ppm). PRM data were analyzed with Skyline (skyline.ms). Spectra and chromatograms were visualized in PD 2.3, Skyline or the PDV proteomics viewer (pdv.zhang-lab.org). For global proteome analyses the UniprotKB SARS2 database (https://covid-19.uniprot.org/; 14 entries) was concatenated with the UniprotKB database, taxonomy Chlorocebus (African green monkey). The total number of protein sequence entries that were searched was 20,751.

RESULTS
We set off by analyzing the global proteome of Vero cells infected with SARS-CoV-2 using standard bottom-up proteomics. Upon off-line high pH reversed-phase (RP) fractionation peptide LC-MS was performed on an Orbitrap Lumos and RAW files were combined during data analysis. SARS-CoV-2 proteins were measured with high sequence coverage as exemplified in Figures 1-3. Based on a label free semi-quantitative (LFQ) analysis of MaxQuant output data, we estimate that 4-5% of the total proteome of this sample (composed of Vero cells, viral proteins inside cells and viral particles outside of cells in the supernatant) is made up by viral proteins. Of all SARS-CoV-2 proteins covered Nucleocapsid (NCAP) is the most abundant one, making up > 88 % of all signal intensity as calculated from MaxQuant intensity values. Therefore, if intensity values can be used as a proxy for total protein mass, almost 90 % of the SARS-CoV-2 proteome would consist of NCAP. Moreover, the high number of identified Chlorocebus proteins (>6,000; see Supplementary Table 1) suggests that it is possible to not only study SARS-CoV-2 proteins, but to also investigate the effects of viral infection on the host cell proteome in great detail.
Based on the extensive sequence coverage for NCAP and several other SARS-CoV-2 proteins we established a list of peptide targets that can be used for PRM targeting. These molecular finger prints are used to program the mass spectrometer software in such a way that it acts as a filter to let only those specific SARS-CoV-2 proteolytic fragments pass. This way, a specific set of target peptides/proteins can be searched for in basically any sample from which proteins can be isolated (e.g., in vitro cell cultures, patient derived samples).
For data visualization in this paper, three highly mass spectrometric responsive tryptic peptides were selected from the global proteome data set as targets for PRM, i.e. GFYAEGSR (NACP_SARS2), ADETQALPQR (NACP_SARS2) and EITVATSR (VME1_SARS2). Importantly, there are potentially a few dozens of specific SARS-CoV-2 peptides that could be used for targeting, although some of these may show slightly lower mass spectrometric responsiveness.
Our test sample (i.e., Vero cells infected with SARS-CoV-2, referred to as 'cells+virus') contained 2.0 mg/ml protein based on a BCA assay. The results of the colorimetric peptide quantification after digestion were in agreement with this concentration. A dilution series was prepared from this sample and the injected total peptide quantities ranged from 50 ng down to 20 pg. These extensively diluted samples were then subjected to PRM on an Orbitrap Eclipse. Figures 4 and 5 show the results of this PRM assay. The six most intense (Top6) fragment ion peaks are shown in different colors as overlapping (in terms of retention time) peaks. The chromatogram excerpts are shown from top to bottom and left to right for decreasing total protein input concentrations. The lower right chromatogram in each panel shows the Top6 fragment ions in the sample corresponding to 20 pg total protein input, which could thus be regarded as the limit of detection (LOD). It should be noted that all PRM assays are performed on peptide targets that are present in a complex matrix, i.e. a Vero cell lysate.
Based on a quick and provisional calculation we can make an estimation of the number of virus particles that could in principle be detected based on this LOD. One assumption is that the general composition of a virion is roughly 10 % RNA, 70 % proteins and 20 % lipids (weight percentages) and the genomic RNA is ~30,000 nt 3 . This would then correspond to roughly 9.0E6 Da RNA and 6.3E7 Da total protein per virion. Based on the MaxQuant LFQ intensity data we assume that ~90 % of the total SARS-CoV-2 proteome mass is composed of NCAP. This would be 5.7E7 Da, which for a 46 kDa MW proteins corresponds to 1,240 molecules. This number of Nucleocapsid copies per virion is in the same ballpark as copy numbers of 2,600 that have been reported for Nucleocapsid of other RNA viruses (e.g., 11,12 ). Furthermore, the sample in which specific SARS-CoV-2 target peptides can still be detected (20 pg sample) is estimated to contain 5 % viral proteins and this then represents 0.9 pg NCAP (20 amol, or 12E6 molecules) and would correspond to ~10,000 virus particles, under the assumption that all NCAP present is assembled into virus particles.
The above numbers are based on several assumptions. For the accurate quantitation of viral proteins, known quantities of heavy isotope labeled peptide analogs (AQUA) should be used as spike-in in LC-MS analyses.

CONLUSIONS & PERSPECTIVES
We have shown that tryptic peptides of SARS-CoV-2 proteins (NCAP, VME1) can be detected down to the mid-attomole range by targeted orbitrap mass spectrometry. Our calculations indicate that the level of sensitivity should be sufficient to detect protein amounts theoretically corresponding to approximately 10,000 SARS-CoV-2 particles. Whether this sensitivity is sufficient for this technology to play a role in diagnostics of COVID-19 largely depends on the amount of virus material that is present in patient material such as nasopharyngeal swabs, mucus, gargle solution, remains to be seen and is subject of further research. It is likely that the majority of viral proteins in swab samples consists of (unpackaged) proteins present within cells. In addition, it is currently unknown what proportion of SARS-CoV-2 particles in clinical specimens is infectious. Data from RT-qPCR determined genome copy numbers on virus culture supernatant suggest that infectious titers are around 2 logs lower than genome copy numbers for coronaviruses (data not shown).
PRM sensitivity in terms of numbers of detected virus particles is -as expected -not as high as that of RT-qPCR, which reaches a 95 % hit rate at about 100 copies of RNA genome equivalent per reaction 13 . A major difference compared to conventional methods of viral diagnostics is that proteins are analyzed as opposed to RNA in case of (RT-q)PCR. Therefore, protein based methods could present an orthogonal and complementary way of diagnosing SARS-CoV-2 infection and disease.
The level of sensitivity is most likely sufficient for many applications in SARS-CoV-2 research. Moreover, the excellent label free quantitation capacity over a wide concentration range outperforms immunoassays and makes this method particularly useful for the study of infection courses over time. By using heavy isotope labeled peptide analogs (AQUA), PRM analysis can be improved both in terms of sensitivity and specificity 14 . Using AQUA peptides as spike-in, it would even be possible to absolutely quantitate viral proteins. This allows one to accurately monitor abundances of SARS-CoV-2 proteins in e.g. time series, which could be useful to study the course of infection and for solving questions on viral load.
The level of sensitivity established with the here described targeted proteomics methodology opens up ways to explore the use of mass spectrometry as a diagnostic tool to detect viral infection in patient material. If this sensitivity turns out to be sufficient for the detection of SARS-CoV-2 proteins in e.g. nasopharyngeal swabs, mucus or blood, subsequent steps should be to the optimization of faster sample preparation procedures and improvements in analysis throughput.

DATA AVAILABILITY
All raw mass spectrometry data were uploaded to the PRIDE repository (www.ebi.ac.uk/pride/) under accession number PXD018760.     Eclipse. The summed AUC values for the Top6 fragment ions of each peptide were taken for relative quantitation. 'Input' is total protein input from the 'cells+virus' sample; inserts are zoom-ins of the input range 0 -300 pg.