Transcriptomic similarities and differences in host response between SARS-CoV-2 and other viral infections

Summary The pandemic 2019 novel coronavirus disease (COVID-19) shares certain clinical characteristics with other acute viral infections. We studied the whole-blood transcriptomic host response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) using RNAseq from 24 healthy controls and 62 prospectively enrolled patients with COVID-19. We then compared these data to non-COVID-19 viral infections, curated from 23 independent studies profiling 1,855 blood samples covering six viruses (influenza, respiratory syncytial virus (RSV), human rhinovirus (HRV), severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1), Ebola, dengue). We show gene expression changes in COVID-19 versus non-COVID-19 viral infections are highly correlated (r = 0.74, p < 0.001). However, we also found 416 genes specific to COVID-19. Inspection of top genes revealed dynamic immune evasion and counter host responses specific to COVID-19. Statistical deconvolution of cell proportions maps many cell type proportions concordantly shifting. Discordantly increased in COVID-19 were CD56bright natural killer cells and M2 macrophages. The concordant and discordant responses mapped out here provide a window to explore the pathophysiology of the host response to SARS-CoV-2.


Introduction
A novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has developed into a global pandemic, resulting in more than 47.9 million cases and 1,221871 deaths across 235 countries as we write (WHO, accessed 5 Nov 2020) (Zhou et al., 2020). Contextually, this pandemic has surpassed the severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) 2003 pandemic by almost 6000-fold in total cases whereby SARS-CoV-1 resulted in 8,098 cases, took 12 months to contain, and had a 9.6% mortality rate (World Health Organization (WHO) accessed 1 Jun 2020). The novel SARS-CoV-2 virus, the causative agent for 2019 novel coronavirus disease , is highly communicable and despite urgent and resourceintensive efforts globally, we have no proven vaccine or efficacious treatment available (Callaway, 2020).
Early in a pandemic, it is imperative to understand what is similar in the host response to the novel virus when compared to other known viruses in order to rapidly rule in or rule out recyclable treatments and/ or vaccination strategies. At the same time, it is also critical to understand the differences in this disease in order to search for novel therapeutics. The human immune system has evolved over millions of years to protect the host from microbes (Medzhitov, 2007;Longo et al., 2015). Understanding the overlap, or lack thereof for the most basic immunological features such as the virus's ability to inhibit the interferon response or to infect host cells with an antibody-dependent infection enhancement, can drive medicine rapidly in a life-saving direction (Jaume et al., 2012;Wang et al., 2016;Mesev et al., 2019;Blanco-Melo et al., 2020). In the last decade alone, we have already responded to pandemics of H1N1, chikungunya, Zika, and near-pandemics of two other coronaviruses, SARS-CoV-1 and Middle East respiratory syndrome-related coronavirus (MERS), from which valuable insights can be applied (Morens and Fauci, Our approach involves a multi-cohort analysis of transcriptomic host response data to investigate host inflammation. The core discovery method leverages biological, clinical, and technical heterogeneity across data sets to identify generalizable disease biomarkers. We have repeatedly demonstrated that host response can be a generalizable sensitive and specific diagnostic and prognostic marker for presence, type, and severity of infections (Sweeney et al., 2015(Sweeney et al., , 2016b(Sweeney et al., , 2018a, of note viral infections (Andres-Terre et al., 2015) but also in autoimmune diseases, vaccination, tuberculosis, cancer, and organ transplant Khatri et al., 2013;Chen et al., 2014;Andres-Terre et al., 2015;Sweeney et al., 2015Sweeney et al., , 2016aSweeney et al., , 2016bSweeney et al., , 2018aSweeney et al., , 2018bSweeney and Khatri, 2015;Warsinske et al., 2018aWarsinske et al., , 2018bHaynes et al., 2020;Mayhew et al., 2020). We have shown in methodological work that this method produces results with the greatest reproducibility in independent cohorts .
In this work, we used RNAseq to profile whole blood samples from 62 patients with COVID-19 prospectively enrolled in Athens, Greece, together with 24 HCs. We simultaneously compiled a database of clinical viral infections from 23 studies of >1,800 samples to represent the conserved immune response to a broad range of viral infections including influenza, respiratory syncytial virus (RSV), human rhinovirus (HRV), SARS-CoV-1, Ebola, and dengue. We here report on the results of a comparison of host responses to SARS-CoV-2 and other viruses. We mapped out their similarities and differences at the gene level, pathway level, and cell proportion level, as a first step to gain a better understanding of this novel pandemic virus and demonstrate that a large portion of the response is in fact similar to previous viral infections. This is immensely valuable as it demonstrates that it is this conserved host response that allows for pandemic preparedness and response. Our implementation of computational methods comparing SARS-CoV-2 to known circulating viruses yields a COVID-19-specific gene signature for differentiating the host response, which warrants further investigation.

Differential expression analysis of transcriptome profiles of patients with COVID-19
We prospectively enrolled and sequenced RNAseq from whole blood from 62 patients with COVID-19 and 24 HCs (Table 1). Differential expression analysis of 86 peripheral blood samples identified 2,002 differentially expressed genes (771 over-expressed, 1,231 under-expressed; Figure 1A, Table S2A) with absolute Hedges' g effect size (ES) which is the difference between groups as a proportion of variability in the groups (Hedges' g ES) R 1 and false discovery rate (FDR) %0.05%), referred to as the ''COVID-19 signature''. We performed pathway enrichment analysis of the COVID-19 signature using Gene Ontology (GO) terms. The 30 most significant pathways for 771 over-expressed genes included neutrophil activation, innate immune response, immune response to viral infection, type-I interferon signaling, and cytokine production ( Figure 1B) and for 1,231 under-expressed genes include lymphocyte differentiation and T-cell activation and regulation (Figure 1C). These results suggest that, in response to SARS-CoV-2 infection, T cells are suppressed, whereas neutrophils are activated as a hallmark of its overwhelming host response represented in the transcriptomic changes. High neutrophil-to-lymphocyte ratios have been observed as a marker of severity in sepsis, cancer, and pneumonia (Diao et al., 2020;Lagunas-Rangel, 2020;Liu et al., 2020;Qin et al., 2020).
identified 23 studies of acute viral infection and from these selected 14 as our discovery set for a non-COVID-19 viral signature (Table 2) and 9 were held out for validation. Statistical power analysis (Hedges and Pigott, 2001) found that even with high inter-study heterogeneity, we had more than 80% statistical power at p value = 0.01 for detecting absolute Hedges' g ES > 0.43 in these data sets ( Figure S2). The multi-cohort analysis of 1,324 transcriptome profiles (652 patients with non-COVID-19 viral infections, 672 HCs) from these 14 studies using MetaIntegrator  identified 635 differentially expressed genes (314 over-expressed, 321 under-expressed). The area under the curve (AUC) of a receiver operator characteristics (ROC) curve represents the discriminatory ability of the score to correctly identify true positive and/or true negatives. The closer to 1 the value is, the better the performance of the test, for example, a test that can discriminate if a patient or sample is virally infected or healthy. ROC plots for all of the discovery data sets using this signature illustrate the high sensitivity and specificity this gene list possesses, indicating genes that are highly discriminatory and hence likely to represent this conserved signature ( Figure 2A, Table S2A). We refer to these 635 genes in short as the ''non-COVID-19 viral signature''. Similar to the COVID-19 signature, GO analysis of over-and under-expressed genes in the non-COVID-19 viral signature identified a similar set of pathways highlighted by neutrophil and T-cell activation, respectively ( Figures 2B and 2C).

Validation of host response genes to viral infections in multiple independent data sets
Next, we confirmed that the non-COVID-19 viral signature is conserved across viruses by validating it in several independent data sets. We calculate the non-COVID-19 viral score for a sample as the difference in geometric means of over-expressed and under-expressed genes. In four independent studies consisting of 236 samples (178 viral infections, 58 HCs; Table 3), the score accurately distinguished patients with a respiratory viral infection (influenza, HRV, or RSV) from HCs ( Figure 3A). Second, we investigated whether the non-COVID-19 viral signature is observed in other severe viral infections including Ebola, dengue, and SARS-CoV-1 in five independent studies (50 HCs, 54 SARS-CoV-1, 37 Ebola, 154 dengue). In each study, the non-COVID-19 viral score also distinguished patients with a viral infection from HCs with high accuracy ( Figure 3B). Third, we tested whether the non-COVID-19 viral signature would also distinguish patients with COVID-19 from HCs. We calculated the non-COVID-19 viral score for each of 62 patients with COVID-19 together with 24 HCs using the conormalized expression data. We found that non-COVID-19 viral score separated patients with COVID-19 from HCs with an AUC of 0.96 ( Figure 3C), similar to SARS-CoV-1 (AUC = 0.98).

Comparison of COVID-19 profile with non-COVID-19 viral infection profile
Next, we investigated similarities and differences in host response to SARS-CoV-2 and other respiratory viruses by comparing change in expression with respect to HCs across 9,818 genes that were present across all data sets. When considering the entire transcriptome, there was high correlation (r = 0.74, p < 0.001) between change in expression in response to SARS-CoV-2 or other respiratory viruses (Hedges' ES from COVID-19 vs HC comparison is plotted against ES from non-COVID-19 vs HC comparison in Figure 4A). We visualized ''2,002 COVID-19 signature genes'' and ''635 non-COVID-19 signature genes'' in the same ES scatterplot by different colors to highlight their relationships ( Figure 4A and Table S2A). We observe that 7,626 genes uncolored in the middle (gray, with higher density in the center shown by contours) out of 9,818 profiled (77.7%) are not in the signature genes in either COVID-19 or non-COVID-19 viral infections. Given the high correlation (r = 0.74), it is not surprising that 223 genes are concordantly over-expressed (Hedges' g ES R 1, FDR %0.05%), as well as 220 genes concordantly under-expressed with (Hedges' g ES % À1, FDR %0.05%). Of the remaining genes from the ''non-COVID-19 signature'', there are 90 genes over-expressed and 100 genes under-expressed in non-COVID-19; however, these had ES between À1 and 1 in the distribution of the COVID-19 ESs. As well, of the remaining genes from the ''COVID-19 signature'', there are 547 genes over-expressed and 1,010 genes under-expressed in COVID-19 that had ES between À1 and 1 in the distribution of the non-COVID-19 ESs. We only found two genes that were completely discordant, thus completely oppositely regulated in COVID-19 and non-COVID-19 viral infections: Aconitase1 (ACO1) is over-expressed in COVID-19 and under-expressed in non-COVID-19 viral infections and Atlastin GTPase 3 (ATL3) is over-expressed in non-COVID-19 viral infections and under-expressed in COVID-19. Interestingly, ACO1 is involved in iron metabolism, and heme appears to be interlinked with COVID-19 pathophysiology (Hopp et al., 2020). ATL3 is required for endoplasmic reticulum (ER) membrane junctions and may be linked to viral replication sites (Monel et al., 2019).
Therefore, in order to identify a statistically significant set of genes differentially expressed in patients with COVID-19 compared to those with other viral infections, we employed COCONUT to conormalize the two disease types into a single matrix for comparison of 62 patients with COVID-19 versus 652 patients with non-COVID-19 viral infection. Conormalization with COCONUT allows for pooling of data across data sets while simultaneously removing batch-to-batch technical variance in a bias-free manner (Sweeney et al., 2016b). At Hedges' g | ES| R 1 with FDR %0.05%, we found 416 genes we refer to as the ''COVID-19-specific gene signature'', 114 over-expressed and 302 under-expressed in patients with COVID-19 than in those with non-COVID-19 viral infection (Figures 4B, Tables S2A and S2B). To illustrate the gain in identification of genes to investigate and re-iterate the value in this statistical method, this set of genes from (b) is highlighted in the same scatterplot from panel a ( Figure 4C).  iScience Article Unlike the ''COVID-19 and non-COVID-19 viral signatures'', the pathway analysis of this gene set did not identify any statistically significant GO terms, potentially indicating novel pathophysiology unique to COVID-19. This combination of genes may include those less well annotated within pathways and thus less likely to result in statistically significance assignment to a pathway. Nonetheless, top ranked but statistically insignificant GO terms include muscle contraction, regulation of epithelial cell proliferation, and biological processes involved in lung and respiratory development for 114 positive genes, as well as pathways related to T-cell homeostasis and T-cell differentiation for 302 negative genes. The significance of these pathways in connection with clinical manifestation needs to be investigated further.

Similarities and differences in pathways between COVID-19 and non-COVID-19 viral infection
We expanded our comparison of significant pathways in response to SARS-CoV-2 versus non-COVID-19 viruses by including all pathways instead of only 30 most significant pathways. We found pathways for over-expressed genes are highly concordant between patients with COVID-19 and non-COVID-19 viral infections ( Figure 5A), pathways for under-expressed genes are discordant ( Figure 5B).  , as well as gene lists from the 9 groups by quadrant in Figure 4A ( Figure 5C, Table S2A). To check the dependency of GO term enrichment results on the cutoffs for selecting signature genes, we tested three additional cutoffs (less or more stringent than the chosen one) each for COVID-19 vs HC, non-COVID-19 vs HC, or COVID-19 vs non-COVID-19 comparison. The results for over-expressed, under-expressed, and all genes from each cutoff together with the 9 gene sets from Figure 4A show a merging and comprehensive picture of pathway analysis results (Table S3, Figure S3), allowing one to focus on pathways of interest, either commonly significant across gene sets or uniquely significant in a gene set or a combination of genes of interest.

Similarities and differences in changes in immune cell proportions between COVID-19 and non-COVID-19 viral infection
We estimated proportions of 25 immune cell types in bulk gene expression in blood samples from patients with COVID-19 or non-COVID-19 viral infections using immunoStates. In patients with COVID-19, we found immune cells from myeloid lineage (M1 macrophages, neutrophils, and MAST cells) increased significantly (FDR %10%) and lymphoid cells (CD4+ and CD8+ alpha-beta T cells, B cells) decreased significantly (FDR %10%) during viral infection ( Figure 6A, Table S4). These results are in line with recent reports demonstrating increased neutrophil and decreased T-cell counts in patients with COVID-19 (Diao et al., 2020;Liu et al., 2020;Qin et al., 2020). In patients with non-COVID-19 viral infections, we observed significant increase in proportion for myeloid cells (M1 macrophages, CD14 + monocytes, MAST cells) and significant decrease in proportion for lymphoid cells (CD4+ and CD8+ T cells, gamma-delta T cells, B cells) ( Figures  6B and S4). Indeed, when considering changes within each data set, M1 macrophages, plasmacytoid dendritic cells, CD14 + monocytes, CD4+ T cells, and total T cells showed change consistently in the same direction across all viral infections including COVID-19 ( Figure 6B).
We observed an overall correlation of 0.493 (p = 0.017) for change in cellular proportions in patients with COVID-19 compared to non-COVID-19 viral infections (Figures 6C, Table S4), where all but 6 cell types changed in the same direction, though not all changes were statistically significant. We again observed increased neutrophil and decreased T-cell counts in COVID-19 which is in line with a recent study that compared COVID-19 to the 2009 H1N1 20 . Cell types that increased in COVID-19 relative to non-COVID-19 were CD56 bright natural killer (NK) cells, M2 macrophages, and total NK cells. Those that decreased in non-COVID-19 relative to COVID-19 were CD56 dim NK cells, memory B cells, and eosinophils. Although change in memory B cells was not statistically significant, the direction of change is expected as patients

Discussion
Understanding the pathophysiology of COVID-19 is critical to finding new treatments. Defining the portion of the host response to a novel pandemic virus that is similar to current circulating viral infections is imperative as treatment options are unknown and vaccines non-existent in the early months and thus repurposing drugs that have passed the United States Food and Drug Administration (FDA) safety trials can potentially be informed here. Simultaneously, identifying the biology of the host response that is not similar to circulating viruses may help rank the order with which drugs are repurposed if they do not bolster areas of the immune system succumbing to a virus for which we have no direct immune memory or offer novel targets for new drugs. Here, we take a host response transcriptomics approach using peripheral blood transcriptomics of the immune response to COVID-19 (n=62) compared to 652 non-COVID-19 viral infections spanning 6 viruses. While the vast majority of the host immune response appears to be similar between COVID-19 and other viruses, valuable information under pandemic circumstances, our study highlights some key differences.
The scatterplot of the correlation of the differential expression (relative to HCs) of non-COVID-19 viral infections versus COVID-19 infections illustrates this large proportion of concordance and seemingly small amount of discordance ( Figure 4). We found only two genes, ACO1 and ATL3, that were expressed in opposite directions using this method. ACO1 was over-expressed in COVID-19 versus HC and under-expressed in non-COVID-19 viral infections versus HC, whereas ATL3 entirely oppositely regulated ( Figure 4). Viral replication can occur in infected cells due to a hinderance of the function of the immune cells drawn in to kill infected cells; as well, there are reports of SARS-CoV-1 and SARS-CoV-2 directly infecting immune cells themselves (Gu et al., 2005;Hu et al., 2012;Pontelli et al., 2020). As our data are from whole blood RNA, we cannot conclude precisely which of these mechanisms are responsible for the shifts in these genes' expression; however, prior reports suggest that both genes may be involved in viral replication iScience Article and immune evasion. ACO1 is an iron-sulfur protein that regulates ferritin and transferrin. When cellular iron levels are low, the protein binds to iron-responsive elements, which represses translation of ferritin (a protein that stores iron), and simultaneously stabilizes the normally rapidly degraded transferrin receptor mRNA allowing for translation of the receptor and more cellular uptake of iron, which is required for proliferation (Koeller et al., 1989). High levels of ferritin are also indicative of macrophage activation syndrome and have been observed in patients with COVID-19 (Ravelli, 2002;Bataille et al., 2020;Dimopoulos et al., 2020;Giamarellos-Bourboulis et al., 2020). ATL3 is a member of the integral membrane GTPases. Proper formation of ER tubules is affected by mutations in this gene. Viruses are known to target host organelles to enter a host cell and avoid destruction (Inoue and Tsai, 2013). Lack of ATL3 results in delayed cargo exit and coat assembly for budding from the ER which is necessary for export of cytokines and chemokines in response to infection; ATL3 has been linked directly to viral replication in Zika (Monel et al., 2019), although Zika was not studied here.
The power of using COCONUT to combine heterogeneous data sets allowed for a pooled, head-to-head comparison of COVID-19 with non-COVID-19 viral infections, resulting in a 416 gene ''COVID-19-specific gene signature'' (Table S2B). Interestingly, the differentially expressed genes in this analysis were not enriched for any GO terms. However, there is bias in the annotation of gene ontologies to those that are heavily annotated and studied, often referred to as the ''streetlight effect'', so absence of evidence does not denote evidence of absence of coordinated differential response (Haynes et al., 2018a(Haynes et al., , 2018bTomczak et al., 2018). Conversely, this novel combination of genes with these particular effect sizes warrants further investigation as a potential route for novel discoveries (Damelin et al., 2017;Haynes et al., 2018b). Simply reviewing what is known of the immunological function of the top two over-and top two under-expressed genes ranked by Hedges' g ES contextualizes ACO1 and ATL3 further with hints of a battle of host versus ''novel'' pathogen, never encountered by the immune system before. The impact on the function of host immune cells during SARS-CoV-1 and MERS infection is driven by their non-structural proteins and affects the normal production of cytokines compared to that of currently circulating viral infections, such as the repression of interferon proteins/ interferons (IFNs) (Hu et al., 2012;Shah et al., 2020). Recently, Blanco-Melo et al. revealed a dysregulated host response indicative of reduced innate antiviral defenses coupled with excessive cytokine production using cell lines, ferrets, and correlating with two deceased patients with COVID-19 (Blanco-Melo et al., 2020), a phenomenon of novel virus escape mechanisms from host defenses, of which we complement here with even larger numbers of entirely human data.
The most under-expressed gene in the ''COVID-19-specific gene signature'' is ZC3H13. Knocking this gene down was associated with less RNA methylation N6-methyladenosine (m 6 A), an epigenetic modification commonly found in the viral RNA genomes of hepatitis C virus (HCV), Zika, dengue, yellow fever, and West Nile virus (Wen et al., 2018). Depletion of m 6 A methyltransferases increase HVC viral particle iScience Article production (Gokhale et al., 2016), which would imply more SARS-CoV-2 viral replication. ATL3 as mentioned is also included in the ''COVID-19-specific gene signature'' and is under-expressed in COVID-19. When ATL3 was knocked down, there was less Zika replication, implying that the under-expression is a host counteractive protective mechanism. The second most under-expressed gene is AMIGO1, a gene for which very little is known; however, recent studies on this family of genes (Kuja-Panula et al., 2003) suggest a cell adhesion function. Cell adhesion molecules are a key component of combatting pathogen infections, without which the host may not mount an appropriate response (Etzioni, 1996). Since the ''COVID-19-specific gene signature'' is derived from direct comparison of COVID-19 versus non-COVID-19 infections, ZC3H13, AMIGO1, and ATL3 under-expressed in COVID-19 equates to higher expression in non-COVID-19 infections. One possible interpretation of this under-expression of ZC3H13 and AMIGO1 in COVID-19 that could be investigated in future studies is that this novel virus may be inhibiting their expression to escape the host responses that are otherwise functional for previously circulating viral infections.
If indeed the under-expression of ATL3 in the ''COVID-19-specific gene signature'' illustrates the tipping scales between the microbe and host and similar to Zika infections, less of this gene expression results in less viral replication; this would imply a protective mechanism rather that host immune evasion. In fact, coronaviruses bud into the ER-Golgi intermediate compartment and in MERS, the C-terminal domain of the M protein was found to contain a trans-Golgi localization signal (Perrier et al., 2019); thus, the role of ATL3 as a way to control viral protein budding presents an exciting avenue for future work. Further to which, the top two over-expressed genes of the ''COVID-19-specific gene signature'' are coiled-coil and C2 domain containing 2A (CC2D2A) and human homeostatic iron regulator or high FE2+ (HFE). CC2D2A plays a critical role in cilia formation (Veleri et al., 2014). Primary cilia microtubule-based sensory organelles that detect mechanical and chemical stimuli are found in almost all cells in the body (Garcia-Gonzalo and Reiter, 2012). Following T-cell receptor signaling, the ciliary trafficking machinery is used to provide spatial control of immune synapses at the interface with the antigen-presenting cell for signaling (Stephen et al., 2018). HFE is a non-classical major histocompatibility (

OPEN ACCESS
iScience 24, 101947, January 22, 2021 9 iScience Article molecules at the cell surface (Hollerer et al., 2017). While largely responsible for presenting ''endogenous'' peptides, during viral infection, this class of HLA is responsible for loading of viral peptides at the ER and trafficking those to the cell surface (Hollerer et al., 2017). HFE is essential in this function as these peptides are presented to T cells or NK cells. The two genes, therefore, are both involved in an effective immune signaling between virally infected cells and the host. HFE is pleotropic in function, and it binds with the transferrin receptor thus reducing affinity for iron loaded transferrin, resulting in less cytoplasmic iron (Taneri et al., 2020). ACO1 is bifunctional as well, a key modulator of mitochondrial iron metabolism, and it is also an essential enzyme in the Krebs cycle (Wood, 2006). Iron metabolism and ATP production are essential for the function of the cell and the proliferation of immune cells. Here, we observe over-expression of CC2D2A, HFE, and ACO1 in COVID-19 infections and lower expression in non-COVID-19 previously circulating infections. We interpret this COVID-19 over-expression of genes not intensely involved in non-COVID-19 infections as avenues for future exploration as possible counteractive measures for the novel immune evasion eluded to by the under-expression of ZC3H13 and AMIGO1 described above. iScience Article All of these genes and their functions need to be molecularly investigated to determine their true role; here, we use them as an illustration of both novel immune evasion and immune defense systems. These measures and counter measures will likely be somewhat different for each patient as they progress through the disease. We see in this cohort gene expression indicative of a beneficial host response whereby HLA class I molecules present viral peptides to the host response for identification and destruction via overexpression of HFE and CC2D2A, carefully managed iron metabolism and energy production via HFE and ACO1. However, how much over-expression is needed in order to overcome the SARS-CoV-2 virus is not known, and not surprisingly, there are trials underway for the use of pegylated interferon alpha in patients with COVID-19 (2020). This drug is FDA approved for treatment of viral infections such as HCV (Tan et al., 2004;Nile et al., 2020) and showed promise in combination with ribavirin in patients with MERS (Omrani et al., 2014), as one of its mechanisms of action increases MHC class I function (Nile et al., 2020).
Within this signature, we also find genes commonly studied in cancer (e.g. TP53, AKT, VEGF, and CYCS). Interestingly, primary cilia house a number of oncogenic molecules including smoothened, KRAS,

Estimated cell proportions (%)
Healthy control iScience Article epidermal growth factor receptor, and platelet-derived growth factor receptor (Jenks et al., 2018), and thus, the role in the immune response to COVID-19 would need further investigation. Of the 416 COVID-19-specific genes, we also observe multiple superfamily members of ATP-binding cassette transporters, which facilitate the interaction of multiple immune cells with various classes of lipids. In macrophages and lymphocytes, this alters the plasticity of the cell, dampening the immune response to viral invasion (Hubler and Kennedy, 2016). As well as ZC3H13, this gene set includes many other zinc finger proteins. Zinc (Zn 2+ ) homeostasis in the cell is tightly regulated as viruses need Zn 2+ for newly synthesized viral proteins (Lazarczyk and Favre, 2008).
In place of GO terms directly derived from our ''COVID-19-specific gene signature'', Figure 5 illustrates the comparison of COVID-19 versus HC to non-COVID-19 versus HC GO terms. We found many downregulated pathways are discordant when comparing to HCs. Within these, a cluster of pathways that are high in COVID-19 and low in non-COVID-19 viral infections involve ribosome-related processes. In SARS-CoV-1 infections, it was determined that viral nsp1 disrupts ribosomal translation of host mRNA while allowing viral translation to continue (Huang et al., 2011). An opposite cluster of pathways that are high in non-COVID-19 viral infections and low in COVID-19 positively regulate cell-cell adhesion, cell activation, leukocyte activation, and immune response-activating cell surface receptor signaling, suggesting a less effective immune response in patients with COVID-19. Of particular interest was the observation that while both diseases had enriched GO terms for type-1 interferon signaling pathways, the significance of this enrichment was lower in COVID-19 ( Figure 5). The inspection of the 6 genes above mirrors these discordant pathway findings, supporting the concept of novel biology specific to COVID-19 within a largely similar response to other viruses.
Interestingly, the immune cell proportions are mostly consistent across COVID-19 and non-COVID-19 data sets. Our results are in line with several recent studies that found high neutrophil-lymphocyte ratio in patients with COVID-19 (Diao et al., 2020;Lagunas-Rangel, 2020;Liu et al., 2020;Qin et al., 2020). Expansion of CD56 bright NK cells is common in many viral infections, as part of recognizing and killing virally infected cells while orchestrating adaptive immune responses (Vivier et al., 2008). Comparing patients with COVID-19 to HCs shows an increase in NK cells ( Figure 6A), largely driven by the CD56 bright population. When compared to non-COVID-19 viral infections, the increase in NK cell (via CD56 bright NK cell) proportion remains high in the COVID-19 infections. This phenomenon was also directly observed using mass spectrometry to measure cell abundance over time in patients with COVID-19 and when considering factors most explanatory in those that recovered the cells that were the most dynamic included CD56 dim NK cells (Sun et al., 2020).
When comparing COVID-19 to non-COVID-19 viral infections, we see M1 macrophage proportions are similar to those of other viral diseases, but the elevated M2 response is discordant. M1 macrophages are pro-inflammatory and kill invaders, whereas M2 macrophages are considered anti-inflammatory and reparative. A large body of work in bacterial sepsis found that individuals with high M1 profiles had increased mortality, whereas those with a more evenly balanced M1/M2 were more likely to survive (Benoit et al., 2008). However, in general, monocytotropic viruses including SARS-CoV-1 have evolved mechanisms to interfere with effective macrophage polarization (Hu et al., 2012), favoring the M2 population for immune evasion. For example, virus-induced macrophage depletion is executed by viruses that carry pro-antiapoptotic proteins, thus initially reducing the number of M1s to skew population to M2 and avoid attack, and then further suppress the production and action of type I IFNs, stunting the progression of M1 macrophage polarization (Laura C Miller, 2015). This shift we see in the proportion of M2 macrophages in COVID-19 versus non-COVID-19 viral infections indicates that this novel pathogen may be executing these immune evasion techniques with a high degree of success. We see that eosinophils and CD56 dim natural killer (NK) cells are lower in COVID-19 versus non-COVID-19 infections, which replicated in a system-level study over time using mass cytometry and Olink assays where both cell types increased in abundance from a low level at the acute phase to a normal level in the recovery phase (Rodriguez et al., 2020). As well, decreased B cell and increased M2 macrophage cells were observed in a study of 3939 patients with COVID-19 from China and pose many avenues for novel therapies .
In conclusion, we here provide bulk RNAseq profiling of peripheral blood in COVID-19 in comparison to HCs which we derived a signature of 2002 genes for investigation of the biology and potentially pathophysiology of this disease, the ''COVID-19 signature genes''. We compiled an extensive database of ll OPEN ACCESS 12 iScience 24, 101947, January 22, 2021 iScience Article non-COVID-19 viral infections across many platforms, ages, diseases, and locations globally to compare to HCs using metaintegration to derive a set of 635 genes representing the host response to known viral pathogens, the ''non-COVID-19 signature genes''. We then used COCONUT to conormalize all of the data and directly compare COVID-19 to non-COVID-19 viral infections resulting in a signature of 416 genes, the ''COVID-19-specific gene signature''. We used all of these analyses to identify both the similarities and differences in the underlying host response. While we found that a large proportion of the host response is similar to that of other infections, we also identified key differences in individual genes, pathways, and cellularity that are suggestive of the clinical differences observed in COVID-19. The genes ACO1 and ATL3 were identified as an intersect of gene signatures for COVID-19 versus HCs and non-COVID-19 versus HCs, which were further contextualized when considering the top ranking genes of the novel ''COVID-19specific gene signature'', suggesting we have illuminated novel biology of the host immune response to a totally novel viral infection, but our findings will need to be replicated in further clinical studies. In summary, COVID-19 gene expression is highly correlated with known viral infection gene expression and has similar shifts in the immune cell proportions known to play a role in viral response but also shows discordant shifts in immune cells that are novel and reflect other recent publications, key information at the onset of a pandemic to leverage our prior and mounting viral infection knowledge. Our computational methods allowed for a head-to-head comparison of COVID-19 and non-COVID-19 viral infection resulting in a novel 416 gene signature, of which many of the genes with the largest Hedges' g ES have well-known immune functions; however, GO terms were not significant suggesting the magnitude and combination of the genes that discriminate the host response to this novel virus can be disseminated to the scientific community at large to investigate whether this novel combination of genes yields any targetable pathophysiology.

Limitations of the study
Our study has some limitations due to the design of using public data for non-COVID-19 comparison. First, due to the limited nature of clinical studies during a pandemic, we had just 62 patients with COVID-19 compared to >650 with other viral infections, creating class imbalance in their comparison. Second, we did not investigate effects of severity on host response as this was mostly unavailable. It is possible that differences in severity between this COVID-19 cohort and the other viral cohorts was a confounder in our analysis. Third, we analyzed differential expression at single pre-set significance and effect size thresholds. Choosing different thresholds (e.g., thresholds based on 80% statistical power in each analysis) would have identified different sets of differentially expressed genes. We provide Hedges g ES and FDR values for all genes (Table S2A) to enable re-analysis of these genes based on thresholds that others may deem more appropriate. Figure S3 is also provided to show the GO term enrichment results by varying cutoffs.

Material availability
This study did not generate any new unique reagents and/or materials.

Data and code availability
The publicly available studies can be accessed on GEO under their respective study IDs. The COVID-19 cohort is deposited in the Gene Expression Omnibus (GEO) database: GSE152641. Results were generated using R packages COCONUT and MetaIntegrator; both methods have been published and are publicly available R packages. The RNAseq pipeline used to process COVID-19 cohort is described in the methods section.

Methods
All methods can be found in the accompanying Transparent Methods supplemental file.
Wen, J., Lv, R., Ma, H., Shen, H., He, C., Wang, J., Jiao, F., Liu, H., Yang, P., Tan, L., et al. (2018  translation initiation factor activity Cul3−RING ubiquitin ligase complex blood microparticle cellular process involved in reproduction in multicellular organism regulation of neuron projection development regulation of protein activation cascade regulation of complement activation regulation of neurotransmitter levels DNA integrity checkpoint reproductive system development reproductive structure development regulation of acute inflammatory response negative regulation of phosphorylation negative regulation of hydrolase activity interaction with host response to nutrient levels response to tumor necrosis factor positive regulation of secretion by cell cellular response to tumor necrosis factor acute inflammatory response leukocyte migration regulation of interleukin−1 beta secretion stress−activated protein kinase signaling cascade positive regulation of adaptive immune response lymphocyte mediated immunity platelet activation ficolin−1−rich granule secretory granule membrane specific granule membrane tertiary granule I−kappaB kinase/NF−kappaB signaling regulation of type I interferon production response to interferon−beta response to interferon−alpha type I interferon production regulation of cytokine secretion innate immune response−activating signal transduction cellular response to interferon−gamma pattern recognition receptor signaling pathway regulation of response to biotic stimulus regulation of immune effector process primary lysosome tertiary granule membrane ficolin−1−rich granule lumen lytic vacuole membrane lysosomal membrane vacuolar membrane defense response to bacterium regulation of cysteine−type endopeptidase activity involved in apoptotic process positive regulation of proteolysis regulation of interleukin−1 production regulation of interleukin−6 production positive regulation of interleukin−6 production regulation of endopeptidase activity cellular response to biotic stimulus interleukin−6 production positive regulation of protein transport positive regulation of endopeptidase activity regulation of cytokine−mediated signaling pathway cellular response to molecule of bacterial origin positive regulation of peptide secretion positive regulation of protein secretion regulation of peptidase activity cellular response to lipopolysaccharide regulation of cysteine−type endopeptidase activity interleukin−1 production positive regulation of peptidase activity regulation of inflammatory response regulation of binding regulation of protein secretion production of molecular mediator of immune response regulation of peptide secretion regulation of hemopoiesis myeloid leukocyte differentiation regulation of I−kappaB kinase/NF−kappaB signaling response to molecule of bacterial origin regulation of response to cytokine stimulus response to lipopolysaccharide negative regulation of innate immune response adaptive immune response based on somatic recombination of immune receptors built from immunoglobulin superfamily domains positive regulation of I−kappaB kinase/NF−kappaB signaling positive regulation of type I interferon production regulation of adaptive immune response myeloid cell differentiation toll−like receptor signaling pathway positive regulation of response to external stimulus negative regulation of immune system process negative regulation of cytokine production positive regulation of inflammatory response positive regulation of cytokine secretion condensed chromosome kinetochore kinetochore condensed chromosome, centromeric region condensed chromosome chromosomal region chromosome, centromeric region spindle negative regulation of organelle organization spindle organization nuclear division negative regulation of chromosome organization mitotic nuclear division microtubule cytoskeleton organization involved in mitosis nuclear chromosome segregation chromosome segregation organelle fission metaphase/anaphase transition of cell cycle mitotic sister chromatid segregation metaphase/anaphase transition of mitotic cell cycle regulation of metaphase/anaphase transition of cell cycle regulation of mitotic metaphase/anaphase transition sister chromatid segregation regulation of nuclear division regulation of mitotic sister chromatid segregation chromosome separation regulation of chromosome separation mitotic sister chromatid separation mitotic cell cycle checkpoint regulation of mitotic sister chromatid separation regulation of mitotic nuclear division cell cycle checkpoint regulation of chromosome segregation regulation of sister chromatid segregation positive regulation of secretion cytosolic ribosome nuclear−transcribed mRNA catabolic process mRNA catabolic process RNA catabolic process viral transcription nuclear−transcribed mRNA catabolic process, nonsense−mediated decay cotranslational protein targeting to membrane establishment of protein localization to endoplasmic reticulum SRP−dependent cotranslational protein targeting to membrane protein targeting to ER cytoplasmic translation viral gene expression translational initiation lymphocyte differentiation T cell differentiation T cell activation structural constituent of ribosome translation factor activity, RNA binding ribosomal subunit ribosome translation preinitiation complex eukaryotic 43S preinitiation complex eukaryotic translation initiation factor 3 complex eukaryotic 48S preinitiation complex polysome cytosolic part ncRNA processing ribosome biogenesis rRNA metabolic process ribonucleoprotein complex subunit organization ribonucleoprotein complex assembly viral translation regulation of cellular amide metabolic process cytoplasmic translational initiation regulation of translation protein targeting protein localization to endoplasmic reticulum protein targeting to membrane establishment of protein localization to membrane vesicle lumen cytoplasmic vesicle lumen secretory granule lumen specific granule positive regulation of innate immune response neutrophil mediated immunity neutrophil activation involved in immune response neutrophil degranulation neutrophil activation regulation of innate immune response viral life cycle negative regulation of multi−organism process regulation of viral process regulation of symbiosis, encompassing mutualism through parasitism regulation of viral life cycle interferon−gamma−mediated signaling pathway viral genome replication response to interferon−gamma negative regulation of viral process negative regulation of viral genome replication regulation of viral genome replication negative regulation of viral life cycle regulation of multi−organism process response to type I interferon cellular response to type I interferon type I interferon signaling pathway positive regulation of cytokine production response to virus defense response to virus midbody endosome membrane regulation of cell cycle G2/M phase transition regulation of G2/M transition of mitotic cell cycle regulation of cell cycle phase transition cell cycle G2/M phase transition regulation of mitotic cell cycle phase transition G2/M transition of mitotic cell cycle negative regulation of cell cycle process negative regulation of cell cycle phase transition negative regulation of mitotic cell cycle phase transition regulation of chromosome organization negative regulation of mitotic cell cycle covalent chromatin modification histone modification proteasomal protein catabolic process DNA replication proteasome−mediated ubiquitin−dependent protein catabolic process coenzyme metabolic process process utilizing autophagic mechanism autophagy protein polyubiquitination positive regulation of establishment of protein localization response to oxidative stress 0 2 4 6 8 10 Value 0 6000    SupplementaryTable4_Immunostates.pdf, related to Figure 6.

COVID-19 samples from Hellenic Sepsis Study Cohort
A total of 76 adult patients with SARS-CoV-2 pneumonia were prospectively enrolled from April 1 st to May 4 th by department participating in the Hellenic Sepsis Study Group (www.sepsis.gr). Patients were enrolled within the first 24 hours of hospital admission using inclusion criteria of identification of a new lower respiratory tract infection due to COVID-19 defined as the presence of new infiltrate in chest X-ray or chest computed tomography indicative of COVID-19 in a patient without any contact with any healthcare facility the last 90 days. SARS-Cov-2 was detected by positive molecular testing of respiratory secretions. For patients who required mechanical ventilation (MV), blood sampling was performed within the first 24 hours from MV . Exclusion criteria were infection by the human immunodeficiency virus, neutropenia, and any previous intake of immunosuppressive medication (corticosteroids, anti-cytokine biologicals, and biological response modifiers). The studies were conducted under approval number 30/20 by the National Ethics Committee of Greece. Written informed consent was provided by patients or by first-degree relatives in cases where patients were unable to consent.