Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure

The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus’s structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system’s protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease.


Introduction
The current coronavirus disease 2019 (COVID-19) pandemic, caused by the Severe acute respiratory syndrome-related coronavirus 2 (SARS-CoV-2) virus, represents an acute global health crisis. Symptoms of the disease can range from mild to severe or fatal [1] and can affect a variety of organs and systems [2]. Outcomes of infection can include acute respiratory distress (ARDS) and acute lung injury, as well as damage to other organ systems [2,3]. Understanding the progression of the disease, including these diverse symptoms, depends on understanding how the virus interacts with the host. Additionally, the fundamental biology of the virus can provide insights into how it is transmitted among people, which can, in turn, inform efforts to control its spread. As a result, a thorough understanding of the pathogenesis of SARS-CoV-2 is a critical foundation on which to build an understanding of COVID-19 and the pandemic as a whole.
The rapid identification and release of the genomic sequence of the virus in January 2020 [4] provided early insight into the virus in a comparative genomic context. The viral genomic sequence clusters with known coronaviruses (order Nidovirales, family Coronaviridae, subfamily Orthocoronavirinae). Phylogenetic analysis of the coronaviruses reveals four major subclades, each corresponding to a genus: the alpha, beta, gamma, and delta coronaviruses. Among them, alpha and beta coronaviruses infect mammalian species, gamma coronaviruses infect avian species, and delta coronaviruses infect both mammalian and avian species [5]. The novel virus now known as SARS-CoV-2 was identified as a beta coronavirus belonging to the B lineage based on phylogenetic analysis of a polymerase chain reaction (PCR) amplicon fragment from five patients along with the full genomic sequence [6]. This lineage also includes the Severe acute respiratory syndrome-related coronavirus (SARS-CoV-1) that caused the 2002-2003 outbreak of Severe Acute Respiratory Syndrome (SARS) in humans [6]. (Note that these subclades are not to be confused with variants of concern within SARS-CoV-2 labeled with Greek letters; i.e., the Delta variant of SARS-CoV-2 is still a beta coronavirus.) Because viral structure and mechanisms of pathogenicity are highly conserved within the order, this phylogenetic analysis provided a basis for forming hypotheses about how the virus interacts with hosts, including which tissues, organs, and systems would be most susceptible to SARS-CoV-2 infection. Coronaviruses that infect humans (HCoV) are not common, but prior research into other HCoV such as SARS-CoV-1 and Middle East respiratory syndrome-related coronavirus (MERS-CoV), as well as other viruses infecting humans such as a variety of influenza species, established a strong foundation that accelerated the pace of SARS-CoV-2 research.
Coronaviruses are large viruses that can be identified by their distinctive "crown-like" shape ( Figure 1). Their spherical virions are made from lipid envelopes ranging from 100 to 160 nanometers in which peplomers (protruding structures) of two to three spike (S) glycoproteins are anchored, creating the crown [7,8]. These spikes, which are critical to both viral pathogenesis and to the response by the host immune response, have been visualized using cryo-electron microscopy [9]. Because they induce the human immune response, they are also the target of many proposed therapeutic agents [10,11]. Viral pathogenesis is typically broken down into three major components: entry, replication, and spread [12]. However, in order to draw a more complete picture of pathogenesis, it is also necessary to examine how infection manifests clinically, identify systems-level interactions between the virus and the human body, and consider the possible effects of variation or evolutionary change on pathogenesis and virulence. Thus, clinical medicine and traditional biology are both important pieces of the puzzle of SARS-CoV-2 presentation and pathogenesis.

Structure of Coronaviruses
Genome structure is highly conserved among coronaviruses, meaning that the relationship between the SARS-CoV-2 genome and its pathogenesis can be inferred from prior research in related viral species. The genomes of viruses in the Nidovirales order share several fundamental characteristics. They are non-segmented, which means the viral genome is a single continuous strand of RNA, and are enveloped, which means that the genome and capsid are encased by a lipid bilayer. Coronaviruses have large positive-sense RNA (ssRNA+) genomes ranging from 27 to 32 kilobases in length [13,14]. The SARS-CoV-2 genome lies in the middle of this range at 29,903 bp [14]. Genome organization is highly conserved within the order [13]. There are three major genomic regions: one containing the replicase gene, one containing the genes encoding structural proteins, and interspersed accessory genes [13] (Figure 1). The replicase gene comprises about two-thirds of the genome and consists of two open reading frames that are translated with ribosomal frameshifting [13]. This polypeptide is then translated into 16 non-structural proteins (nsp), except in gammacoronaviruses where nsp1 is absent, that form the replication machinery used to synthesize viral RNA [15]. The remaining third of the genome encodes structural proteins, including the spike (S), membrane, envelope, and nucleocapsid proteins. Additional accessory genes are sometimes present between these CoV-2 belongs, are known to bind to the CEACAM1 protein, 5-N-acetyl-9-O-acetyl neuraminic acid, and to angiotensin-converting enzyme 2 (ACE2) [18]. This recognition is driven by domains in the S1 subunit [20]. SARS-CoV-2 has a high affinity for human ACE2, which is expressed in the vascular epithelium, other epithelial cells, and cardiovascular and renal tissues [21,22], as well as many others [23]. The binding process is guided by the molecular structure of the spike protein, which is structured in three segments: an ectodomain, a transmembrane anchor, and an intracellular tail [24]. The ectodomain forms the crown-like structures on the viral membrane and contains two subdomains known as the S1 and S2 subunits [25]. The S1 (Nterminal) domain forms the head of the crown and contains the receptor binding motif, and the S2 (C-terminal) domain forms the stalk that supports the head [25]. The S1 subunit guides the binding of the virus to the host cell, and the S2 subunit guides the fusion process [24].
After the binding of the S1 subunit to an entry point, the spike protein of coronaviruses is often cleaved at the S1/S2 boundary into the S1 and S2 subunits by a host protease [20,26,27]. This proteolytic priming is important because it prepares the S protein for fusion [26,27]. The two subunits remain bound by van der Waals forces, with the S1 subunit stabilizing the S2 subunit throughout the membrane fusion process [20]. Cleavage at a second site within S2 (S2') activates S for fusion by inducing conformational changes [20]. Similar to SARS-CoV-1, SARS-CoV-2 exhibits redundancy in which host proteases can cleave the S protein [28]. Both transmembrane protease serine protease-2 (TMPRSS-2) and cathepsins B/L have been shown to mediate SARS-CoV-2 S protein proteolytic priming, and small molecule inhibition of these enzymes fully inhibited viral entry in vitro [28,29]. Other proteases known to cleave the S1/S2 boundary in coronaviruses include TMPRSS-4, trypsin, furin, cathepsins, and human airway trypsin-like protease (HAT) [29].
Unlike in SARS-CoV-1, a second cleavage site featuring a furin-like binding motif is also present near the S1/S2 boundary in SARS-CoV-2 [30]. This site is found in HCoV belonging to the A and C lineages of beta coronavirus, including MERS-CoV, but not in the other known members of the B lineage of beta coronavirus that contains SARS-CoV-1 and SARS-CoV-2 [30]. It is associated with increased virulence in other viral species [30] and may facilitate membrane fusion of SARS-CoV-2 in the absence of other proteases that prime the S1/S2 site [31]. However, given that proteases such as HAT are likely to be present in targets like the human airway, the extent to which this site has had a real-world effect on the spread of SARS-CoV-2 was initially unclear [31]. Subsequent research has supported this site as an important contributor to pathogenesis: in vitro analyses have reported that it bolsters pathogenicity specifically in cell lines derived from human airway cells (Calu3 cell line) [32,33,34] and that furin inhibitors reduced pathogenic effects in VeroE6 cells [35].
Electron microscopy suggests that in some coronaviruses, including SARS-CoV-1 and MERS-CoV, a six-helix bundle separates the two subunits in the postfusion conformation, and the unusual length of this bundle facilitates membrane fusion through the release of additional energy [5]. The viral membrane can then fuse with the endosomal membrane to release the viral genome into the host cytoplasm. Once the virus enters a host cell, the replicase gene is translated and assembled into the viral replicase complex. This complex then synthesizes the double-stranded RNA (dsRNA) genome from the genomic ssRNA(+). The dsRNA genome is transcribed and replicated to create viral mRNAs and new ssRNA(+) genomes [13,36]. From there, the virus can spread into other cells. In SARS-CoV-2, the insertion of the furin-like binding site near the S1/S2 boundary is also thought to increase cell-cell adhesion, making it possible for the viral genome to spread directly from cell to cell rather than needing to propagate the virion itself [37]. In this way, the genome of SARS-CoV-2 provides insight into the pathogenic behavior of the virus.
Evidence also suggests that SARS-CoV-2 may take advantage of the specific structure of endothelial cells to enter the circulatory system. Endothelial cells are specialized epithelial cells [38] that form a barrier between the bloodstream and surrounding tissues. The endothelium facilitates nutrient, oxygen, and cellular exchange between the blood and vascularized tissues [39]. The luminal (interior) surface of the endothelium is lined with glycocalyx, a network of both membrane-bound and soluble proteins and carbohydrates, primarily proteoglycans and glycoproteins [40,41]. The glycocalyx varies in thickness from 0.5 microns in the capillaries to 4.5 microns in the carotid arteries and forms a meshwork that localizes both endothelial-and plasma-derived signals to the inner vessel wall [40]. Heparan sulfate is the dominant proteoglycan in the glycocalyx, representing 50-90% of glycocalyx proteoglycan content [42]. The SARS-CoV-2 spike protein can bind directly to heparan sulfate, which serves in part as a scaffolding molecule to facilitate ACE2 binding and entry into endothelial cells [41]. A heparan sulfate binding site has also been identified near the ACE2 binding site on the viral receptor binding domain (RBD), and modeling has suggested that heparan sulfate binding yields an open conformation that facilitates binding to ACE2 on the cell surface [41]. Degrading or removing heparan sulfate was associated with decreased binding [41]. Heparan sulfate may also interact with the S1/S2 proteolytic cleavage site and other binding sites to promote binding affinity [43]. Notably, treatment with soluble heparan sulfate or even heparin (a commonly used anticoagulant and vasodilator that is similar in structure to heparan sulfate [44]) potently blocked spike protein binding and viral infection [41]. This finding is particularly interesting because degradation of heparan sulfate in the glycocalyx has previously been identified as an important contributor to ARDS and sepsis [45], two common and severe outcomes of COVID-19, and suggests that heparan sulfate could be a target for pharmaceutical inhibition of cell entry by SARS-CoV-2 [46,47,48,49,50]. Together, this evidence suggests that heparan sulfate can serve as an important adhesion molecule for SARS-CoV-2 cell entry. It may represent a therapeutic target but has not been pursued as much as other candidate targets [10].

Immune Evasion Strategies
Research in other HCoV provides some indication of how SARS-CoV-2 infection can proceed despite human immune defenses. Infecting the epithelium can help viruses such as SARS-CoV-1 bypass the physical barriers, such as mucus, that comprise the immune system's first line of defense [51]. Once the virus infiltrates host cells, it is adept at evading detection. CD163+ and CD68+ macrophage cells are especially crucial for the establishment of SARS-CoV-1 in the body [51]. These cells most likely serve as viral reservoirs that help shield SARS-CoV-1 from the innate immune response. According to a study on the viral dissemination of SARS-CoV-1 in Chinese macaques, viral RNA could be detected in some monocytes throughout the process of differentiation into dendritic cells [51]. This lack of active viral replication allows SARS-CoV-1 to escape the innate immune response because reduced levels of detectable viral RNA allow the virus to avoid both natural killer cells and Toll-like receptors [51]. Even during replication, SARS-CoV-1 is able to mask its dsRNA genome from detection by the immune system. Although dsRNA is a pathogen-associated molecular pattern that would typically initiate a response from the innate immune system [52], in vitro analysis of nidoviruses including SARS-CoV-1 suggests that these viruses can induce the development of double-membrane vesicles that protect the dsRNA signature from being detected by the host immune system [53]. This protective envelope can therefore insulate these coronaviruses from the innate immune system's detection mechanism [54].
HCoVs are also known to interfere with the host immune response, rather than just evade it. For example, the virulence of SARS-CoV-2 is increased by nsp1, which can suppress host gene expression by stalling mRNA translation and inducing endonucleolytic cleavage and mRNA degradation [55]. SARS-CoV-1 also evades the immune response by interfering with type I IFN induction signaling, which is a mechanism that leads to cellular resistance to viral infections. SARS-CoV-1 employs methods such as ubiquitination and degradation of RNA sensor adaptor molecules MAVS and TRAF3/6 [56]. Also, MERS-CoV downregulates antigen presentation via MHC class I and MHC class II, which leads to a reduction in T cell activation [56]. These evasion mechanisms, in turn, may facilitate systemic infection. Coronaviruses such as SARS-CoV-1 are also able to evade the humoral immune response through other mechanisms, such as inhibiting certain cytokine pathways or down-regulating antigen presentation by the cells [53].

Host Cell Susceptibility
ACE2 and TMPRSS-2 have been identified as the primary entry portal and as a critical protease, respectively, in facilitating the entry of SARS-CoV-1 and SARS-CoV-2 into a target cell [9,28,57,58,59]. This finding has led to a hypothesized role for the expression of these molecules in determining which cells, tissues, and organs are most susceptible to SARS-CoV-2 infection. ACE2 is expressed in numerous organs, such as the heart, kidney, and intestine, but it is most prominently expressed in alveolar epithelial cells; this pattern of expression is expected to contribute to the virus' association with lung pathology [21,60,61] as well as that of SARS [62]. A retrospective observational study reported indirect evidence that certain antineoplastic therapies, such as the chemotherapy drug gemcitabine, may reduce risk of SARS-CoV-2 infection in patients with cancer, possibly via decreased ACE2 expression [63]. Additionally, the addition of the furin site insertion at the S1/S2 boundary means that SARS-CoV-2 does not require TMPRSS-2 when furin, an ubiquitously expressed endoprotease [64], is present, enabling cell-cell fusion independent of TMPRSS-2 availability [65].

Clinical Presentation of COVID-19
SARS-CoV-2 pathogenesis is closely linked with the clinical presentation of the COVID-19 disease. Reports have described diverse symptom profiles associated with COVID-19, with a great deal of variability both within and between institutions and regions. Definitions for nonsevere, severe, and critical COVID-19, along with treatment recommendations, are available from the World Health Organization living guidelines [76]. A large study from Wuhan, China conducted early in the pandemic identified fever and cough as the two most common symptoms that patients reported at hospital admission [77], while a retrospective study in China described the clinical presentations of patients infected with SARS-CoV-2 as including lower respiratory tract infection with fever, dry cough, and dyspnea (shortness of breath) [78]. This study [78] noted that upper respiratory tract symptoms were less common, suggesting that the virus preferentially targets cells located in the lower respiratory tract. However, data from the New York City region [79,80] showed variable rates of fever as a presenting symptom, suggesting that symptoms may not be consistent across individuals. For example, even within New York City, one study [79] identified low oxygen saturation (<90% without the use of supplemental oxygen or ventilation support) in 20.4% of patients upon presentation, with fever being present in 30.7%, while another study [80] reported cough (79.4%), fever (77.1%), and dyspnea (56.5%) as the most common presenting symptoms; both of these studies considered only hospitalized patients. A later study reported radiographic findings such as ground-glass opacity and bilateral patchy shadowing in the lungs of many hospitalized patients, with most COVID-19 patients having lymphocytopenia, or low levels of lymphocytes (a type of white blood cell) [77]. Patients may also experience loss of smell, myalgias (muscle aches), fatigue, or headache. Gastrointestinal symptoms can also present [81], and the CDC includes nausea and vomiting, as well as congestion and runny nose, on its list of symptoms consistent with COVID-19 [1]. An analysis of an app-based survey of 500,000 individuals in the U.S. found that among those tested for SARS-CoV-2, a loss of taste or smell, fever, and a cough were significant predictors of a positive test result [82]. It is important to note that in this study, the predictive value of symptoms may be underestimated if they are not specific to COVID-19. This underestimation could occur because the outcome measured was a positive, as opposed to a negative, COVID-19 test result, meaning an association would be more easily identified for symptoms that were primarily or exclusively found with COVID-19. At the time the surveys were conducted, due to limits in U.S. testing infrastructure, respondents typically needed to have some symptoms known to be specific to COVID-19 in order to qualify for testing. Widespread testing of asymptomatic individuals may therefore provide additional insight into the range of symptoms associated with COVID-19.
Consistent with the wide range of symptoms observed and the pathogenic mechanisms described above, COVID-19 can affect a variety of systems within the body in addition to causing respiratory problems [ [89,90]. One study on autopsy samples suggested that SARS-CoV-2 may be able to enter the central nervous system via the neural-mucosal interface [91]. However, a study of 41 autopsied brains [92] found no evidence that the virus can actually infect the central nervous system. Although there was viral RNA in some brain samples, it was only found in very small amounts, and no viral protein was found. The RNA may have been in the blood vessels or blood components and not in the brain tissue itself. Instead, the neuropathological effects of COVID-19 are more likely to be caused indirectly by hypoxia, coagulopathy, or inflammatory processes rather than by infection in the brain [92]. COVID-19 has been associated with an increased incidence of large vessel stroke, particularly in patients under the age of 40 [93], and other thrombotic events including pulmonary embolism and deep vein thrombosis [94]. The mechanism behind these complications has been suggested to be related to coagulopathy, with reports indicating the presence of antiphospholipid antibodies [95] and elevated levels of ddimer and fibrinogen degradation products in deceased patients [96]. Other viral infections have been associated with coagulation defects and changes to the coagulation cascade; notably, SARS was also found to lead to disseminated intravascular coagulation and was associated with both pulmonary embolism and deep vein thrombosis [97]. The mechanism behind these insults has been suggested to be related to inflammation-induced increases in the von Willebrand factor clotting protein, leading to a pro-coagulative state [97]. Abnormal clotting (thromboinflammation or coagulopathy) has been increasingly discussed recently as a possible key mechanism in many cases of severe COVID-19, and may be associated with the high ddimer levels often observed in severe cases [98,99,100]. This excessive clotting in lung capillaries has been suggested to be related to a dysregulated activation of the complement system, part of the innate immune system [101,102].
Finally, concerns have been raised about long-term sequelae of COVID-19. Some COVID-19 patients have reported that various somatic symptoms (such as shortness of breath, fatigue, chest pain) and psychological (depression, anxiety or mild cognitive impairment) symptoms can last for months after infection [103]. Such long-term affects occur both in adults [104] and children [105]. Sustained symptoms affecting a variety of biological systems have been reported across many studies (e.g., [103,106,107]). The phenomenon of "long COVID" is not fully understood although various possible explanations have been proposed, including damage caused by immune response to infection as well as by the infection itself, in addition to negative consequences of the experience of lengthy illness and hospitalization. However, a lack of consistency among definitions used in different studies makes it difficult to develop precise definitions or identify specific symptoms associated with long-term effects of COVID-19 [108,109]. Patient and family support groups for "long haulers" have been formed online, and patient-driven efforts to collect data about post-acute COVID-19 provide valuable sources of information (e.g., [106]). The specific relationship between viral pathogenesis and these reported sequelae remains to be uncovered, however.

Pediatric Presentation
The presentation of COVID-19 infection can vary greatly among pediatric patients and, in some cases, manifests in distinct ways from COVID-19 in adults. Evidence suggests that children and adolescents tend to have mostly asymptomatic infections and that those who are symptomatic typically exhibit mild illness [110,111,112,113]. One review examined symptoms reported in 17 studies of children infected with COVID-19 during the early months of the COVID-19 epidemic in China and one study from Singapore [114]. In the more than a thousand cases described, the most common reports were for mild symptoms such as fever, dry cough, fatigue, nasal congestion and/or runny nose, while three children were reported to be asymptomatic. Severe lower respiratory infection was described in only one of the pediatric cases reviewed. Gastrointestinal symptoms such as vomiting or diarrhea were occasionally reported. Radiologic findings were not always reported in the case studies reviewed, but when they were mentioned they included bronchial thickening, ground-glass opacities, and/or inflammatory lesions [114]. Neurological symptoms have also been reported [115].
These analyses indicate that most pediatric cases of COVID-19 are not severe. Indeed, it is estimated that less than 1% of pediatric cases result in critical illness [112,116], although reporting suggests that pediatric hospitalizations may be greater with the emergence of the Delta variant of concern (VOC) [117,118,119]. Serious complications and, in relatively rare cases, deaths have occurred [120]. Of particular interest, children have occasionally experienced a serious inflammatory syndrome, multisystem inflammatory syndrome in children (MIS-C), following COVID-19 infection [121]. This syndrome is similar in some respects to Kawasaki disease, including Kawasaki disease shock syndrome [122,123,124], and is thought to be a distinct clinical manifestation of SARS-CoV-2 due to its distinct cytokine profile and the presence of burr cells in peripheral blood smears [125,126]. MIS-C has been associated with heart failure in some cases [127]. A small number of case studies have identified presentations similar to MIS-C in adults associated with SARS-CoV-2 [128,129,130,131]. However, not all cases of severe COVID-19 in children are characterizable as MIS-C. A recent study [132] described demographic and clinical variables associated with MIS-C in comparison with non-MIS-C severe acute COVID-19 in young people in the United States. Efforts to characterize long-term sequelae of SARS-CoV-2 infection in children face the same challenges as in adults, but long-term effects remain a concern in pediatric patients [105,133,134], although some early studies have suggested that they may be less of a concern than in adults [135,136,137]. Research is ongoing into the differences between the pediatric and adult immune responses to SARS-CoV-2, and future research may shed light on the factors that lead to MIS-C; it is also unknown whether the relative advantages of children against severe COVID-19 will remain in the face of current and future variants [138].

Cytokine Release Syndrome
The inflammatory response was identified early on as a potential driver of COVID-19 outcomes due to existing research in SARS and emerging research in COVID-19. While too low of an inflammatory response is a concern because it will fail to eliminate the immune threat [139], excessive pro-inflammatory cytokine activity can cascade [140] and cause cell damage, among other problems [141]. A dysregulated immune response can cause significant damage to the host [142,143,144] including pathogenesis associated with sepsis. Sepsis, which can lead to multi-organ failure and death [145,146], is traditionally associated with bacterial infections. However, sepsis associated with viral infections may be underidentified [147], and sepsis has emerged as a major concern associated with SARS-CoV-2 infection [148]. Hyperactivity of the pro-inflammatory response due to lung infection is commonly associated with acute lung injury and more rarely with the more severe manifestation, ARDS, which can arise from pneumonia, SARS, and COVID-19 [140,145]. Damage to the capillary endothelium can cause leaks that disrupt the balance between pro-inflammatory cytokines and their regulators [149], and heightened inflammation in the lungs can also serve as a source for systemic inflammation, or sepsis, and potentially multi-organ failure [145]. The shift from local to systemic inflammation is a phenomenon often referred to broadly as a cytokine storm [145] or, more precisely, as cytokine release syndrome [150].
Cytokine dysregulation is therefore a significant concern in the context of COVID-19. In addition to the known role of cytokines in ARDS and lung infection more broadly, immunohistological analysis at autopsy of deceased SARS patients revealed that ACE2-expressing cells that were infected by SARS-CoV-1 showed elevated expression of the cytokines IL-6, IL-1β, and TNF-α [151]. Similarly, the introduction of the S protein from SARS-CoV-1 to mouse macrophages was found to increase production of IL-6 and TNF-α [152]. For SARS-CoV-2 infection leading to COVID-19, early reports described a cytokine storm syndrome-like response in patients with particularly severe infections [60, 153,154]. Sepsis has been identified as a major contributor to COVID-19-related death. Among patients hospitalized with COVID-19 in Wuhan, China, 112 out of 191 (59%) developed sepsis, including all 54 of the non-survivors [78].
While IL-6 is sometimes used as a biomarker for cytokine storm activity in sepsis [145], the relationship between cytokine profiles and the risks associated with sepsis may be more complex. One study of patients with and at risk for ARDS, specifically those who were intubated for medical ventilation, found that shortly after the onset of ARDS, anti-inflammatory cytokine concentration in BALF increased relative to the concentration of pro-inflammatory cytokines [149]. The results suggest that an increase in pro-inflammatory cytokines such as IL-6 may signal the onset of ARDS, but recovery depends on an increased anti-inflammatory response [149]. However, patients with severe ARDS were excluded from this study. Another analysis of over 1,400 pneumonia patients in the United States reported that IL-6, tumor necrosis factor (TNF), and IL-10 were elevated at intake in patients who developed severe sepsis and/or ultimately died [155]. However, unlike the study analyzing pro-and anti-inflammatory cytokines in ARDS patients [149], this study reported that unbalanced pro-/anti-inflammatory cytokine profiles were rare. This discrepancy could be related to the fact that the sepsis study measured only three cytokines. Although IL-6 has traditionally been considered pro-inflammatory, its pleiotropic effects via both classical and trans-signaling allow it to play an integral role in both the inflammatory and anti-inflammatory responses [156], leading it to be associated with both healthy and pathological responses to viral threat [157]. While the cytokine levels observed in COVID-19 patients fall outside of the normal range, they are not as high as typically found in patients with ARDS [158]. Regardless of variation in the anti-inflammatory response, prior work has therefore made it clear that pulmonary infection and injury are associated with systemic inflammation and with sepsis. Inflammation has received significant interest both in regards to the pathology of COVID-19 as well as potential avenues for treatment, as the relationship between the cytokine storm and the pathophysiology of COVID-19 has led to the suggestion that a number of immunomodulatory pharmaceutical interventions could hold therapeutic value for the treatment of COVID-19 [10,159].

Insights from Systems Biology
Systems biology provides a cross-disciplinary analytical paradigm through which the host response to an infection can be analyzed. This field integrates the "omics" fields (genomics, transcriptomics, proteomics, metabolomics, etc.) using bioinformatics and other computational approaches. Over the last decade, systems biology approaches have been used widely to study the pathogenesis of diverse types of life-threatening acute and chronic infectious diseases [160]. Omics-based studies have also provided meaningful information regarding host immune responses and surrogate protein markers in several viral, bacterial and protozoan infections [161]. Though the complex pathogenesis and clinical manifestations of SARS-CoV-2 infection are not yet fully understood, omics technologies offer the opportunity for discovery-driven analysis of biological changes associated with SARS-CoV-2 infection.

Transcriptomics
Through transcriptomic analysis, the effect of a viral infection on gene expression can be assessed. Transcriptomic analyses, whether in vivo or in situ, can potentially reveal insights into viral pathogenesis by elucidating the host response to the virus. For example, infection by some viruses, including by the coronaviruses SARS-CoV-2, SARS-CoV-1, and MERS-CoV, is associated with the upregulation of ACE2 in human embryonic kidney cells and human airway epithelial cells [60]. This finding suggests that SARS-CoV-2 facilitates the positive regulation of its own transmission between host cells [60]. The host immune response also likely plays a key role in mediating infection-associated pathologies. Therefore, transcriptomics is one critical tool for characterizing the host response in order to gain insight into viral pathogenesis. For this reason, the application of omics technologies to the process of characterizing the host response is expected to provide novel insights into how hosts respond to SARS-CoV-2 infection and how these changes might influence COVID-19 outcomes.
Several studies have examined the cellular response to SARS-CoV-2 in vitro in comparison to other viruses. One study [162] compared the transcriptional responses of three human cell lines to SARS-CoV-2 and to other respiratory viruses, including MERS-CoV, SARS-CoV-1, Human parainfluenza virus 3, Respiratory syncytial virus, and Influenza A virus. The transcriptional response differed between the SARS-CoV-1 infected cells and the cells infected by other viruses, with changes in differential expression specific to each infection type. Where SARS-CoV-2 was able to replicate efficiently, differential expression analysis revealed that the transcriptional response was significantly different from the response to all of the other viruses tested. A unique pro-inflammatory cytokine signature associated with SARS-CoV-2 was present in cells exposed to both high and low doses of the virus, with the cytokines IL-6 and IL1RA uniquely elevated in response to SARS-CoV-2 relative to other viruses. However, one cell line showed significant IFN-I or IFN-III expression when exposed to high, but not low, doses of SARS-CoV-2, suggesting that IFN induction is dependent on the extent of exposure. These results suggest that SARS-CoV-2 induces a limited antiviral state with low IFN-I or IFN-III expression and a moderate IFN-stimulated gene response, in contrast to other viruses. Other respiratory viruses have been found to encode antagonists to the IFN response [163,164], including SARS-CoV-1 [165] and MERS-CoV [166].
The analysis of SARS-CoV-2 suggested that this transcriptional state was specific to cells expressing ACE2, as it was not observed in cells lacking expression of this protein except with ACE2 supplementation and at very high (10-fold increase) level of SARS-CoV-2 exposure [162]. In another study, direct stimulation with inflammatory cytokines such as type I interferons (e.g., IFNβ) was also associated with the upregulation of ACE2 in human bronchial epithelial cells, with treated groups showing four-fold higher ACE2 expression than control groups at 18 hours post-treatment [167]. This hypothesis was further supported by studies showing that several nsp in SARS-CoV-2 suppress interferon activity [168] and that the SARS-CoV-2 ORF3b gene suppresses IFNB1 promoter activity (IFN-I induction) more efficiently than the SARS-CoV-1 ORF3b gene [169]. Taken together, these findings suggest that a unique cytokine profile is associated with the response to the SARS-CoV-2 virus, and that this response differs depending on the magnitude of exposure.
Susceptibility and IFN induction may also vary by cell type. Using poly(A) bulk RNA-seq to analyzed dynamic transcriptional responses to SARS-CoV-2 and SARS-CoV-1 revealed negligible susceptibility of cells from the H1299 line (< 0.08 viral read percentage of total reads) compared to those from the Caco-2 and Calu-3 lines (>10% of viral reads) [170]. This finding suggests that the risk of infection varies among cell types, and that cell type could influence which hosts are more or less susceptible. Based on visual inspection of microscopy images alongside transcriptional profiling, the authors also showed distinct responses among the host cell lines evaluated [170]. In contrast to Caco-2, Calu-3 cells infected with SARS-CoV-2 showed signs of impaired growth and cell death at 24 hours post infection, as well as moderate IFN induction with a strong up-regulation of IFN-stimulated genes. Interestingly, the results were similar to those reported in Calu-3 cells exposed to much higher levels of SARS-CoV-2 [162], as described above. This finding suggests that IFN induction in Calu-3 cells is not dependent on the level of exposure, in contrast to A549-ACE2 cells. The discrepancy could be explained by the observations that Calu-3 cells are highly susceptible to SARS-CoV-2 and show rapid viral replication [29], whereas A549 cells are incompatible with SARS-CoV-2 infection [171]. This discrepancy raises the concern that in vitro models may vary in their similarity to the human response, underscoring the importance of follow-up studies in additional models.
As a result, transcriptional analysis of patient tissue is an important application of omics technology to understanding COVID-19. Several studies have collected blood samples from COVID-19 patients and analyzed them using RNA-Seq [172,173,174,175,176,177]. Analyzing gene expression in the blood is valuable to understanding host-pathogen interactions because of the potential to identify alterations associated with the immune response and to gain insights into inflammation, among other potential insights [172]. One study compared gene expression in 39 COVID-19 inpatients admitted with community-acquired pneumonia to that of control donors using whole blood cell transcriptomes [172]. They also evaluated the effect of mild versus severe disease. A greater number of differentially expressed genes were found in severe patients compared to controls than in mild patients compared to controls. They also identified that the transcriptional profiles clustered into five groups and that the groups could not be explained by disease severity. Most severe cases fell into two clusters associated with increased inflammation and granulocyte and neutrophil activation. The presence of these clusters suggests the possibility that personalized medicine could be useful in the treatment of COVID-19 [172]. Longitudinal analysis of granulocytes from patients with mild versus severe COVID-19 revealed that granulocyte activation-associated factors differentiated the disease states, with greater numbers of differentially expressed genes early in disease course [172]. This study therefore revealed distinct patterns associated with COVID-19 and identified genes and pathways associated with each cluster.
Many other studies have also identified transcriptomic signatures associated with the immune response and inflammation. Other studies have profiled the transcriptome of BALF [174] and the nasopharynx [178]. One study used single-cell transcriptomics techniques to investigate cell types including brain and choroid plexus cells compared to healthy controls and controls with influenza; among other signals of neuroinflammation, this study reported cortical T cells only in COVID-19 patients [179]. Transcriptomic analysis can thus provide insight into the pathogenesis of SARS-CoV-2 and may also be useful in identifying candidate therapeutics [172].

Proteomics
Proteomics analysis offers an opportunity to characterize the response to a pathogen at a level above transcriptomics. Especially early on, this primarily involved evaluating the effect of the virus on cell lines. One early proteomics study investigated changes associated with in vitro SARS-CoV-2 infection using Caco-2 cells [180]. This study reported that SARS-CoV-2 induced alterations in multiple vital physiological pathways, including translation, splicing, carbon metabolism and nucleic acid metabolism in the host cells. Another area of interest is whether SARS-CoV-2 is likely to induce similar changes to other HCoV. For example, because of the high level of sequence homology between SARS-CoV-2 and SARS-CoV-1, it has been hypothesized that sera from convalescent SARS-CoV-1 patients might show some efficacy in cross-neutralizing SARS-CoV-2-S-driven entry [28]. However, despite the high level of sequence homology, certain protein structures might be immunologically distinct, which would be likely to prohibit effective cross-neutralization across different SARS species [181]. Consequently, proteomic analyses of SARS-CoV-1 might also provide some essential information regarding the new pathogen [182,183].
Proteomics research has been able to get ahead of the timeline for development of omics-level big data sets specific to SARS-CoV-2 by adopting a comparative bioinformatics approach. Data hubs such as UniProt [184], NCBI Genome Database [185], The Immune Epitope Database and Analysis Resource [186], and The Virus Pathogen Resource [187] contain a wealth of data from studies in other viruses and even HCoV. Such databases facilitate the systems-level reconstruction of protein-protein interaction networks, providing opportunities to generate hypotheses about the mechanism of action of SARS-CoV-2 and identify potential drug targets.
In an initial study [188], 26 of the 29 SARS-CoV-2 proteins were cloned and expressed in HEK293T kidney cells, allowing for the identification of 332 high-confidence human proteins interacting with them. Notably, this study suggested that SARS-CoV-2 interacts with innate immunity pathways. Ranking pathogens by the similarity between their interactomes and that of SARS-CoV-2 suggested West Nile virus, Mycobacterium tuberculosis, and human papillomavirus infections as the top three hits. The fact that the host-pathogen interactome of the bacterium Mycobacterium tuberculosis was found to be similar to that of SARS-CoV-2 suggests that changes related to lung pathology might comprise a significant contributor to these expression profiles. Additionally, it was suggested that the envelope protein, E, could disrupt host bromodomain-containing proteins, i.e., BRD2 and BRD4, that bind to histones, and the spike protein could likely intervene in viral fusion by modulating the GOLGA7-ZDHHC5 acyltransferase complex to increase palmitoylation, which is a post-translational modification that affects how proteins interact with membranes [189].
An example of an application of this in silico approach comes from another study [190], which used patient-derived peripheral blood mononuclear cells to identify 251 host proteins targeted by SARS-CoV-2. This study also reported that more than 200 host proteins were disrupted following infection. In particular, a network analysis showed that nsp9 and nsp10 interacted with NF-Kappa-B-Repressing Factor, which encodes a transcriptional repressor that mediates repression of genes responsive to Nuclear Factor kappa-light-chain-enhancer of activated Bcells. These genes are important to pro-, and potentially also anti-, inflammatory signaling [191]. This finding could explain the exacerbation of the immune response that shapes the pathology and the high cytokine levels characteristic of COVID-19, possibly due to the chemotaxis of neutrophils mediated by IL-8 and IL-6. Finally, it was suggested [192] that the E protein of both SARS-CoV-1 and SARS-CoV-2 has a conserved Bcl-2 Homology 3-like motif, which could inhibit anti-apoptosis proteins, e.g., BCL2, and trigger the apoptosis of T cells. Several compounds are known to disrupt the host-pathogen protein interactome, largely through the inhibition of host proteins. Therefore, this research identifies candidate targets for intervention and suggests that drugs modulating protein-level interactions between virus and host could be relevant to treating COVID-19.
As with other approaches, analyzing the patterns found in infected versus healthy human subjects is also important. COVID-19 infection has been associated with quantitative changes in transcripts, proteins, metabolites, and lipids in patient blood samples [193]. One longitudinal study [194] compared COVID-19 patients to symptomatic controls who were PCR-negative for SARS-CoV-2. The longitudinal nature of this study allowed it to account for differences in the scale of inter-versus intraindividual changes. At the time of first sampling, common functions of proteins upregulated in COVID-19 patients relative to controls were related to immune system mediation, coagulation, lipid homeostasis, and protease inhibition. They compared these data to the patient-specific timepoints associated with the highest levels of SARS-CoV-2 antibodies and found that the actin-binding protein gelsolin, which is involved in recovery from disease, showed the steepest decline between those two time points. Immunoglobulins comprised the only proteins that were significantly different between the COVID-19 and control patients at both of these timepoints. The most significantly downregulated proteins between these time points were related to inflammation, while the most significantly upregulated proteins were immunoglobulins. Proteins related to coagulation also increased between the two timepoints. The selection of a symptomatic control cohort rather than healthy comparisons also suggests that the results are more likely to highlight the response to SARS-CoV-2 and COVID-19 specifically, rather than to disease more broadly. This study also compared the disease course in patients who ultimately survived to those who died and found that ITIH4, a protein associated with the inflammatory response to trauma, may be a biomarker useful to identifying patients at risk of death. Thus, these results indicate the value of studying patients in a longitudinal manner over the disease course. By revealing which genes are perturbed during SARS-CoV-2 infection, proteomicsbased analyses can thus provide novel insights into host-virus interaction and serve to generate new avenues of investigation for therapeutics.

Viral Virulence
Like that of SARS-CoV-1, the entry of SARS-CoV-2 into host cells is mediated by interactions between the viral spike glycoprotein, S, and human ACE2 (hACE2) [20, 28,195,196,197,198,199,200]. Differences in how the S proteins of the two viruses interact with hACE2 could partially account for the increased transmissibility of SARS-CoV-2. Studies have reported conflicting binding constants for the S-hACE2 interaction, though they have agreed that the SARS-CoV-2 S protein binds with equal, if not greater, affinity than the SARS-CoV-1 S protein does [9,20,198]. The C-terminal domain of the SARS-CoV-2 S protein in particular was identified as the key region of the virus that interacts with hACE2, and the crystal structure of the C-terminal domain of the SARS-CoV-2 S protein in complex with hACE2 reveals stronger interaction and a higher affinity for receptor binding than that of SARS-CoV-1 [199]. Among the 14 key binding residues identified in the SARS-CoV-1 S protein, eight are conserved in SARS-CoV-2, and the remaining six are semi-conservatively substituted, potentially explaining variation in binding affinity [20,198]. Studies of crystal structure have shown that the RBD of the SARS-CoV-2 S protein, like that of other coronaviruses, undergoes stochastic hinge-like movement that flips it from a "closed" conformation, in which key binding residues are hidden at the interface between protomers, to an "open" one [9,20]. Spike proteins cleaved at the furin-like binding site are substantially more likely to take an open conformation (66%) than those that are uncleaved (17%) [201]. Because the RBD plays such a critical role in viral entry, blocking its interaction with ACE2 could represent a promising therapeutic approach. Nevertheless, despite the high structural homology between the SARS-CoV-2 RBD and that of SARS-CoV-1, monoclonal antibodies targeting SARS-CoV-1 RBD failed to bind to SARS-CoV-2-RBD [9]. However, in early research, sera from convalescent SARS patients were found to inhibit SARS-CoV-2 viral entry in vitro, albeit with lower efficiency than it inhibited SARS-CoV-1 [28].
Comparative genomic analysis reveals that several regions of the coronavirus genome are likely critical to virulence. The S1 domain of the spike protein, which contains the receptor binding motif, evolves more rapidly than the S2 domain [18,19]. However, even within the S1 domain, some regions are more conserved than others, with the receptors in S1's N-terminal domain (S1-NTD) evolving more rapidly than those in its C-terminal domain (S1-CTD) [19]. Both S1-NTD and S1-CTD are involved in receptor binding and can function as RBDs to bind proteins and sugars [18], but RBDs in the S1-NTD typically bind to sugars, while those in the S1-CTD recognize protein receptors [5]. Viral receptors show higher affinity with protein receptors than sugar receptors [5], which suggests that positive selection on or relaxed conservation of the S1-NTD might reduce the risk of a deleterious mutation that would prevent binding. The SARS-CoV-2 S protein also contains an RRAR furin recognition site at the S1/S2 junction [9,20], setting it apart from both bat coronavirus RaTG13, with which it shares 96% genome sequence identity, and SARS-CoV-1 [202]. Such furin cleavage sites are commonly found in highly virulent influenza viruses [203,204]. The furin recognition site at the S1/S2 junction is likely to increase pathogenicity via destabilization of the spike protein during fusion to ACE2 and the facilitation of cell-cell adhesion [9,20,37,201,203,204]. These factors may influence the virulence of SARS-CoV-2 relative to other beta coronaviruses. Additionally, a major concern has been the emergence of SARS-CoV-2 variants with increased virulence. The extent to which evolution within SARS-CoV-2 may affect pathogenesis is reviewed below.

Molecular Signatures, Transmission, and Variants of Concern
Genetic variation in SARS-CoV-2 has been used to elucidate patterns over time and space. Many mutations are neutral in their effect and can be used to trace transmission patterns. Such signatures within SARS-CoV-2 have provided insights during outbreak investigations [205,206,207]. Similar mutations observed in several patients may indicate that the patients belong to the same transmission group. The tracking of SARS-CoV-2 mutations is recognized as an essential tool for controlling future outbreaks and tracing the path of the spread of SARS-CoV-2. In the first months of the pandemic in early 2020, early genomic surveillance efforts in Guangdong, China revealed that local transmission rates were low and that most cases arising in the province were imported [208]. Since then, efforts have varied widely among countries: for example, the U.K. has coordinated a national database of viral genomes [209], but efforts to collect this type of data in the United States have been more limited [210]. Studies have applied phylogenetic analyses of viral genomes to determine the source of local COVID-19 outbreaks in Connecticut (USA), [211], the New York City area (USA) [212], and Iceland [213]. There has been an ongoing effort to collect SARS-CoV-2 genomes throughout the COVID-19 outbreak, and as of summer 2021, millions of genome sequences have been collected from patients. The sequencing data can be found at GISAID [214], NCBI [215], and COVID-19 data portal [216].
Ongoing evolution can be observed in genomic data collected through molecular surveillance efforts. In some cases, mutations can produce functional changes that can impact pathogenesis. One early example is the spike protein mutation D614G, which appeared in March 2020 and became dominant worldwide by the end of May 2020 [217,218]. This variant was associated with increased infectivity and increased viral load, but not with more severe disease outcomes [217,219]. This increased virulence is likely achieved by altering the conformation of the S1 domain to facilitate binding to ACE2 [219]. Similarly, the N439K mutation within the RBD of the spike protein is likely associated with increased transmissibility and enhanced binding affinity for hACE2, although it is also not thought to affect disease outcomes [220]. In contrast, a mutation in ORF8 that was identified in Singapore in the early months of 2020 was associated with cases of COVID-19 that were less likely to require treatment with supplemental oxygen [221], and a deletion surrounding the furin site insertion at the S1/S2 boundary has been identified only rarely in clinical settings [222], suggesting that these mutations may disadvantage viral pathogenesis in human hosts. Thus, mutations have been associated with both virological and clinical differences in pathogenesis.
Several VOCs have also been identified and designated through molecular surveillance efforts [223]. The Alpha variant (lineage B.1.1.7) was first observed in the U.K. in October 2020 before it quickly spread around the world [224]. Other variants meriting further investigation have also been identified, including the Beta variant (B.1.351 lineage) first identified in South Africa and the Gamma variant (P.1 lineage) initially associated with outbreaks in Brazil. These lineages share independently acquired mutations that may affect pathogenicity [225,226,227,228,229]. For example, they are all associated with a greater binding affinity for hACE2 than that of the wildtype variant [227,230,231], but they were not found to have more efficient cell entry than the wildtype virus [232]. A fourth VOC, the Delta variant (B.1.617.2 and AY.1, AY.2, and AY.3 lineages), was identified in India in late 2020 [233]. Some of the mutations associated with this lineage may alter fusogenicity and enhance furin cleavage, among other effects associated with increased pathogenicity [234]. The changes in these VOC demonstrate how ongoing evolution in SARS-CoV-2 can drive changes in how the virus interacts with host cells.

Quantifying Viral Presence
Assessing whether a virus is present in a sample is a more complex task than it initially seems. Many diagnostic tests rely on real-time polymerase chain reaction (RT-PCR) to test for the presence versus absence of a virus [235]. They may report the cycle threshold (Ct) indicating the number of doubling cycles required for the target (in this case, SARS-CoV-2) to become detectable. A lower Ct therefore corresponds to a higher viral load. The Ct that corresponds to a positive can vary widely, but is often around 35. This information is sufficient to answer many questions, since an amplicon must be present in order to be duplicated in RT-PCR. For example, if a patient is presenting with COVID-19 symptoms, a positive RT-PCR test can confirm the diagnosis.
However, RT-PCR analysis alone cannot provide the information needed to determine whether a virus is present at sufficient levels to be infectious [236]. Some studies have therefore taken the additional step of cultivating samples in vitro in order to observe whether cells become infected with SARS-CoV-2. One study collected upper respiratory tract samples from COVID-19 patients, analyzed them with RT-PCR to determine the cycle threshold, and then attempted to cultivate the SARS-CoV-2 virus in VeroE6 cells [236]. This study found that out of 246 samples, less than half (103) produced a positive culture. Moreover, at a Ct of 35, only 5 out of 60 samples grew in vitro. Therefore, the RT-PCR-confirmed presence of SARS-CoV-2 in a sample does not necessarily indicate that the virus is present at a high enough concentration to grow and/or spread.

Mechanisms of Transmission
When a human host is infected with a virus and is contagious, person-to-person viral transmission can occur through several possible mechanisms. When a contagious individual sneezes, coughs, or exhales, they produce respiratory droplets that can contain a large number of viral particles [237]. Viral particles can enter the body of a new host when they then come in contact with the oral, nasal, eye, or other mucus membranes [237]. The primary terms typically used to discuss the transmission of viruses via respiratory droplets are droplet, aerosol, and contact transmission [238]. The distinction between droplet and aerosol transmission is typically anchored on whether a particle containing the virus is larger or smaller than 5 micrometers (μm) [239,240]. Droplet transmission typically refers to contact with large droplets that fall quickly to the ground at close range, such as breathing in droplets produced by a sneeze [237,239]. Aerosol transmission typically refers to much smaller particles (less than 5 μm) produced by sneezing, coughing, or exhaling [237,238] that can remain suspended over a longer period of time and potentially to be moved by air currents [237]. It is also possible that viral particles deposited on surfaces via large respiratory droplets could later be aerosolized [237]. The transmission of viral particles that have settled on a surface is typically referred to as contact or fomite transmission [237,241]. Any respiratory droplets that settle on a surface could contribute to fomite transmission [237]. Droplet and contact transmission are both well-accepted modes of transmission for many viruses associated with common human illnesses, including influenza and rhinovirus [237].
The extent to which aerosol transmission contributes to the spread of respiratory viruses is more widely debated. In influenza A, for example, viral particles can be detected in aerosols produced by infected individuals, but it is not clear to what extent these particles drive the spread of influenza A infection [237,238,242,243,244]. Regardless of its role in the spread of influenza A, however, aerosol transmission likely played a role in outbreaks such as the 1918 Spanish Influenza (H1N1) and 2009 "swine flu" (pH1N1) [244]. All three of these mechanisms have been identified as possible contributors to the transmission of HCoVs [237], including the highly pathogenic coronaviruses SARS-CoV-1 and MERS-CoV [245,246]. Transmission of SARS-CoV-1 is thought to proceed primarily through droplet transmission, but aerosol transmission is also considered possible [237, 247,248], and fomite transmission may have also played an important role in some outbreaks [249]. Similarly, the primary mechanism of MERS transmission is thought to be droplets because inter-individual transmission appears to be associated with close interpersonal contact (e.g., household or healthcare settings), but aerosolized particles of the MERS virus have been reported to persist much more robustly than influenza A under a range of environmental conditions [250,251]. However, few of these analyses have sought to grow positive samples in culture and thus to confirm their potential to infect new hosts.
Contact, droplet, and aerosol transmission are therefore all worth evaluating when considering possible modes of transmission for a respiratory virus like SARS-CoV-2. The stability of the SARS-CoV-2 virus both in aerosols and on a variety of surfaces was found to be similar to that of SARS-CoV-1 [252]. Droplet-based and contact transmission were initially put forward as the greatest concern for the spread of SARS-CoV-2 [253], with droplet transmission considered the dominant mechanism driving the spread of the virus [254] because the risk of fomite transmission under real-world conditions is likely to be substantially lower than the conditions used for experimental analyses [255]. The COVID-19 pandemic has, however, exposed significant discrepancies in how terms pertaining to airborne viral particles are interpreted in different contexts [239]. The 5-μm distinction between "droplets" and "aerosols" is typical in the biological literature but is likely an artifact of historical science rather than a meaningful boundary in biology or physics [240]. Additionally, various ambient conditions such as air flow can influence how particles of different sizes fall or spread [239]. Despite initial skepticism about airborne transmission of SARS-CoV-2 through small particles [240], evidence now suggests that small particles can contribute to SARS-CoV-2 transmission [252, 256,257,258]. For example, one early study detected SARS-CoV-2 viral particles in air samples taken from hospitals treating COVID-19 patients, although the infectivity of these samples was not assessed [259]. Subsequently, other studies have been successful in growing SARS-CoV-2 in culture with samples taken from the air [260,261] while others have not [262,263] (see [264] for a systematic review of available findings as of July 2020). The fact that viable SARS-CoV-2 may exist in aerosolized particles calls into question whether some axioms of COVID-19 prevention, such as 2-meter social distancing, are sufficient [240,260,265].

Symptoms and Viral Spread
Other aspects of pathogenesis are also important to understanding how the virus spreads, especially the relationship between symptoms, viral shedding, and contagiousness. Symptoms associated with reported cases of COVID-19 range from mild to severe [1], but some individuals who contract COVID-19 remain asymptomatic throughout the duration of the illness [266]. The incubation period, or the time period between exposure and the onset of symptoms, has been estimated at five to eight days, with means of 4.91 (95% confidence interval (CI) 4.35-5.69) and 7.54 (95% CI 6.76-8.56) reported in two different Asian cities and a median of 5 (IQR 1 to 6) reported in a small number of patients in a Beijing hospital [267,268].
However, the exact relationship between contagiousness and viral shedding remains unclear. Estimates suggest that viral shedding can, in some cases, begin as early as 12.3 days (95% CI 5.9-17.0) before the onset of symptoms, although this was found to be very rare, with less than 0.1% of transmission events occurring 7 or more days before symptom onset [269]. Transmissibility appeared to peak around the onset of symptoms (95% CI -0.9 -0.9 days), and only 44% (95% CI 30-57%) of transmission events were estimated to occur from presymptomatic contacts [269]. A peak in viral load corresponding to the onset of symptoms was also confirmed by another study [236]. As these trends became apparent, concerns arose due to the potential for individuals who did not yet show symptoms to transmit the virus [270]. Recovered individuals may also be able to transmit the virus after their symptoms cease. A study of the communicable period based on twenty-four individuals who tested positive for SARS-CoV-2 prior to or without developing symptoms estimated that individuals may be contagious for one to twenty-one days, but they note that this estimate may be low [266]. In an early study, viral nucleic acids were reported to remain at observable levels in the respiratory specimens of recovering hospitalized COVID-19 patients for a median of 20 days and with a maximum observed duration through 37 days, when data collection for the study ceased [78].
As more estimates of the duration of viral shedding were released, they converged around approximately three weeks from first positive PCR test and/or onset of symptoms (which, if present, are usually identified within three days of the initial PCR test). For example, in some studies, viral shedding was reported for up to 28 days following symptom onset [271] and for one to 24 days from first positive PCR test, with a median of 12 days [67]. On the other hand, almost 70% of patients were reported to still have symptoms at the time that viral shedding ceased, although all symptoms reduced in prevalence between onset and cessation of viral shedding [272]. The median time that elapsed between the onset of symptoms and cessation of viral RNA shedding was 23 days and between first positive PCR test and cessation of viral shedding was 17 days [272]. The fact that this study reported symptom onset to predate the first positive PCR test by an average of three days, however, suggests that there may be some methodological differences between it and related studies. Furthermore, an analysis of residents of a nursing home with a known SARS-CoV-2 case measured similar viral load in residents who were asymptomatic regardless of whether they later developed symptoms, and the load in the asymptomatic residents was comparable to that of residents who displayed either typical or atypical symptoms [273]. Taken together, these results suggest that the presence or absence of symptoms are not reliable predictors of viral shedding or of SARS-CoV-2 status (e.g, [274]). However, it should be noted that viral shedding is not necessarily a robust indicator of contagiousness. The risk of spreading the infection was low after ten days from the onset of symptoms, as viral load in sputum was found to be unlikely to pose a significant risk based on efforts to culture samples in vitro [271]. The relationship between symptoms, detectable levels of the virus, and risk of viral spread is therefore complex.
The extent to which asymptomatic or presymptomatic individuals are able to transmit SARS-CoV-2 has been a question of high scientific and community interest. Early reports (February and March 2020) described transmission from presymptomatic SARS-CoV-2-positive individuals to close family contacts [275,276]. One of these reports [276] also included a description of an individual who tested positive for SARS-CoV-2 but never developed symptoms. Later analyses also sought to estimate the proportion of infections that could be traced back to a presymptomatic or asymptomatic individual (e.g., [277]). Estimates of the proportion of individuals with asymptomatic infections have varied widely. The proportion of asymptomatic individuals on board the Diamond Princess cruise ship, which was the site of an early COVID-19 outbreak, was estimated at 17.9% [278]. In contrast, a model using the prevalence of antibodies among residents of Wuhan, China estimated a much higher rate of asymptomatic cases, at approximately 7 in 8, or 87.5% [279]. An analysis of the populations of care homes in London found that, among the residents (median age 85), the rate of asymptomatic infection was 43.8%, and among the caretakers (median age 47), the rate was 49.1% [280]. The duration of viral shedding may also be longer in individuals with asymptomatic cases of COVID-19 compared to those who do show symptoms [281]. As a result, the potential for individuals who do not know they have COVID-19 to spread the virus raises significant concerns. In Singapore and Tianjin, two cities studied to estimate incubation period, an estimated 40-50% and 60-80% of cases, respectively, were considered to be caused by contact with asymptomatic individuals [267]. An analysis of viral spread in the Italian town of Vo', which was the site of an early COVID-19 outbreak, revealed that 42.5% of cases were asymptomatic and that the rate was similar across age groups [282]. The argument was thus made that the town's lockdown was imperative for controlling the spread of COVID-19 because it isolated asymptomatic individuals. While more models are likely to emerge to better explore the effect of asymptomatic individuals on SARS-CoV-2 transmission, these results suggest that strategies for identifying and containing asymptomatic but contagious individuals are important for managing community spread.

Estimating the Fatality Rate
Estimating the occurrence of asymptomatic and mild COVID-19 cases is important to identifying the mortality rate associated with COVID-19. The mortality rate of greatest interest would be the total number of fatalities as a fraction of the total number of people infected. One commonly reported metric is the case fatality rate (CFR), which compares the number of COVID-19 related deaths to the number of confirmed or suspected cases. However, in locations without universal testing protocols, it is impossible to identify all infected individuals because so many asymptomatic or mild cases go undetected. Therefore, a more informative metric is the infection fatality rate (IFR), which compares the known deaths to the estimated number of cases. It thus requires the same numerator as CFR, but divides by an approximation of the total number of cases rather than only the observed/suspected cases. IFR varies regionally, with some locations observed to have IFRs as low as 0.17% while others are as high as 1.7% [283].
Estimates of CFR at the national and continental level and IFR at the continent level is maintained by the Centre for Evidence-Based Medicine [284]. Several meta-analyses have also sought to estimate IFR at the global scale. These estimates have varied; one peer-reviewed study aggregated data from 24 other studies and estimated IFR at 0.68% (95% CI 0.53%-0.82%), but a preprint that aggregated data from 139 countries calculated a global IFR of 1.04% (95% CI 0.77%-1.38%) when false negatives were considered in the model [283,285]. A similar prevalence estimate was identified through a repeated cross-sectional serosurvey conducted in New York City that estimated the IFR as 0.97% [286]. Examination of serosurvey-based estimates of IFR identified convergence on a global IFR estimate of 0.60% (95% CI 0.42%-0.77%) [283]. All of these studies note that IFR varies widely by location, and it is also expected to vary with demographic and health-related variables such as age, sex, prevalence of comorbidities, and access to healthcare and testing [287]. Estimates of infection rates are becoming more feasible as more data becomes available for modeling and will be bolstered as serological testing becomes more common and more widely available. However, this research may be complicated due to the emergence of variants over time, as well as the varying availability and acceptance of vaccines in different communities and locations.

Dynamics of Transmission
Disease spread dynamics can be estimated using R0, the basic reproduction number, and Rt, the effective reproduction number. Accurate estimates of both are crucial to understanding the dynamics of infection and to predicting the effects of different interventions. R0 is the average number of new (secondary) infections caused by one infected person, assuming a wholly susceptible population [288], and is one of the most important epidemiological parameters [289]. A simple mechanistic model used to describe infectious disease dynamics is a susceptibleinfected-recovered compartmental model [290,291]. In this model, individuals move through three states: susceptible, infected, and recovered; two parameters, and , specify the rate at which the infectious recover, and the infection transmission rate, respectively, and R0 is estimated as the ratio of and [289,292]. A pathogen can invade a susceptible population only if R0 > 1 [289,293]. The spread of an infectious disease at a particular time t can be quantified by Rt, the effective reproduction number, which assumes that part of the population has already recovered (and thus gained immunity to reinfection) or that mitigating interventions have been put into place. For example, if only a fraction St of the population is still susceptible, Rt = St x R0. When Rt is greater than 1, an epidemic grows (i.e., the proportion of the population that is infectious increases); when Rt is less than 1, the proportion of the population that is infectious decreases. R0 and Rt can be estimated directly from epidemiological data or inferred using susceptible-infected-recovered-type models. To capture the dynamics of SARS-CoV-2 accurately, the addition of a fourth compartment, i.e. a susceptible-exposed-infectiousrecovered model, may be appropriate because such models account for the relative lengths of incubation and infectious periods [294].
Original estimates of R0 for COVID-19 lie in the range R0=1.4-6.5 [295,296,297]. Variation in R0 is expected between different populations, and the estimated values of R0 discussed below are for specific populations in specific environments. The different estimates of R0 should not necessarily be interpreted as a range of estimates of the same underlying parameter. In one study of international cases, the predicted value was R0=1.7 [298]. In China (both Hubei province and nationwide), the value was predicted to lie in the range R0=2.0-3.6 [295,299,300].
Another estimate based on a cruise ship where an outbreak occurred predicted R0=2. 28 [301]. Susceptible-exposed-infectious-recovered model-derived estimates of R0 range from 2.0 -6.5 in China [302,303,304,305] to R0=4.8 in France [306]. Using the same model as for the French population, a study estimated R0=2.6 in South Korea [306], which is consistent with other studies [307]. From a meta-analysis of studies estimating R0, [296] the median R0 was estimated to be 2.79 (IQR 1.16) based on twelve studies published between January 1 and February 7, 2020.
Inference of the effective reproduction number can provide insight into how populations respond to an infection and the effectiveness of interventions. In China, Rt was predicted to lie in the range 1.6-2.6 in January 2020, before travel restrictions [308]. Rt decreased from 2.35 one week before travel restrictions were imposed (January 23, 2020), to 1.05 one week after. Using their model, the authors also estimated the probability of new outbreaks occurring. Assuming individual-level variation in transmission comparable to that of MERS or SARS, the probability of a single individual exporting the virus and causing a large outbreak is 17-25%, and assuming variation like that of SARS and transmission patterns like those observed for COVID-19 in Wuhan, the probability of a large outbreak occurring after ≥4 infections exist at a new location is greater than 50%. An independent study came to similar conclusions, finding Rt=2.38 in the twoweek period before January 23 with a decrease to Rt = 1.34 (using data from January 24 to February 3) or Rt=0.98 (using data from January 24 to February 8) [297]. In South Korea, Rt was inferred for February through March 2020 in two cities, Daegu (the center of the outbreak) and Seoul [307]. Metro data was also analyzed to estimate the effects of social distancing measures. Rt decreased in Daegu from around 3 to <1 over the period that social distancing measures were introduced. In Seoul, Rt decreased slightly, but remained close to 1 (and larger than Rt in Daegu). These findings indicate that social distancing measures appeared to be effective in containing the infection in Daegu, but in Seoul, Rt remained above 1, meaning secondary outbreaks remained possible. The study also shows the importance of regionspecific analysis: the large decline in case load nationwide was mainly due to the Daegu region and could mask persistence of the epidemic in other regions, such as Seoul and Gyeonggi-do. In Iran, estimates of Rt declined from 4.86 in the first week to 2.1 by the fourth week after the first cases were reported [309]. In Europe, analysis of 11 countries inferred the dynamics of Rt over a time range from the beginning of the outbreak until March 28, 2020, by which point most countries had implemented major interventions (such as stay-at-home orders, public gathering bans, and school closures) [310]. Across all countries, the mean Rt before interventions began was estimated as 3.87; Rt varied considerably, from below 3 in Norway to above 4.5 in Spain. After interventions, Rt decreased by an average of 64% across all countries, with mean Rt=1.43. The lowest predicted value was 0.97 for Norway and the highest was 2.64 for Sweden, which could be related to the fact that Sweden did not implement social distancing measures on the same scale as other countries. The study concludes that while large changes in Rt are observed, it is too early to tell whether the interventions put into place are sufficient to decrease Rt below 1.
Evolution within SARS-CoV-2 has also driven changes in the estimated reproduction number for different populations at different times. As of June 2021, the reproduction number had increased globally relative to 2020, and increased transmissibility over the wildtype variant was observed Data and empirically determined biological mechanisms inform models, while models can be used to try to understand data and systems of interest or to make predictions about possible future dynamics, such as the estimation of capacity needs [316] or the comparison of predicted outcomes among prevention and control strategies [317,318]. Many current efforts to model Rt have also led to tools that assist the visualization of estimates in real time or over recent intervals [319,320]. These are valuable resources, yet it is also important to note that the estimates arise from models containing many assumptions and are dependent on the quality of the data they use, which varies widely by region.

Conclusions
The novel coronavirus SARS-CoV-2 is the third HCoV to emerge in the 21st century, and research into previous HCoVs has provided a strong foundation for characterizing the pathogenesis and transmission of SARS-CoV-2. Critical insights into how the virus interacts with human cells have been gained from previous research into HCoVs and other viral infections.
With the emergence of three devastating HCoV over the past twenty years, emergent viruses are likely to represent an ongoing threat. Contextualizing SARS-CoV-2 alongside other viruses serves not only to provide insights that can be immediately useful for combating this virus itself but may also prove valuable in the face of future viral threats.
Host-pathogen interactions provide a basis not only for understanding COVID-19, but also for developing a response. As with other HCoVs, the immune response to SARS-CoV-2 is likely driven by detection of its spike protein, which allows it to enter cells through ACE2. Epithelial cells have also emerged as the major cellular target of the virus, contextualizing the respiratory and gastrointestinal symptoms that are frequently observed in COVID-19. Many of the mechanisms that facilitate the pathogenesis of SARS-CoV-2 are currently under consideration as possible targets for the treatment or prevention of COVID-19 [10,11]. Research in other viruses also provides a foundation for understanding the transmission of SARS-CoV-2 among people and can therefore inform efforts to control the virus's spread. Airborne forms of transmission (droplet and aerosol transmission) have emerged as the primary modes by which the virus spreads to new hosts. Asymptomatic transmission was also a concern in the SARS outbreak of 2002-03 and, as in the current pandemic, presented challenges for estimating rates of infection [321]. These insights are important for developing a public health response, such as the CDC's shift in its recommendations surrounding masking [322].
Even with the background obtained from research in SARS and MERS, COVID-19 has revealed itself to be a complex and difficult-to-characterize disease that has many possible presentations that vary with age. Variability in presentation, including cases with no respiratory symptoms or with no symptoms altogether, were also reported during the SARS epidemic at the beginning of the 21st century [321]. The variability of both which symptoms present and their severity have presented challenges for public health agencies seeking to provide clear recommendations regarding which symptoms indicate SARS-CoV-2 infection and should prompt isolation. Asymptomatic cases add complexity both to efforts to estimate statistics such as R0 and Rt, which are critical to understanding the transmission of the virus, and IFR, which is an important component of understanding its impact on a given population. The development of diagnostic technologies over the course of the pandemic has facilitated more accurate identification, including of asymptomatic cases [235]. As more cases have been diagnosed, the health conditions and patient characteristics associated with more severe infection have also become more clear, although there are likely to be significant sociocultural elements that also influence these outcomes [323]. While many efforts have focused on adults, and especially older adults because of the susceptibility of this demographic, additional research is needed to understand the presentation of COVID-19 and MIS-C in pediatric patients. As more information is uncovered about the pathogenesis of HCoV and SARS-CoV-2 specifically, the diverse symptomatology of COVID-19 has and likely will continue to conform with the ever-broadening understanding of how SARS-CoV-2 functions within a human host.
While the SARS-CoV-2 virus is very similar to other HCoV in several ways, including in its genomic structure and the structure of the virus itself, there are also some differences that may account for differences in the COVID-19 pandemic compared to the SARS and MERS epidemics of the past two decades. The R0 of SARS-CoV-2 has been estimated to be similar to SARS-CoV-1 but much higher than that of MERS-CoV [324], although a higher R0 has been estimated for some VOC. While the structures of the viruses are very similar, evolution among these species may account for differences in their transmissibility and virulence. For example, the acquisition of a furin cleavage site the S1/S2 boundary within the SARS-CoV-2 S protein may be associated with increased virulence. Additionally, concerns have been raised about the accumulation of mutations within the SARS-CoV-2 species itself, and whether these could influence virulence [325]. These novel variants may be resistant to vaccines and antibody treatments such as Bamlanivimab that were designed based on the wildtype spike protein [10, 326]. As a consequence of reliance on targeting the SARS-CoV-2 spike protein for many therapeutic and prophylactic strategies, increased surveillance is required to rapidly identify and prevent the spread of novel SARS-CoV-2 variants with alterations to the spike protein. The coming of age of genomic technologies has made these types of analyses feasible, and genomics research characterizing changes in SARS-CoV-2 along with temporal and spatial movement is likely to provide additional insights into whether within-species evolution influences the effect of the virus on the human host. Additionally, the rapid development of sequencing technologies over the past decade has made it possible to rapidly characterize the host response to the virus. For example, proteomics analysis of patient-derived cells revealed candidate genes whose regulation is altered by SARS-CoV-2 infection, suggesting possible approaches for pharmaceutical invention and providing insight into which systems are likely to be disrupted in COVID-19 [190]. As more patient data becomes available, the biotechnological advances of the 2000s are expected to allow for more rapid identification of potential drug targets than was feasible during the SARS, or even MERS, pandemic.
Thus, the COVID-19 crisis continues to evolve, but the insights acquired over the past 20 years of HCoV research have provided a solid foundation for understanding the SARS-CoV-2 virus and the disease it causes. As the scientific community continues to respond to COVID-19 and to elucidate more of the relationships between pathogenesis, transmission, host regulatory responses, and symptomatology, this understanding will no doubt continue to evolve and to reveal additional connections among virology, pathogenesis, and health. This review represents a collaboration between scientists from diverse backgrounds to contextualize this virus at the union of many different biological disciplines [327]. At present, understanding the SARS-CoV-2 virus and its pathogenesis is critical to a holistic understanding of the COVID-19 pandemic. In the future, interdisciplinary work on SARS-CoV-2 and COVID-19 may guide a response to a new viral threat.

Acknowledgements
We thank Nick DeVito for assistance with the Evidence-Based Medicine Data Lab COVID-19 TrialsTracker data. We thank Yael Evelyn Marshall who contributed writing (original draft) as well as reviewing and editing of pieces of the text but who did not formally approve the manuscript, as well as Ronnie Russell, who contributed text to and helped develop the structure of the manuscript early in the writing process and Matthias Fax who helped with writing and editing text related to diagnostics. We are also very grateful to James Fraser for suggestions about successes and limitations in the area of computational screening for drug repurposing.
We are grateful to the following contributors for reviewing pieces of the text: Nadia Danilova, James Eberwine and Ipsita Krishnan.