The rapid, massive infection of the scientific literature and authors by COVID-19

(1) Departments of Medicine, of Epidemiology and Population Health, of Biomedical Data Science, and of Statistics, and Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, California, USA (2) Meta-Research Innovation Center Berlin (METRIC-B), QUEST, Berlin Institute of Health, Berlin, Germany (3) SciTech Strategies, Inc., Albuquerque, New Mexico, USA (4) Research Intelligence, Elsevier B.V., Amsterdam, the Netherlands

Contributions: JPAI had the original idea and wrote the first draft of the paper. JB analyzed the data. All authors interpreted the data and contributed writing the paper and approved the final version. JPAI is guarantor.
The acute crisis of COVID-19 has led to a major effort by the scientific community to generate evidence about the new coronavirus and its pandemic. Here, we dissect the way that COVID-19 has spread like a rapid, widespread infection in the scientific literature and among researchers, as more and more papers and authors have focused on this exciting, timely topic. We aim to understand which scientific areas and which types of scientists have been most mobilized by the pandemic, and we discuss the implications of this rapid "covidization" of the research enterprise.

METHODS
We used a copy of the Scopus database 1  to publications indexed (loaded) in Scopus in 2020 only, and with a publication year of 2020 or greater. In order to evaluate publication dates by month, we have used the publication month and year where available. When publication month was not available, and when the publication date exceeded the indexing date, we used the indexing date. This accounts, for example, for cases where an article is published today, but the official journal issue is due later. Our evaluation is targeted at the date at which publications became available to the public rather than official publication dates.
We further focused on the 2,759,916 authors who have a Scopus-indexed publication in the first 11 months of 2020 and who have also authored in their entire career at least 5 Scopus-indexed papers classified as articles, reviews or conference papers. This allows exclusion of authors with limited presence in the scientific literature as well as some author IDs that may represent split fragments of the publication record of some more prolific authors.

Field classification
All authors were assigned to their most common field and subfield discipline of their career. We used the Science Metrix classification of science, which is a standard mapping of all science into 21 main fields and 174 subfield disciplines. 2,3

Influential scientists
We also examine how COVID-19 has affected the publication portfolio of researchers whose work has the largest citation impact in the literature. On the one hand, these scientists are already well established and thus may have less need or interest to venture into a new field. On the other hand, these scientists are also more productive and competitive, therefore they may be faster in moving into a rapidly emerging, new important frontier. We used the career-long statistics calculated with the Scopus database of November 1, 2020, using the code as provided with the supplemental data recently published for the most cited authors across science. [4][5][6] Each author has been assigned to a main field and main subfield based on the largest proportion of publications across fields and analysis is restricted to the top 2% authors per Science Metrix subfield. We have developed a composite citation indicator 4,5 and accordingly 140,885 scientists can be classified as being in the top 2% of their main subfield discipline based on the citations that their work received in 2019. Of those, 118,916 were active and had published at least 1 paper also in 2020.

Topics of prominence
In order to visualize the growth and spread of the COVID-19 scientific literature across scientific fields and over time, we used a circle of scientific fields that has been previously developed 7 and which places the 333 Scopus journal categories sequentially around the perimeter of a circle. There are 27 high-level categories that are placed first and ordered in a manner that emerges naturally from a meta-analysis of the layouts of other science maps created using multiple databases and methods. 8 Each of the 27 categories is assigned a separate color. The remaining 306 lower-level journal categories are then ordered within the corresponding high-level categories using factor analyses based on citation patterns. Each of the 333 journal categories thus has a fixed position on the perimeter of the circle.
The full Scopus citation graph of well over 50 million articles and 1 billion citation links was used to cluster articles into over 90,000 topics using established methods. 9 Each topic is assigned a position within the circle based on triangulation of the positions of its constituent papers, each of which takes on the positional characteristics of its journal category. Topics are colored by their dominant journal category and area-sized proportionally based on the number of objects (e.g., papers, authors) being counted for the particular analysis. This circle of science and topic visualization are used in Elsevier's SciVal tool. For the display of authors per topic, we have assigned authors to one topic by taking the topic with the highest proportion of publications per author.

Prolific authors and authors with high citation impact of their COVID-19 publication record
We also mapped the most prolific authors of the published COVID-19 corpus and the authors whose COVID-19 publications to-date had had the highest citation impact.
For prolific productivity, we ranked the authors according to decreasing number of COVID-19 published items. We show detailed data on extremely prolific authors with over 30 COVID-19 published items to-date.
Citation impact was assessed with the previously proposed citation indicator 4-6 that combines information on 6 indices: total citations, Hirsch h-index, Schreiber hm-index, citations to single-authored papers, citations to first-or single-authored papers, and citations to first-, single-or last-authored papers. This avoids focusing simply on a single traditional metric such as citations, where it is expected that the authors of the earliest highly-cited papers would practically monopolize the top of the list, even if they had published a single paper and they were co-authors among many other authors. Selfcitations are excluded from all calculations. 5, 6 We present descriptive data on the institution, country and two most common scientific subfields (per Science Metrix classification) for the top-300 authors in that list.
We avoid comparisons based on statistical tests, as the analyses presented here were aimed to be descriptive and exploratory. Among the 2,759,916 authors who have published anything that is Scopus-indexed in the first 11 months of 2020 and who have also authored in their entire career at least 5 Scopus-indexed papers that are classified as articles, reviews or conference papers, by the end of November 2020, 144,403 of these authors (5.2%), had at least one published and indexed COVID-19 paper.

Scientific fields and subfields
Among the 2,959,916 authors, at the field level the highest "infection" rates with COVID-19 publications were seen in authors whose main field in their career had been Public Health and in Clinical Medicine: 11.3% (6,388/56,516) and 11.1% (92,570/833,060) of authors in these two fields, respectively, were "infected" by the end of 1 0 fields. The lowest percentage was seen in the field of Physics & Astronomy (0.7%), from which even 1,779 authors had their work "infected" by COVID-19. At the subfield discipline level, the highest "infection" rate of authors was seen (Table 1) in Emergency and Critical Care Medicine (26.3%). However, "infection" rates were higher than 10% (i.e. at least one in ten authors in that field had authored something on COVID-19) in 32 subfield disciplines and higher than 5% (at least one in twenty authors) in 71 subfield disciplines. Almost all (173/174) subfields (except for Automobile Design & Engineering) had some authors publishing on COVID-19. Supplementary Table 1 gives detailed data for COVID-19 "infection rates" of authors across all subfield disciplines.
27% of the authors published their COVID-19 research primarily in a subfield discipline that was not among the top 3 subfield disciplines where they had published most commonly during their career. Sometimes the fields of expertise of authors seemed remote from COVID-19, e.g. an expert on solar cells publishing on COVID-19 in healthcare personnel. Even experts specializing in their past work on very remote disciplines such as fisheries, ornithology, entomology or architecture had published on COVID-19.

Influential scientists and COVID-19 publications
Influential scientists were even more likely to be "infected" with COVID-19 (Supplementary Table 2). Among the 118,916 influential scientists active in publishing in 2020, 15,803 (13.3%) had been "infected" by COVID-19 in their publications in the first 11 months of 2020. The "infection" rate was the highest in the fields of Clinical Medicine (27.7%) and Public Health (26.8%). Among subfield disciplines, the highest "infection" rate of such active, influential authors was seen ( Medicine (58.1%), Allergy (50.2%) and Virology (48.0%). However, "infection" rates were higher than 10% (i.e. at least one in ten authors in that field had authored something on COVID-19) in 83 of 174 subfield disciplines across science and higher than 5% (at least one in twenty authors) in 116 subfield disciplines. Figure 1 shows the growth and spread of COVID-19 papers, authors of COVID-19 papers, and high-impact authors of COVID-19 papers (those who belong to the top-2% of impact, as discussed previously) across scientific topics. As shown, there is a strong response of the literature and of the scientific workforce in some specific thematic areas, but there is also increasing and substantial involvement of scientists and respective publications, even in remote topics.

Productivity for COVID-19 publications
A total of 1,560 author IDs in Scopus had 10 or more Scopus-indexed published items. Setting a threshold of at least 15, 20, 25, and 30 items, the number of such extremely prolific authors was 483, 216, 107, and 67. Table 2 shows the 65 authors with the highest productivity (30 or more COVID-19 published items indexed in Scopus; 2 authors had their papers split in two ID profiles each, which we merged). Of these 65 authors, 3 were BMJ news journalists, one was an anonymous Lancet editorial column, and one was an audio interview editor at the New England Journal of Medicine. Among the remaining 60 scientists, the most common countries were Italy (n=10), China (n=9), USA (n=8), Hong Kong (n=6), India (n=5), and UK (n=5).

Authors with highest citation impact for COVID-19 publications
Supplementary Table 3

DISCUSSION
Approximately 2.8% of the scientific literature published in the first 11 months of 2020 and more than 4% of all scientists publishing in that period were "infected" in their published work by COVID-19. The relative proportion of COVID-19 papers increased rapidly over time. The most influential scientists across science were even more commonly engaged with COVID-19 research. Roughly one in seven active, influential scientists quickly added or adjusted their publishing portfolio to include COVID-19. Scientists in some scientific fields were highly engaged with COVID-19 work, with rates exceeding 1 in 10 for scientists publishing in Clinical Medicine and Public Health, and exceeding 1 in 4 when the most influential scientists working in these fields were considered. Some subfields have even more massive involvement of scientists in COVID-19 work. However, almost every single subfield had some scientists publishing on COVID-19. The spread of COVID-19 interests across the map of science was rapid and extensive.
Our data probably even underestimate the explosive growth of COVID-19-related work, since some papers are published but not yet indexed, while some others have been released only as preprints (a popular method of disseminating information in the COVID-19 era) 10,11 and most COVID-19 preprints appear in medRxiv, 12 a repository not yet covered by Scopus. Probably over 100,000 COVID-19 papers are published in 2020.
Undoubtedly many more papers will continue to be published in 2021 and beyond.
Therefore, while 4.5% of the publishing scientific community and 13.3% of the most influential scientists had already authored COVID-19 publications at the time of our analysis, these proportions may become much larger in the future.
Many authors had published an astonishingly large number of COVID-19 items, and 65 had published 30 or more in such minimal time. Given delays in indexing, these numbers may underestimate the hyper-prolific productivity. The concentration of hyperprolific authors in countries like China, Hong Kong, and Italy may be related to the early outbreak of the pandemic in these countries, as well as prevalent co-authorship practices in these countries. Importantly, meritorious productivity versus sloppiness is difficult to disentangle without examining each case in depth.
We also addressed the citation impact of authors for their COVID-19 work. The top ranks included many journalists and editors who publish many news stories and editorials in their highly visible general medical and science journals. This news/editorial function may be helpful. These published items may be readily used for citations, as they are often published well in advance of the scientific work to which they refer. However, the quality, standards and validity of rapidly deployed non-peer-reviewed items is unknown. Flashy news, media, and editorializing may be prominent during the pandemic. [13][14][15][16] It is unknown whether non-peer-reviewed news stories and in-house editorials in major journal help against the "infodemic" or sometimes contribute to make things worse. Excluding journalists and editors of prestigious journals, the key countries of the authors with the highest composite citation indicator tended to be similar to the countries of the most prolific authors. A few subfields accounted for the lion's share of the authors with the highest composite citation indicator.
The rapid response of the scientific community to the COVID-19 crisis is largely a welcome phenomenon. Many scientists quickly focused their attention to an urgent situation and an entirely new pathogen and disease. This demonstrates that the scientific community has sufficient flexibility to shift attention rapidly to major issues. Much was swiftly learned on COVID-19. On the other hand, the quality of the published work was not assessed in our analysis, but several evaluations raise concerns about many of the COVID-19 publications being of low quality. [17][18][19] Massive productivity has been described in the pre-COVID era, as affecting researchers across many fields 20 and may be also a feature for COVID-19 research. Extreme productivity would be worrisome if it sacrifices quality.
The spread of COVID-19 publications in topics and authors traditionally working beyond key relevant disciplines further testifies the great attractiveness of COVID-19 as a field of investigation. The favorable aspect of this expansion is the ability to bring in specialists with expertise in diverse fields, thus fostering interdisciplinarity in a multidimensional crisis. However, if many scientists have ventured to work and publish in areas where they lack fundamental expertise, their contributions may be problematic or outright erroneous.
Furthermore, there has been a rapid mobilization of funding into COVID-19 research, with some areas, e.g. vaccine development, earmarked for urgent work. This may have worked as an additional attractor of scientists to this rapidly expanding field. However, urgency does not guarantee good quality and robustness. Much of the produced publication record may not be very informative and some may be fundamentally flawed.
Flaws go beyond retractions, which account for <0.1% of published COVID-19 work. 21,22 Certain limitations should be discussed. First, current Scopus data have high precision and recall (98.1% and 94.4%, respectively), 1 but some authors may be split in two or more records and some ID records may include papers from two or more authors.
These errors may affect single authors but are unlikely to affect the overall picture obtained in these analyses. Second, field and subfield classification follows a well-known established method, though published items are not precisely categorizable in scientific fields. Third, data on citation impact of COVID-19 authors are too early to appraise with confidence, and the ranking of specific scientists is highly tenuous and can quickly change with relatively small changes in citation counts. The bigger picture of author characteristics rather than specific names should be the focus of these data. Fourth, since many COVID-19 accepted papers and preprints are not yet indexed in Scopus, fields with slower publication and indexing may be relatively under-represented in the analyses.
As the pandemic matures, the science of COVID-19 should also mature. Important remaining questions can be raised about the extent and duration of this "covidization" of research. Will scientists continue to flock from different disciplines into COVID-19 research? What consequences might this have for other areas of important investigationcould non-COVID-19 topics be unfairly neglected? Is the response proportional to the magnitude of the crisis? What is the validity and utility of all these publications? Tracking both the pandemic and the scientific response to the pandemic will be useful to make decisions about planning for the growth, reallocation of interest, and old-versus-new priorities for science.