Training deep learning algorithms with weakly labeled pneumonia chest X-ray data for COVID-19 detection

The novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has caused a pandemic resulting in over 2.7 million infected individuals and over 190,000 deaths and growing. Respiratory disorders in COVID-19 caused by the virus commonly present as viral pneumonia-like opacities in chest X-ray images which are used as an adjunct to the reverse transcription-polymerase chain reaction test for confirmation and evaluating disease progression. The surge places high demand on medical services including radiology expertise. However, there is a dearth of sufficient training data for developing image-based automated decision support tools to alleviate radiological burden. We address this insufficiency by expanding training data distribution through use of weakly-labeled images pooled from publicly available CXR collections showing pneumonia-related opacities. We use the images in a stage-wise, strategic approach and train convolutional neural network-based algorithms to detect COVID-19 infections in CXRs. It is observed that weakly-labeled data augmentation improves performance with the baseline test data compared to non-augmented training by expanding the learned feature space to encompass variability in the unseen test distribution to enhance inter-class discrimination, reduce intra-class similarity and generalization error. Augmentation with COVID-19 CXRs from individual collections significantly improves performance compared to baseline non-augmented training and weakly-labeled augmentation toward detecting COVID-19 like viral pneumonia in the publicly available COVID-19 CXR collections. This underscores the fact that COVID-19 CXRs have a distinct pattern and hence distribution, unlike non-COVID-19 viral pneumonia and other infectious agents.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 8, 2020.

55
While not recommended as a primary diagnostic tool due to risk of increased transmission, chest 56 radiography and computed tomography (CT) scans are used to screen/confirm respiratory damage 57 in COVID-19 disease and evaluate its progression [3]. CT scans are reported to be less specific than

74
It is customary to train and test a DL model with the data coming from the same target 75 distribution to offer probabilistic predictions toward categorizing the medical images to their 76 respective categories. Often this idealized target is not possible due to limited data availability, or 77 weak labels. In the present situation, despite a large number of cases worldwide, we have very limited 78 COVID-19 CXR image data that is publicly available to train DL models where the goal is to recognize 79 CXR images showing COVID-19-related viral pneumonia from those caused by other non-COVID-80 19 viral, bacterial and other pathogens. Acquiring such data remains a goal for medical societies such 81 as the Radiological Society of North America (RSNA) 2 and Imaging COVID-19 AI Initiative in 82 Europe 3 . Large number of training data enable a diversified feature space across categories that help 83 1 https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection 2 https://press.rsna.org/timssnet/media/pressreleases/14_pr_target.cfm?ID=2167 3 https://imagingcovid19ai.eu/ . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020.

104
In this work, we use weakly-labeled CXR images that are pooled from publicly available

132
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020.    Table 1 shows the distribution of data extracted from the datasets identified above and used for 150 Broadly, our workflow consists of the following steps: First, we preprocess the images to make

172
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 8, 2020.              Table 2. For model 261 validation, we allocated 20% of the training data which was randomly selected. The performance 262 achieved by the models is shown in Table 3.

263
It can be observed that the VGG-16 model demonstrates superior performance in terms of 264 accuracy and AUC with the hold-out test data. Xception model gives higher precision and specificity 265 than the other models. However, considering the F-score and MCC that give a balanced precision 266 and sensitivity measure, the VGG-16 model outperformed the others in classifying the pediatric CXRs 267 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 8, 2020.

291
As observed in Table 4

299
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

308
The learned behavior of the baseline trained VGG-16 model with the pediatric CXR collection is 309 interpreted through Grad-CAM visualizations and is shown in Fig. 4

330
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.04.20090803 doi: medRxiv preprint 9 The weakly labeled images are further stored to augment the baseline training data to improve 331 performance in categorizing the test data from pediatric, Twitter COVID-19, and Montreal COVID-332 19 CXR collections. We also augmented the baseline with the COVID-19 CXR collections to study 333 their effect on improving performance with the baseline test data. The performance metrics achieved 334 with the baseline test data using different combinations of the augmented training data is shown in 335

347
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 8, 2020.

353
We also studied the effect of weakly labeled data augmentation with the test data from Twitter 354 and Montreal COVID-19 CXR collections. The results are as shown in Table 6.

378
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 8, 2020.

404
Unlike the degraded performance of the model trained on non-augmented data that failed to 405 localize salient ROI in a test CXR showing COVID-19 viral pneumonia, as observed from Fig. 4, the 406 model trained on the augmented baseline with COVID-19 CXRs from one collection delivered 407 superior localization performance with the test CXR samples from the other collection. Fig. 6a shows 408 the learned interpretation of these trained models in the form of heat maps and class activation maps.

409
It is observed that the models are correctly focusing on the salient ROI, matching with the GT         CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.04.20090803 doi: medRxiv preprint