A collaborative online AI engine for CT-based COVID-19 diagnosis

Artificial intelligence can potentially provide a substantial role in streamlining chest computed tomography (CT) diagnosis of COVID-19 patients. However, several critical hurdles have impeded the development of robust AI model, which include deficiency, isolation, and heterogeneity of CT data generated from diverse institutions. These bring about lack of generalization of AI model and therefore prevent it from applications in clinical practices. To overcome this, we proposed a federated learning-based Unified CT-COVID AI Diagnostic Initiative (UCADI, http://www.ai-ct-covid.team/), a decentralized architecture where the AI model is distributed to and executed at each host institution with the data sources or client ends for training and inferencing without sharing individual patient data. Specifically, we firstly developed an initial AI CT model based on data collected from three Tongji hospitals in Wuhan. After model evaluation, we found that the initial model can identify COVID from Tongji CT test data at near radiologist-level (97.5% sensitivity) but performed worse when it was tested on COVID cases from Wuhan Union Hospital (72% sensitivity), indicating a lack of model generalization. Next, we used the publicly available UCADI framework to build a federated model which integrated COVID CT cases from the Tongji hospitals and Wuhan Union hospital (WU) without transferring the WU data. The federated model not only performed similarly on Tongji test data but improved the detection sensitivity (98%) on WU test cases. The UCADI framework will allow participants worldwide to use and contribute to the model, to deliver a real-world, globally built and validated clinic CT-COVID AI tool. This effort directly supports the United Nations Sustainable Development Goals’ number 3, Good Health and Well-Being, and allows sharing and transferring of knowledge to fight this devastating disease around the world.

Initiative (UCADI, http://www.ai-ct-covid.team/), a decentralized architecture where the AI 48 model is distributed to and executed at each host institution with the data sources or client ends 49 for training and inferencing without sharing individual patient data. Specifically, we firstly 50 developed an initial AI CT model based on data collected from three Tongji hospitals in Wuhan. 51 After model evaluation, we found that the initial model can identify COVID from Tongji CT test 52 data at near radiologist-level (97.5% sensitivity) but performed worse when it was tested on 53 COVID cases from Wuhan Union Hospital (72% sensitivity), indicating a lack of model 54 generalization. Next, we used the publicly available UCADI framework to build a federated 55 model which integrated COVID CT cases from the Tongji hospitals and Wuhan Union hospital 56 (WU) without transferring the WU data. The federated model not only performed similarly on 57 Tongji test data but improved the detection sensitivity (98%) on WU test cases. The UCADI 58 framework will allow participants worldwide to use and contribute to the model, to deliver a 59 real-world, globally built and validated clinic CT-COVID AI tool. This effort directly supports 60 the United Nations Sustainable Development Goals' number 3, Good Health and Well-Being, 61 and allows sharing and transferring of knowledge to fight this devastating disease around the 62 world. 63 Introduction modality to detect viral nucleotide in specimens from patients with suspected COVID-19 66 infection and remained as the gold standard for active disease confirmation. However, due to the 67 greatly variable disease course in different patients, the detection sensitivity is only 60%-71% 1-3 68 leading to considerable false negative results. These symptomatic COVID 19 patients and 69 asymptomatic carriers with false negative RT-PCR results pose a significant public threat to the 70 community as they may be contagious. As such, clinicians and researchers have made 71 tremendous efforts searching for alternative and/or complementary modalities to improve the 72 diagnostic accuracy for COVID-19. 73 COVID-19 patients present with certain unique radiological features on chest computed 74 tomography (CT) scans including ground glass opacity, interlobular septal thickening, 75 consolidation etc., that have been used to differentiate COVID-19 from other bacterial or viral 76 pneumonia or healthy individuals 4-7 . CT has been utilized for diagnosis of COVID-19 in some 77 countries and regions with reportedly sensitivity of 56-98% 2,3 . However, these radiologic 78 features are not specifically tied to COVID-19 pneumonia and the diagnostic accuracy heavily 79 depending on radiologists' experience. Particularly, insufficient empirical understanding of the 80 radiological morphology characteristic of this unknown pneumonia resulted in inconsistent 81 sensitivity and specificity by varying radiologists in identifying and assessing COVID-19. A 82 recent study has reported substantial differences in the specificity in differentiation of COVID-19 83 from other viral pneumonia by different radiologists 8 . Meanwhile, CT-based diagnostic 84 approaches have led to substantial challenges as many suspected cases will eventually need 85 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint intelligent automatic method to help to address the clinical deficiency in current CT approaches. 87 Successful development of an automatic method depends on a tremendous amount of imaging 88 data with high quality clinical annotation for training an artificial intelligence (AI) model. We 89 confronted several challenges for developing a robust and universal AI tool for precise COVID-90 19 diagnosis: 1) data deficiency. Our high-quality CT data sets were only a small sampling of the 91 full infected cohorts and therefore it is unlikely we captured the full set radiological features. 2) 92 data isolation, Data derived across multiple centers was difficult to transfer for training due to 93 security, privacy, and data size concerns. and 3) data heterogeneity. Datasets were generated by 94 different scanner machines which introduces an additional layer of complexity to the training 95 because every vendor provides some unique capabilities. Furthermore, it is unknown whether 96 COVID-19 patients in diverse geographic locations, ethnic groups, or demographics show 97 similar or distinct CT image patterns. All of these may contribute to a lack of generalization for 98 an AI model, which a serious issue for a global AI clinical solution. 99 To solve this problem, we propose here a Unified CT-COVID AI Diagnostic Initiative (UCADI) 100 to deliver an AI-based CT diagnostic tool. We base our developmental philosophy on the 101 concept of federated learning, which enables machine learning engineers and medical data 102 scientists to work seamlessly and collectively with decentralized CT data without sharing 103 individual patient data, and therefore every participating institution can contribute to AI training 104 results of CT-COVID studies to a continuously-evolved and improved central AI model and help 105 to provide people worldwide an effective AI model for precise CT-COVID diagnosis (Fig.1). 106 107 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 19, 2020. Next, we validated the predictive performance of the CNN through a classification task: four-122 class pneumonia partition-four featured clinical diagnoses in determining suspected cases of 123 COVID-19. This task aimed at distinguishing COVID-19 ( Fig. 3. i) from three types of non-124 COVID-19 ( Fig. 3. ii) including other viral pneumonia, bacterial pneumonia, and healthy cases 125 (d, e, and f in Fig. 3). We selected 20% of 1036 CT cases in training and validation set for 5-fold 126 cross-validation. The CNN demonstrated the validation result that achieved overall sensitivity of 127 77.2% and specificity of 91.9%. 128 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint from three types of non-COVID-19 cases whereas six radiologists obtained the average 79% in 136 sensitivity (87.5%, 90%, 55%, 80%, 68%, 93%, respectively, and 90% for the maximal voting 137 value among six radiologists), and 90% in specificity (92%, 97%, 89%, 95%, 88%, 79%, 138 respectively, and 95.6% for the maximal voting value) (Fig 4). In the Tongji dataset, the CNN 139 shows performance approaching that of expert radiologists. To examine the reliability of the 140 model, we performed class activation mapping (CAM) analysis for raw CT images in both 141 validation and test datasets 9  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint performed comparable sensitivity-specificity to all six radiologists in differentiating COVID-19 153 from non-COVID-19 cases (Fig. 4a). Meanwhile, the CNN also performed equivalent 154 sensitivity-specificity in comparison with average radiologists in the assessment of three 155 severities (e, f, g in Fig. 4). However, the CNN revealed insufficient capability in determining 156 other viral pneumonia (Fig. 4b), bacterial pneumonia (Fig. 4c), and healthy case (Fig. 4d). 157 To test the generalization of the initial model that was trained exclusively on data from Tongji 158 hospitals, we evaluated the predictive performance using CT data from 100 confirmed COVID-159 19 cases generated at Wuhan Union hospital. The accuracy of the model was only 72%, 160 compared with a 97% sensitivity using reserved testing data from Tongji hospitals. This 161 demonstrated a lack of generalization for the initial model. 162 The global online AI diagnostic engine enabled with federated learning 163 To overcome the hurdle, we proposed a federated learning framework to facilitate UCADI, a 164 global joint effort to generate an AI based on large scale date and integration of diverse ethnic 165 patient groups. In the traditional AI approach, sensitive user data from different sources are 166 gathered and transferred to a central hub where models are trained and generated. The federated 167 learning proposed by Google 10 , in contrast, is a decentralized architecture where the AI model is 168 distributed to and executed at each host institution with the data sources or client ends for 169 training and inferencing. The local copies of the AI model on the host institution eliminate 170 network latencies and costs incurred due to sharing large size of data with the central server. 171 Most importantly, the strategy privacy preserved by design enables medical centers collaborating 172 on the development of models, but without need of directly sharing sensitive clinical data with 173 each other. 174 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. participants, which enables continuation of local training. The framework is highly flexible, 185 allowing hospitals join or leave the UCADI initiative at any moments, because it is not tied to 186 any specific data cohorts. 187 With the framework, we deployed two experiments to validate federated learning concept on the 188 CT COVID data. Firstly, we trained three models for each of three Tongji hospital datasets, and 189 then transferred the datasets to three physically independent computer servers, respectively, and 190 trained a Tongji federated model in a simulation mode (detailed in Methods). As shown in Figure  191 4. e-h, the federated model performed close to the centralized-trained initial model and better We need a global joint effort to fight the virus. The first challenge we have confronted in this 206 war is to deliver is deliver people precise and effective diagnosis. In this study, we introduce a 207 globally collaborative AI initiative framework, UCADI, to assist radiologists, streamline, and 208 accelerate CT-based diagnosis. Firstly, we developed an initial CNN model that achieved a 209 performance comparable to expert radiologist in classifying pneumonia to identify COVID-19, 210 and additionally assessing the severity of identified COVID-19. Furthermore, we developed a 211 federated learning framework, based on which hospitals worldwide can join UCADI to jointly 212 train an AI-CT model for COVID-19 diagnosis. With CT data from multiple Wuhan hospitals, 213 we confirmed the effectiveness of this the federated learning approach. We have shared the 214 initial model and the federated learning programmatic API source code 215 (https://github.com/HUST-EIC-AI-LAB/) and encourage hospitals worldwide join UCADI to 216 form an international collaboration to fight the virus with a globally trained AI application. It is 217 worth noting that there is still need for improvement in the technical implementation in the 218 framework: 1) The number of local training iterations before global parameter updating. The 219 number of local training iterations has a direct influence on the training efficiency, effectiveness, 220 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint three Tongji Hospitals. In addition, we collected an independent cohort including 507 COVID-19 246 pneumonia CT cases confirmed by chest CT from Union Hospital, Wuhan, China. The cohort 247 was used for testing the performance of initial model and the multi-hospital model using 248 federated learning framework. 249 We conducted image processing of the raw CT image data to reduce computing burdens. We 250 utilized a sampling method to select 5 subsets of CT slices from all sequential images of one CT 251 case using random starting positions and scalable sampling intervals on transverse view to 252 picture the infected lung regions. All 5 processed subsets were separately fed to the CNN to 253 obtain average predictive probabilities, which can effectively include impacts of different levels 254 of lung from all CT slices. To further improve computing efficiency, we resized each slice from 255 512 to 128 pixel regarding its width and height and rescaled the lung windows of CT to a range 256 from -1200 to 600 and normalized them via the Z-score means before feeding the CNN. 257

Building AI model using pooled data 258
The dataset was split out into the training and validation set with 1036 cases ( 10 Sino-French New City Hospital, 12 Optical Valley Hospital). We particularly considered the 265 balanced data distribution of 4 classes in test set. We initially trained a four-class CNN (Fig. 2) 266 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint based on 3D-Densenet 12 , a densely connected convolutional network, which performed 267 remarkable advantages in classifying CT images. We customized its architecture to contain 14 268 3D-convolution layers distributed in 6 dense blocks and 2 transmit blocks (Fig. 2b  For all experiments, we used the same architecture (3D-Densenet) with data-centralized training 286 and the same set of local training hyperparameters for all clients with SGD optimizer: batch size 287 of 35, learning rate of 0.01, momentum of 0.9 and weight decay of 5e-4. In experiment I, we set 288 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint means each client train with its local data once before sending information to central 290 server(cloud). We conducted the training process utilizing a workstation equipped with 3 Tesla 291 V100 GPUs, costing 16 hours to finish. In experiment II, we set the number of federated rounds Privacy-preserving setup: 303 We use a variant of additively homomorphic encryption to achieve privacy-preserving, which 304 called Learning with Errors (LWE)-based encryption. The encryption method allows us to leak 305 no information of participants to the honest-but-curious parameter (cloud) server. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020.  . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 19, 2020. . https://doi.org/10.1101/2020.05.10.20096073 doi: medRxiv preprint Figure 2 | Data and strategy. a, number of CT studies and total images. b, the CNN was developed based on 3D-Densenet, consisting of 6 dense blocks in green, 2 transmit blocks in white and an output layer in gray. Preprocessed 128-x-128-pixel CT images of one case were fed to the network across 14 3D-convolution layers and a number of functions embedded in 3D blocks, finally received the predicted classification result. c, the CNN classified CT case into 4 types and further assessed the severity into I or II or III if the case was predicted as COVID-19.