Identifying those at risk of reattendance at dischargefrom emergency departments using explainablemachine learning

Short-term reattendances to emergency departments are a key quality of care indicator. Identifying patients at increased risk of early reattendance can help reduce the number of patients with missed or undertreated illness or injury, and could support appropriate discharges with focused interventions. In this manuscript we present a retrospective, single-centre study where we create and evaluate a machine-learnt classifier trained to identify patients at risk of reattendance within 72 hours of discharge from an emergency department. On a patient hold-out test set, our highest performing classifier obtained an AUROC of 0.748 and an average precision of 0.250; demonstrating that machine-learning algorithms can be used to classify patients, with moderate performance, into low and high-risk groups for reattendance. In parallel to our predictive model we train an explanation model, capable of explaining predictions at an attendance level, which can be used to help inform the design of interventional strategies.

. Segregation of the study data into training and the two hold-out test sets. Discarded attendances were those that occurred in either the first 30 days or last 72 hours of the temporal test, to avoid information leakage between the training and temporal test set and because the reattendance status could not be robustly calculated for attendances occurring in the last 72 hours of the dataset. Reattendance rates (bottom row of shaded boxes) display the observed 72-hour reattendance rate for each cohort. manuscript we discuss a machine-learnt model, utilizing data extracted from historical (coded, inpatient) discharge summaries, 38 alongside contemporary attendance recorded clinical data such as observations and results of standard triage processes, to 39 identify patients at increased risk of short-term reattendance following an emergency department attendance. In addition to our 40 predictive model, we construct an explanation model which allow us to evaluate the trends our model has learned and explain 41 our model's prediction at an attendance level.  Table 1. Most frequently occurring ICD10 codes for attendances in the training set. The left column denotes the noted conditions (as specified by ICD10 codes) and the right column the number of attendances in the training set noted to have this condition. A given condition is only associated with a small fraction of attendances, but in total 38.9 % of attendances resulting in discharge have at least one associated condition. Conditions are generated by extracting from the (ICD10) coded discharge sumaaries held in a patients electronic health record.
(see Supplementary Figure 2 for further details). This is then dichotomized (less than 72 hours) to annotate each attendance 64 with whether the discharge was followed by another attendance by the same patient within 72 hours. This formulation allowed 65 us to frame the predictive task as a binary classification problem.

66
Predictive modelling 67 We separated our data into a training set and two independent test sets ( Figure 1). relation between patients in each dataset is displayed in Supplementary Figure 1, demonstrating patient exclusivity between the 75 training set and the patient hold-out test set. The temporal test set is discussed in the Supplementary Information only.

76
As our machine-learnt classifier, we used a gradient boosted decision tree as implemented in the XGBoost framework 16 . variable to the feature set, where the variable added selected was the one which increased the CV score by the largest amount. 90 We sequentially added variables to the feature set in this manner until all variables were included in the feature set. The optimal 91 feature set for our final model was selected by the set that yielded the highest CV score (Supplementary Table 2).

92
We evaluate our final model performance (the average output of the five models trained during cross-validation) on the two  Model explainability 96 To explain the predictions of our model we make use of the TreeExplainer algorithm in the SHAP Python library [17][18][19] . All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table 2. Performance on the validation set for models using individual features (models a-n) and sets of features (models o and p). Metrics are evaluated on the training set using grouped 5-fold CV at the patient level and we report the mean of the metric across the five validation folds. All models hyperparameters were tuned as described in the methods section to optimize the CV AUROC. 2.8-3.3%) reattendance rate). When we included the full one-hot encoded matrix denoting whether the patient had a history of 130 the given condition, our model (model n, with other studies that previous emergency department usage is an important consideration when considering a patient's 135 reattendance risk 8 . Three models (models k, j, and l, Table 2) make use of coded information describing the reason for the  Finally, we investigated models using larger feature sets combining variables (models o and p in Table 2). Firstly, we trained 143 a model using only the three variables which were most predictive in unison, as determined by our greedy feed forward feature 144 selection process (see Methods and Supplementary Table 2). This model (model o in Table 2) used just the condition indicators, 145 the chief complaint recorded at triage, and the number of times the patient visited the ED in the previous 30 days. Ultimately, it 146 obtained a validation AUROC of 0.742, demonstrating that using multiple variables is more predictive of reattendance than a 147 single variable. We also evaluated our highest performing model, as determined by our feature selection process, which used 148 eight more of the available variables (diagnosis, condition count, hour of day, Manchester Triage Score, arrival mode, week day, 149 triage discriminator and age). Despite using several more variables, the model's validation AUROC only increased to 0.753.

150
Next, we applied our final model (model p, Table 2) on the patient wise hold-out test set, the evaluation of which is presented  To investigate what our model has learned we made use of the TreeExplainer algorithm 18 ; a demonstration of the global 156 explanation of our reattendance model is presented in Figure 3. In Figure 3 a the SHAP values (which quantify, at an instance  coloured by the patient's condition count at the given attendance. This embedding was created by clustering the prediction explanations (generated using the TreeExplainer algorithm) for each emergency department attendance using the UMAP algorithm. Generally, closer data points share a more similar explanation for their predicted reattendance risk.
in production. Conversely, this does mean that the model will not necessarily generalize to different EDs without first training on their local data, this will be particularily prominient in EDs with a catchment zone with very different demographics to 258 Southampton, which would have a differing disease prevalence and characteristics at presentation to the emergency department.

259
Despite this, since our model contains variables either in the standard UK emergency care dataset or regularily available to EDs 260 nationally, it is possible to evaluate this model directly in other EDs with little alteration. External validation of our model 261 using data from different EDs is essential before prospective deployment beyond the department at which the training data was 262 sourced.

264
In conclusion, we have constructed and retrospectively evaluated a gradient boosted decision tree classifier capable of predicting