Distributed Learning from Multi-Site Observational Health Data for Zero-Inflated Count Outcomes

Background Multi-site studies facilitate the study of rare outcomes or exposures through integrating patient information from several distinct care sites. Due to patient privacy concerns, sharing of patient-level information among collaborating sites is often prohibited, suggesting a need for privacy-preserving data analysis methods. Several such methods exist, but have been shown to sometimes result in biased estimation or require extensive communication among sites. Objective We present a communication-efficient, privacy-preserving method for performing distributed regression on Electronic Health Records (EHR) data across multiple sites for zero-inflated count outcomes. Our approach is motivated by two real-world data problems: examining risk factors associated with pediatric avoidable hospitalization and modeling frequency of serious adverse events in colorectal cancer patients. Methods We use hurdle regression, a two-part (logistic-Poisson) regression model, to characterize the effects of risk factors on zero-inflated count outcomes. We develop a one-shot algorithm for performing hurdle regression (ODAH) across multiple sites, using individual patient data at one site and aggregated data from all other sites to approximate the complete data log likelihood. We evaluate ODAH through extensive simulations and an application to EHR data from the Children's Hospital of Philadelphia (CHOP) and the OneFlorida Clinical Research Consortium. We compare ODAH estimates to those from meta-analysis and pooled analysis (all patient data pooled together, the gold standard). Results In simulations, ODAH estimates exhibited bias relative to the gold standard of less than 0.1% across several settings. In contrast, meta-analysis estimated exhibited relative bias up to 12.7%, largely dependent on event rate. When applying ODAH to CHOP data, relative biases for estimates in both components of the hurdle model were less than 5.1%, while meta-analysis estimates exhibited relative bias as high as 63.6%. When analyzing OneFlorida data, ODAH relative biases were less than 10% for eight of the ten estimated coefficients, while meta-analysis estimates again showed substantially greater bias. Conclusions Our simulations and real-world applications suggest ODAH is a promising method for performing privacy-preserving distributed learning on EHR data when modeling zero-inflated count outcomes.

emergency department visit [3]. In scenarios with zero-inflated count outcomes, it is sometimes postulated that the processes generating zero and positive counts are systematically different, defined by distinct distributions. For these cases, data can be represented with a hurdle model, which consist of two components: one for estimating the probability of a non-zero count (typically a logistic regression), and another for estimating a count given that it is greater than zero (typically a zero-truncated count model, commonly Poisson or Negative Binomial regression) [2]. The sequential nature of the hurdle model allows for each component's set of parameters to be estimated separately, benefitting computational efficiency.
A chief concern regarding EHR use is patient privacy. EHRs contain sensitive patientlevel identifiers, often preventing data sharing across institutions. This leads to many clinical data analyses performed at single sites; these analyses are often underpowered, with results specific to a certain sub-population due to choice of site.
In light of the limitations of single-site analyses, many have stressed the importance of multi-site studies [4][5][6]. Multi-site studies allow for integration of clinical information from several sites by using distributed data networks (DDNs). DDNs, made up of several health care institutions, are designed to assist analyses of medical product safety and comparative effectiveness research, among many other uses [6]. By integrating patient information from several sources, DDNs facilitate the study of rare outcomes or exposures, often featuring larger and more inclusive samples from the target population. An example of a successful DDN is the Observational Health Data Sciences and Informatics (OHDSI) consortium, whose primary purpose is to develop open-source tools to be shared across multiple centers for use in collaborative observational health data research [7]. Another successful DDN is PEDSnet (pedsnet.org), a multi-site network made up of eight large pediatric health systems [8]. PEDSnet contains patient information for over 6 million children, offering substantial opportunity to enhance the quality of pediatric EHR research.
Ideally, one would analyze data in a multi-site study by pooling all patient-level data together at a central site prior to analysis. This is not always possible due to concerns regarding patient privacy and confidentiality; the Health Insurance Portability and Accountability Act of 1996 (HIPAA) established a privacy rule to regulate use of protected health information (PHI) often found in EHRs, requiring de-identification of PHI prior to secondary use in biomedical research [9]. De-identified PHI have proven to be susceptible to re-identification, causing concern among patients [ [10][11].

Existing Methods for Privacy-Preserving Data Analysis
There are several established methods for performing privacy-preserving data analysis aside from using de-identified data alone. A common approach for analyzing data in multi-site studies is to use meta-analysis, where only aggregate established measures are shared across sites. Meta-analysis is a long-standing popular choice for privacy-preserving analysis, notably in several OHDSI studies [12][13][14]. While relatively easy to implement, meta-analysis has been shown to result in biased or imprecise effect estimates in the context of rare outcomes or exposures, as well as with smaller sample sizes [15]. Another favorable option is to use distributed regression, primarily designed to break up computationally-intensive tasks into smaller, more manageable tasks. Each of the smaller tasks is computed in parallel at separate centers without sharing raw data. One example of a distributed algorithm is WebDISCO [15,18]. Like the aforementioned methods, ODAL and ODAC preserve privacy by avoiding data sharing at the observation level, but offer an advantage in terms of efficiency, requiring only one or two rounds of communication between centers.

Goal of This Study
We build upon the framework of communication-efficient distributed algorithms and present an algorithm for hurdle regression, ODAH (One-shot Distributed Algorithm for performing Hurdle regression), for modeling zero-inflated count data. We evaluate ODAH through an extensive simulation study before applying our method to two real-world data use cases: analyzing risk factors of pediatric avoidable hospitalization using data from the Children's Hospital of Philadelphia and modeling serious adverse event frequency for colorectal cancer patients using data from the OneFlorida Clinical Research Consortium.

Poisson-Logit Hurdle Model
A hurdle model is an altered count model in which the processes of generating zero and positive counts are not constrained to be the same, designed to cope with zero-inflated count outcomes [2]. To derive the hurdle model, we consider the two processes making up the model independently. First, we model the proportion of zero counts with a Bernoulli process using a logit link. Let ! , , … , " ∈ {0,1} be independent realizations of a binary response variable W, such that ( # = 1) = # and ( # = 0) = 1 − # . The logistic model of the probability # is modeled as a linear combination of explanatory variables X and regression coefficients : Next, positive counts are modeled using a zero-truncated Poisson model. Let Thus, # can interpreted as the probability that the "hurdle is crossed", resulting in a non-zero count. In the context of zero-inflated counts, we assume ( # = 0) is much greater than ( # > 0).
For observations where the realization from the logistic model is 1, positive counts follow a zero-truncated Poisson distribution such that . Thus, we can write the mixture probability mass function of the Poisson hurdle model as Modeling the rate parameter # using a log link, we can express the log of # as a linear combination of explanatory variables Z and regression coefficients : All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
( Figure 1 here.) We write the log-likelihood of the Poisson hurdle model as ( , ) = ! ( ) + % ( ), with Note that this factors into two components such that b and g are separable; the Hessian matrix is block diagonal, so b and g are information orthogonal. Thus, there will not be any loss of information in estimating each set of parameters separately. This property is useful in the context of distributed regression, reducing computational complexity.
While less common than traditional regression models for count data, hurdle models have been used successfully in various health contexts with substantial zero inflation. For instance, Negative Binomial -Logit hurdle models were utilized to estimate risk of vaccine adverse events for clinical trial participants, as well as to estimate cigarette and marijuana use among youth ecigarette users [19,20]. Hurdle regression has also been used in other specialized contexts, such as in estimating spatiotemporal patterns of emergency department use and quantifying association between preventive dental behaviors and caries prevalence [21,22]. Contrary to zeroinflated Poisson or Negative Binomial regression models, hurdle models have only one source of zero counts, indicating that all individuals in the study sample are at risk of a particular outcome.
This is more appropriate in many clinical settings, offering improved interpretation of estimated model coefficients.

Parameter Estimation using Distributed Hurdle Regression: ODAH
Suppose we have clinical data stored in K sites, where the j th site has a sample size nj and the total sample size across sites is = ∑ preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint where U and U are initial estimates for the algorithm. Here, !! ( ) and %! ( ) are loglikelihoods computed using patient-level data at the lead site for the logistic and zero-truncated components, respectively. The terms All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint Well-chosen U and U will increase the accuracy of Y and _, respectively. In this work, U and U are estimates computed from performing a fixed-effects meta-analysis using all K sites, or inverse-variance weighted sums of estimates from the K studies, i.e. for U (with U similar), When using a meta-analysis estimate to initiate ODAH, two non-iterative rounds of communication are necessary for transferring information across sites; thus, our approach is considered a one-shot approach for performing distributed inference. ODAH requires each collaborating site to first fit the hurdle model of interest using its own data before sending parameter point and variance estimates to the lead site. A user at the local site can then initiate ODAH by, following its own hurdle model fitting, computing initial estimates using metaanalysis before sending these estimates to the collaborating sites for computing gradients. These gradients are then sent to the lead site to construct the surrogate log likelihood function. Using only gradients and patient-level data from the lead site, we obtain parameter estimates calculated from maximizing each surrogate likelihood function with respect to the parameter of interest.
The ODAH algorithm is outlined in detail below.
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint Algorithm:   preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Simulation Study
To evaluate ODAH empirically in a controlled setting, we conducted a simulation study to primarily compare the performance of ODAH to that of meta-analysis, which does not incorporate any subject-level data. We additionally compare the performance ODAH to that when using only lead site data and all subject-level data (pooled -the gold standard).
In our simulations, a count outcome Y was associated with two risk factors, ! and % . ! was generated using a truncated Normal distribution emulating the number of primary care visits per year for each patient in the Children's Hospital of Philadelphia (CHOP) data ( !~( 3, 2), ! ∈ (0, 18)), while % was generated using a Bernoulli distribution with the probability of success representing that of public insurance use among CHOP patients in our sample ( %~( 0.33)). Our covariate of interest was % , with ! assumed to be a confounder. was generated from the Poisson-Logit hurdle described above. Motivated by our rare-event real-world data applications, we primarily sought to examine how varying levels of low outcome prevalence and event rate affect the performance of ODAH relative to pooled analysis. We explored four rare-event prevalence settings while holding event rate constant at 0.03 (event rate in CHOP data): 5%, 2.5% (near CHOP data prevalence), 1%, and 0.5%. To evaluate the effect of event rate on method performance, we explored additional event rates of 0.25, 0.01, and 0.005 while holding outcome prevalence constant at 2.5%.
In all settings, we fixed the number of sites K = 10 and total population size = 200,000. In settings outcome prevalence or event rate vary, we set ! = % = ⋯ = !2 , so all sites had the same number of observations. We also explored the effect of the lead site being larger than the All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
For each setting, we evaluated estimation accuracy using lead site data, meta-analysis, ODAH, and pooled data across 1,000 simulations in terms of relative bias to the pooled estimate.

Real-World Data Applications
To examine the performance of ODAH on real world data, we applied our algorithm to two real-world use cases featuring zero-inflated count outcomes.

Children's Hospital of Philadelphia: Pediatric Avoidable Hospitalization
The CHOP system provides care to about 400,000 children per year and includes a large, multi-state outpatient network, as well as one of the largest inpatient facilities for pediatric patients residing in the greater Philadelphia region. Data for this study were extracted from the CHOP EHR system for outpatient, emergency department, and inpatient visits for patients with at least two primary care facility visits from January 2009 to December 2017.
To mimic a scenario in which different sites do not have access to patient-level information at other sites, we assigned patients to the primary care site they attended most often during the study period and carried out analysis as if patient-level information could not be shared across primary care sites. In total, patients were assigned to 27 different primary care sites; we selected six of these sites to illustrate our method, made up of 70,818 patients ( Table 2). The largest site of these six, Site 4, was chosen to be the lead site.
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; For this study, we sought to examine risk factors associated with pediatric avoidable hospitalization (AH); about one-third of pediatric healthcare costs are associated with hospital admissions, the majority of which are unplanned [24]. Unplanned hospitalizations associated with a diagnosis treatable at the primary care level are considered avoidable [25]. By studying which risk factors are most strongly associated with AH, hospital systems can identify patient subpopulations for which primary care should be improved, ideally leading to an overall reduction in hospital costs or admissions [26]. Because pediatric hospitalization is uncommon, integrating data across hospital systems can lead to more robust inference, increasing power to detect differences in rates of AH among patients.
To evaluate ODAH, we modeled total number of AHs given a collection of EHR variables: gender, race (Caucasian or other), mean age (across all visits), primary care visits per year, and insurance type (public or private). While the majority of patients who experience an AH in these data only experience one, 22% experience more than one, suggesting an advantage of using Poisson regression over logistic regression alone ( Figure 3). This, combined with substantial zero-inflation, makes Poisson-Logit hurdle regression an appropriate method for modeling these data. The probability of at least one AH is estimated using a logistic regression model. For patients with at least one AH, the total number of hospitalizations is estimated using a zerotruncated Poisson regression model.
( Table 2  preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  [27]. Figure 4 shows the geographic locations of OneFlorida partners.
( Figure 4 here) To apply ODAH to OneFlorida data, we identified a study population of patients with colorectal cancer (CRC) and who use FOLFIRI, an FDA-approved standard of care first line chemotherapy treatment in patients with metastatic CRC, as their CRC treatment. We focused on assessing drug safety in terms of the occurrences of serious adverse events (SAEs). To define an SAE, we followed the FDA definition of SAEs and the Common Terminology Criteria for Adverse Events (CTCAE) v 5.0, and the number of SAEs were counted for each patient within 180 days after first FOLFIRI prescription [28]. We removed the chronic conditions that occurred before prescription. A set of covariates and risk factors for all patients were extracted from patients' medical records for this analysis, including patients' demographic variables (age, race, Hispanic ethnicity status, and gender) on the day of CRC diagnosis. We also calculated each patient's Charlson comorbidity index (CCI) using their medical history.
Since OneFlorida data are centralized, we were able to both carry out analysis as if patientlevel information could not be shared across clinical sites (as was done in our CHOP application) All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint as well as fit a hurdle regression model using pooled analysis, which served as the gold standard.
In total, our analysis included 660 patients from three clinical sites, with Site 3 being the largest and serving as the local site (Table 3). To evaluate ODAH using these data, we modeled serious adverse event frequency given the extracted clinical information noted above for each patient.
( Table 3 here) As in our simulations, we evaluated method performance by calculating relative bias to the pooled estimate for lead site analysis, meta-analysis, and ODAH. To estimate variance of ODAH parameter estimates, we used the inverse of the Hessian matrix produced when optimizing the surrogate log likelihood function of each hurdle model component.

Simulation Results
Figure 5 depicts simulation results from evaluating method performance across all scenarios described in Table 2. Across settings, there was no discernable difference in method performance for estimating % , the regression coefficient associated with % in the logistic component of the hurdle model. We therefore present the simulation results for estimating % , leaving % estimation results for the Supplement. Due to select iterations resulting in outlying estimates when using lead site analysis, the median bias for the lead site estimate across iterations is reported rather than the mean.
( Figure 5 here) When lead site size and event rate were fixed at 20,000 and = 0.03, respectively, we varied outcome prevalence to see how each method performed relative to pooled analysis, the gold standard ( Figure 4A). In all prevalence levels examined, ODAH performed nearly as well as All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint pooled analysis, with negligible difference in the estimate's bias and variance; bias in the ODAH estimate relative to the pooled estimate was less than 0.1% for each prevalence level.
When lead site size and outcome prevalence were fixed at 20,000 and 2.5%, respectively, we varied event rate to examine its impact on estimating % in a low prevalence setting ( Figure 4B).
For all methods, variance of estimates decreased with increasing event rate. ODAH and metaanalysis estimates were nearly identical to pooled estimates when events rates were set to = 0.25 and 0.03, exhibiting negligible bias relative to the pooled estimate (ODAH bias < 0.1%, meta-analysis bias < 1.9%). When the event rate was set to = 0.01 and 0.005, ODAH again exhibited negligible relative bias (< 0.1%) but meta-analysis exhibited larger bias relative to the pooled estimate (4.57% and 12.7%, respectively). Lead site analysis exhibited the largest variance of all methods examined, maintaining relatively low relative bias to the pooled estimate when = 0.25, 0.03 and 0.01 (< 1.1%) but larger bias when = 0.005 (5.31%).
When examining the effect of increasing lead site size while fixing outcome prevalence and event rate at 2.5% and = 0.03, respectively, there was not substantial evidence for lead site size affecting ODAH or meta-analysis performance relative to pooled analysis ( Figure 4C).
Variance of lead site analysis estimates decreased with increasing lead site size. Meta-analysis estimates were more biased, with relative bias ranging from 4.15% (gender covariate) to 63.6% (primary care visits per year covariate).

Log relative risk estimates (corresponding to the Poisson component of the hurdle model)
were nearly identical when using ODAH and pooled analysis. Meta-analysis performed similarly to ODAH across all coefficients, but ODAH always achieved the smaller relative bias to pooled estimates. ODAH relative bias was < 0.50% for all covariates, while meta-analysis relative bias ranged from 5.89% (PC visits per year) to 11.7% (race).

SAE (OneFlorida) Results
Results from using ODAH to model SAE frequency in colorectal cancer patients using data from OneFlorida are shown in Figure 7, displayed similarly to the CHOP AH results. In this application, we again see our method performing well in terms of relative bias to pooled estimation. For four of the five log odds ratios estimated in the logistic component of the hurdle model, relative biases produced by ODAH were less than 7%. The lone exception, the gender coefficient, reflected greater bias due to its near-zero effect size (reflecting an odds ratio of 1).
Similar results were observed in the zero-truncated Poisson component, with relative biases to the pooled estimates less than 10% for four of the five estimated log relative risks. The age All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint coefficient had higher relative bias, again due to negligible effect size. In both components, meta-analysis tended to do poorer relative to pooled estimation. The largest difference in estimation can be seen in the coefficients reflecting association of SAE frequency with Hispanic ethnicity, where relative bias was 71% in the logistic component and 276% in the Poisson component (compared to 5.3% and 1.8% for ODAH, respectively). (Figure 7 here)

Discussion
We introduced a non-iterative, privacy-preserving algorithm for performing distributed hurdle regression with zero-inflated count outcomes. As demonstrated by simulations and a realworld EHR application, our method consistently produced parameter estimates comparable to and sometimes better than those produced by meta-analysis. Our method's utility is especially evident in settings featuring a count outcome with severe zero-inflation and very low event rate, as we demonstrated the tendency of only meta-analysis to produce biased estimates under these circumstances.
There are many advantages to using our method for performing privacy-preserving data analysis. By using a form of distributed regression, our approach is generally well-suited for multi-site studies which are on-going. The surrogate likelihood method takes advantage of patient-level data still being accessible by collaborating sites, allowing collaborators to engage in limited inter-site communication to produce less biased results than would be obtained via metaanalysis, which is better suited for studies that are already completed. Further, most existing distributed regression techniques require iterative communication among sites to produce accurate estimates. ODAH requires two rounds of non-iterative communication between the local site and all other sites before surrogate likelihood functions can be maximized to obtain All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint accurate, precise parameter estimates. This is particularly advantageous in big data settings, where iterative procedures have a high computational burden in terms of memory and processing time. Additionally, due to the separability of hurdle model components, each component's likelihood function can be maximized independently, reducing computational complexity.
Our simulation results suggest that lead site size relative to total population size does not have a discernable effect on any method performance outside of analysis only using data at the lead site. However, since the surrogate log likelihood function only uses individual-level data stored in the lead site, we recommend that the lead site is as large as possible; this ensures the surrogate likelihood is a close approximation to the complete data log likelihood.
In terms of limitations when using ODAH, the main limitation is the requirement of relative homogeneity among the data to be analyzed. This is an implication of the surrogate likelihood construction, which approximates the complete data log likelihood in part by using a samplesize-weighted sum of gradients. This implicitly assumes that study data are independent and identically distributed across all sites, which is not the case in real-world settings. As evidenced by Figure 8, geographical heterogeneity among the patient population can occur in the covariates, with some locations having substantially different demographic makeups than others.
We recommend those who implement ODAH ensure patient demographics are largely similar across institutions, or alternatively perform subgroup analysis for several relatively homogeneous subsets of institutions. Additionally, the Poisson component of our method does not currently account for over-dispersion in the outcome; in our real data application, we did not find strong evidence of dispersion. Finally, there were discrepancies when comparing simulation and data analysis results in terms of bias in the hurdle model's logistic component estimates. We suspect this is due to simulated data not fully capturing the true distribution of the CHOP data, All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint namely covariate imbalance. For example, 52% of patients that had at least one AH used public insurance, compared to 32% of patients who did not have an AH. We seek to address these limitations in future work.
( Figure 8 here) Work continues to be done in our group on constructing methods for performing noniterative, privacy-preserving distributed inference. We seek to eventually cover a wide array of outcome types, namely binary, count, and time-to-event outcomes. As we continue to develop a collection of privacy-preserving algorithms, we believe ODAH is worthy of consideration when one seeks to perform distributed regression on zero-inflated count outcome data.

Funding
This work is support in part by National Institutes of Health Grants 1R01LM012607 (ME, CL, RD and YC) and 1R01AI130460 (ME, CL, RD and YC).

None declared.
Abbreviations EHR: electronic health record ODAH: One-shot Distributed Algorithm for performing Hurdle regression CHOP: Children's Hospital of Philadelphia DDN: distributed data network OHDSI: Observational Health Data Sciences and Informatics PHI: protected health information AH: avoidable hospitalization OR: odds ratio RR: relative risk preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; Figure 3. Distribution of total number of avoidable hospitalizations (AHs) for patients with at least one AH in CHOP data sample. Total Avoidable Hospitalizations (Zero Truncated)

Proportion of Patients
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Table 3. Summary statistics describing patient population across three OneFlorida clinical sites.
All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ;  preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint Figure 6. Plots depicting results from CHOP avoidable hospitalization application. Log odds ratio (A) and log relative risk (B) estimates (along with corresponding 95% confidence intervals) for each covariate in the fitted hurdle model. Dashed horizontal line represents pooled estimate, our gold standard for comparing methods. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Local Meta ODAH Pooled 95% confidence intervals of log relative risk estimates B All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted December 19, 2020. ; https://doi.org/10.1101/2020.12.17.20248194 doi: medRxiv preprint  All rights reserved. No reuse allowed without permission. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.