A Bayesian Approach to Real-Time Monitoring and Forecasting of Chinese Foodborne Diseases

Foodborne diseases have a big impact on public health and are often underreported. This is because a lot of patients delay treatment when they suffer from foodborne diseases. In Hunan Province (China), a total of 21,226 confirmed foodborne disease cases were reported from 1 March 2015 to 28 February 2016 by the Foodborne Surveillance Database (FSD) of the China National Centre for Food Safety Risk Assessment (CFSA). The purpose of this study was to make use of the daily number of visiting patients to forecast the daily true number of patients. Our main contribution is that we take the reporting delays into consideration and propose a Bayesian hierarchical model for this forecast problem. The data shows that there were 21,226 confirmed cases reported among 21,866 visiting patients, a proportion as high as 97%. Given this observation, the Bayesian hierarchical model was established to predict the daily true number of patients using the number of visiting patients. We propose several scoring rules to assess the performance of different nowcasting procedures. We conclude that Bayesian nowcasting with consideration of right truncation of the reporting delays has a good performance for short-term forecasting, and could effectively predict the epidemic trends of foodborne diseases. Meanwhile, this approach could provide a methodological basis for future foodborne disease monitoring and control strategies, which are crucial for public health.


Introduction
Patients with foodborne diseases often have a lack of awareness about the severity, which may cause them to delay seeing a doctor. Such kinds of delay may easily cover up the outbreak of food safety incidents, which is not conducive to the timely control of the disease outbreaks. Taking the occurred but not yet reported events into account, and tracking the true number of daily cases are essential to rapidly and accurately evaluate current epidemic trends. This stimulated us to study foodborne surveillance data to provide up-to-date information on the growth of the epidemic and current trends, so that from the local health department agency to the national public health agency or institute (e.g., the Center for Disease Control and Prevention in the USA, European Food Safety Agency, the European Centre for Disease Prevention and Control, etc.) can judge whether an outbreak is ongoing, assess the impact of control measures and implement capacity planning. Previous studies on foodborne disease events have mainly focused on analyzing public surveillance data, estimating the actual incidence of foodborne disease in a country [1][2][3][4][5], and evaluating the burden of disease caused by various types of pathogens [6][7][8][9][10]. Heino et al. [11] proposed a randomized framework to identify the increasing number of foodborne disease events; Neill et al. [12] and Xiao [13,14] applied spatial statistical methods to explore the spatial aggregation of foodborne diseases. When diseases broke out in a certain area, the number of reported cases would surge correspondingly compared to the baseline data, which could be detected by anomaly pattern algorithms through numerical changes [15,16]. Guo et al. [17] designed a detection model of foodborne disease events and risk assessment by integrating the big-data of population, traffic, food production and sales and social media data under a spatio-temporal framework. The above studies ignore the reporting delay factor. However, the additional delays between onset date and reporting date in the public health surveillance database should not be ignored when processing tracking procedures. Lawless [18] firstly estimated the number of events that had occurred but not yet been reported and developed an approach to conduct robust predictions when incorporating random effects based on lately reporting data. Later, a robust algorithm to perform tracking procedures called "nowcasts" was used to correct for reporting delays [19]. Based on such concept, Hohle and Heiden [20] proposed a Bayesian nowcasting algorithm to deal with the short-term forecasting of the daily number of reported cases. Salmon et al. [21] improved the outbreak detection algorithms by taking reporting delays into account. Considering that the number of cases can increase dramatically in a matter of days in emerging food safety incidents, in this paper, we focus on how to make use of the daily number of visiting patients to forecast the daily true number of the patients, and we present how the proposed Bayesian nowcasting model could provide a more precise information of epidemic trends.

Data Material for Foodborne Diseases
Our data, collected from the Foodborne Surveillance Database (FSD) of the China National Centre for Food Safety Risk Assessment (CFSA, Beijing, China), covered confirmed cases in all cities in Hunan Province, China, spanning a year from 1 March 2015 to 28 February 2016. Table 1 shows examples of our collected data information. Each confirmed case consists of symptom onset date (the self-reported date when the patient was attacked by a foodborne disease), and the visit date (when the patients went to see a doctor). The reporting delay is divided into two phases ( Figure 1): Phase 1 is the delay between the onset date and visiting date; phase 2 is the delay between visiting date and the reporting date. In our data set, there are 21,866 patient visits and 21,226 reported confirmed cases.  Generally, the monitoring at the CFSA can be performed on time series aggregated by the date of symptom onset and the date of report arrival at the surveillance database. Specifically, till one day, we define "reported cases" as the currently available counts of patients confirmed by a doctor; and define "occurred cases" as the real number of patients attacked. The goal of nowcasting is to predict the true number of counts from the currently available counts. Figure 2 plots the trend of foodborne disease, where the blue lines indicate the daily number of "occurred cases" and the red lines indicate the trend line. The curve of "occurred cases" starts to get into a high growth phase from May, and continues to peak till the beginning of November. From the end of November, the number of cases begins to decrease. Figure 3 shows the daily counts of "reported cases" (blue bar) and "occurred cases" (red bar); on 20 July 2015, 371 reported cases of 430 occurred cases (i.e., 86.3%, from 16 July to 20 July) have information available due to the reporting delay. As more reported data comes in, the number of reported cases (blue bar) is getting closer to the number of occurred cases (red bar) in Figure 3. Generally, the monitoring at the CFSA can be performed on time series aggregated by the date of symptom onset and the date of report arrival at the surveillance database. Specifically, till one day, we define "reported cases" as the currently available counts of patients confirmed by a doctor; and define "occurred cases" as the real number of patients attacked. The goal of nowcasting is to predict the true number of counts from the currently available counts. Figure 2 plots the trend of foodborne disease, where the blue lines indicate the daily number of "occurred cases" and the red lines indicate the trend line. The curve of "occurred cases" starts to get into a high growth phase from May, and continues to peak till the beginning of November. From the end of November, the number of cases begins to decrease. Figure 3 shows the daily counts of "reported cases" (blue bar) and "occurred cases" (red bar); on 20 July 2015, 371 reported cases of 430 occurred cases (i.e., 86.3%, from 16 July to 20 July) have information available due to the reporting delay. As more reported data comes in, the number of reported cases (blue bar) is getting closer to the number of occurred cases (red bar) in Figure 3.   Figure 4 shows the histogram of observed frequency of delay on the basis of all cases, from which we see that many patients go to see a doctor on the first day when they have foodborne disease, more people have a few days' delay and seldom have delays of more than one week. To facilitate understanding, we assume the maximum delay occurs up to 5 days due to the 3 principle getting from the information in Figure 4. Note that the data will become less reliable and the information contained is not accurate when the delay becomes very large. We denote , = 0,1, … ,5, as the observed proportion of the number of patients with delay days to the total number of patients given a time span. Note that 5 includes delay with days larger than or equal to 5 days, and  Generally, the monitoring at the CFSA can be performed on time series aggregated by the date of symptom onset and the date of report arrival at the surveillance database. Specifically, till one day, we define "reported cases" as the currently available counts of patients confirmed by a doctor; and define "occurred cases" as the real number of patients attacked. The goal of nowcasting is to predict the true number of counts from the currently available counts. Figure 2 plots the trend of foodborne disease, where the blue lines indicate the daily number of "occurred cases" and the red lines indicate the trend line. The curve of "occurred cases" starts to get into a high growth phase from May, and continues to peak till the beginning of November. From the end of November, the number of cases begins to decrease. Figure 3 shows the daily counts of "reported cases" (blue bar) and "occurred cases" (red bar); on 20 July 2015, 371 reported cases of 430 occurred cases (i.e., 86.3%, from 16 July to 20 July) have information available due to the reporting delay. As more reported data comes in, the number of reported cases (blue bar) is getting closer to the number of occurred cases (red bar) in Figure 3.   Figure 4 shows the histogram of observed frequency of delay on the basis of all cases, from which we see that many patients go to see a doctor on the first day when they have foodborne disease, more people have a few days' delay and seldom have delays of more than one week. To facilitate understanding, we assume the maximum delay occurs up to 5 days due to the 3 principle getting from the information in Figure 4. Note that the data will become less reliable and the information contained is not accurate when the delay becomes very large. We denote , = 0,1, … ,5, as the observed proportion of the number of patients with delay days to the total number of patients given a time span. Note that 5 includes delay with days larger than or equal to 5 days, and  Figure 4 shows the histogram of observed frequency of delay on the basis of all cases, from which we see that many patients go to see a doctor on the first day when they have foodborne disease, more people have a few days' delay and seldom have delays of more than one week. To facilitate understanding, we assume the maximum delay occurs up to 5 days due to the 3σ principle getting from the information in Figure 4. Note that the data will become less reliable and the information contained is not accurate when the delay becomes very large. We denote p d , d = 0, 1, . . . , 5, as the observed proportion of the number of patients with delay d days to the total number of patients given a time span. Note that p 5 includes delay with days larger than or equal to 5 days, and ∑ 5 d=0 p d = 1. From a seasonal perspective of delay distribution, Figure 5 suggests that the distribution of reporting delay does not change significantly with time, which therefore motivates us to assume that the patient behavior (corresponding to delay distribution) in a given region is stable with four seasons, if nothing else intervenes (e.g., policy).

Bayesian Nowcasting
Nowcasting is defined as the prediction of the present-the very near future or the very recent past. Recently, it has been regarded as a useful tool to real-time monitor the disease surveillance data [19,20], personal pro-health outdoor activities [22], and forecast the epidemics in public health settings [23]. In this paper, we develop a Bayesian nowcasting with truncation (BNT) approach to forecast the daily total number of cases in the presence of reporting delays, which can correct the current number of cases, thus to provide more accurate daily information on current epidemic trends. Below, we introduce necessary notation and assumptions. From a seasonal perspective of delay distribution, Figure 5 suggests that the distribution of reporting delay does not change significantly with time, which therefore motivates us to assume that the patient behavior (corresponding to delay distribution) in a given region is stable with four seasons, if nothing else intervenes (e.g., policy).  From a seasonal perspective of delay distribution, Figure 5 suggests that the distribution of reporting delay does not change significantly with time, which therefore motivates us to assume that the patient behavior (corresponding to delay distribution) in a given region is stable with four seasons, if nothing else intervenes (e.g., policy).

Bayesian Nowcasting
Nowcasting is defined as the prediction of the present-the very near future or the very recent past. Recently, it has been regarded as a useful tool to real-time monitor the disease surveillance data [19,20], personal pro-health outdoor activities [22], and forecast the epidemics in public health settings [23]. In this paper, we develop a Bayesian nowcasting with truncation (BNT) approach to forecast the daily total number of cases in the presence of reporting delays, which can correct the current number of cases, thus to provide more accurate daily information on current epidemic trends. Below, we introduce necessary notation and assumptions.

Bayesian Nowcasting
Nowcasting is defined as the prediction of the present-the very near future or the very recent past. Recently, it has been regarded as a useful tool to real-time monitor the disease surveillance data [19,20], personal pro-health outdoor activities [22], and forecast the epidemics in public health settings [23]. In this paper, we develop a Bayesian nowcasting with truncation (BNT) approach to forecast the daily total number of cases in the presence of reporting delays, which can correct the current number of cases, thus to provide more accurate daily information on current epidemic trends. Below, we introduce necessary notation and assumptions.

Notation and Assumptions
We place our study in a discrete time setting where each unit represents a day. We use the notation of Lawless [18] to describe the prediction of the currently actual number of patients in the presence of delay. Let n t,d be the number of patients with foodborne disease at time t but reported with a delay of d days, which means that, these n t,d patients get foodborne disease at time t, but they arrive on hospital at time t + d. t takes values on {0, · · · , T}, T denotes the current day or "now", and d takes values on {0, · · · , D}. Typically, one can assume that the maximum delay occurs up to D days since the data will become less reliable and the information contained is not accurate when the delay time d becomes very large. In our study, reports with a delay larger than D are included in the category of delay being "D days". Note that when d > T − t, we could not know n t,d , because at time T, the patients have not gone to hospital yet. So our data is right-truncated type of data. Formally, n t,d to be observed cases reported (those who go to hospital already) until time T. Thus with the limit of maximal delay, the real number of cases occurred at time t, 0 ≤ t ≤ T, is: Note that N t is right-truncated for t larger than T − D. Our goal is to estimate the unobserved right-truncated N t . We use Figure 6 to visualize the data structure. Each row indicates the total number of cases occurred at time t, the shaded box N t − N t,T represents the cases that occurred but not yet reported. In order to facilitate model description, we divided the solution space into three parts: A t = {(t, d) : 0 ≤ t ≤ T, 0 ≤ d ≤ D} covers all observed data and unobserved data while O t = {(t, d) : 0 ≤ t ≤ T, 0 ≤ d ≤ min(T − t, D)} contains observed n O t only and U t = A t \O t contains unobserved n U t only. We draw them as the light gray trapezoid and as the darker gray triangle in Figure 6 respectively.

Notation and Assumptions
We place our study in a discrete time setting where each unit represents a day. We use the notation of Lawless [18] to describe the prediction of the currently actual number of patients in the presence of delay. Let , be the number of patients with foodborne disease at time but reported with a delay of days, which means that, these , patients get foodborne disease at time , but they arrive on hospital at time + . takes values on {0, ⋯ , }, denotes the current day or "now", and takes values on {0, ⋯ , D}. Typically, one can assume that the maximum delay occurs up to days since the data will become less reliable and the information contained is not accurate when the delay time becomes very large. In our study, reports with a delay larger than are included in the category of delay being "D days". Note that when > − , we could not know , , because at time , the patients have not gone to hospital yet. So our data is right-truncated type of data. Formally, we devote , ≜ ( , ) = ∑ , min ( − , ) =0 to be observed cases reported (those who go to hospital already) until time . Thus with the limit of maximal delay, the real number of cases occurred at time , 0 ≤ ≤ , is: , − < , e. g. , truncateddata Note that is right-truncated for larger than − . Our goal is to estimate the unobserved right-truncated . We use Figure 6 to visualize the data structure. Each row indicates the total number of cases occurred at time t, the shaded box − , represents the cases that occurred but not yet reported. In order to facilitate model description, we divided the solution space into three parts: = only. We draw them as the light gray trapezoid and as the darker gray triangle in Figure 6 respectively.

Predict the Distribution for N t
Define p d as the probability that one patient will have a delay of d days. p d 's should satisfy the following equation ∑ D d=0 p d . With reference to Kalbfleisch and Lawless's [24] and Zeger et al.'s [25] assumptions: the number of daily illness follows an underlying inhomogeneous Poisson process .t. N t ∼ Po(λ t ), t = 0, · · · , T. A reasonable data generating process is as follows: (n t,0 , n t,1 , · · · , n t,D ) | N t ∼ Multinomial(N t , (p 0 , p 1 , , · · · , p D )) where Po(·) denotes the Poisson distribution and Multinomial(·, ·) denotes the multinomial distribution.
We try to infer the unknown λ t and p d parameters for t = 0, · · · , T; d = 0, · · · , D. Once these parameters are estimated, we could predict the distribution of N t given the observed information.
Using the Bayesian method, we assume that conjugate priors and the full hierarchical Bayes model could be described as follows: where GD(α, β) is the generalized Dirichlet (GD) distribution with parameters α = (α 0 , α 1 , , · · · , α D ) , β = (β 0 , β 1 , , · · · , β D ) . We denote p (p D , p D−1 , , · · · , p 1 ) . Ga(a λ , b λ ) as the Gamma distribution with parameters a λ , b λ which are known constants as the informative prior related to the morbidity. For this hierarchical model, the marginal (prior) distribution of N t is negative binomial distribution with the mean and variance as follows: By application of Bayes theorem, we can predict the distribution of N t , t = T − D, · · · , T given N t,T from the observed counts N t,T : From the above equality, we could calculate the posterior distribution of N t | N t,T as follows: (i) The integration f (λ t ) f (N t | λ t )dλ t could be computed by directly computing the density of negative binomial distribution N t . (ii) A Monte Carlo method could be used to approximate the second factor of the above equation.
By this Monte Carlo method, we first sample p from generalized Dirichlet distribution, and then we compute the likelihood function and calculate the average of all data: For example, when we calculate f (N t = nN t,T ), the Monte Carlo approximation is: where p (s) is the sth sample from GD(α, β). Therefore, we can use the average to approximate f (N t = nN t,T ), for t = T − D, · · · , T and n = 0, 1, · · · , N max , where N max is sufficiently large.

Setup for Hyper-Parameters
We applied the Bayesian hierarchical model proposed above to the foodborne surveillance data time series of Hunan Province in China. To decide the hyper-parameter α, β in the prior of p and a λ , b λ in the prior of λ t , we use the empirical Bayes ideas. Analyzing the observed data from 1 March 2015 to 28 February 2016, we found that the mean and variance of N t from the dataset are 57 and 982 respectively. Thus solving the following equations, we get the values for hyperparameters a λ , b λ : By the delay distribution shown in Figure 4, we take the maximum delay D = 5 days. Retrospectively, we can transform the historical available data into a final reporting triangle, which clearly shows the delay information. For example, in Figure 7, each cell n t,d indicates a count number that occurred at a given time point t but reported with a delay of d while the sum over all counts in a row is the corresponding N t,T for each day t shown on the right bars, where T = "31 July 2015". Also we can work out the cumulative frequency for each delay based on the whole dataset, which provides informative prior for p. where ( ) is the th sample from ( , ). Therefore, we can use the average to approximate ( = | | , ), for = − , ⋯ , and = 0,1, ⋯ , , where is sufficiently large.

Setup for Hyper-Parameters
We applied the Bayesian hierarchical model proposed above to the foodborne surveillance data time series of Hunan Province in China. To decide the hyper-parameter , in the prior of and , in the prior of , we use the empirical Bayes ideas. Analyzing the observed data from 1 March 2015 to 28 February 2016, we found that the mean and variance of from the dataset are 57 and 982 respectively. Thus solving the following equations, we get the values for hyperparameters , : By the delay distribution shown in Figure 4, we take the maximum delay = 5 days. Retrospectively, we can transform the historical available data into a final reporting triangle, which clearly shows the delay information. For example, in Figure 7, each cell , indicates a count number that occurred at a given time point but reported with a delay of while the sum over all counts in a row is the corresponding , for each day shown on the right bars, where = "31 July 2015". Also we can work out the cumulative frequency for each delay based on the whole dataset, which provides informative prior for .

Daily Surveillance
Below we apply the proposed BNT approach to the data set of foodborne patients described above. Here we set a time span 5 days to do the nowcasting. For each current day T, we try to predict

Daily Surveillance
Below we apply the proposed BNT approach to the data set of foodborne patients described above. Here we set a time span 5 days to do the nowcasting. For each current day T, we try to predict the unobserved occurred cases N t 's for t ∈ {T − 4, T − 3, T − 2, T − 1, T}. We give an animation of the details of our nowcasting process in the Supplementary Materials File S1. To explain the animation, we show four pictures for the continuous four days from 15 to 18 July in Figure 8. These pictures give the nowcasting process for predicting the numbers N t 's of cases really occurred to the current days in July. Figure 8a takes 15 July as the current day, the numbers of reported cases until 15 July are drawn in blue, and the numbers of real occurred cases are drawn in red. Since there are 5 days for occurred cases to be reported, the accurate numbers of occurred cases for the current 5 days from 11 to 15 July may not be available as of 15 July. The goal is to predict the numbers of occurred cases for these current 5 days. In Figure 8a, the prediction of these numbers are drawn in yellow, the bold yellow bar indicates the median of the prediction distribution of occurred cases, the two short yellow bars indicate the 2.5% and 97.5% quantiles of the prediction distribution. The predicted number of occurred cases for 15 July is very close to the real number, although the interval between two bounds of the prediction distribution for 15 July is longer. The predicted number for 14 July is a little higher than the real number, although the interval is much shorter than that for 15 July. Similarly, Figure 8b-d show the predictions on 16-18 July, respectively, and it can be seen that interval for 15 July becomes shorter and shorter. From the four pictures drawn from the animation given in the Supplementary Materials File S1, we can see the predicted numbers are quite close to the real numbers of occurred cases for these four days, and the intervals all cover the real numbers of occurred cases. the unobserved occurred cases 's for ∈ { − 4, − 3, − 2, − 1, }. We give an animation of the details of our nowcasting process in the Supplementary Materials File S1. To explain the animation, we show four pictures for the continuous four days from 15 to 18 July in Figure 8. These pictures give the nowcasting process for predicting the numbers 's of cases really occurred to the current days in July. Figure 8a takes 15 July as the current day, the numbers of reported cases until 15 July are drawn in blue, and the numbers of real occurred cases are drawn in red. Since there are 5 days for occurred cases to be reported, the accurate numbers of occurred cases for the current 5 days from 11 to 15 July may not be available as of 15 July. The goal is to predict the numbers of occurred cases for these current 5 days. In Figure 8a, the prediction of these numbers are drawn in yellow, the bold yellow bar indicates the median of the prediction distribution of occurred cases, the two short yellow bars indicate the 2.5% and 97.5% quantiles of the prediction distribution. The predicted number of occurred cases for 15 July is very close to the real number, although the interval between two bounds of the prediction distribution for 15 July is longer. The predicted number for 14 July is a little higher than the real number, although the interval is much shorter than that for 15 July. Similarly, Figure 8b-d show the predictions on 16-18 July, respectively, and it can be seen that interval for 15 July becomes shorter and shorter. From the four pictures drawn from the animation given in the Supplementary Materials File S1, we can see the predicted numbers are quite close to the real numbers of occurred cases for these four days, and the intervals all cover the real numbers of occurred cases.

Evaluating the Nowcasting
In this section, three different nowcasting procedures will be compared: the Lawless [18] frequency (LF) method with the consideration of the right-truncated nature and two Bayesian procedures, one is BNT method proposed in this paper and Bayesian nowcasting with no truncation (BNnT) method by Hohle and Heiden [20] ignoring the right truncation.
Before doing comparisons, we predict the numbers of occurred cases for 5 days on each current day T. We take T from 30 June 2015 to 30 August 2015, totally 63 days and so 5 × 63 = 315 nowcastings will be calculated. Considering a specific time T (for example "now") we predict 's for ∈ { − 4, − 3, − 2, − 1, }. To evaluate the performance of different nowcasting approaches, we use three scoring rules:

Evaluating the Nowcasting
In this section, three different nowcasting procedures will be compared: the Lawless [18] frequency (LF) method with the consideration of the right-truncated nature and two Bayesian procedures, one is BNT method proposed in this paper and Bayesian nowcasting with no truncation (BNnT) method by Hohle and Heiden [20] ignoring the right truncation.
Before doing comparisons, we predict the numbers of occurred cases for 5 days on each current day T. We take T from 30 June 2015 to 30 August 2015, totally 63 days and so 5 × 63 = 315 nowcastings will be calculated. Considering a specific time T (for example "now") we predict N t 's for t ∈ {T − 4, T − 3, T − 2, T − 1, T}. To evaluate the performance of different nowcasting approaches, we use three scoring rules: (i) Logarithmic score (logS) [26]: (ii) Ranking probability score (RPS) [27,28]: where P T t is the predictive distribution for time t based on the information available at T and with N t being the number of occurred cases. f P T t (·) is the probability mass function (PMF) of the predictive distribution P T t , and where F P T t (·) denotes the cumulative distribution function (CDF) of the predictive distribution P T t . And from the data prior information, here we choose N max = 300. (iii) The proportion of times that the observed value lay outside the equal-tailed 95% predicted interval (OutCI).
Such rules allow investigating calibration and sharpness of predictive distribution. The higher probability of the forecast distribution for the actual observed value is, the better the prediction is. It means that the lower the score is, the better the performance is. Table 2 gives the mean scores obtained by averaging over the 315 nowcastings, which illustrates that BNT method proposed in this paper has an overall best performance based on these three evaluation rules compared to other existing methods (BNnT method and LF method). We consider the comparison among LF method, BNT method proposed in this paper and BNnT method, here we also predict the numbers of occurred cases for 5 days from the current day 0 to 4 days ago. Figure 9 shows the mean scores of RPS obtained by averaging for t ∈ {T − 4, T − 3, T − 2, T − 1, T} from 315 nowcastings. The mean scores are the largest for delay = 0 among all delays. When the delay increases, more information is collected, and the LF method (red line) and BNT method (blue line) have a better performance with delay ≤ 2. When delay ≥ 2 days, the BNT method appears to perform best.
days ago. Figure 9 shows the mean scores of RPS obtained by averaging for ∈ { − 4, − 3, − 2, − 1, } from 315 nowcastings. The mean scores are the largest for delay = 0 among all delays. When the delay increases, more information is collected, and the LF method (red line) and BNT method (blue line) have a better performance with delay ≤ 2 . When delay ≥ 2 days, the BNT method appears to perform best. Finally, we compare the mean scores of RPS among LF method, BNT method proposed in this paper and BNnT method with considering the time span from 1 July to 30 August. Figure 10 shows that BNT method (blue line) outperforms the LF method (red line) and BNnT method (green line). Finally, we compare the mean scores of RPS among LF method, BNT method proposed in this paper and BNnT method with considering the time span from 1 July to 30 August. Figure 10 shows that BNT method (blue line) outperforms the LF method (red line) and BNnT method (green line).

Discussion
In this paper, we introduced a Bayesian hierarchal approach to monitor and forecast in real-time Chinese foodborne disease outbreaks based on the public health surveillance data if reporting delays are present. Such delay-adjusting tracking procedures can provide daily information on the epidemic trends by predicting the daily true number of patients.
In fact, the delay could be divided into two phases. Phase 1 is the delay from the onset to the doctor visit, and this is the delay of the patients in seeking medical treatments. Since the foodborne disease is not a serious disease requiring immediate emergency, many patients are not in a hurry to go to the hospital for treatment, which may cover up the outbreak of a food safety incident. Phase 2 is the delay between the visit date and the hospital reporting date. The delay distribution in this stage is very complicated. Due to the difference for medical level of each hospital, the efficiency of the experiment, the doctor's understanding of the foodborne disease and the definition of the cases, etc., the diagnosis report will be delayed. This is the delay in phase 2. References [19][20][21] used the number of patients reported in the hospital to predict the number of occurred cases. The cycle is very long from the onset to the diagnosis reported, which couldn't predict the number of cases, or grasp the

Discussion
In this paper, we introduced a Bayesian hierarchal approach to monitor and forecast in real-time Chinese foodborne disease outbreaks based on the public health surveillance data if reporting delays are present. Such delay-adjusting tracking procedures can provide daily information on the epidemic trends by predicting the daily true number of patients.
In fact, the delay could be divided into two phases. Phase 1 is the delay from the onset to the doctor visit, and this is the delay of the patients in seeking medical treatments. Since the foodborne disease is not a serious disease requiring immediate emergency, many patients are not in a hurry to go to the hospital for treatment, which may cover up the outbreak of a food safety incident. Phase 2 is the delay between the visit date and the hospital reporting date. The delay distribution in this stage is very complicated. Due to the difference for medical level of each hospital, the efficiency of the experiment, the doctor's understanding of the foodborne disease and the definition of the cases, etc., the diagnosis report will be delayed. This is the delay in phase 2. References [19][20][21] used the number of patients reported in the hospital to predict the number of occurred cases. The cycle is very long from the onset to the diagnosis reported, which couldn't predict the number of cases, or grasp the trends of the disease in a timely way. Therefore, in view of this, our model has the following advantages: First, when the data is complete and there is hospital-confirmed report data in hand, our model uses the number of confirmed patients to predict the number of occurred cases, a procedure similar to that of [20,21]. Then, if the data is incomplete, for example, we only have the visiting data till today, we could use the number of visiting patients to predict the occurred cases, which may lead to a very small overestimation of the number of occurred cases, but least not underestimate them. However, it makes the prediction ahead of days (the delay in Phase 2). In fact, our collected data shows that there are 21,226 confirmed cases among 21,866 visiting patients, a proportion as high as 97%. Early warning of disease outbreaks could avoid long delays from patient visit to disease confirmation, which can greatly shorten the forecast period, detecting possible food safety incidents in a more timely fashion. It is very significant in this regard.
As future work, since the number of visits is used to predict the number of occurred cases that will be overestimated (at least not underestimated), we will use the empirical data to obtain the proportion of confirmed cases among the visiting patients. Under this condition, the information of the proportion can be used to predict the number of occurred cases. We will also consider applying a compound Poisson model to solve the problem of overestimation of the number of occurred cases. Meanwhile, other covariates including city, age, occupation, etc., will also be considered. How to analyze and model the foodborne disease surveillance data, and how to detect the outbreak based on historical data, even the casual inference to pathogenic factors, are all issues to be studied in the future.