Multiple Ensemble Neural Network Models with Fuzzy Response Aggregation for Predicting COVID-19 Time Series: The Case of Mexico

In this paper, a multiple ensemble neural network model with fuzzy response aggregation for the COVID-19 time series is presented. Ensemble neural networks are composed of a set of modules, which are used to produce several predictions under different conditions. The modules are simple neural networks. Fuzzy logic is then used to aggregate the responses of several predictor modules, in this way, improving the final prediction by combining the outputs of the modules in an intelligent way. Fuzzy logic handles the uncertainty in the process of making a final decision about the prediction. The complete model was tested for the case of predicting the COVID-19 time series in Mexico, at the level of the states and the whole country. The simulation results of the multiple ensemble neural network models with fuzzy response integration show very good predicted values in the validation data set. In fact, the prediction errors of the multiple ensemble neural networks are significantly lower than using traditional monolithic neural networks, in this way showing the advantages of the proposed approach.


Introduction
Recently, we noticed the rapid propagation of the COVID-19 Coronavirus around the world, appearing initially in China and then spreading to neighbor countries, like Thailand, Korea, and Japan, and after that to Europe, America, and later Africa. In particular, Europe, Italy, Spain, France, the United Kingdom, and Germany have been hit hard with the propagation of the COVID-19 virus, having to this moment many confirmed cases and deaths. After that, the virus spread to the American continent, and the United States and Canada which were also hit hard with the spread of the COVID-19 virus. Finally, the virus arrived in Mexico, where it is now becoming a large problem with almost 50,000 confirmed cases as of 18 May 2020.
In relation to COVID-19 prediction, we can mention the following work. In Chen et al. [1], the authors outline the prediction of the SARS-CoV-2 (2019-nCoV) 3C-as a protease structure. In Fan et al. [2], the authors outline an approach for the prediction of the epidemic spread of the coronavirus, driven by the spring festival transportation in China. In Goh et al. [3], the authors discuss the rigidity of the outer shell predicted by a protein intrinsic disorder model with this uncovering COVID-19 infectivity. In Grifoni et al. [4], a bioinformatics approach that can predict candidate targets for immune responses to SARS-CoV-2 was presented. In He [5], the author discusses what could still be done to control COVID-19 outbreaks in addition to the usual measures of isolation and contact tracing that most countries are imposing. In Huang et al. [6], a spatial-temporal distribution of COVID-19 in China and its prediction were described. In Ibrahim et al. [7], the authors describe the prediction of the

Nonlinear Autoregressive Neural Networks
The NAR (nonlinear autoregressive) neural network uses past values of the time series to estimate predicted future values. The NAR neural network model consists of one input layer, one or more hidden layers, and one output layer. NAR is a dynamic and recurrent network with feedback connections [16]. NAR is used in one-step-ahead or multi-step-ahead time-series forecasting. The NAR model expressed mathematically is presented in the following Equation (1): where y(t) is the value of the considered time series y at time t, and d is the time delay and F denotes the transfer function [17]. In this case, two NAR networks are used in the ensemble, one with Levenberg-Marquardt and the other with Bayesian regularization training algorithms. In Figure 1, the NAR neural network architecture is illustrated in more detail.

Function Fitting Neural Network
The FITNET (function fitting neural network) is another commonly used Multi-Layer Perceptron (MLP) or a class of feedforward artificial neural network (ANN) that contains one hidden layer. A feed-forward network with one hidden layer and enough neurons in the hidden layers can fit any finite input-output mapping problem. The FITNET model uses the process of training a neural network on a set of inputs in order to produce an associate set of target outputs. The FITNET is used for curve-fitting and regression. In Figure 2, is the input neuron, and are the weights, represents the neuron numbers, and is the neuron output [18][19][20]. In Figure 2, the general architecture of an artificial neural network (ANN) is shown. The learning or training algorithm used in FITNET is the well-known Levenberg-Marquardt training method and the reason for this is because it is very fast for time-series data.

Function Fitting Neural Network
The FITNET (function fitting neural network) is another commonly used Multi-Layer Perceptron (MLP) or a class of feedforward artificial neural network (ANN) that contains one hidden layer. A feed-forward network with one hidden layer and enough neurons in the hidden layers can fit any finite input-output mapping problem. The FITNET model uses the process of training a neural network on a set of inputs in order to produce an associate set of target outputs. The FITNET is used for curve-fitting and regression. In Figure 2, X n is the input neuron, W ij and W k j are the weights, n represents the neuron numbers, and Y is the neuron output [18][19][20].
In Figure 2, the general architecture of an artificial neural network (ANN) is shown.

Function Fitting Neural Network
The FITNET (function fitting neural network) is another commonly used Multi-Layer Perceptron (MLP) or a class of feedforward artificial neural network (ANN) that contains one hidden layer. A feed-forward network with one hidden layer and enough neurons in the hidden layers can fit any finite input-output mapping problem. The FITNET model uses the process of training a neural network on a set of inputs in order to produce an associate set of target outputs. The FITNET is used for curve-fitting and regression. In Figure 2, is the input neuron, and are the weights, represents the neuron numbers, and is the neuron output [18][19][20]. In Figure 2, the general architecture of an artificial neural network (ANN) is shown. The learning or training algorithm used in FITNET is the well-known Levenberg-Marquardt training method and the reason for this is because it is very fast for time-series data. The learning or training algorithm used in FITNET is the well-known Levenberg-Marquardt training method and the reason for this is because it is very fast for time-series data.
Neural networks, such as the NAR and the FITNET, and Fuzzy Systems, are commonly used for time-series forecasting. In fact, fuzzy systems, NAR and FITNET have been used in many areas. For example, a model was constructed for both snow-free and snowy areas to forecast monthly and daily albedo [21], wheel-wear prediction models based on NAR demonstrated being useful in predicting dynamic changes of wheel diameters [22], and FITNET was used for atomic coordinate prediction of carbon nanotubes [23]. On the other hand, neuro-fuzzy systems were used for prediction quality of a rubber curing process [24] and cardiovascular disease risk level prediction [25]. However, here the Fuzzy integrator, the NAR, and FITNET neural networks are used to help predict 10 days ahead of 12 states in Mexico and the total of the country using the confirmed and death cases of the COVID-19 using one-hidden layer. The Levenberg-Marquardt backpropagation (trainlm) is used as the training algorithm, the purelin as the transfer function and three feedback delays. The number of epochs is 500, 10 neurons in the hidden layer, and the earning rate is 0.01. The Mexican dataset was obtained from Mexico's Government website [26].

Proposed Method
In Figure 3, the main architecture of the ensemble neural network model is shown. We have a dataset from COVID-19 confirmed and death cases, which consists of 12 states in Mexico and the total data of the country. In modules 1 and 2 of the ensemble, we use the NAR neural network using different parameters, and in module 3 we use the FITNET neural network to train and learn from the given information. The mean square error (MSE) of the training and actual data is normalized using Equation (2): where N = the size of the training data, x i = the actual values, and y i = the trained data obtained of the sample i [27].  Regarding the general architecture of Figure 3, the main reasoning behind this is the following. We have one ensemble for each state in Mexico (in the Figure, this is from 1 to N). Then each ensemble has three modules, which consists of the simple neural networks (NAR and FITNET). The reason for using three modules in each ensemble is that in previous work, this architecture has provided good results. Then each ensemble has its own fuzzy aggregator to produce the final prediction of the ensemble.
The structure of the fuzzy integrator system is shown in Figure 4, which is formed by the inputs before fuzzification, the fuzzy inference system (integrator), and the fuzzy outputs after defuzzification. The inputs 1 , 2 , and 3 consist of the normalized mean square errors of the three neural networks that have been used to predict. In this case, e1 is the MSE of module 1, e2 is the NMSE of module 2, and e3 is the NMSE of module 3. The fuzzy inference system consists of three fuzzy rules, and the three outputs are 1 , 2 , and 3 , which are obtained with the weighted mean in the defuzzification process. The main idea of this fuzzy system is to model the process of Then the normalized mean square errors are used in the fuzzy integrator of Figure 4 to produce the weights w 1 , w 2 , w 3 and then by using the expression in Equation (5) we combine the predictions to obtain the total prediction PT: where w 1 = weight of module 1, w 2 = the weight of module 2, w 3 = the weight of module 3, p 1 = the predicted value of module 1, p 2 = the predicted value of module 2, and p 3 = the predicted value of module 3. in the defuzzification process. The main idea of this fuzzy system is to model the process of assigning the weights to the predictions of the modules according to the individual errors of the modules obtained with Equation 1. So basically, for example, if the error of module 1 is low and the errors of the other modules are high, then we assign a high weight to module 1 and low weights to the other ones. The advantage of using a fuzzy approach here with linguistic variables is that the process of assigning the weight has a level of uncertainty, which is modeled with the membership functions and fuzzy reasoning.  Figure 5 illustrates the fuzzy inputs of the membership functions of 1 , 2 , and 3 which is the NMSE of the neural networks in module 1, module 2, and module 3, respectively. The fuzzy values that are considered are low, medium, and large. The inputs 1 , 2 , and 3 have been normalized in the range between 0 and 1. Regarding the general architecture of Figure 3, the main reasoning behind this is the following. We have one ensemble for each state in Mexico (in the Figure, this is from 1 to N). Then each ensemble has three modules, which consists of the simple neural networks (NAR and FITNET). The reason for using three modules in each ensemble is that in previous work, this architecture has provided good results. Then each ensemble has its own fuzzy aggregator to produce the final prediction of the ensemble.
The structure of the fuzzy integrator system is shown in Figure 4, which is formed by the inputs before fuzzification, the fuzzy inference system (integrator), and the fuzzy outputs after defuzzification.
The inputs e 1 , e 2 , and e 3 consist of the normalized mean square errors of the three neural networks that have been used to predict. In this case, e 1 is the MSE of module 1, e 2 is the NMSE of module 2, and e 3 is the NMSE of module 3. The fuzzy inference system consists of three fuzzy rules, and the three outputs are w 1 , w 2 , and w 3 , which are obtained with the weighted mean in the defuzzification process. The main idea of this fuzzy system is to model the process of assigning the weights to the predictions of the modules according to the individual errors of the modules obtained with Equation (1). So basically, for example, if the error of module 1 is low and the errors of the other modules are high, then we assign a high weight to module 1 and low weights to the other ones. The advantage of using a fuzzy approach here with linguistic variables is that the process of assigning the weight has a level of uncertainty, which is modeled with the membership functions and fuzzy reasoning. Figure 5 illustrates the fuzzy inputs of the membership functions of e 1 , e 2 , and e 3 which is the NMSE of the neural networks in module 1, module 2, and module 3, respectively. The fuzzy values that are considered are low, medium, and large. The inputs e 1 , e 2 , and e 3 have been normalized in the range between 0 and 1.  Figure 6 illustrates the fuzzy outputs membership functions of 1 , 2 , and 3 , which are the weighted mean errors of 1 , 2 , and 3 , respectively. The fuzzy values that are considered are low, medium, and high. The outputs 1 , 2 , and 3 have been normalized in a range between 0 and 1.
The decision to use Gaussian membership functions was done after experimenting with  Figure 6 illustrates the fuzzy outputs membership functions of w 1 , w 2 , and w 3 , which are the weighted mean errors of e 1 , e 2 , and e 3 , respectively. The fuzzy values that are considered are low, medium, and high. The outputs w 1 , w 2 , and w 3 have been normalized in a range between 0 and 1.  Figure 6 illustrates the fuzzy outputs membership functions of 1 , 2 , and 3 , which are the weighted mean errors of 1 , 2 , and 3 , respectively. The fuzzy values that are considered are low, medium, and high. The outputs 1 , 2 , and 3 have been normalized in a range between 0 and 1.
The decision to use Gaussian membership functions was done after experimenting with Triangular, Trapezoidal, and Gaussian functions, in which better results were achieved with Gaussians. The results were better in terms of the smoothness of the output prediction results, as well as in terms of accuracy. In the paper, we only report the final design of the Gaussian membership functions that we obtained. The fuzzy system contains three fuzzy rules, which are the following: 1. If ( 1 is small) and ( 2 is medium) and ( 3 is large), then ( 1 is high)( 2 is medium)( 3 is small).
These fuzzy rules express the knowledge of how to combine predictions based on their corresponding errors. Basically, the rules are assigning the weights (outputs) used in performing the The decision to use Gaussian membership functions was done after experimenting with Triangular, Trapezoidal, and Gaussian functions, in which better results were achieved with Gaussians. The results were better in terms of the smoothness of the output prediction results, as well as in terms of accuracy. In the paper, we only report the final design of the Gaussian membership functions that we obtained.
The fuzzy system contains three fuzzy rules, which are the following: 1.
If (e 1 is small) and (e 2 is medium) and (e 3 is large), then (w 1 is high) (w 2 is medium) (w 3 is small).

2.
If (e 1 is large) and (e 2 is small) and (e 3 is medium), then (w 1 is small) (w 2 is high) (w 3 is medium).

3.
If (e 1 is medium) and (e 2 is large) and (e 3 is small), then (w 1 is medium) (w 2 is small) (w 3 is high).
These fuzzy rules express the knowledge of how to combine predictions based on their corresponding errors. Basically, the rules are assigning the weights (outputs) used in performing the average based on the fuzzy values of the errors in the modules. The reason for preferring Mamdani over Sugeno modeling is because a Mamdani fuzzy model is more interpretable in terms of the fuzzy rules (completely linguistic), and also is easier to design. The advantage of using fuzzy logic here is that we are able to handle the uncertainty in making a combined prediction, which is similar to combining the opinions of three experts.

Knowledge Representation of the Fuzzy System
In this Section, we show the knowledge representation of the fuzzy system with Gaussian membership functions. The membership values for the Gaussian membership function are defined in the following Equation (6): The membership value µ(x) is the degree to which a given input x belongs to that membership function 0 ≤ µ(x) ≤ 1: where c = the center, and the variance σ are the design parameters. The Equations (7) The particular parameter values for the membership functions were defined considering the three possible fuzzy values, which are Low, Medium, and High, assigned and adjusted in a manual way [28][29][30][31][32]. Figure 7 shows the comparison of results of Confirmed Cases Prediction in Mexico using different neural network models; two of them are a monolithic model, FITNET, and NAR versus the Modular Neural Network, which uses a fuzzy logic integrator. Table 1 shows a comparison of the predicted values for confirmed cases of COVID-19 in 10 days ahead for Mexico (whole Country).       Figure 8, we can note that the proposed Modular Neural Network with Fuzzy MNNF has lower errors. Table 2 shows the relative errors of prediction for the states and the whole country.     Figure 9 shows the comparison of results of Death Cases Prediction in Mexico using different neural network models, two of them a monolithic model, FITNET, and NAR versus the Modular Neural Network, which uses a fuzzy logic integrator. Table 3 shows a comparison of the predicted values for death cases of Covid-19 in 10 days ahead for Mexico (the whole country). Table 4 shows relative errors of prediction of death cases for the states and the whole country.   Figure 10 shows the comparison of % RMSE in death cases for the different models of NN for the 12 states and the Country of Mexico where: 1 is Baja California, 2 Ciudad de Mexico, 3 Coahuila,   Finally, we show in Figure 11 an example of the prediction in one particular state of Mexico (Sinaloa). We are predicting 10 days ahead (data not previously seen by the model), and in Figure 11 we show the results of the proposed MNNF model when compared to the NAR and FIT models. We can clearly appreciate how the proposed MNNF model is following very closely the real data and the other models after day 5, where they drift apart and loose prediction value. Our explanation of this behavior is that the proposed MNNF is using fuzzy logic for aggregating the results of the modules, and in some way, the uncertainty in making a prediction is being managed appropriately.  Finally, we show in Figure 11 an example of the prediction in one particular state of Mexico (Sinaloa). We are predicting 10 days ahead (data not previously seen by the model), and in Figure 11 we show the results of the proposed MNNF model when compared to the NAR and FIT models. We can clearly appreciate how the proposed MNNF model is following very closely the real data and the other models after day 5, where they drift apart and loose prediction value. Our explanation of this behavior is that the proposed MNNF is using fuzzy logic for aggregating the results of the modules, and in some way, the uncertainty in making a prediction is being managed appropriately. Finally, we show in Figure 11 an example of the prediction in one particular state of Mexico (Sinaloa). We are predicting 10 days ahead (data not previously seen by the model), and in Figure 11 we show the results of the proposed MNNF model when compared to the NAR and FIT models. We can clearly appreciate how the proposed MNNF model is following very closely the real data and the other models after day 5, where they drift apart and loose prediction value. Our explanation of this behavior is that the proposed MNNF is using fuzzy logic for aggregating the results of the modules, and in some way, the uncertainty in making a prediction is being managed appropriately.

Conclusions
In this paper, a new approach with multiple ensemble neural network models and fuzzy response aggregation for the COVID-19 time series was proposed. Ensemble neural networks were used to produce several predictions under different conditions. Fuzzy logic was then used to aggregate the responses of several predictor modules, in this way, improving the final prediction by combining, in a proper way, the outputs of the modules. Fuzzy logic helps in handling the uncertainty in the process of making a final decision about the prediction. The complete model was tested for the case of predicting the COVID-19 time series in Mexico, at the level of the states and the whole country. Simulation results of the multiple ensemble neural network models with fuzzy response integration show very good predicted values in the validation data set. In fact, the prediction errors of the multiple ensemble neural networks were significantly lower than using monolithic neural networks, in this way clearly showing the advantages of the proposed approach. We have to say that the proposed model can be viewed as a general prediction model because it can be applied in other time periods of the COVID-19 time series. For example, in the case of Mexico, right now the time series show an increasing trend, which is presented in this paper, but eventually, there is be a turning point, and the series will decrease, but the model will not have any problem. This is because once we have new data with a decreasing trend, we will train the simple neural networks again and use the same architecture of multiple ensembles and fuzzy aggregators to produce the new predictions in a decreasing fashion.
As future work, we plan to apply the same type of model to other COVID-19 data sets from other countries. In addition, we can also consider other time-series prediction problems, like in finance or economics. Also, regarding the model, we can optimize the structure of the neural networks using meta-heuristics, and we can use type-2 fuzzy logic in the response integration, expecting that results should improve, like in related works [33,34]. Finally, we envision improving the work in this paper by using adaptive fuzzy and neural network techniques, like in [35,36], or applying the proposed models in other kinds of applications [37,38].