Marine Predators Algorithm for Forecasting Confirmed Cases of COVID-19 in Italy, USA, Iran and Korea

The current pandemic of the new coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), or COVID-19, has received wide attention by scholars and researchers. The vast increase in infected people is a significant challenge for each country and the international community in general. The prediction and forecasting of the number of infected people (so-called confirmed cases) is a critical issue that helps in understanding the fast spread of COVID-19. Therefore, in this article, we present an improved version of the ANFIS (adaptive neuro-fuzzy inference system) model to forecast the number of infected people in four countries, Italy, Iran, Korea, and the USA. The improved version of ANFIS is based on a new nature-inspired optimizer, called the marine predators algorithm (MPA). The MPA is utilized to optimize the ANFIS parameters, enhancing its forecasting performance. Official datasets of the four countries are used to evaluate the proposed MPA-ANFIS. Moreover, we compare MPA-ANFIS to several previous methods to evaluate its forecasting performance. Overall, the outcomes show that MPA-ANFIS outperforms all compared methods in almost all performance measures, such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Squared Relative Error (RMSRE), and Coefficient of Determination(R2). For instance, according to the results of the testing set, the R2 of the proposed model is 96.48%, 98.59%, 98.74%, and 95.95% for Korea, Italy, Iran, and the USA, respectively. More so, the MAE is 60.31, 3951.94, 217.27, and 12,979, for Korea, Italy, Iran, and the USA, respectively.


Introduction
Coronaviruses are a family of viruses that are serious pathogens of people. They result in gastrointestinal, hepatic, neurological, and severe respiratory diseases. Their main distributions are among humans, bats, mice, livestock, and wild animals [1][2][3]. The last two decades witnessed three outbreaks of coronaviruses, called SARS-CoV, MERS-CoV, and SARS-CoV-2 (COVID- 19), in 2003, 2012, and 2019, respectively. These three outbreaks have confirmed human-to-human and animal-to-animal transmission [4]. two types of strategies, called Lévy and Brownian motion, which are selected by the predators for optimal foraging. Therefore, in this study, we leverage the MPA to optimize the ANFIS parameters.
In our previous study [19], we proposed an enhanced ANFIS forecasting model, called FPASSA-ANFIS. We forecasted the number of infected people in China. Although the proposed model showed good performances, using two metaheuristics, salp swarm algorithm (SSA) and flower pollination algorithm (FPA), was a little complex. However, it was found that it needs more improvements, especially to deal with large-scale datasets, and also, its exploration ability is less effective than its exploitation. Therefore, this study applied a new metaheuristic method called the marine predators algorithm (MPA) [32]. This algorithm simulates the strategy that represents the relation between the predator and prey in the ocean by using the Brownian and Lévy movements. Our developed MPA-ANFIS approach begins by setting the initial value for its parameters. Then, this is followed by splitting the historical data of COVID-19 for the specified country into two sets of training and testing. Then, we set the initial value for a set of solutions that indicate the configuration of the parameters of the ANFIS network. Thereafter, we compute the performance of the ANFIS model using the training set and the current configuration/solution using the root mean squared error (RMSE) as an objective function. The next step is to determine the best configuration of the parameter. We then use the operators of MPA to update the other solutions. After reaching the terminal condition, the best solution is used to build the ANFIS model and the testing set to assess the constructed ANFIS model. This next step is the forecasting of COVID-19.
The primary contributions and objectives are listed as follows: 1.
We propose a robust time-series model for forecasting the number of infected people (confirmed cases) of SARS-CoV2 in several countries, Iran, Italy, Korea, and the USA.

2.
We improve the performance of the ANFIS model using a novel optimization method, MPA, which has not been applied in previous studies since the MPA is a new algorithm proposed in recent months.

3.
We evaluate the proposed MPA-ANFIS with official datasets and by comparing it with several previous forecasting methods.
The rest of sections of this study are arranged as follows: Section 2 consists of the preliminaries of ANFIS and MPA. Section 3 presents the MPA-ANFIS method. Experiments and results are described in Section 4. Finally, the conclusion is presented in Section 6.

Adaptive Neuro-Fuzzy Inference System
In general, ANFIS creates a mapping between inputs and outputs by employing "IF-THEN rules" (also known as the "Takagi-Sugeno inference model"). The basic structure of ANFIS is shown in Figure 1. As shown in this figure, the inputs of Layer 1 are represented by x and y, where the output of node i is represented by O 1i , as follows: Equation (4) defines the output of Layer 3: where w i is the ith nodes output from the previous layer. The output of Layer 4 is represented by Equation (5): where where f is a function that combines the inputs and parameters of network. The consequent parameters of node i are represented by r i , q i , and p i . Finally, the output of Layer 5 is represented by Equation (6):

Marine Predators Algorithm
In this section, the formulation of the marine predators algorithm is introduced [32]. Similar to other metaheuristic (MH) techniques, the MPA starts by assigning random values for a set of solutions depending on the search space, and this is formulated as: In Equation (7), LB refers to the lower boundary in the search space, while UB is the upper boundary. r 1 ∈ [0, 1] is the random number. The MPA has a strategy that considers the prey and predator as a search agent since when the predator searches for the prey, the prey itself is searching for its food. Therefore, the elite (matrix of the top predators) will be updated at the end of each generation. The formulation of the elite and prey (X) is given as [32]: . . . . . . . . . . . .
The next step is to update the position of prey X, which is performed using three stages depending on the variant ratio of velocity simultaneously emulating the entire relation between prey and predator. The details of each stage are discussed in the following subsections.

Stage 1: High-Velocity Ratio
In this stage, the predator is moving faster than X in the exploration phase, and this occurs in the first third of the total number of generations (i.e., 1 3 t max ). Therefore, the prey S i is updated using the following equations.
where R ∈ [0, 1] and P = 0.5 represent a vector of uniform random numbers and a constant number, respectively. R B represents a random vector that refers to the Brownian motion.
indicates the process of element-wise multiplications.

Stage 2: Unit Velocity Ratio
In this stage, the prey and predator are moving in the same area, and this movement simulates the process of searching for the prey/food. Furthermore, this refers to the process of changing the status of the MPA from exploration to exploitation. Actually, both of them have the same chance to occur during this stage. Following [32], exploration is performed using the predator, while exploitation is performed by the prey. It is assumed that the Lévy flight and Brownian motion represent the prey movement and the predator, respectively, and this is defined as in Equations (11) and (12) when 1 3 t max < t < 2 3 t max : where R L represents random numbers following a Lévy distribution. Equations (11) and (12) are applied to the first half of the population that represents the exploitation. While for the second half of the population: where CF is the parameter that controls the step size of movement for the predator and t max represents the total number of generations.

Stage 3: Low-Velocity Ratio
This stage is the last process in the optimization process, which occurs when the movement of the predator is faster than the prey. This refers to the exploitation phase when t > 2 3 t max , and this is formulated as:

Eddy Formation and FADs' Effect
There are issues of the environment that affect the behavior of marine predators such as fish aggregating devices (FADs). The effect of FAD is formulated as: In Equation (17), FAD = 0.2, and U is a binary solution, and this is preformed by generating a random solution, then converting it to a binary solution using the threshold 0.2. r ∈ [0, 1] represents a random number. r 1 and r 2 are the indices of the prey.

Marine Memory
Following [32], the marine predator has a memory that remembers the good position that it has reached. In general, the fitness value of each solution is compared with the previous fitness value, and the best one is saved in memory. The pseudo-code of MPA is presented below.

The Proposed Method
This section introduces the proposed method called PMA-ANFIS. The goal of PMA-ANFIS is to forecast the number of cases of COVID-19 in four countries, namely Italy, the USA, Iran, and Korea.
The proposed method improves ANFIS by optimizing its parameters. The ANFIS model was selected because it is widely used in many forecasting tasks. It also can work effectively with uncertainty, fuzziness, and ambiguity in the problem. MPA is a new optimization algorithm; it shows good performance in selecting the best ANFIS parameters compared to other methods.
PMA-ANFIS is constructed using the five layers of the ANFIS model, where the Layer 1 receives the input data, and Layer 5 produces the results. The main goal of FPA is to optimize the ANFIS weights that lie between Layers 4 and 5. This process works in the training phase.
PMA-ANFIS receives the number of confirmed cases and their dates. Then, the input data are formed by the proposed method to be in a time-series format. Due to the data diversity in the four countries, the autocorrelation function (ACF) is applied to perform this step. It searches for patterns in the data and helps select the best one. It is recommended that a number greater than 0.2 be considered; therefore, in this study, 6 lags were selected for the USA dataset, 5 lags for both the Korean and Iranian datasets, and 7 lags for the Italian dataset. With these settings, the input data were formed.
The entire dataset was divided into two groups. The first group (i.e., training set) contained 75% of the data, while the rest was used as a testing set. ANFIS applies the fuzzy c-means method, and the cluster number was set to seven.
To evaluate the quality of the candidate parameters, the mean squared error (MSE) was applied (as in Equation (18)). The MSE computes the error between the target and the produced data.
where g indicates the target data. d is the output of the produced data. The size of the population is defined by the variable N a . As the optimization method, MPA-ANFIS starts by creating a population (X) to represent the problem population. After that, the objective function is applied to test the solutions individually. In each iteration, the value of the MSE is checked, and the solution that has the lowest value of MSE is saved as the best solution. MPA-ANFIS works and loops its steps until meeting the stop criterion, and the best parameter of ANFIS is passed to the testing stage. The optimized ANFIS model is used to compute the final results in the testing stage.
MPA-ANFIS was evaluated using well-known performance measures, namely root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination (R 2 ). The MPA-ANFIS stages are illustrated in Figure 2.

Data
We used the datasets of reported cases of COVID-19 in four countries. They were obtained from the website of the World Health Organization (WHO) [5]. The datasets included the daily confirmed cases in four countries, the USA, Korea, Iran, and Italy. The total days for each country equaled 77 days, from 22 January 2020 to 7 April 2020. Seventy-five percent of the dataset was applied to train the proposed method, and the rest was applied in the testing phase.

Performance Measures and Parameter Setting
In this study, a set of metrics was used to assess the MPA-ANFIS approach and other models. These metrics are defined in Table 1.
In Table 1, N s , Yp, and Y are the number of samples, the original COVID-19 dataset, and its prediction, respectively. The average of Y is given by Y. The model that had the smallest values for the metrics (except a high value for R 2 ) was the best one.

Root Mean Squared Error (RMSE)
In addition, Table 2 shows the value for all compared algorithm, including original adaptive neuro-fuzzy inference system (ANFIS), and enhanced ANFIS with genetic algorithm (GA), particle swarm optimizer(PSO), Artificial bee colony (ABC), the hybridized of flower pollination algorithm and salp swarm algorithm (SSAFPA), sine-cosine algorithm (SCA) that were used in our comparison. There were general parameters that would be used over all the tested algorithms, such as the number of solutions was set to 25. The total number of generations was 100; also, each algorithm was performed 30 times in independent runs [26,27,33,34]

Results
The comparison results between the forecasting COVID-19 model based on MPA-ANFIS and other models are given in Tables 3-6 based on the testing set (where the bold results showed the best results). By analyzing the USA dataset, it can be observed that MPA predicted the number of cases confirmed for COVID-19 nearly the same as the target number since it had the smallest RMSE, MAE, MAPE, and RMSRE, as well as it had the highest R 2 . The performance of other models was different according to the performance measures.  By analyzing the performance of the MPA-ANFIS model using the Iranian dataset, it could be noticed that it provided better results than others among all measures except the RMSE, which was allocated to the second rank after GA. In addition, it can be seen that PSO, GA, and MPA nearly had the same performances, but MPA was allocated the first rank. Furthermore, the other three models (i.e., ABC, SCA, and FPASSA) nearly had the same behavior for Iran, except SCA was the lowest in terms of R 2 , which provided nearly 20%.
In the case of the performance of the proposed model to predict COVID-19 for Italy, it could be observed that MPA had better results in terms of RMSE, MAE, MAPE, and R 2 . However, in terms of RMSRE, GA provided the smallest value, followed by PSO and MPA, which were the second and third rank, respectively.
Finally, when Korea's COVID-19 dataset was used, it could be noticed that in terms of MAE, MAPE, and R 2 , GA based on ANFIS was the best algorithm. However, in other terms (i.e., RMSE and RMSRE), the developed MPA-ANFIS was better than the others. Figures 3-6 depict the original COVID-19 dataset and the forecasting for each country. It can be seen from Figure 3 that the overall common forecasting by the prediction methods for COVID-19 for the USA would be growth; therefore, the USA government needs to implement stricter policies to reduce the infection. For Iran (as in Figure 4), the forecasting of COVID-19 among all methods indicated that the situation would still be growth, except SCA, which predicted that it would go down and become nearly stable after several days; however, we ignored the SCA results because its R 2 was not good and the RMSE was very high. Therefore, our recommendation for Iran was to put more restrictions on people's movement and maintain social distancing since this is one of the greatest problems facing Middle-Eastern countries. From Figure 6, which represents the forecasting of COVID-19 for Italy, it could be noticed from the best algorithm MPA, PSO, GA, and FPSSA that COVID-19 will have an exponential growth. For Korea, as in Figure 5, it could be observed that the situation already became stable.

Discussion
In this paper, we proposed a modified ANFIS model using a new optimization algorithm, called MPA, to forecast the number of confirmed cases of COVID-19 in four different countries, Italy, Iran, Korea, and the USA.
By analyzing the relation of confirmed cases (RCC) between the confirmed cases and the four countries' areas, we could note that there was a positive relation in all countries. The area of Italy was the smallest one among the four countries (301,339 km 2 ), and the RCC was the highest one, equaling 10.29%, whereas, the USA had the largest area (952,5067 km 2 ), and the RCC was the smallest one, equaling 0.44%. The RCC of Korea (100,210 km 2 ) was 4.25%, and the RCC of Iran (164,8195 km 2 ) was 1.13%.
From the analysis of forecasting confirmed COVID-19 cases for the four countries, it could be observed that the confirmed cases rate increased between 2% and 42% in Italy and between 8% and 40% in the USA, whereas, in Iran and Korea, it increased between 3% and 13% and 0.5% and 3%, respectively.
In this study, we proposed an alternative forecasting COVID-19 model that depended on improving the quality of the ANFIS model using MPA. The proposed MPA used the COVID-19 datasets from four countries. The main aim of using those datasets was to test the ability of ANFIS-MPA to work with data collected from different countries, and each one of these countries had its dynamics and different internal conditions.
The results of the improved ANFIS using MPA seemed to propose that the COVID-19 curve for the USA, Iran, and Italy had an exponential form, and for Korea after 13 March, it increased with small numbers. From the previous analysis, it could be concluded that the performance of the developed MPA-ANFIS model provided better results than the other models over all the tested datasets. However, the proposed ANFIS-MPA suffered from some limitations, such as its computational time seemed to be higher than other models in some cases. In addition, ANFIS needed some improvement in its structure to avoid the over-fitting problem that occurred when the algorithm was trained using the training set, but it could not provide the optimal response when the testing set was applied to its learned model. Furthermore, the traditional MPA still needed more improvement since it was found that, by analyzing its behavior, the exploitation ability was weaker than the exploration ability.
For more improvement and investigation, the mobility and transportation data between countries and within a country need to be addressed in future work, which may reveal the real reason for this terrifying spread of COVID-19. However, access to these records requires more time.

Conclusions
With the rapid worldwide spread of SARS-CoV-2 (COVID- 19), it is very important to forecast the number of infected people (confirmed cases) to help governments and organizations do the necessary planning to face this severe pandemic. To this end, this study proposed an efficient forecasting model using an enhanced ANFIS model. The MPA was used to optimize the ANFIS parameters. The proposed MPA-ANFIS was used to forecast the number of infected people in four different countries, namely Italy, Iran, Korea, and the USA, using the historical records of these countries that have been updated daily since the beginning of 2020. The evaluation of the proposed MPA-ANFIS was implemented by comparing it to some exiting forecasting models. The outcomes showed that MPA-ANFIS could forecast the number of cases based on the time-series data. Over all the experiments, MPA-ANFIS outperformed all compared models on several measures, such as MAPE, RMSRE, MAE, R 2 , and RMSE.
In future work, the forecasting of the number of confirmed cases of COVID-19 can be improved using the mobility and transportation data of each country, which may explain the rapid rise and spread of the COVID-19.