Covid-19 Outbreak Progression in Italian Regions: Approaching the Peak by the End of March in Northern Italy and First Week of April in Southern Italy.

Epidemiological figures of the SARS-CoV-2 epidemic in Italy are higher than those observed in China. Our objective was to model the SARS-CoV-2 outbreak progression in Italian regions vs. Lombardy to assess the epidemic's progression. Our setting was Italy, and especially Lombardy, which is experiencing a heavy burden of SARS-CoV-2 infections. The peak of new daily cases of the epidemic has been reached on the 29th, while was delayed in Central and Southern Italian regions compared to Northern ones. In our models, we estimated the basic reproduction number (R0), which represents the average number of people that can be infected by a person who has already acquired the infection, both by fitting the exponential growth rate of the infection across a 1-month period and also by using day-by-day assessments based on single observations. We used the susceptible-exposed-infected-removed (SEIR) compartment model to predict the spreading of the pandemic in Italy. The two methods provide an agreement of values, although the first method based on exponential fit should provide a better estimation, being computed on the entire time series. Taking into account the growth rate of the infection across a 1-month period, each infected person in Lombardy has involved 4 other people (3.6 based on data of April 23rd) compared to a value of R0 = 2.68, as reported in the Chinese city of Wuhan. According to our model, Piedmont, Veneto, Emilia Romagna, Tuscany and Marche will reach an R0 value of up to 3.5. The R0 was 3.11 for Lazio and 3.14 for the Campania region, where the latter showed the highest value among the Southern Italian regions, followed by Apulia (3.11), Sicily (2.99), Abruzzo (3.0), Calabria (2.84), Basilicata (2.66), and Molise (2.6). The R0 value is decreased in Lombardy and the Northern regions, while it is increased in Central and Southern regions. The expected peak of the SEIR model is set at the end of March, at a national level, with Southern Italian regions reaching the peak in the first days of April. Regarding the strengths and limitations of this study, our model is based on assumptions that might not exactly correspond to the evolution of the epidemic. What we know about the SARS-CoV-2 epidemic is based on Chinese data that seems to be different than those from Italy; Lombardy is experiencing an evolution of the epidemic that seems unique inside Italy and Europe, probably due to demographic and environmental factors.


Epidemiological Figures
According to the Italian National Institute of Health (ISS), by March 29th in Italy there were more than 101,000 people who tested positive for SARS-CoV-2 since the beginning of the epidemic (75,500 currently positive and 14,600 healed) [1]. About 58% of cases are male (median age: 62 years old). Detailed epidemiological figures provided by the Italian National Institute of Health tell us that men represent the majority of cases between 50 and 89 years old (range of 55%-66%), while in the younger age groups, males and females are equally represented among people who tested positive for SARS-CoV-2. Men also represent the vast majority of deceased people in all age groups from 30 to 89 years old (range 66%-82%) [2].
Regional and Campania (n = 126), all other regions of Central and Southern Italy at the moment have less than 100 patients admitted to the ICUs of their regional healthcare systems [1].
Based on these figures, it is clear that the SARS-CoV-2 outbreak is now putting overwhelming pressure mainly on Lombardy and the Northern regions of the Po Valley (Padana Plain), but the peak of the epidemic has not yet been reached. Until now, Southern regions seemed to be less affected by SARS-CoV-2, although a huge number of people, mainly students attending universities in Northern Italy, came back from Po Valley to their families in the South in the middle of the outbreak, thus representing a potential factor able to accelerate the spread of the viral infection.
Here we present an attempt to predict the peak of the outbreak in Italy, which is expected to reach a national level by the end of March, and the different progression of the epidemic in Southern Italian regions compared to Lombardy. Based on these figures, it is clear that the SARS-CoV-2 outbreak is now putting overwhelming pressure mainly on Lombardy and the Northern regions of the Po Valley (Padana Plain), but the peak of the epidemic has not yet been reached. Until now, Southern regions seemed to be less affected by SARS-CoV-2, although a huge number of people, mainly students attending universities in Northern Italy, came back from Po Valley to their families in the South in the middle of the outbreak, thus representing a potential factor able to accelerate the spread of the viral infection.
Here we present an attempt to predict the peak of the outbreak in Italy, which is expected to reach a national level by the end of March, and the different progression of the epidemic in Southern Italian regions compared to Lombardy.

Modeling the Covid-19 Outbreak Progression in Southern Italian Regions vs. Lombardy
The basic reproduction number (R0) is an indicator of the average number of people that can be infected by a person who has already acquired the infection. R0 is a metric of how contagious the disease is. Its correct estimation is extremely important for epidemiologists, especially when facing new diseases like SARS-CoV-2. R0 can be computed in different ways. In our models, we estimated the basic reproduction number (R0) both by fitting the exponential growth rate of the infection across a 1-month period and also by using a day-by-day assessment based on single observations [3]. This study makes use of the susceptible-exposed-infected-removed (SEIR) compartment model [4] to predict the spread of the pandemic in Italy. Our efforts could be helpful in the adoption of all the possible preventive measures, and to study the epidemic's progression across Southern regions as opposed to the national trend. This metric can be biased by the optimal estimation of the basic reproductive number R0. It must be said that R0 is important if correlated with weather conditions, and that the reproductive index is reduced as the air temperature and relative humidity increases [5], according to the formula:

Modeling the Covid-19 Outbreak Progression in Southern Italian Regions vs. Lombardy
The basic reproduction number (R 0 ) is an indicator of the average number of people that can be infected by a person who has already acquired the infection. R 0 is a metric of how contagious the disease is. Its correct estimation is extremely important for epidemiologists, especially when facing new diseases like SARS-CoV-2. R 0 can be computed in different ways. In our models, we estimated the basic reproduction number (R 0 ) both by fitting the exponential growth rate of the infection across a 1-month period and also by using a day-by-day assessment based on single observations [3]. This study makes use of the susceptible-exposed-infected-removed (SEIR) compartment model [4] to predict the spread of the pandemic in Italy. Our efforts could be helpful in the adoption of all the possible preventive measures, and to study the epidemic's progression across Southern regions as opposed to the national trend. This metric can be biased by the optimal estimation of the basic reproductive number R 0 . It must be said that R 0 is important if correlated with weather conditions, and that the reproductive index is reduced as the air temperature and relative humidity increases [5], according to the formula: This means that the transmission of SARS-CoV-2 could decrease with the warmer season, and that some specific figures of the outbreak in Lombardy and Po Valley can be explained by taking into account climatic variables.

Modeling the Basic Reproductive Number R 0 Exponential Framework Estimation
The exponential estimation is based on the work of Wu et al. [6], which was based on that of Zhao et al. [7], where the epidemic curve obeyed an exponential growth. As of the date of this study (23 April 2020), the epidemic growth was still near exponential, and the fitted model had many inlier data points.
The method is based on a non-linear least square framework for intrinsic growth estimation γ, in order to obtain R 0 = 1 M(−γ) , with M being the Laplace transforming the probability distribution of the serial interval T g of the infection. The R 0 estimation is obtained with 100% susceptibility for SARS-CoV-2 at the early stage in Wuhan, as reported in [8]. In Figure 2, the R 0 number estimates are computed for the Southern Italian regions and for the initial outbreak region (Lombardy). According to our model, in Lombardy, each infected person has involved 4 people (3.6). The R 0 lowers to 3.14 for Campania, which shows the highest value among the Southern Italian regions, followed by Apulia

Daily Basis Estimation of the Reproductive Number
R 0 is an average value, but it can also be computed day-by-day to monitor the transmission of the infection. Being an average value, it can be skewed by super-spreader events. A super-spreader is when an infected individual infects an unexpectedly large number of people. In Italy, this event can be also generated not necessarily by an individual, but from the perturbation of a susceptible population, as occurred in Apulia and Sicily with an uncontrolled large group of people coming from areas experiencing an outbreak. For a super-spreader, such events are not necessarily a bad sign as they can indicate that fewer people are perpetuating an epidemic. Super-spreaders may also be easier to identify and contain, since their symptoms are likely to be more severe. In short, R 0 is a moving target. Tracking every case and the transmission of a disease is extremely difficult, so the estimation of R 0 is a complex and challenging issue; estimates often change as new data becomes available.
If we define the Y(t) as the number of infected people with symptoms at time t, the exponential growth rate is λ = ln(Y(t)/t).
Let us consider T g = 7.5 as the generation time (i.e., the serial interval) and T l = 5.2 as the latent or incubation time (values taken from [6]). The infectious time T i = T g − T l , and the ratio of exposed period to generation time is ρ = T l /T g . The basic reproductive number can be approximated to: In order to estimate R 0 , it is important to find λ and then the number of infected people: suspect" corresponds to the number of individuals screened with the test which have been confirmed. Figure 3 and Table 1 show the estimated R 0 values, computed on a daily basis for the Italian regions and for the initial outbreak in Lombardy, where about 40 cases were confirmed out of 100 suspects (Figure 4). The two methods provide an agreement of values, although the first method based on an exponential fit should provide a better estimation, having been computed on the entire time series. From Figure 3, it becomes an important aspect with respect to the Wuhan R 0 = 2.68 as reported in [6].

Daily Basis Estimation of the Reproductive Number
R0 is an average value, but it can also be computed day-by-day to monitor the transmission of the infection. Being an average value, it can be skewed by super-spreader events. A super-spreader is when an infected individual infects an unexpectedly large number of people. In Italy, this event can be also generated not necessarily by an individual, but from the perturbation of a susceptible population, as occurred in Apulia and Sicily with an uncontrolled large group of people coming from areas experiencing an outbreak. For a super-spreader, such events are not necessarily a bad sign as they can indicate that fewer people are perpetuating an epidemic. Super-spreaders may also be easier to identify and contain, since their symptoms are likely to be more severe. In short, R0 is a moving target. Tracking every case and the transmission of a disease is extremely difficult, so the estimation of R0 is a complex and challenging issue; estimates often change as new data becomes available.
If we define the Y(t) as the number of infected people with symptoms at time t, the exponential growth rate is = ln( ( )/ ).
Let us consider = 7.5 as the generation time (i.e., the serial interval) and = 5.2 as the latent or incubation time (values taken from [6]). The infectious time = − , and the ratio of exposed period to generation time is = / . The basic reproductive number can be approximated to: In order to estimate , it is important to find and then the number of infected people: ( ) = × + "suspect" corresponds to the number of individuals screened with the test which have been confirmed. Figure 3 and Table 1 show the estimated values, computed on a daily basis for the Italian regions and for the initial outbreak in Lombardy, where about 40 cases were confirmed out of 100 suspects (Figure 4). The two methods provide an agreement of values, although the first method based on an exponential fit should provide a better estimation, having been computed on the entire time series. From Figure 3, it becomes an important aspect with respect to the Wuhan = 2.68 as reported in [6].

Modeling Transmission in Italy
We used the susceptible-exposed-infectious-recovered (SEIR) model [4,7] to simulate the epidemic since it was established on January 2020. It is based on a previous model, SIR, which was based on three compartments, but since the infection has an incubation period, the compartment E (exposed) was included as shown in Figure 5. These compartments are modeled over time and capture the changes in the population. Let us say that, given N is the total population, then N = S + E + I + R, where: We used the susceptible-exposed-infectious-recovered (SEIR) model [4,7] to simulate the epidemic since it was established on January 2020. It is based on a previous model, SIR, which was based on three compartments, but since the infection has an incubation period, the compartment E (exposed) was included as shown in Figure 5. These compartments are modeled over time and capture the changes in the population. Let us say that, given N is the total population, then N = S + E + I + R, where: "S" (susceptible): the portion of the population that does not have any vax coverage or immunity; "E" (exposed): the portion of the population that has been infected but is in the incubation period that does not infect other individuals; "I" (infectious): the portion of N that is infectious and may infect others, resulting in either death or recovery; "R" (recovered): the number of infectious people who have healed and become immune. This model captures the dynamics of these compartments over time by four ordinary differential equations. One of the most important aspects of these ordinary differential equations is equilibrium, which is achieved by setting their derivatives to 0 along time t. The two equilibriums are disease-free equilibrium (DFE) and endemic equilibrium (EE).
Besides equilibrium, stability is an issue correlated with the basic reproductive number, where DFE is stable if < 1; when > 1 DFE is unstable and EE is stable. The four equations are: where = / , = 1/ , and = 1/ , with and as defined above being the serial and incubation period, respectively. The contact rate is the rate of infection from an infected individual to a susceptible contact on the unitary time step d . The number of individuals transferred from the susceptible state to the exposed state is • • Δ . The force of infection is defined as * ( )/ , which is the number of new infections divided by the Italian population. At the same time step, there are ( )Δ number of cases that are transferred from the exposed to infectious compartment, and ( )Δ number of cases transferred from the infectious compartment to "removed". It is important to state that we assumed a closed population, which means the population is fixed with no births, no deaths, and no introduction of new individuals. From the above ODE system, [ ( ) + ( ) + ( ) + ( )] = 0, which means that the population N is constant at any time step : S( ) + ( ) + ( ) + ( ) = for any ≥ 0 .The individuals in the exposed state are infected but not yet infectious. The population is well-mixed, and the model assumes that the latent and infectious times of the pathogen are exponentially distributed. In this letter, contact rate is changing over time as "S" (susceptible): the portion of the population that does not have any vax coverage or immunity; "E" (exposed): the portion of the population that has been infected but is in the incubation period that does not infect other individuals; "I" (infectious): the portion of N that is infectious and may infect others, resulting in either death or recovery; "R" (recovered): the number of infectious people who have healed and become immune. This model captures the dynamics of these compartments over time by four ordinary differential equations. One of the most important aspects of these ordinary differential equations is equilibrium, which is achieved by setting their derivatives to 0 along time t. The two equilibriums are disease-free equilibrium (DFE) and endemic equilibrium (EE).
Besides equilibrium, stability is an issue correlated with the basic reproductive number, where DFE is stable if R 0 < 1; when R 0 > 1 DFE is unstable and EE is stable. The four equations are: where β = R 0 /T i , α = 1/T l , and γ = 1/T i , with T i and T l as defined above being the serial and incubation period, respectively. The contact rate β is the rate of infection from an infected individual to a susceptible contact on the unitary time step dt. The number of individuals transferred from the susceptible state to the exposed state is β·S·I N ∆t. The force of infection is defined as β * S(t)/N, which is the number of new infections divided by the Italian population. At the same time step, there are αE(t)∆t number of cases that are transferred from the exposed to infectious compartment, and γI(t)∆t number of cases transferred from the infectious compartment to "removed". It is important to state that we assumed a closed population, which means the population is fixed with no births, no deaths, and no introduction of new individuals. From the above ODE system, d dt [S(t) + E(t) + I(t) + R(t)] = 0, which means that the population N is constant at any time step t : S(t) + E(t) + I(t) + R(t) = N for any t ≥ 0. The individuals in the exposed state are infected but not yet infectious. The population is well-mixed, and the model assumes that the latent and infectious times of the pathogen are exponentially distributed. In this letter, contact rate β is changing over time as it happens in SARS-CoV-2, which increased in the early stages due to public unawareness of the disease, then decreased with government control policy measures. The contact rate follows a logistic function trend by estimating it day by day [9]: where t is the number of days after January 31st (the first found cases in Italy), σ a regularization parameter, and b the bias. A training procedure was performed on the observable data in order to find optimal (C, σ, b).
We considered with exposed people the number of twice-infected people after lockdown, which is in line with the predictions and the observed values. As shown in Figure 6a, the new daily cases peak of 29 March is shown and predicted with a Gaussian fit, while in Figure 6b (red curve), the expected peak of the SEIR model is at the second half of April at a national level. It is expected that Southern Italian regions could reach the peak later, in the second half of April. Regarding the number of the active cases peak, it must be noted that, according to [10], several cases are undocumented, so the amplitude of the peak also takes into account a small portion of undocumented cases. In a specific study carried out China before the lockdown, the author estimated that 86% of all infections were undocumented, highlighting the importance of setting up a quarantine procedure to limit the spread. As reported in [11], mobility is another important aspect for the diffusion of the virus, where they show that travel restrictions are useful in the early stage of an outbreak. it happens in SARS-CoV-2, which increased in the early stages due to public unawareness of the disease, then decreased with government control policy measures. The contact rate follows a logistic function trend by estimating it day by day [9]: where is the number of days after January 31st (the first found cases in Italy), a regularization parameter, and the bias. A training procedure was performed on the observable data in order to find optimal ( , , ).
We considered with exposed people the number of twice-infected people after lockdown, which is in line with the predictions and the observed values. As shown in Figure 6a, the new daily cases peak of 29 March is shown and predicted with a Gaussian fit, while in Figure 6b (red curve), the expected peak of the SEIR model is at the second half of April at a national level. It is expected that Southern Italian regions could reach the peak later, in the second half of April. Regarding the number of the active cases peak, it must be noted that, according to [10], several cases are undocumented, so the amplitude of the peak also takes into account a small portion of undocumented cases. In a specific study carried out China before the lockdown, the author estimated that 86% of all infections were undocumented, highlighting the importance of setting up a quarantine procedure to limit the spread. As reported in [11], mobility is another important aspect for the diffusion of the virus, where they show that travel restrictions are useful in the early stage of an outbreak.
(a) (b) Figure 6. Prediction of the peak for Italy of newly active cases occurred on March 29 th (a), and exposed, infected, and the deceased shown in (b). An estimate of the peak of more than 180,000 active cases is shown to occur on the second half of April. Figure 6. Prediction of the peak for Italy of newly active cases occurred on March 29 th (a), and exposed, infected, and the deceased shown in (b). An estimate of the peak of more than 180,000 active cases is shown to occur on the second half of April.

Conclusions
This paper has introduced the study of SARS-CoV-2 in Italy, by studying the evolution of the epidemic at regional level. We have experimented two different techniques to compute the basic reproduction number, one related to an estimate on a daily basis and the other on the studied period. The daily basis estimation is useful for the used compartment SEIR model to estimate the epidemic peak at national level. We showed correct prediction on the new daily cases peak occurred on March 29 th , and gave an estimate of the active cases peak at national level. Future works will be oriented to forecast new daily cases by training deep neural networks methods, whose results will be inserted in Dinamyc SEIR model to reach better epidemic peak estimation.
Author Contributions: C.D., P.P., and A.M. conceived, wrote and revised the manuscript. All authors have read and agreed to the published version of the manuscript.