Numerical identification of epidemic thresholds for susceptible-infected-recovered model on finite-size networks

Epidemic threshold has always been a very hot topic for studying epidemic dynamics on complex networks. The previous studies have provided different theoretical predictions of the epidemic threshold for the susceptible-infected-recovered (SIR) model, but the numerical verification of these theoretical predictions is still lacking. Considering that the large fluctuation of the outbreak size occurs near the epidemic threshold, we propose a novel numerical identification method of SIR epidemic threshold by analyzing the peak of the epidemic variability. Extensive experiments on synthetic and real-world networks demonstrate that the variability measure can successfully give the numerical threshold for the SIR model. The heterogeneous mean-field prediction agrees very well with the numerical threshold, except the case that the networks are disassortative, in which the quenched mean-field prediction is relatively close to the numerical threshold. Moreover, the numerical method presented is also suitable for the susceptible-infected-susceptible model. This work helps to verify the theoretical analysis of epidemic threshold and would promote further studies on the phase transition of epidemic dynamics.

Epidemic threshold, which is one of the most important features of the epidemic dynamics, has attracted much attention recently. The existing studies have provided different theoretical predictions for epidemic threshold of the susceptible-infected-recovered (SIR) model on complex networks, while the numerical verification of these theoretical predictions is still lacking. As a result, it is very necessary to develop an effective numerical measure for identifying the SIR epidemic threshold. In this paper, the numerical identification of the SIR epidemic threshold is systematically studied. We present a numerical method by analyzing the peak of the epidemic variability to identify the epidemic threshold. To understand the effectiveness of the variability measure, the distribution of outbreaks sizes is investigated near the epidemic threshold on random regular networks. Based on the analysis of the cutoff hypothesis of the outbreak size distribution, we find that the variability measure can provide an excellent identification of the epidemic threshold. We further use the variability measure to verify the existing theoretical predictions on scale-free and real networks. The results show that the heterogeneous meanfield (HMF) prediction agrees very well with the numerical threshold, except the case that the networks are disassortative, in which the quenched mean-field (QMF) prediction is relatively close to the numerical threshold. The numerical method presented can effectively identify SIR epidemic thresholds on various networks, and could be extended to other dynamical processes such as information diffusion and behavior spreading. This work provides us a deep understanding of epidemic threshold and would promote further studies on phase transition of epidemic dynamics.

I. INTRODUCTION
Models for disease propagation are at the core of our understanding about epidemic dynamics on complex networks. 1,2 Two epidemic models of particular importance are the susceptible-infected-susceptible (SIS) and susceptibleinfected-recovered (SIR) models. 3 At each time step, an infected node can transmit a disease to each of its susceptible neighbors with probability b. At the same time, the infected nodes become susceptible again in the SIS model or recover in the SIR model with probability l. In the SIS model, a critical value of the effective transmission rate k ¼ b=l separates the absorbing phase with only healthy nodes from the active phase with a stationary density of infected nodes. Differently, no steady state is allowed in the SIR model, but a threshold still exists above which the final fraction of recovered nodes is finite. 4 The traditional theoretical study on the epidemic threshold of the SIS model is based on the heterogeneous meanfield (HMF) theory, which means that all the nodes within a given degree are considered to be statistically equivalent. 5,6 According to the HMF theory, the epidemic threshold of SIS model 7,8 is given by k HMF c ¼ hki hk 2 i , where hki and hk 2 i are the first and second moments of degree distribution P(k), 9 respectively. As the quenched structure of the network and dynamical correlations between the state of adjacent nodes is a) neglected in the HMF theory, 10 researchers have proposed an important improvement over the HMF theory-quenched mean-field (QMF) theory. The QMF theory fully preserves the actual quenched structure of the network described as its adjacency matrix, and the epidemic threshold is predicted to be [11][12][13] where K N is the maximum eigenvalue of the adjacency matrix of a given network. Considering that the existing theories more or less have some limitations (e.g., the HMF theory neglects the quenched structure of the network; QMF theory ignores dynamical correlations 14 ), some numerical methods such as the finite-size scaling analysis, 15 susceptibility, 16 and lifetime measure 17 have been proposed to check the accuracies of different theoretical predictions for the SIS model. Among these existing methods, the relatively common one for a network with finite size N is the susceptibility measure where q denotes the outbreak size. In Ref. 18, the susceptibility measure has been shown to be very effective for identifying the SIS epidemic thresholds on various networks. For another paradigmatic epidemic model, the SIR model, there have been a lot of theoretical studies on its epidemic threshold. The earliest theoretical study on the SIR epidemic threshold is based on the assumption of homogeneous mixing, showing that the SIR epidemic threshold is inversely proportional to the average connectivity hki. 3 At the HMF level, 19 the epidemic threshold of SIR model takes the value On networks with power-law scaling PðkÞ $ k Àc , 9,20 where c is the degree exponent, the HMF approach predicts a vanishing threshold for scale-free networks with c 3 and the finite threshold for c > 3. 4 Mapping the SIR model to a bond percolation process, 21 the epidemic threshold coincides with the result of Eq. (3) for a SIR model with unit infection time (e.g., l ¼ 1), and when the infection times vary among infected nodes (e.g., l < 1), the epidemic threshold is given by 4 According to the QMF theory, 11 the epidemic threshold of the SIR model has the same expression as Eq. (1). However, the QMF result is even qualitatively not correct, as the QMF predicts a vanishing threshold for power-law distributed networks with c > 3 that is in conflict with the visually numerical results. 22 Although the numerical threshold of the SIS model has attracted much attention, [15][16][17][18] the systematic study of the numerical identification of the SIR epidemic threshold is still insufficient. It is well known that the outbreak size becomes finite above the threshold k c . 21 However, as the value of k increases, the outbreak size continuously changes from an infinitesimal fraction to a finite fraction in the SIR model, 23 and thus, it is difficult to determine the value of k at which the outbreak size turns to be finite. To our knowledge, there has no effective numerical method for identifying the SIR epidemic threshold in previous studies. In this work, we perform extensive numerical simulations of the SIR model on networks with finite size, and present a numerical identification method by analyzing the peak of the epidemic variability 24,25 (i.e., the maximal value of the epidemic variability) to identify the epidemic threshold. The effectiveness of the numerical measure is checked on random regular networks (RRN), where the HMF theory is exact. To get a deep understanding of the validity of the numerical method, we investigate the distribution of outbreaks sizes near the epidemic threshold. The robustness of the variability measure is confirmed by the analysis of cutoff hypothesis of the outbreak size distribution. We further employ the variability measure to verify the theoretical predictions on scale-free networks and real-world networks, where the results indicate that the HMF prediction agrees very well with the numerical threshold, except the case that the networks are disassortative, in which the QMF prediction is relatively close to the numerical threshold.

II. AN EFFECTIVE NUMERICAL IDENTIFICATION MEASURE
In this section, we give the detailed description of the simulation of SIR model, propose the numerical identification measure of the epidemic threshold, and make deep analysis of the effectiveness of the numerical measure. In the SIR model, each node can be one of three states which are the susceptible state, infected state, and recovered state, respectively. At the beginning, one node is randomly selected as the initial infected (i.e., seed), and all other nodes are susceptible. At each time step t, each susceptible node i becomes infected with probability 1 À ð1 À bÞ n i if it has one or more infected neighbors, where n i is the number of its infected neighbors. At the same time, all infected nodes recover (or die) at rate l and the recovered nodes acquire permanent immunity. Time increases by Dt ¼ 1, and the dynamical process terminates when all infected nodes are recovered. In this paper, l is set to 1, unless otherwise specified.

A. Proposing the numerical identification measure
The susceptibility measure can not only identify an effective SIS epidemic threshold, 18 but can also be used to determine the critical point of the percolation process. 26 Since the connection between SIR and bond percolation is made through the assimilation of the size of the percolating giant component with the final number of recovered individuals, 21 we check the effectiveness of susceptibility measure v for the SIR model on RRN, where all nodes have exactly the same degree k. On these networks, the HMF prediction We compare the HMF prediction with the numerical threshold identified by the susceptibility measure k v p in Fig. 1(a), where the result shows that the numerical threshold of the SIR model identified by v is significantly larger than 1=ðk À 1Þ.
Considering that the fluctuation of the outbreak size is large near the epidemic threshold, we try to identify the epidemic threshold by the variability measure 24,25 which is a standard measure to determine the critical point in equilibrium phase on magnetic system. 27 The inset of Fig.  1(a) shows that the variability D exhibits a peak over a wide range of k, so we estimate the epidemic threshold from the position of the peak of the variability k D p . On RRN with different values of k, we find that k D p is always consistent with the HMF prediction. When the degree k is given, we further consider the relationship between the epidemic threshold and the network size N in Fig. 1(b), where the numerical thresholds k v p and k D p do not change with N. k D p is very close to the HMF prediction, while there is an obvious gap between k v p and HMF prediction. From the above, we know that the variability D can identify an effective SIR epidemic threshold, while the epidemic threshold identified by the susceptibility v is overestimated on RRN. Thus, a new problem has arisen: Why the variability D performs well but the susceptibility v goes awry for the SIR model?
B. Analysis of the effectiveness of numerical identification measure Next, we make an analysis of the effectiveness of the numerical identification measures above, by investigating the distribution of outbreak sizes which has the strong heterogeneity near the epidemic threshold. On RRN with k ¼ 10, where the epidemic threshold k c ¼ 1=ðk À 1Þ ¼ 1=9, Fig.  2(a) shows the distribution of outbreak sizes near k c . The outbreak sizes follow approximately an exponential distribution at k ¼ 0:1 that is smaller than k c . At k ¼ k c , the outbreak sizes follow a power-law distribution PðqÞ $ q a with a cutoff at some values, where a ' À1:5. [28][29][30] Since the disease may die out quickly or infect a subset of nodes when k > k c , the distribution of outbreak sizes is bimodal, 31,32 with two peaks occurring at q ¼ 1=N and q ' 0:2 for k ¼ 0:12, respectively.  Moreover, the theoretical distribution of outbreak sizes (see Appendix for details) is compared with the results obtained by numerical simulations in Fig. 2(a), where the theoretical probability from Eq. (A4) is consistent with the numerical results for relatively small outbreak sizes (q < 0:05). At the epidemic threshold, the theoretical results also show that the outbreak sizes obey a power-law distribution with the exponent of about À1.5. When k > k c , some large outbreak sizes constitute a lump in the numerical scattergram, but the probability of large outbreak sizes cannot be solved from Eq. (A4). We thus speculate that the nonignorable lump may affect the numerical identification of SIR epidemic threshold for the susceptibility measure.
To verify the speculation, Fig. 2(b) investigates the effectiveness of the susceptibility measures under some cutoff hypotheses. We set the cutoff value of the outbreak size as r c , which means that only the outbreak sizes with q r c are used to numerically count the susceptibility v under the cutoff hypothesis. Three kinds of r c are considered, where r c ¼ 0:05 corresponds to the maximum value of small outbreak size before the lump appears in the numerical scattergram, r c ¼ 0:2 means that the numerical scattergram consists of a part of the lump, and r c ¼ 0:4 means that there is a complete lump in the numerical scattergram. As shown in Fig. 2(b), the susceptibility measure can indeed give a quite effective estimate of the SIR epidemic threshold when the whole lump is ignored (i.e., r c ¼ 0:05). With the increase of r c , the position of peak value of susceptibility v gradually shifts to the right for large outbreak sizes are considered. This shows that the susceptibility v loses its effectiveness in identifying the SIR epidemic threshold due to the existence of the lump.
In Fig. 2(c), the robustness of the variability D is further checked in theory. As the numerical distribution of the large outbreak sizes is concentrated, we assume that there is a lump located at r c with Pðr c Þ ¼ 1 À R q<r c PðqÞ in the theoretical probability distribution diagram of outbreak sizes while k > k c . Based on such theoretical distribution, we plot the variability measure as a function of k for different values of r c in Fig. 2(c). Since the variability D measures the heterogeneity of the outbreak size distribution, which is strongest at the epidemic threshold, 28-30 the peak position of the variability measure does not change with the position of the lump.
From the above analysis, we can conclude that the variability D is effective in identifying the epidemic threshold of SIR model, while the bimodal distribution of outbreak sizes for k > k c leads to the overestimation of the SIR epidemic threshold when using the susceptibility v.

III. VERIFICATION OF THE THEORETICAL PREDICTIONS ON SCALE-FREE AND REAL NETWORKS
We further verify the theoretical predictions on scalefree and real networks, by comparing them with the numerical thresholds from the variability D.

A. Scale-free networks
We build scale-free networks (SFN) with degree distribution PðkÞ $ k Àc based on the configuration model. 9 The so-called structural cutoff k max $ N 1=2 and natural cutoff k max $ N 1=cÀ1 (Ref. 33) are considered to constrain the maximum possible degree k max on SFN. Differently, the degreedegree correlations vanish on scale-free networks with structural off, while the disassortative degree-degree correlations exist when c < 3 for scale-free networks with natural cutoff, 33 because high degree vertices connect preferably to low degree ones in this case. We consider the SIR model on SFN with structural cutoff in Figs. 3(a) and 3(c), where the SIR epidemic threshold increases monotonically with the degree exponent c, and the variation of epidemic threshold with network size N is approximately linear in logarithmic relationship. 16 The HMF prediction k HMF c is very close to the numerical threshold k D p , while there is an obvious difference between the QMF prediction k QMF c and k D p . The SFN with natural cutoff are considered in Figs. 3(b)  and 3(d), where the variations of epidemic threshold with c and N are similar to the results on SFN with structural cutoff. When c > 3, the HMF prediction is close to the numerical threshold, while there is a gap between the QMF prediction and the numerical threshold. Since the disassortative degreedegree correlations exist for c < 3, there is a slight difference between k HMF c and k D p . In Fig. 3(d), the distinction between k HMF c and k D p becomes large with the increase of N for c ¼ 2:25, while in such case the QMF prediction is always close to the numerical threshold since the principle eigenvector is delocalized when 2 < c 5=2. 34 It can been seen from the above analysis that the numerical method presented provides quantitive indexes for the observations in Ref. 22, where the HMF theory is relatively accurate for the SIR model.

B. Real-world networks
To check the performance of the variability D on realworld networks, [35][36][37][38][39][40][41][42] Fig. 4 depicts D as a function of k for Hamsterster full network, 35 which contains friendships and family links between users of the website hamsterster.com, and Facebook (NIPS) network, 41 which contains Facebook user-user friendships. The numerical results intuitively show that the variability D always reaches a maximum value near the value of k above which the outbreak size q becomes finite. The theoretical predictions of the HMF theory and of the QMF theory are quite close to the numerical threshold identified by D on Hamsterster full network, but they become poor on Facebook (NIPS) network. Considering the difference in the behaviors of the theoretical predictions on the two networks above, the detailed comparisons between the numerical and theoretical thresholds on other real networks are presented in Table I. The results indicate that although the HMF prediction and the numerical threshold k D p ðSIRÞ are nearly the same for assortative networks, there is an obvious difference between them for the networks showing significant disassortative mixing. The QMF prediction is relatively worse than the HMF prediction for assortative networks, but the former is close to k D p ðSIRÞ for some disassortative networks (e.g., Router views, 37 CAIDI, 37 and email contacts 42 ). These findings numerically show the accuracies of existing theoretical predictions on real-world networks.

IV. CONCLUSION AND DISCUSSION
In summary, we have studied the numerical identification of SIR epidemic threshold on complex networks with finite size. We have checked the effectiveness of the susceptibility measure for SIR on RRN, where the HMF is exact. The results showed an obvious gap between the numerical threshold identified by the susceptibility v and the HMF prediction. Then, we proposed the numerical identification method by analyzing the peak of the epidemic variability, and found that the numerical threshold identified by the variability measure D agrees very well with the HMF prediction on RRN.
In order to get a deep understanding of the effectiveness of the two numerical measures above, we have analyzed the distribution of outbreak sizes near the epidemic threshold k c . The outbreak sizes follow approximately an exponential distribution when k < k c . At the epidemic threshold, the outbreak sizes follow a power-law distribution with the exponent of about À1.5. When k > k c , the numerical distribution of outbreak sizes is bimodal with two peaks occurring at q ¼ 1=N and O(1), respectively. The probability of small outbreak sizes in theory is consistent with that obtained by numerical simulations, but the probability of large outbreak sizes that constitute a lump in the numerical scattergram cannot be obtained theoretically. Based on the analysis of the cutoff hypothesis of the outbreak size distribution, we found that the susceptibility measure can give a fairly effective SIR epidemic threshold when the lump is ignored. Since the variability measure reflects the heterogeneity of the outbreak size distribution, it is always effective in identifying the epidemic threshold, where the distribution of outbreak sizes has the very strong heterogeneity.
We further employed the variability measure to verify the theoretical predictions on scale-free and real networks. The HMF prediction is close to the numerical threshold on most of the networks, but on SFN with natural cutoff and degree exponent c < 5=2, it becomes poor due to the existence of disassortative mixing. Similarly, the HMF prediction agrees well with the numerical method on real networks with assortative mixing, while it becomes very poor for disassortative networks, where the QMF prediction is relatively close to the numerical threshold. These findings provide quantitive indexes for the accuracies of existing theoretical predictions from the perspective of simulation.
As part of the discussion, we have considered the epidemic threshold for l < 1 in Fig. 5. The results on RRN and SFN all show that the numerical thresholds for l ¼ 0:1 are a little larger than those for l ¼ 0:5. As shown in the inset, l ! 0 leads to an epidemic threshold close to Eq. (4), while l ! 1 leads to an epidemic threshold close to Eq. (3). It should be pointed out that the numerical threshold of SIR model is inclined to the theoretical prediction k c ¼ hki hk 2 iþðlÀ2Þhki from the edge-based compartmental theory 43,44 when 0 < l < 1. These findings could be complementary to some existing results. 4 Moreover, we have tried applying the variability measure to the identification of the SIS epidemic threshold. As shown in Fig. 6 and Table I, the numerical threshold k D p from the variability measure agrees very well with the threshold k v p identified by the susceptibility measure, whose validity for the SIS model has been confirmed in Ref. 18. This shows that the variability measure can also provide an effective estimate of the SIS epidemic threshold.
We have put forward a numerical method for identifying the epidemic threshold for SIR model, which is also suitable for the SIS model. This method can effectively identify epidemic thresholds on various networks, and could be extended to other dynamical processes such as information diffusion and behavior spreading. Further work should be done to check the effectiveness of this method on more complicated networks (e.g., temporal networks 45 and multilayer networks 46 ). Besides, the accurate analytic approximation of the epidemic threshold for general networks remains an TABLE I. Characteristics and epidemic thresholds of real-world networks. N is the network size, k max is the maximum degree, r is the correlation coefficient of the degrees, k HMF c and k QMF c are the HMF and QMF predictions for SIR, respectively, k D p (SIR) denotes the numerical threshold of SIR identified by D, and k D p (SIS) and k v p (SIS) represent the numerical thresholds of SIS identified by D and v, respectively.  The threshold k c vs. c for l ¼ 0:1 (c) and l ¼ 0:5 (d) on SFN, where solid and empty symbols denote SFN with structural cutoff k max $ N 1=2 and natural cutoff k max $ N 1=cÀ1 , respectively. "Circles" and "squares" denote k HMF c and k D p , respectively. Inset: k c as a function of l on RRN with N ¼ 10 4 and k ¼ 6. The results are averaged over 10 2 Â 10 6 independent realizations on 10 2 networks. important problem. This work helps to verify theoretical analysis of epidemic threshold and would promote further studies on phase transition of epidemic dynamics.