Prediction of Epidemic Spread of the 2019 Novel Coronavirus Driven by Spring Festival Transportation in China: A Population-Based Study

After the 2019 novel coronavirus (2019-nCoV) outbreak, we estimated the distribution and scale of more than 5 million migrants residing in Wuhan after they returned to their hometown communities in Hubei Province or other provinces at the end of 2019 by using the data from the 2013–2018 China Migrants Dynamic Survey (CMDS). We found that the distribution of Wuhan’s migrants is centred in Hubei Province (approximately 75%) at a provincial level, gradually decreasing in the surrounding provinces in layers, with obvious spatial characteristics of circle layers and echelons. The scale of Wuhan’s migrants, whose origins in Hubei Province give rise to a gradient reduction from east to west within the province, and account for 66% of Wuhan’s total migrants, are from the surrounding prefectural-level cities of Wuhan. The distribution comprises 94 districts and counties in Hubei Province, and the cumulative percentage of the top 30 districts and counties exceeds 80%. Wuhan’s migrants have a large proportion of middle-aged and high-risk individuals. Their social characteristics include nuclear family migration (84%), migration with families of 3–4 members (71%), a rural household registration (85%), and working or doing business (84%) as the main reason for migration. Using a quasi-experimental analysis framework, we found that the size of Wuhan’s migrants was highly correlated with the daily number of confirmed cases. Furthermore, we compared the epidemic situation in different regions and found that the number of confirmed cases in some provinces and cities in Hubei Province may be underestimated, while the epidemic situation in some regions has increased rapidly. The results are conducive to monitoring the epidemic prevention and control in various regions.


Introduction
The outbreak of a new coronavirus (2019-nCoV) has spread internationally since the initial report of cases by Wuhan Municipal Health Commission, China on 31 December 2019 [1][2][3][4]. On 26 January 2020, WHO announced that there is a high risk of a 2019-nCoV epidemic in China and at a global level [5].
The analysis object of this study was the floating population who have lived in Wuhan for more than one month. Short-term migrants and students were not included. Distinct from other models of the dynamics of this epidemic, we used the information of the respondents from the CMDS and their family members to further explore the origins of Wuhan's migrant population, such as their returning destination, population characteristics, family structures and other metrics. This approach can provide practical solutions to prepare prevention strategies, and approaches to assess resources for treatment and containment of the epidemic.

Evidence before This Study
The National Health Commission of China released a report on 27 January 2020 that stated that 2019-nCoV could be transmitted not only via respiratory droplets, but also via direct contact. 2019-nCoV has now spread nationally and worldwide, and due to the lack of data on the size and origins of the floating population of Wuhan, it has been difficult for the Chinese government to arrange real-time medical resources and implement effective public health interventions.

Added Value of This Study
We used data from the Wuhan Floating Population Monitoring Survey to estimate the size and origins of the migrant population in Wuhan. We also described the socio-demographic characteristics of this population, and compared confirmed cases from different regions to estimate the epidemic with modelling techniques. We found that three-quarters of Wuhan's floating population are from Hubei Province, and that nearly 85% migrated with nuclear families. The number of members per family is 3 to 4, and most individuals are migrant workers from rural areas with low education levels. By comparing the predicted and actual values obtained from the model, we analysed the profile of the epidemic in various regions since January 25, and found that the spread of the 2019-nCoV has varied greatly between regions, and that the epidemic in some regions may be underestimated. There may also be unknowns, such as structural factors in some regions, that deserve further attention.

Implications of All the Available Evidence
The majority of the floating population left Wuhan before the city was "closed off" by authorities, so our analysis will be useful for estimating the key geographic areas for prevention and control. The results indicate that the floating population of Wuhan is centred in Hubei Province and the surrounding provinces, so local government must quickly and effectively take steps to prevent further spread of 2019-nCoV. Higher-level governments must also strengthen the assistance they are providing, such as sending medical workers and medical supplies to these areas to avoid 2019-nCoV becoming a new pandemic. At the same time, it is important to increase surveillance in areas where the epidemic may be underestimated, and promptly identify prevention and control loopholes to reduce the burden of a new round of transmission.
China has been deeply involved in the globalisation process, and even China's central and western regions have become important links in the global production and trade chain. Therefore, while our research is aimed at China in the current era of migration, this research has practical implications for global public health and disease control, as floating populations are increasing in size all over the world and relationships between countries are becoming increasingly close. Thus, other countries should pay attention to the epidemic situation in specific geographic areas of China to prevent secondary and international transmission of the 2019-nCoV.

Data Sources
The data used in this study are based on the 2013-2018 China Migrants Dynamic Survey (CMDS), and the tabulated data of 5 million migrants in Wuhan recently released by the Wuhan municipal government. This survey was carried out via a multi-stage stratified sampling method, and collected data with structured questionnaires. As survey data is limited to the mainland provinces, municipalities, autonomous regions, and Hainan Province, the population analysed in this study excluded the populations of Hong Kong and Macao. A total of 11,999 samples of the resident floating population in Wuhan from 2013 to 2018 were extracted from the survey dataset.
According to the survey design, Wuhan's floating population was defined as the population from other cities and districts, aged 15 and over, residing for more than one month in Wuhan, and not registered in Wuhan. In Table 1, the sample distribution of the resident population in Wuhan over time is presented. The sample size was 1999 in the year 2013, and 2000 for other years.

Data Processing
Using the information of the floating population and their family members in Wuhan, we analysed their return destinations and their structural characteristics by descriptive statistical methods. In Table 2, the distribution of the origins of Wuhan's floating population is presented, at the provincial level, over the past several years. The sample size is quite stable for each province over time. The province of origin for 75% of the floating population was Hubei Province, which contains the city of Wuhan, and approximately 25% of the population originated in other provinces.
The location information published in other historical survey data is limited to the province where household registration is located, due to a lack of data for 2019. The data provided by the Hubei Provincial Health Commission in 2017 includes more detailed information of prefectures, cities, districts and counties. The analysis was therefore divided into two parts; the first part comprised an analysis of the origin of Wuhan's floating population at the provincial level using historical data. The second part comprised an analysis of the floating population within Hubei Province based on 2017 data. When analysing the floating population at the provincial level, we used all samples from the previous years-i.e., the mean of 6 years of data collection-to ensure the robustness of the results, in view of the stability of sample distributions in each province over time.  Total  1999  2000  2000  2000  2000  2000  11,999  Hubei  1514  1508  1487  1465  1477  1547  8998  Henan  113  134  109  159  170  125  810  Anhui  59  58  55  53  56  46  327  Hunan  57  46  68  54  41  36  302  Jiangxi  58  40  53  57  49  34  291  Chongqing  34  29  34  33  33  35  198  Zhejiang  22  29  25  33  25  33  167  Sichuan  22  30  45  21  22  27  167  Fujian  14  17  16  15  39  19  120  Jiangsu  38  13  16  19  13  11  110  Shandong  12  18  11  13  8  12  74  Guangdong  7  8  18  18  14  8  73  Hebei 0  1  5  Tianjin  1  0  0  1  0  1  3  Shanghai  0  1  0  1  0  1  3  Inner  Mongolia  1  0  0  0  1  0  2   Xizang  0  0  1  0  0  1  2  Ningxia  0  0  1  0  0  0  1 According to the current infectious features of 2019-nCoV, which are that middle-aged and elderly people have a high risk of infection, and transmission can occur between individuals, families and communities, we assessed several main variables. These comprised age group, educational level, pattern of migration, number of migrating family members per household, type of household registration, and reasons for migration. We defined these variables in the following ways: (1) age group  was classified as under 20, 21-30, 31-40, 41-50, 51-60, and over 60; (2) educational level was divided into junior high school and below, high school/secondary school, and college and above; (3) pattern of migration was divided into independent migration, nuclear family migration, and extended family migration; (4) number of migrating family members per household was classified as 1, 2, 3, 4, and 5 or more; (5) types of household registration were divided into rural and urban household registration; (6) reasons for migration were working and doing business, family relocation, or other reasons.

Quasi-Experimental Design
The analyses assume a theoretical model of 2019-nCoV transmission. We considered a floating population of 5 million in Wuhan, who returned to their hometowns from 23 January 2020, as potential infected persons. Moreover, we added factors of demographic characteristics, the situation of medical diagnosis, government prevention and control, the number of confirmed cases, and undisclosed data to our statistical model to estimate the dynamics of the epidemic. After controlling for certain factors, we analysed the factors that were not controlled, such as government intervention and the number of statistical reports.
Specifically, we first analysed the correlation between the size of the floating population in Wuhan and the number of confirmed cases per day. Then, we examined the differences among regions and proposed a transmission rate as a reference to compare the differences in regions. In the comparative analysis, we focused on the probably underestimated number of cases and the virus transmission rate to determine the likelihood of epidemics existing in different regions.

Prediction Model Setting
Finally, we predicted the floating population of Wuhan using statistical methods and compared it with the number of 2019-nCoV confirmed cases in each region, to identify regional differences of 2019-nCoV infection. Furthermore, we predicted the forthcoming epidemic trend at the prefectureand province-level based on the proportion of Wuhan's floating population represented by people from these areas.
Human-to-human transmission of the 2019-nCoV has been confirmed. Four sets of factors that may influence regional differences appear to be involved: (1) Demographic factors, such as short-term business travellers between Wuhan and other regions, college students in Wuhan returning to their homes in other regions, Spring Festival tourists from Wuhan to other regions, and trans-regional floating populations for Spring Festival family reunions from or across Wuhan; (2) Intervention factors, such as medical treatments and governmental preventative measures; (3) Information disclosure and the information release system; and (4) Other unknown factors.
We considered all these factors, and hypothesised the social environment of 2019-nCoV transmission. First, although the government had taken the unprecedented measure of sealing off Wuhan city on 23 January 2020, we assumed that, at that time, the entire floating population of Wuhan, all short-term business travellers to Wuhan and all college students in Wuhan had returned to their hometowns throughout China, because 24 January 2020 was the Spring Festival's Eve (normally, the 2020 Spring Festival holiday from January 24 to 30). Moreover, this Spring Festival vacation period started at least a week before this date time, leaving plenty of time for these people to leave the city.
However, the number of people in Wuhan that travelled to reunite with their families in other cities during the Spring Festival vacation may be negligible, for the sealing-off of the city and other preventive measures taken across the country may have prevented their travelling. Second, the influence of the college students in Wuhan was an invariant factor, as college students are young and healthy, have fixed travelling routes, come from different regions evenly scattered across the country, and travelled to return home on or around January 10; we would assume their influence on virus transmission to different regions to be the same. Third, the medical treatment ability of regional medical centres of Hubei Province would also be the same, as the breakout emerged so fast that these regional medical centres would have had the same level of emergency-preparedness. Finally, the above factors will not change dramatically until the mass return of Wuhan's floating population after the conclusion of the Spring Festival vacation.

Description of Statistical Analysis
To estimate the floating population in the cities of Hubei Province and across the country, we must determine the floating population residing in Wuhan in 2019. As the statistics compiled by the Wuhan city government from 2019 have not been released, the data from previous years was used for this prediction. The prediction of floating population in Wuhan based on the statistics from previous years is presented in Table 3, demonstrating that there were approximately 2.43 million migrants living in Wuhan for more than six months in 2019. However, if the predictions of the statistical data were combined with survey data, which was used in this study to estimate the origin of Wuhan's floating population that return to their hometowns, there would have been a problem with inconsistent statistical strength. This would have resulted from the fact that the floating population measured by the government statistics department reflects those who have lived in Wuhan for more than 6 months, but the respondents in the survey have lived in Wuhan for over one month. A shorter defined residence time would have therefore produced a larger estimate of the population, and thus the total floating population in Wuhan, as determined from the CMDS data, was larger than the population as determined by the government statistics department.
On 22 January 2020, Xinhua News Agency (an official government media source) interviewed the mayor of Wuhan and reported that more than 5 million members of the floating population had returned to their hometowns before the Spring Festival holiday. This number stated (over 5 million) was more than twice the predicted value in this study (2.43 million), indicating that the statistical strength of the news report was based on a shorter period of residence, and this was consistent with the data we used to determine the floating population residing in Wuhan for over one month. Thus, in the absence of more rigorous and authoritative total data, we used 5 million people as Wuhan's floating population, from which to estimate the scale and distribution of those members of this population who returned to their hometown during the Festival.

National Distribution and Social Characteristics of the Origins of Wuhan's Floating Population
Based on sample survey data, in Table 4, the proportional estimation of the origins of Wuhan's floating population at a provincial level is presented, as well as the results of statistical analysis based on a floating population of 5 million. Estimation of population size is based on the total number of floating population in Wuhan (about 5 million); CI = confidence interval.
The national distribution of the migrants presents obvious spatial characteristics of circle layers and echelons at provincial level (Table 4 and Figure 1).
(1) Hubei Province is the central area of origin of Wuhan's floating population, accounting for 75% of the population, with a 95% confidence interval of (74.21, 75.76). Based on a total population of 5 million people, Wuhan's floating population with household registration in Hubei Province is approximately 3.75 million, with a 95% confidence interval of (3,710,227 to 3,788,125).
(2) Henan, Anhui, Jiangxi and Hunan Provinces belong to the first circle layer. Henan Province, home to a floating population of 337,000, had the highest proportion with respect to its total population, equating to approximately 6.7% and a 95% confidence interval of (315,401 to 360,712). Based on the analysis of city data in 2017, Xinyang, Zhumadian, Shangqiu, and Nanyang cities in Henan Province accounted for 38.82%, 20.59%, 12.94%, and 10.59% respectively, of the floating population from Henan living in Wuhan, accounting for approximately 83% of the total. The floating population proportions of Anhui, Hunan, and Jiangxi Provinces were 2.7%, 2.5%, and 2.4%, respectively, with corresponding floating populations in Wuhan of 136,000, 126,000, and 121,000 respectively.
(4) Shandong, Guangdong, Hebei, Gansu, Guangxi, Heilongjiang, Shaanxi, Shanxi and Guizhou Provinces are at the third circle layer, with a proportion of 0.19% to 0.62% and a corresponding population of 10,000 to 30,000.
(5) Some provinces and municipalities, including Qinghai, Liaoning, Yunnan, Jilin and Beijing, are located in the fourth circle layer, accounting for 0.08-0.16% of the floating population, equating to 4000-8000 people.
(6) The remaining provinces and municipalities, such as Hainan, Xinjiang, Tianjin, Shanghai, Inner Mongolia, Tibet and Ningxia, are at the fifth circle layer, with a floating population proportion of less than 0.04%, corresponding to ≤2000 people.
As presented in the table above, this population is mainly 21-40 years old, but the scale of the susceptible, high-risk and over 40 years old population is also very large. The distribution is as follows: (1) The susceptible and high-risk population is concentrated in Hubei Province. The size of the 41-50 age group is more than 800,000, that of the age group of 51-60 is 180,000, and the number of people over 60 is 40,000.
(2) Henan and Anhui Provinces have larger susceptible and high-risk populations, of more than 100,000 and nearly 40,000, respectively.
(5) The three provinces of northeast China, namely Heilongjiang, Jilin and Liaoning, have large susceptible and high-risk populations, equating to approximately 7000 in Heilongjiang and approximately 3000 in Jilin and Liaoning.
Infection of family members is a main means of transmission, and the distribution of the characteristics of floating population family migration at the provincial level are detailed in Table 5. The vast majority of the floating population migrates to Wuhan in the form of nuclear families (84.42%), and most families comprise 3-4 members (71.44%). The distribution is as follows: (1) The number of nuclear family households in the Wuhan floating population that originates from Hubei Province is 3.425 million, accounting for 62.85% of the total floating population of Wuhan, and households with 3-4 family members number 2,693,500, accounting for 53.87% of the total. The high risk of 2019-nCoV transmission within and by this population is self-evident.
(2) Families from Henan, Anhui, Hunan and Jiangxi Provinces comprise a large proportion of those in the floating population of Wuhan. Those from Henan total nearly 300,000 households, and the number of these households with 3-4 family members is more than 240,000. Approximately 110,000 families from the remaining provinces are part of the floating population of Wuhan, including nearly 100,000 3-4 family-member households from Anhui and more than 80,000 from Hunan and Jiangxi.
(3) The number of families in the floating population of Wuhan from Chongqing, Zhejiang, Sichuan, Fujian and Jiangsu municipalities and provinces is 40,000-80,000, and the number of households with 3-4 family members is 30,000-60,000.
(4) The number of families in the floating population of Wuhan that originate from 7 other provinces, namely Shandong, Guangdong, Hebei, Gansu, Guangxi, Heilongjiang and Shaanxi, is 10,000-30,000 households, and the number of households with 3-4 family members is approximately 20,000. The remaining provinces comprise fewer than 10,000 households Jiangxi.
Certain factors can easily spread the virus from homes to communities in rural areas, such as a lack of medical resources and investment, weak health prevention and control, low awareness of health, and insufficient awareness of infectious diseases. In Table 5, the floating population in Wuhan is dominated by rural households (85.14%), and working or doing business is the main reason for their having travelled to Wuhan (84.29%). Therefore, epidemic prevention and control in rural areas is of critical importance. The distribution is as follows: (1) The joint distribution of the origins of Wuhan's floating population within Hubei Province is 63.73%, equating to a population of 3,186,500, and 62.71% of these are migrant workers, equating to 3,135,500 people.
(2) Henan, Anhui, Hunan, and Jiangxi Province both have more than 100,000 households with rural household registers and migrant workers in Wuhan, and the population of those from Henan in Wuhan's floating population is approximately 300,000.
(2) Approximately 50-60 % of the population of the provinces of Hubei, Sichuan, Hebei, Fujian, Jiangsu, Hunan, Guizhou, Shandong, Shanxi, Tibet and Gansu was educated to junior high school level or below.
(3) The population in three municipalities, including Beijing, Tianjin and Shanghai, have a high level of education, with over 66% receiving tertiary education. The population of the remaining provinces had a medium-to-high educational level.
Above all, these data indicated that there is a large middle-aged and older high-risk floating population in Wuhan. Their social characteristics include having travelled to Wuhan in a nuclear family of 3-4 members, being on a rural household register, and often having a lower educational level. These characteristics are consistent with conditions favouring the wide spread of 2019-nCoV.

Distribution and Social Characteristics of the Origins of Wuhan's Migrants in Hubei Province
According to the foregoing analysis, 75% of Wuhan's floating population have registered households in Hubei Province, equating to approximately 3.75 million people. That such a large proportion of the floating population of Wuhan originate from elsewhere in Hubei Province has reduced the possibility of the epidemic spreading across the country, but all regions in Hubei Province are facing tremendous pressure from the spread of the epidemic. Therefore, we used the 2017 CMDS data to analyse the distribution of the floating population in regions within Hubei Province. Table 6 and Figure 2 present the distribution of the origins of Wuhan's floating population within Hubei Province. The proportion of the floating population gradually decreases from east to west across Hubei Province, and there are great differences between cities. The distribution is as follows: (1) Xiaogan, Wuhan, and Huanggang are in the first echelon. The proportion of the floating population who originate from these cities is high, accounting for 23.4%, 19.6%, and 14% of the total, respectively. They are a cross-regional floating population of 734,000 and a 95% confidence interval of (65.89, 81.30). The analysis of districts and counties indicates that the members the floating population who originate from the outskirts of Huangpi District and Xinzhou District flow into the main urban area of Hankou, so the epidemic situation in the outskirts of Huangpi District and Xinzhou District needs special attention. Secondly, the members of Wuhan's floating population who originate from Xiaogan comprise the largest proportion, equating to approximately 788,000 people and a 95% confidence interval of (79.81, 96.25). Members of Wuhan's floating population who originate from Huanggang comprise the third proportion, equating to approximately 52,549 people and a 95% confidence interval of (46.06, 59.58).
(2) The three Directly Managed by Province (DMP) cities (Xiantao, Qianjiang, and Tianmen) and Jingzhou belong to the second echelon, each comprising approximately 330,000 people, and each accounting for approximately 9% of the floating population of Wuhan, with a 95% confidence interval of (28,39).
(4) Xiangyang, Ezhou, Yichang, Enshi, and Shiyan belong to the fourth echelon, accounting for less than 3% of the floating population of Wuhan, equating to fewer than 100,000 people.
Overall, the suburbs of Wuhan surrounding Xiaogan, Huanggang, and the three DMP cities are the origins of the largest proportion (66%) of the floating population of Wuhan, equating to approximately 2.475 million people.  We used district-and county-level variables to estimate the floating population within Hubei Province, and the results are presented in Table 7. The survey covered 94 districts and counties, including Huangpi, Xinzhou, Jiangxia, Caidian, and Hannan in Wuhan, as well as cross-region active migrants in some major urban areas.
The top 10 districts and counties of Hubei Province in terms of floating population are Huangpi, Hanchuan, Xiantao, Xinzhou, Hong'an, Yunmeng, Honghu, Macheng, Xiaonan, and Xiaochang. That is, ≥100,000 people from each of these districts and counties are part of the floating population of Wuhan, with the top 3 districts and counties, Huangpi, Hanchuan and Xiantao, having ≥200,000 people in Wuhan's floating population.
The remaining districts and counties have fewer than 10,000 people in Wuhan's floating population.
In general, these members of Wuhan's floating population originate from certain districts and counties of Hubei Province. The cumulative percentage of the top 30 districts and counties exceeds 80% of these areas' total population, showing a clear exponential distribution trend.
We then analysed the social characteristics of the migrants in Hubei Province by age, type of migration, number of migrants, type of household registration, and reasons for traveling to Wuhan to become part of its floating population.
From Table 8 (please see the last page), we observe that in terms of susceptible and high-risk groups over 40 years old, there are approximately 300,000 people in Xiaogan, approximately 180,000 people in Wuhan (cross-region migration), and approximately 150,000 people in Huanggang. There are also approximately 100,000 people in the DMP Cities and Jingzhou respectively, and 30,000-50,000 people in Jingmen, Suizhou, Xianning, and Huangshi. Fewer than 30,000 people from each of Xiangyang, Ezhou, Yichang, Enshi and Shiyan have travelled to Wuhan.
The migration characteristics of the floating population of Wuhan from Hubei Province are detailed in Table 8. Migration with a nuclear family is the main pattern, accounting for nearly 80% of the total, or 2.985 million households. The proportion of households with 3-4 family members (i.e., nuclear families) is approximately 67%, or 2.53 million households. Specifically, 740,000 nuclear families originate from Xiaogan, 400,000-600,000 nuclear families originate from the inner suburbs of Wuhan and Huanggang, and approximately 260,000 nuclear families originate from the DMP cities and Jingzhou. More than 100,000 nuclear families originate from Jingmen, Suizhou, Xianning, and Huangshi, while fewer than 100,000 nuclear families originate from Xiangyang, Ezhou, Yichang, Enshi and Shiyan. The distribution of households with 3-4 members is similar to that of nuclear families.
It also presents the distribution of the origins of Wuhan's floating population who originate from within Hubei Province. According to the statistical results, rural household registers account for 83%, equating to a population of approximately 3.12 million. The proportion of the group who was working and doing business in urban areas is 77%, and the population is 2.89 million. The size of the population distribution in each city is similar to the aforementioned migration types and other variables, and is not reported here.
In Table 8, the overall educational level of those members of Wuhan's floating population who originate from Hubei Province is higher than the national level, with approximately 52% having been educated to junior high school level and below, approximately 29% to high school/secondary school level and below, and approximately 19% to college and above. However, in those members of Wuhan's floating population who originate from the surrounding cities of Wuhan, which contribute a large number of people to the floating population of Wuhan, namely Xiaogan, Huanggang, Huangshi, Suizhou, DMP cities, Xianning, and Ezhou, >50% of people have an educational level of junior high school and below, with this being >60% in Xiaogan. This means that the awareness of health protection and timely treatment may be low in this section of the floating population of Wuhan, which will heighten the risk of large-scale transmission of 2019-nCoV.

Prediction of Epidemic Trends within Hubei Province
The floating population in Wuhan will serve as a sound predictor for the trend of the 2019-nCoV outbreak. The Pearson's correlation coefficient between the proportion of the floating population in Wuhan who originate from a certain region of Hubei and the number of confirmed 2019-nCoV cases in each region increased from 0.65 on 25 January 2020 to 0.84 on 31 January 2020 (Table 9). This indicates that when a region contributes a higher number of people to the floating residential population of Wuhan, more confirmed cases will emerge in this region.  Table 6); 3 Ratio = Confirmed cases (on 2020/1/31)/Floating population from Wuhan (Unit: 10,000 people); 4 DMP (Directly Managed by the Province) cities includes Xiantao, Qianjiang and Tianmen; 5 The Pearson's correlation coefficient is calculated from the number of floating populations in Wuhan and the number of confirmed cases per day.
We assumed that the effect of the floating population on the transmission of the 2019-nCoV is consistent across Hubei province, and selected three prefectures that contribute the greatest number of people to the floating population of Wuhan (Xiaogan, Huanggang and Jingmeng) as the reference prefectures to predict the epidemic trend of the 2019-nCoV at prefecture level. Those prefectures can be divided into three groups since 28 January 2020 (

Prediction of Epidemic Trends outside Hubei Provinces
The floating population of Wuhan originated from outside Hubei Province may have promoted the spread of 2019-nCoV. Table 10 compares the number of individuals travelling from Wuhan to other provinces and the daily number of confirmed cases for those other provinces. Analysis revealed that the correlation coefficient at the provincial level was lower than at the prefecture level within Hubei Province, but the correlation coefficient increased from 0.4 on 25 January 2020 to 0.63 on 31 January 2020.  Table 10 also shows the ratio of confirmed cases in each province to the proportion of people in the floating population in Wuhan who originate from each of these provinces, on 28 January 2020. We divide provinces into two categories based on their short-term travel populations in Wuhan, and Wuhan's travelling population to other provinces during the Spring Festival holiday. The first category comprises those provinces that have large-scale short-term business trips or tourist populations in Wuhan during the Spring Festival holiday, namely Beijing, Shanghai, Tianjin, and Hainan. Obviously, such a high level of inter-provincial population mobility may exacerbate the spread of 2019-nCoV. For example, the high ratio of confirmed cases in Guangdong Province may be due to the large short-term travel populations visiting Shenzhen and Guangzhou and Wuhan, while the high ratio of confirmed cases in Hainan Province may result from the outbound tourist population from Wuhan to Hainan during the Spring Festival holiday. In Table 10, the results are divided into two parts: the correlation coefficient of the first category of provinces, which reaches a maximum of 0.96, and the correlation coefficient of the second category of provinces, which increased from 0.56 to 0.7. This abovementioned second category comprise the other 25 provinces that have small short-term business trip groups or tourist populations in Wuhan during the Spring Festival holiday. We assumed that the effect of the floating population on the spread of 2019-nCoV was consistent across the country. The other 25 provinces are divided into three groups since 25 January 2020 ( Figure 4): (1) Provinces with a rapid increase in the number of confirmed cases, namely Zhejiang, Shandong, Guangxi, Shaanxi, Liaoning, and Yunnan; (2) Provinces with a moderate increase in the number of confirmed cases, namely Hunan, Chongqing, Sichuan, Fujian, Jiangsu, Hebei, Gansu, Heilongjiang, Shaanxi, Guizhou, Qinghai, Jilin, Xinjiang, Inner Mongolia, Tibet, and Ningxia; and (3) Provinces with a small increase in the number of confirmed cases, namely Henan, Anhui, and Jiangxi. In Table 10, if we exclude the data of Henan Province and Zhejiang Province from the second category, we find that the correlation coefficient on 31 January 2020 is 0.93.
We selected four provinces (Henan, Hunan, Sichuan, and Zhejiang) as the reference provinces to predict the epidemic trend of 2019-nCoV in each province. We found that: (1) The epidemic growth model of Henan Province does not fit the situation in most other provinces. That is, except in Anhui and Jiangxi, the actual number of outbreaks in other provinces was higher than that predicted by the Henan model. As these provinces have large floating populations in Wuhan, the rapid increase in the number of confirmed cases in Henan, Anhui and Jiangxi may result from effective measures that have been taken to control the spread of 2019-nCoV, or the lack of sufficient diagnostic capabilities to detect suspected cases. (2) The epidemic growth model for Hunan and Sichuan Province predicts a rapid increase in the number of confirmed cases in Henan, Anhui and Jiangxi provinces. Thus, if the epidemic pattern in Hunan and Sichuan follows a typical evolutionary pattern, the current numbers of confirmed cases in the three provinces of Henan, Anhui, and Jiangxi are greatly underestimated. For example, the number of confirmed cases in Henan on 31 January 2020 would be between 860 and 889, but the number in official announcements was only 168. In contrast, the number of confirmed cases in Zhejiang, Shandong, Guangxi, Shaanxi, Liaoning, and Yunnan Provinces were higher than the predicted number, which may be affected by uncontrollable local factors that need further investigation.
(3) The epidemic growth model for Zhejiang Province predicts a rapid increase in the number of confirmed cases in most provinces, especially Jiangsu and Fujian provinces that are adjacent to Zhejiang. It is important to investigate why there were so many confirmed cases in Zhejiang, and whether the outbreak in Jiangsu and Fujian Province was not detected in a timely manner, or whether all possible cases have not yet occurred.
Overall, the predicted epidemic pattern for Hunan and Sichuan provinces fits best to the actual epidemic trend of the 2019-nCoV outbreak. However, the current number of confirmed cases in Henan, Anhui, and Jiangxi provinces is likely to be underestimated, especially given that these contain extensive rural areas with large populations and limited medical resources. The higher actual number of confirmed cases in Zhejiang, Shandong, Guangxi, Shaanxi, Liaoning, and Yunnan provinces may be affected by other unknown factors or uncontrollable random factors that need further investigation.

Discussion
To prevent or mitigate the spread of an emerging infectious disease and its negative effects, public health interventions mainly aim at three types of population, namely the population in the source area, the floating population leaving the source area, and the population travelling from the infected area to other areas. The Spring Festival in 2020 is much earlier than in previous years. At this time, the possibility of human-to-human transmission of a new coronavirus had just been discovered. When the Wuhan Municipal Government decided on 23 January 2020 to "close the city" to control the outflow of population, more than 5 million people had already left Wuhan on the Spring Festival holiday, and it was too late to control the entire potentially infected population in the epidemic area. At present, China's high-speed railway and expressway transportation network has experienced great development. This fast and convenient transportation has led to a floating population that can leave the source area to quickly reach every part of the country, which makes it very difficult to quarantine the floating population leaving the source area through transportation stations. In addition, there is an incubation period after human infection, further increasing the difficulty of quarantine at traffic stations, which is also an important reason for the implementation of "city closure" control policies in many cities across the country.
After 2019-nCoV was confirmed as being capable of transmitting from human to human, the Chinese government implemented top to bottom national mobilisation. It fully investigated and isolated the population of Wuhan, and also publicised the severity of the epidemic, and also increased awareness of the prevention of infectious diseases and raised people's vigilance through messages on television, mobile communications and the Internet. In addition, according to the latest epidemic surveillance, the incubation period of the coronavirus is 3 to 7 days, with an upper limit of 14 days. For this reason, the central government has issued an executive order to extend the Spring Festival holiday from 30 January to 2 February 2020. Many provinces are even requiring firms to not restart work until 9 February, except those necessary for social operations related to the national economy and people's livelihood. Extending the holiday is needed to avoid the returning people leaving home early and returning to work, so as to minimise the risk of the epidemic spreading again due to population fluctuations.
There are limitations to this study. First, our analysis did not include other large-scale populations. For example, some are college students, because Wuhan is the city with the largest number (>1 million) of college students in China and the world. The other parts include short-term business travellers, transit passengers and tourists. Official media reported that the size of the populations during the Spring Festival holiday would reach more than 30 million. This can be confirmed from the daily-confirmed cases of 2019-nCoV infection. Although there is a small permanent population in Wuhan whose household register belongs to provinces and cities such as Beijing, Shanghai, Tianjin, Hainan, and Guangdong (in fact, Shenzhen and Guangzhou are two megacities), these provinces and cities still have large-scale temporary floating populations from and to Wuhan because of the large population and well-developed economy. Therefore, the number of confirmed cases of 2019-nCoV infection in these areas is far ahead of that in most other provinces that have a large floating population in Wuhan.
Second, our sample has a certain deviation. The data on the origin of Wuhan's floating population does not include Hong Kong, Macao, or international migrants, which makes our research unable to estimate the population size of these regions. At present, some cases have been confirmed in surrounding Asian countries, Europe, North America and Australia. Third, limited to interdisciplinary research capabilities, our model does not include infectious disease analysis models such as SIR to further analyse the potential and scale of 2019-nCoV spread, which may reduce the value of this research in the prevention and control of 2019-nCoV infections. Finally, the results of the study are mainly applicable to the end of the Spring Festival holiday, and after the large-scale population comes back to work or study, the spread of the epidemic will be more complicated.
We believe that the abovementioned limitations can be overcome. Using big data such as location information of transportation and mobile Internet, short-term floating populations can be included in the study to maximise the estimated population flotation and scale in Wuhan. Unfortunately, thus far we have not seen a rigorous study using big data to analyse the outflow of populations in the epicentre of an epidemic. This means that there is still a long way to go for the research and application of big data in the field of national and global public health.

Conclusions
At the time of writing this paper (29 January 2020), all provinces in China have reported confirmed or suspected cases of 2019-nCoV, every prefecture and city in Hubei Province has confirmed cases of 2019-nCoV, and transmission of 2019-nCoV has spread from imported to inter-regional. Due to the fact that 5 million migrants had left Wuhan before the "closure of the city", our research reveals a high correlation between the number of Wuhan's floating population and the number of confirmed cases. Fortunately, the origin of Wuhan's floating population is highly concentrated in Hubei Province and its surrounding provinces, of which the migrants with Hubei household registers account for 75%, and more than 80% of the population is concentrated in the top 30 districts and counties. This means that some areas will face a very high risk of epidemic outbreaks, but it is also conducive to centralised resources enabling prevention and control of the epidemic to avoid large-scale spread in other regions.
More than 5 million of Wuhan's floating population have returned to their hometowns as potential carriers of the virus and may become carriers of the virus's re-transmission. Due to China's urban and rural dualistic structure, most of these people are rural migrant workers with low levels of education. The results find that 85% of the migrants have rural household registers. These people, who frequently work outdoors or work overtime are more likely to be susceptible because of their poor diet and nutrition. At the same time, most of these people travel with 3-4 family members, and the susceptible and high-risk population over 40 years old accounts for a large proportion of this floating population, which provides ideal conditions for the transmission of 2019-nCoV within families. To make matters worse, the rural areas where these people return to have very limited medical and public health services, and gatherings during the Spring Festival aggravate the risk of virus transmission in the community.
So far, confirmed cases of 2019-nCoV continue to increase every day across China. The results of our model analysis indicate that, on the one hand, the correlation between the size of the floating population and the number of confirmed cases in Wuhan has continued to increase over time, and by 28 January, the correlation coefficient of these factors in Hubei Province had reached 0.78, which means that the size of the floating population in Wuhan is an important parameter for predicting the epidemic. On the other hand, we also found that the effect of the size of the floating population in Wuhan is heterogeneous across regions. Some areas have a large floating population in Wuhan, including Henan, Anhui, and Jiangxi provinces, and Xiaogan City, Jingzhou City, and the three county-level cities directly under the provincial government, and yet the number of confirmed cases of 2019-nCoV is apparently relatively small. However, we believe that the epidemic situation in these areas may be underestimated. Considering the serious consequences of delays in diagnosis and loopholes in infection control in suspected or confirmed cases of SARS in the SARS epidemic in 2003, it is necessary to strengthen surveillance in these areas to determine the causes of the fewer confirmed cases of 2019-nCoV in these areas.
Author Contributions: L.L. conceived and proposed research ideas, C.F. and C.Y. collected the data, C.F. undertook the main research work such as research methods, data analysis, and manuscript writing. C.F., L.L., W.G., A.Y., C.Y., M.J., M.R., P.X., H.L. and Y.W. participated in draft review, contributed to data interpretation, and approved final manuscript. All authors have read and agreed to the published version of the manuscript.