Origin Detection During Food-borne Disease Outbreaks – A Case Study of the 2011 EHEC / HUS Outbreak in Germany

The key challenge during food-borne disease outbreaks, e.g. the 2011 EHEC/HUS outbreak in Germany, is the design of efficient mitigation strategies based on a timely identification of the outbreak’s spatial origin. Standard public health procedures typically use case-control studies and tracings along food shipping chains. These methods are time-consuming and suffer from biased data collected slowly in patient interviews. Here we apply a recently developed, network-theoretical method to identify the spatial origin of food-borne disease outbreaks. Thereby, the network captures the transportation routes of contaminated foods. The technique only requires spatial information on case reports regularly collected by public health institutions and a model for the underlying food distribution network. The approach is based on the idea of replacing the conventional geographic distance with an effective distance that is derived from the topological structure of the underlying food distribution network. We show that this approach can efficiently identify most probable epicenters of foodborne disease outbreaks. We assess and discuss the method in the context of the 2011 EHEC epidemic. Based on plausible assumptions on the structure of the national food distribution network, the approach can correctly localize the origin of the 2011 German EHEC/HUS outbreak.


Introduction
Due to intensified mass production, facilitated world-wide shipping and novel food manufacturing methods, foodborne disease outbreaks occur more frequently with increasing impacts on society, public health institutions,

Introduction
Due to intensified mass production, facilitated world-wide shipping and novel food manufacturing methods, foodborne disease outbreaks occur more frequently with increasing impacts on society, public health institutions,

Introduction
Due to intensified mass production, facilitated world-wide shipping and novel food manufacturing methods, foodborne disease outbreaks occur more frequently with increasing impacts on society, public health institutions, 1 PLOS Currents Outbreaks the economy, and food industry 1 .An estimated 60% of annual gastrointestinal illnesses for each adult in the general population of the United States is caused by food-borne diseases 2 .Moreover, diarrhoea is the second leading cause of morbidity and mortality among children under five years worldwide 3 .food-borne diseases impose enormous financial burden on health care services, routine surveillance and public health investigations, and trigger substantial productivity impacts and product recalls by the food industry.For seven food-borne pathogens an annual burden of $6.5-$34.5 billion in the United States alone was estimated 4 .
One of the most substantial challenges in this context is determining the spatial origin of the contaminated food vehicle, which causes the epidemic, for earlier and more effective disease containment.Several factors make detection of the food-borne disease outbreak origin challenging, e.g.population growth, changing eating habits, globalization of food supply chains, production and processing innovations, and microbiological adaptation 1 , 5 .Furthermore, public health institutes have limited resources to solve issues such as underreporting and low specificity in the association between aetiology and food vehicle 6 .Origin reconstruction is a complex problem because the effects of contaminated food typically occur with a significant time lag and incidence patterns are geographically incoherent.Additionally, specific transport pathways are generally not monitored.More importantly, food distribution networks are multi-scale, spanning length-scale of hundreds to thousands of kilometers, delivering to and within spatially heterogeneous populations.Consequently, it is generically impossible to estimate the geographic origin of the phenomenon based on geometric aspects of the spatial distribution of reported cases.Only for 66% of the outbreaks, public health investigations identified evidence concerning the infection source 7 .These practical difficulties were particularly striking during the German 2011 EHEC (enterohemorrhagic Escherichia coli) outbreak, which affected 3,842 people with unusually high rates of severe HUS (hemolyticuremic syndrome) cases and mortality.The EHEC/HUS outbreak raised the awareness of timely and efficient origin reconstruction methods and their importance to society, public health institutions, risk assessment authorities and the food industry 2 .There is no general procedure for food-borne disease outbreak investigations, that fits a particular event perfectly.However, the World Health Organization (WHO) 8 provides practical standard guidelines for the investigation and control of food-borne disease outbreaks as a multidisciplinary task which requires information from many sources.First, an unusual accumulation of disease reports has to be detected and defined as an outbreak.After pathogen specification, initial cases are investigated with regard to common factors and clinical and food specimens are sampled.The corresponding microbiological 'fingerprinting' of strains may also identify case relatedness and/or potential sources of contamination.From associated food and environmental samples, backward tracings are initiated to determine the origin.Furthermore, a case definition can be established to identify outbreak related cases and to collect their information on a standardized questionnaire.Using this data, analytical investigations, such as case-control and cohort studies, are performed to test hypotheses about the transmission vehicle and origin.The outbreak source is determined by combining all collected information, otherwise further analytical studies are required.Finally, the potential origin and transmission routes are controlled using forward tracings from contamination to the outbreak cases.Several attempts to improve traceability of food products to their geographical origin have been developed including technical innovations 9 , microbiological advances 10 , or food forensics 11 .However, detection of outbreak origin remains time-consuming and cost-intensive.
Network theory and network models have become the most important tools for understanding and predicting epidemics in general 12 , 13 , 14 .The majority of studies focuses on spatial disease dynamic systems in which networks quantify the coupling strength or transportation fluxes between spatially distributed populations.Almost all studies aim at understanding and forecasting the future time course of an epidemic based on the topological connectivity of the underlying transport networks 15 , 16 .Furthermore, most studies focus on humanto-human transmissible diseases.Little work has been done, however, on the inverse problem, also known as the 'zero patient' problem in epidemics.Shah and Zaman 17 , 18 developed a universal source detection the economy, and food industry 1 .An estimated 60% of annual gastrointestinal illnesses for each adult in the general population of the United States is caused by food-borne diseases 2 .Moreover, diarrhoea is the second leading cause of morbidity and mortality among children under five years worldwide 3 .food-borne diseases impose enormous financial burden on health care services, routine surveillance and public health investigations, and trigger substantial productivity impacts and product recalls by the food industry.For seven food-borne pathogens an annual burden of $6.5-$34.5 billion in the United States alone was estimated 4 .
One of the most substantial challenges in this context is determining the spatial origin of the contaminated food vehicle, which causes the epidemic, for earlier and more effective disease containment.Several factors make detection of the food-borne disease outbreak origin challenging, e.g.population growth, changing eating habits, globalization of food supply chains, production and processing innovations, and microbiological adaptation 1 , 5 .Furthermore, public health institutes have limited resources to solve issues such as underreporting and low specificity in the association between aetiology and food vehicle 6 .Origin reconstruction is a complex problem because the effects of contaminated food typically occur with a significant time lag and incidence patterns are geographically incoherent.Additionally, specific transport pathways are generally not monitored.More importantly, food distribution networks are multi-scale, spanning length-scale of hundreds to thousands of kilometers, delivering to and within spatially heterogeneous populations.Consequently, it is generically impossible to estimate the geographic origin of the phenomenon based on geometric aspects of the spatial distribution of reported cases.Only for 66% of the outbreaks, public health investigations identified evidence concerning the infection source 7 .These practical difficulties were particularly striking during the German 2011 EHEC (enterohemorrhagic Escherichia coli) outbreak, which affected 3,842 people with unusually high rates of severe HUS (hemolyticuremic syndrome) cases and mortality.The EHEC/HUS outbreak raised the awareness of timely and efficient origin reconstruction methods and their importance to society, public health institutions, risk assessment authorities and the food industry 2 .There is no general procedure for food-borne disease outbreak investigations, that fits a particular event perfectly.However, the World Health Organization (WHO) 8 provides practical standard guidelines for the investigation and control of food-borne disease outbreaks as a multidisciplinary task which requires information from many sources.First, an unusual accumulation of disease reports has to be detected and defined as an outbreak.After pathogen specification, initial cases are investigated with regard to common factors and clinical and food specimens are sampled.The corresponding microbiological 'fingerprinting' of strains may also identify case relatedness and/or potential sources of contamination.From associated food and environmental samples, backward tracings are initiated to determine the origin.Furthermore, a case definition can be established to identify outbreak related cases and to collect their information on a standardized questionnaire.Using this data, analytical investigations, such as case-control and cohort studies, are performed to test hypotheses about the transmission vehicle and origin.The outbreak source is determined by combining all collected information, otherwise further analytical studies are required.Finally, the potential origin and transmission routes are controlled using forward tracings from contamination to the outbreak cases.Several attempts to improve traceability of food products to their geographical origin have been developed including technical innovations 9 , microbiological advances 10 , or food forensics 11 .However, detection of outbreak origin remains time-consuming and cost-intensive.
Network theory and network models have become the most important tools for understanding and predicting epidemics in general 12 , 13 , 14 .The majority of studies focuses on spatial disease dynamic systems in which networks quantify the coupling strength or transportation fluxes between spatially distributed populations.Almost all studies aim at understanding and forecasting the future time course of an epidemic based on the topological connectivity of the underlying transport networks 15 , 16 .Furthermore, most studies focus on humanto-human transmissible diseases.Little work has been done, however, on the inverse problem, also known as the 'zero patient' problem in epidemics.Shah and Zaman 17 , 18 developed a universal source detection the economy, and food industry 1 .An estimated 60% of annual gastrointestinal illnesses for each adult in the general population of the United States is caused by food-borne diseases 2 .Moreover, diarrhoea is the second leading cause of morbidity and mortality among children under five years worldwide 3 .food-borne diseases impose enormous financial burden on health care services, routine surveillance and public health investigations, and trigger substantial productivity impacts and product recalls by the food industry.For seven food-borne pathogens an annual burden of $6.5-$34.5 billion in the United States alone was estimated 4 .
One of the most substantial challenges in this context is determining the spatial origin of the contaminated food vehicle, which causes the epidemic, for earlier and more effective disease containment.Several factors make detection of the food-borne disease outbreak origin challenging, e.g.population growth, changing eating habits, globalization of food supply chains, production and processing innovations, and microbiological adaptation 1 , 5 .Furthermore, public health institutes have limited resources to solve issues such as underreporting and low specificity in the association between aetiology and food vehicle 6 .Origin reconstruction is a complex problem because the effects of contaminated food typically occur with a significant time lag and incidence patterns are geographically incoherent.Additionally, specific transport pathways are generally not monitored.More importantly, food distribution networks are multi-scale, spanning length-scale of hundreds to thousands of kilometers, delivering to and within spatially heterogeneous populations.Consequently, it is generically impossible to estimate the geographic origin of the phenomenon based on geometric aspects of the spatial distribution of reported cases.Only for 66% of the outbreaks, public health investigations identified evidence concerning the infection source 7 .These practical difficulties were particularly striking during the German 2011 EHEC (enterohemorrhagic Escherichia coli) outbreak, which affected 3,842 people with unusually high rates of severe HUS (hemolyticuremic syndrome) cases and mortality.The EHEC/HUS outbreak raised the awareness of timely and efficient origin reconstruction methods and their importance to society, public health institutions, risk assessment authorities and the food industry 2 .There is no general procedure for food-borne disease outbreak investigations, that fits a particular event perfectly.However, the World Health Organization (WHO) 8 provides practical standard guidelines for the investigation and control of food-borne disease outbreaks as a multidisciplinary task which requires information from many sources.First, an unusual accumulation of disease reports has to be detected and defined as an outbreak.After pathogen specification, initial cases are investigated with regard to common factors and clinical and food specimens are sampled.The corresponding microbiological 'fingerprinting' of strains may also identify case relatedness and/or potential sources of contamination.From associated food and environmental samples, backward tracings are initiated to determine the origin.Furthermore, a case definition can be established to identify outbreak related cases and to collect their information on a standardized questionnaire.Using this data, analytical investigations, such as case-control and cohort studies, are performed to test hypotheses about the transmission vehicle and origin.The outbreak source is determined by combining all collected information, otherwise further analytical studies are required.Finally, the potential origin and transmission routes are controlled using forward tracings from contamination to the outbreak cases.Several attempts to improve traceability of food products to their geographical origin have been developed including technical innovations 9 , microbiological advances 10 , or food forensics 11 .However, detection of outbreak origin remains time-consuming and cost-intensive.
Network theory and network models have become the most important tools for understanding and predicting epidemics in general 12 , 13 , 14 .The majority of studies focuses on spatial disease dynamic systems in which networks quantify the coupling strength or transportation fluxes between spatially distributed populations.Almost all studies aim at understanding and forecasting the future time course of an epidemic based on the topological connectivity of the underlying transport networks 15 , 16 .Furthermore, most studies focus on humanto-human transmissible diseases.Little work has been done, however, on the inverse problem, also known as the 'zero patient' problem in epidemics.Shah and Zaman However, these methods require comprehensive knowledge of the transmission network, which is rarely the case.
Here we apply a recently developed network-geometric approach for epicenter reconstruction 25 to food-borne diseases.This approach is based on a plausible redefinition of spatial separation and the introduction of an effective distance derived from the underlying food distribution network in combination with viewing the contagion process from the perspective of a specific node in the network.Using the effective distance method, complex spreading patterns can be mapped onto simple, regular wave propagation patterns if and only if the actual outbreak origin is chosen as the reference node.This way, the method can determine the correct outbreak origin based on the degree of regularity of the measured prevalence distribution when viewed in the effective distance perspective.This reconstruction is successful without the knowledge of the detailed infection hierarchy.Here, the underlying network captures the underlying transportation of the contaminated food rather than the mobility pattern of humans.

German EHEC O104:H4/HUS outbreak 2011
Regarding the number of severe HUS cases, the 2011 EHEC/HUS outbreak in Germany, has been the largest E. coli outbreak reported worldwide.Between May 2 and July 26, 2011, 3,842 outbreak associated EHEC cases were reported to the Robert Koch-Institute (RKI), the German Federal Public Health and Surveillance Institute.This included 855 severe HUS cases (22.3%) and 53 patients (1.4%) died.The outbreak was caused by a rare serotype O104:H4 which infected predominantly adults (median age, 43 years), particularly women (68%), and resulted in high HUS and mortality rates 26 .In the previous years, between 925 and 1,283 cases were reported annually, mostly in children.The majority of the infection cases was observed in Northern Germany, which resulted in a higher incidence (number of cases per 100,000 inhabitants) for the corresponding districts than the overall one for Germany (see Fig. 1).Extensive investigations were conducted by the Task Force EHEC, which included a matched case-control study, a recipe-based restaurant cohort study, and backward-/forwardtracings 27 .The entire process was complicated, resource demanding and time-consuming.All investigations required a large amount of data that are typically biased, incomplete, erroneous, and sometimes contradictory.
The tracings require a large amount of trained personnel and their success depends on the results of epidemiological studies.Only the combination of several study designs finally lead to the determination of sprouts as the transmission vehicle and the identification of their origin, a farm in Bienenbüttel located in the district Uelzen, Lower Saxony.On June 10, 38 days after outbreak onset, the public was informed to avoid sprout consumption and the responsible production farm was closed.maximum likelihood estimate, which assumes virus spread in a general graph along a breadth-first-search tree and derive theoretical thresholds for the detection probability.Pinto at al. 19  However, these methods require comprehensive knowledge of the transmission network, which is rarely the case.
Here we apply a recently developed network-geometric approach for epicenter reconstruction 25 to food-borne diseases.This approach is based on a plausible redefinition of spatial separation and the introduction of an effective distance derived from the underlying food distribution network in combination with viewing the contagion process from the perspective of a specific node in the network.Using the effective distance method, complex spreading patterns can be mapped onto simple, regular wave propagation patterns if and only if the actual outbreak origin is chosen as the reference node.This way, the method can determine the correct outbreak origin based on the degree of regularity of the measured prevalence distribution when viewed in the effective distance perspective.This reconstruction is successful without the knowledge of the detailed infection hierarchy.Here, the underlying network captures the underlying transportation of the contaminated food rather than the mobility pattern of humans.

German EHEC O104:H4/HUS outbreak 2011
Regarding the number of severe HUS cases, the 2011 EHEC/HUS outbreak in Germany, has been the largest E. coli outbreak reported worldwide.Between May 2 and July 26, 2011, 3,842 outbreak associated EHEC cases were reported to the Robert Koch-Institute (RKI), the German Federal Public Health and Surveillance Institute.This included 855 severe HUS cases (22.3%) and 53 patients (1.4%) died.The outbreak was caused by a rare serotype O104:H4 which infected predominantly adults (median age, 43 years), particularly women (68%), and resulted in high HUS and mortality rates 26 .In the previous years, between 925 and 1,283 cases were reported annually, mostly in children.The majority of the infection cases was observed in Northern Germany, which resulted in a higher incidence (number of cases per 100,000 inhabitants) for the corresponding districts than the overall one for Germany (see Fig. 1).Extensive investigations were conducted by the Task Force EHEC, which included a matched case-control study, a recipe-based restaurant cohort study, and backward-/forwardtracings 27 .The entire process was complicated, resource demanding and time-consuming.All investigations required a large amount of data that are typically biased, incomplete, erroneous, and sometimes contradictory.
The tracings require a large amount of trained personnel and their success depends on the results of epidemiological studies.Only the combination of several study designs finally lead to the determination of sprouts as the transmission vehicle and the identification of their origin, a farm in Bienenbüttel located in the district Uelzen, Lower Saxony.On June 10, 38 days after outbreak onset, the public was informed to avoid sprout consumption and the responsible production farm was closed.maximum likelihood estimate, which assumes virus spread in a general graph along a breadth-first-search tree and derive theoretical thresholds for the detection probability.Pinto at al. 19  However, these methods require comprehensive knowledge of the transmission network, which is rarely the case.
Here we apply a recently developed network-geometric approach for epicenter reconstruction 25 to food-borne diseases.This approach is based on a plausible redefinition of spatial separation and the introduction of an effective distance derived from the underlying food distribution network in combination with viewing the contagion process from the perspective of a specific node in the network.Using the effective distance method, complex spreading patterns can be mapped onto simple, regular wave propagation patterns if and only if the actual outbreak origin is chosen as the reference node.This way, the method can determine the correct outbreak origin based on the degree of regularity of the measured prevalence distribution when viewed in the effective distance perspective.This reconstruction is successful without the knowledge of the detailed infection hierarchy.Here, the underlying network captures the underlying transportation of the contaminated food rather than the mobility pattern of humans.

German EHEC O104:H4/HUS outbreak 2011
Regarding the number of severe HUS cases, the 2011 EHEC/HUS outbreak in Germany, has been the largest E. coli outbreak reported worldwide.Between May 2 and July 26, 2011, 3,842 outbreak associated EHEC cases were reported to the Robert Koch-Institute (RKI), the German Federal Public Health and Surveillance Institute.This included 855 severe HUS cases (22.3%) and 53 patients (1.4%) died.The outbreak was caused by a rare serotype O104:H4 which infected predominantly adults (median age, 43 years), particularly women (68%), and resulted in high HUS and mortality rates 26 .In the previous years, between 925 and 1,283 cases were reported annually, mostly in children.The majority of the infection cases was observed in Northern Germany, which resulted in a higher incidence (number of cases per 100,000 inhabitants) for the corresponding districts than the overall one for Germany (see Fig. 1).Extensive investigations were conducted by the Task Force EHEC, which included a matched case-control study, a recipe-based restaurant cohort study, and backward-/forwardtracings 27 .The entire process was complicated, resource demanding and time-consuming.All investigations required a large amount of data that are typically biased, incomplete, erroneous, and sometimes contradictory.
The tracings require a large amount of trained personnel and their success depends on the results of epidemiological studies.Only the combination of several study designs finally lead to the determination of sprouts as the transmission vehicle and the identification of their origin, a farm in Bienenbüttel located in the district Uelzen, Lower Saxony.On June 10, 38 days after outbreak onset, the public was informed to avoid sprout consumption and the responsible production farm was closed.

Network-theoretic origin detection
We

Network-theoretic origin detection
We consider a model network for spatial food distribution, where nodes represent administrative districts in Germany.Links quantify the amount of goods that are shipped from node to per unit time.Note that in the following, we let .For what follows, only relative flux fractions (1)   are required to specify the network.The quantities can be interpreted as an effective coupling between districts and that is induced by the food distribution between these districts.We consider the quantities as a proxy from which spreading propensities between and can be derived.
Because precise measurements of food distribution pathways are not available, we consider an established, approximate heuristic from the social sciences, economics and transportation theory known as the gravity model 30 , 31 .This approach accounts for the observation that traffic flow increases monotonically with the population size between locations and decreases algebraically with distance, leading to the relationship (1)   are required to specify the network.The quantities can be interpreted as an effective coupling between districts and that is induced by the food distribution between these districts.We consider the quantities as a proxy from which spreading propensities between and can be derived.
Because precise measurements of food distribution pathways are not available, we consider an established, approximate heuristic from the social sciences, economics and transportation theory known as the gravity model 30 , 31 .This approach accounts for the observation that traffic flow increases monotonically with the population size between locations and decreases algebraically with distance, leading to the relationship (1)   are required to specify the network.The quantities can be interpreted as an effective coupling between districts and that is induced by the food distribution between these districts.We consider the quantities as a proxy from which spreading propensities between and can be derived.
Because precise measurements of food distribution pathways are not available, we consider an established, approximate heuristic from the social sciences, economics and transportation theory known as the gravity model 30 , 31 .This approach accounts for the observation that traffic flow increases monotonically with the population size between locations and decreases algebraically with distance, leading to the relationship .Plausible choices for these parameters can be found in the following way: First, we assume that the coupling strength between two locations and increase with the number of connections ( ) that can be formed between elements of the populations.This implies that .Additionally, the coupling strength should be proportional to a mean value of the origin and destination population sizes, while leverage by large population nodes should be attenuated.Accounting for this, we choose the geometric average .Plausible choices for these parameters can be found in the following way: First, we assume that the coupling strength between two locations and increase with the number of connections ( ) that can be formed between elements of the populations.This implies that .Additionally, the coupling strength should be proportional to a mean value of the origin and destination population sizes, while leverage by large population nodes should be attenuated.Accounting for this, we choose the geometric average .Plausible choices for these parameters can be found in the following way: First, we assume that the coupling strength between two locations and increase with the number of connections ( ) that can be formed between elements of the populations.This implies that .Additionally, the coupling strength should be proportional to a mean value of the origin and destination population sizes, while leverage by large population nodes should be attenuated.Accounting for this, we choose the geometric average 6 PLOS Currents Outbreaks (3) Furthermore, we let the coupling strength decrease with distance.The corresponding tail exponent is consistent with the quantitative assessments of human mobility and transportation networks 34 , 35 , i.e.
(3) Furthermore, we let the coupling strength decrease with distance.The corresponding tail exponent is consistent with the quantitative assessments of human mobility and transportation networks 34 , 35 , i.e.Finally, we fix the scale parameter (in km) in Eq. ( 2) to be of the order of the average linear extent of a district.With these assumptions, the parameters in the gravity model are and km.

PLOS Currents Outbreaks
Although we choose these parameter values as base values, we also investigate the robustness of our results against variations in exponents and found that our results are quite robust.
The gravity model generates a fully connected network with strongly heterogeneous weights, contrasting realistic mobility or transportation networks that possess a sparse topology.In order to obtain a more realistic model for food distribution that exhibits topological sparseness of connections, we follow a procedure recently introduced by Serrano et al. 36 .The idea of this approach is that only links are retained that are statistically significant with respect to a random null model, in which traffic is distributed uniformly among links of a node.
Following this idea, we first compute the flux fraction (4) Finally, we fix the scale parameter (in km) in Eq. ( 2) to be of the order of the average linear extent of a district.With these assumptions, the parameters in the gravity model are and km.
Although we choose these parameter values as base values, we also investigate the robustness of our results against variations in exponents and found that our results are quite robust.
The gravity model generates a fully connected network with strongly heterogeneous weights, contrasting realistic mobility or transportation networks that possess a sparse topology.In order to obtain a more realistic model for food distribution that exhibits topological sparseness of connections, we follow a procedure recently introduced by Serrano et al. 36 .The idea of this approach is that only links are retained that are statistically significant with respect to a random null model, in which traffic is distributed uniformly among links of a node.
Following this idea, we first compute the flux fraction (4) Finally, we fix the scale parameter (in km) in Eq. ( 2) to be of the order of the average linear extent of a district.With these assumptions, the parameters in the gravity model are and km.
Although we choose these parameter values as base values, we also investigate the robustness of our results against variations in exponents and found that our results are quite robust.
The gravity model generates a fully connected network with strongly heterogeneous weights, contrasting realistic mobility or transportation networks that possess a sparse topology.In order to obtain a more realistic model for food distribution that exhibits topological sparseness of connections, we follow a procedure recently introduced by Serrano et al. 36 .The idea of this approach is that only links are retained that are statistically significant with respect to a random null model, in which traffic is distributed uniformly among links of a node.
Following this idea, we first compute the flux fraction 8 PLOS Currents Outbreaks  This approach yields a network skeleton of statistically significant links.Following this procedure the resulting network has an overall connectivity of 18%, see Fig. 2B.(6)   This approach yields a network skeleton of statistically significant links.Following this procedure the resulting network has an overall connectivity of 18%, see Fig. 2B.(6)   This approach yields a network skeleton of statistically significant links.Following this procedure the resulting network has an overall connectivity of 18%, see Fig. 2B.One of the characteristic features of transportation networks in general, which is also captured by the above gravity model, is its multiscale structure.Although short-range links are usually strongest, the algebraic tail in Eq. ( 2) yields long-range connections that can dominate spreading phenomena evolving on these networks.

PLOS Currents Outbreaks
Qualitatively, this is illustrated in Fig. 3A which depicts an simple planar quasi-lattice network, in which every node is connected only to its spatially adjacent nodes.Additionally, a few long-range, random connections are added.Because of long-range connections in the network, an initially localized spreading process quickly attains a spatially incoherent structure.As a consequence of this, it is no longer possible to predict with ordinary diffusion when a spreading process will arrive at a given location in the network.More importantly, it is difficult to reconstruct the outbreak origin from a snapshot (or a sequence of snapshots) of the spatio-temporal pattern of spread alone based on conventional planar distance measures and two-dimensional geometry.
Effectively, two nodes that are connected by a long-range link in a multiscale network system are more adjacent than their spatial distance would suggest.Based on this basic and intuitive insight, a recent study 25 introduced the concept of effective distance to network-driven contagion or spreading phenomena.The most important result of this study is that spatio-temporally complex patterns of spreading can be mapped onto simple, regular wave front patterns when conventional distance is replaced by a suitably chosen effective distance.This not only permits calculations of arrival times at any node in the network but, more importantly, the identification of outbreak origins as will be explained in more detail below.The effective distance approach has been shown to work in the context of infectious disease dynamics on a global scale, for instance, the One of the characteristic features of transportation networks in general, which is also captured by the above gravity model, is its multiscale structure.Although short-range links are usually strongest, the algebraic tail in Eq. ( 2) yields long-range connections that can dominate spreading phenomena evolving on these networks.
Qualitatively, this is illustrated in Fig. 3A which depicts an simple planar quasi-lattice network, in which every node is connected only to its spatially adjacent nodes.Additionally, a few long-range, random connections are added.Because of long-range connections in the network, an initially localized spreading process quickly attains a spatially incoherent structure.As a consequence of this, it is no longer possible to predict with ordinary diffusion when a spreading process will arrive at a given location in the network.More importantly, it is difficult to reconstruct the outbreak origin from a snapshot (or a sequence of snapshots) of the spatio-temporal pattern of spread alone based on conventional planar distance measures and two-dimensional geometry.
Effectively, two nodes that are connected by a long-range link in a multiscale network system are more adjacent than their spatial distance would suggest.Based on this basic and intuitive insight, a recent study 25 introduced the concept of effective distance to network-driven contagion or spreading phenomena.The most important result of this study is that spatio-temporally complex patterns of spreading can be mapped onto simple, regular wave front patterns when conventional distance is replaced by a suitably chosen effective distance.This not only permits calculations of arrival times at any node in the network but, more importantly, the identification of outbreak origins as will be explained in more detail below.The effective distance approach has been shown to work in the context of infectious disease dynamics on a global scale, for instance, the One of the characteristic features of transportation networks in general, which is also captured by the above gravity model, is its multiscale structure.Although short-range links are usually strongest, the algebraic tail in Eq. ( 2) yields long-range connections that can dominate spreading phenomena evolving on these networks.
Qualitatively, this is illustrated in Fig. 3A which depicts an simple planar quasi-lattice network, in which every node is connected only to its spatially adjacent nodes.Additionally, a few long-range, random connections are added.Because of long-range connections in the network, an initially localized spreading process quickly attains a spatially incoherent structure.As a consequence of this, it is no longer possible to predict with ordinary diffusion when a spreading process will arrive at a given location in the network.More importantly, it is difficult to reconstruct the outbreak origin from a snapshot (or a sequence of snapshots) of the spatio-temporal pattern of spread alone based on conventional planar distance measures and two-dimensional geometry.
Effectively, two nodes that are connected by a long-range link in a multiscale network system are more adjacent than their spatial distance would suggest.Based on this basic and intuitive insight, a recent study 25 introduced the concept of effective distance to network-driven contagion or spreading phenomena.The most important result of this study is that spatio-temporally complex patterns of spreading can be mapped onto simple, regular wave front patterns when conventional distance is replaced by a suitably chosen effective distance.This not only permits calculations of arrival times at any node in the network but, more importantly, the identification of outbreak origins as will be explained in more detail below.The effective distance approach has been shown to work in the context of infectious disease dynamics on a global scale, for instance, the The effective distance method assumes that, irrespective of the details of the local dynamics of a spreading process, the proliferation of the contagion throughout the network is determined by the coupling between nodes, and that this coupling is quantified by the flux matrix elements .Given an initial outbreak location , a contagion process can take a multitude of paths to any other node in the network.Each path is taken with probability .Consider a path that starts at and ends at with a sequence of intermediate steps at nodes such that 12 PLOS Currents Outbreaks (7)   The probability of the contagion process taking this path is assumed to be given by the product of probabilities of each step (7)   The probability of the contagion process taking this path is assumed to be given by the product of probabilities of each step (7)   The probability of the contagion process taking this path is assumed to be given by the product of probabilities of each step 13 PLOS Currents Outbreaks (8)   Here, for every link in the network the function is the probability that a contaminated food at is moved to .The fundamental assumption in Brockmann and Helbing 25 is that the single step probability is identified with the flux fraction that is determined by the underlying transportation network:

PLOS Currents Outbreaks
Here, for every link in the network the function is the probability that a contaminated food at is moved to .The fundamental assumption in Brockmann and Helbing 25 is that the single step probability is identified with the flux fraction that is determined by the underlying transportation network: Here, for every link in the network the function is the probability that a contaminated food at is moved to .The fundamental assumption in Brockmann and Helbing 25 is that the single step probability is identified with the flux fraction that is determined by the underlying transportation network: Then, we define the effective distance of a multi-leg path by (9)   Then, we define the effective distance of a multi-leg path by (9)   Then, we define the effective distance of a multi-leg path by 15 PLOS Currents Outbreaks (10)   where is the number of links composing the path and the corresponding path probability.For the sake of motivation and interpretation, we can decompose the path length into contributions by direct links of this formula: (10)   where is the number of links composing the path and the corresponding path probability.For the sake of motivation and interpretation, we can decompose the path length into contributions by direct links of this formula: (10)   where is the number of links composing the path and the corresponding path probability.For the sake of motivation and interpretation, we can decompose the path length into contributions by direct links of this formula:

PLOS Currents Outbreaks
16 PLOS Currents Outbreaks (11)   Here, the effective length of a direct link is given by (11)   Here, the effective length of a direct link is given by (11)   Here, the effective length of a direct link is given by 17 PLOS Currents Outbreaks (12)   This relation establishes a connection between network topological features and effective distance.The functional form is chosen such that a number of important features are fulfilled: (i) the length from to decreases with increasing probability .That is, for large values of , the effective length is small and for vanishing transition probability the effective length diverges.(ii) The effective length of a multistep path as defined in Eq. ( 7) is the sum of the effective lengths of each segment in the path.(iii) Given two paths that occur with certainty (e.g. with for each link), but have a different number of segments, the path that has more segments also has a larger effective length.
Generically, transportation networks are strongly heterogeneous such that, in an ensemble of paths with origin and destination , the dynamics are dominated by the most probable path and therefore the path of minimum effective length 25 .The effective distance is defined as the minimum effective length of a path from origin to destination : (12)   This relation establishes a connection between network topological features and effective distance.The functional form is chosen such that a number of important features are fulfilled: (i) the length from to decreases with increasing probability .That is, for large values of , the effective length is small and for vanishing transition probability the effective length diverges.(ii) The effective length of a multistep path as defined in Eq. ( 7) is the sum of the effective lengths of each segment in the path.(iii) Given two paths that occur with certainty (e.g. with for each link), but have a different number of segments, the path that has more segments also has a larger effective length.
Generically, transportation networks are strongly heterogeneous such that, in an ensemble of paths with origin and destination , the dynamics are dominated by the most probable path and therefore the path of minimum effective length 25 .The effective distance is defined as the minimum effective length of a path from origin to destination : (12)   This relation establishes a connection between network topological features and effective distance.The functional form is chosen such that a number of important features are fulfilled: (i) the length from to decreases with increasing probability .That is, for large values of , the effective length is small and for vanishing transition probability the effective length diverges.(ii) The effective length of a multistep path as defined in Eq. ( 7) is the sum of the effective lengths of each segment in the path.(iii) Given two paths that occur with certainty (e.g. with for each link), but have a different number of segments, the path that has more segments also has a larger effective length.
Generically, transportation networks are strongly heterogeneous such that, in an ensemble of paths with origin and destination , the dynamics are dominated by the most probable path and therefore the path of minimum effective length 25 .The effective distance is defined as the minimum effective length of a path from origin to destination : 18 PLOS Currents Outbreaks (13)   From the perspective of a chosen root or reference node , one can compute the shortest path tree , which is the collection of shortest effective paths to all other nodes in the network.This shortest path tree is equivalent to the most probable contagion hierarchy that a spreading process will take through the network. ( From the perspective of a chosen root or reference node , one can compute the shortest path tree , which is the collection of shortest effective paths to all other nodes in the network.This shortest path tree is equivalent to the most probable contagion hierarchy that a spreading process will take through the network. ( From the perspective of a chosen root or reference node , one can compute the shortest path tree , which is the collection of shortest effective paths to all other nodes in the network.This shortest path tree is equivalent to the most probable contagion hierarchy that a spreading process will take through the network.Fig. 3B illustrates the advantages of this approach in an artificial multi-scale network.From the perspective of the outbreak origin, the shortest path tree of the root node is shown, and the radial distance in the new map corresponds to the effective distance from the root node to the remaining nodes in the network.The same spreading process that appears to be spatio-temporally complex in the conventional metric layout is equivalent to a regular, constant-speed spreading wave in the effective distance representation.Consequently, one can calculate arrival times based on effective distance alone.In fact, in Brockmann and Helbing 25 it was shown that effective distance from the outbreak origin and arrival time strongly correlate in real scenarios, e.g. the 2003 SARS epidemic and the 2009 H1N1 pandemic influenza outbreak.
The most relevant consequence of the effective distance approach is that, only from the perspective of the actual outbreak origin, the pattern exhibits a regular concentric wave front structure.From the perspective of any other node in the network, the pattern exhibits a more or less disordered structure.Fig. 3C illustrates this.
The panels depict the same dynamics as in the other panels from a randomly chosen reference node.Clearly, any spatial regularity is absent.One can now make use of this observation, i.e. the fact that the spreading pattern is regular only from the perspective of the actual outbreak location, to reconstruct the outbreak origin.
Given a snapshot of the disease spread, e.g. the disease incidence at every node, one computes the effective distance perspective for each node in the network and quantifies, from which node the pattern appears to be most regular.The node with maximum regularity is considered to be the most likely outbreak origin.In the following we apply this approach to the 2011 EHEC/HUS outbreak in Germany.
Fig. 3B illustrates the advantages of this approach in an artificial multi-scale network.From the perspective of the outbreak origin, the shortest path tree of the root node is shown, and the radial distance in the new map corresponds to the effective distance from the root node to the remaining nodes in the network.The same spreading process that appears to be spatio-temporally complex in the conventional metric layout is equivalent to a regular, constant-speed spreading wave in the effective distance representation.Consequently, one can calculate arrival times based on effective distance alone.In fact, in Brockmann and Helbing 25 it was shown that effective distance from the outbreak origin and arrival time strongly correlate in real scenarios, e.g. the 2003 SARS epidemic and the 2009 H1N1 pandemic influenza outbreak.
The most relevant consequence of the effective distance approach is that, only from the perspective of the actual outbreak origin, the pattern exhibits a regular concentric wave front structure.From the perspective of any other node in the network, the pattern exhibits a more or less disordered structure.Fig. 3C illustrates this.
The panels depict the same dynamics as in the other panels from a randomly chosen reference node.Clearly, any spatial regularity is absent.One can now make use of this observation, i.e. the fact that the spreading pattern is regular only from the perspective of the actual outbreak location, to reconstruct the outbreak origin.
Given a snapshot of the disease spread, e.g. the disease incidence at every node, one computes the effective distance perspective for each node in the network and quantifies, from which node the pattern appears to be most regular.The node with maximum regularity is considered to be the most likely outbreak origin.In the following we apply this approach to the 2011 EHEC/HUS outbreak in Germany.
Fig. 3B illustrates the advantages of this approach in an artificial multi-scale network.From the perspective of the outbreak origin, the shortest path tree of the root node is shown, and the radial distance in the new map corresponds to the effective distance from the root node to the remaining nodes in the network.The same spreading process that appears to be spatio-temporally complex in the conventional metric layout is equivalent to a regular, constant-speed spreading wave in the effective distance representation.Consequently, one can calculate arrival times based on effective distance alone.In fact, in Brockmann and Helbing 25 it was shown that effective distance from the outbreak origin and arrival time strongly correlate in real scenarios, e.g. the 2003 SARS epidemic and the 2009 H1N1 pandemic influenza outbreak.
The most relevant consequence of the effective distance approach is that, only from the perspective of the actual outbreak origin, the pattern exhibits a regular concentric wave front structure.From the perspective of any other node in the network, the pattern exhibits a more or less disordered structure.Fig. 3C illustrates this.
The panels depict the same dynamics as in the other panels from a randomly chosen reference node.Clearly, any spatial regularity is absent.One can now make use of this observation, i.e. the fact that the spreading pattern is regular only from the perspective of the actual outbreak location, to reconstruct the outbreak origin.
Given a snapshot of the disease spread, e.g. the disease incidence at every node, one computes the effective distance perspective for each node in the network and quantifies, from which node the pattern appears to be most regular.The node with maximum regularity is considered to be the most likely outbreak origin.In the following we apply this approach to the 2011 EHEC/HUS outbreak in Germany.Each column depicts the shortest path tree for a sample root node (red), from left to right districts Uelzen, Göttingen, and Oberalbkreis.The top row depicts embedded in the conventional geographic representation, the bottom illustrates the shortest path tree in a layout such that the radial distance is proportional to the effective distance from the root node in the same way as in Fig. 3.The shortest path tree represents the most probable path that a contagion process takes with initial outbreak in node .

Detection of the German EHEC/HUS outbreak origin
Given the gravity model network for food transportation, we first compute the shortest path tree for every potential root node , see Fig. 4 for examples.Next, a temporal snapshot of the EHEC incidence pattern is analyzed in each of the shortest path tree representations, i.e. from the perspective of all network nodes as potential candidate origins of outbreak.The incidence pattern typically consists of a subset of nodes with nonzero incidence.From the perspective of the actual outbreak origin, the effective distance to these affected nodes, should be small and exhibit a small variance, a consequence of the concentricity of the spreading pattern in the effective distance representation.Therefore, in order to quantify the regularity of the incidence pattern from every potential outbreak origin, we compute the average and standard deviation of effective distances to nodes with nonzero incidence (the subset of nodes ) 25 .

Fig. 4: Shortest path trees and effective distance among districts in Germany.
Each column depicts the shortest path tree for a sample root node (red), from left to right districts Uelzen, Göttingen, and Oberalbkreis.The top row depicts embedded in the conventional geographic representation, the bottom illustrates the shortest path tree in a layout such that the radial distance is proportional to the effective distance from the root node in the same way as in Fig. 3.The shortest path tree represents the most probable path that a contagion process takes with initial outbreak in node .

Detection of the German EHEC/HUS outbreak origin
Given the gravity model network for food transportation, we first compute the shortest path tree for every potential root node , see Fig. 4 for examples.Next, a temporal snapshot of the EHEC incidence pattern is analyzed in each of the shortest path tree representations, i.e. from the perspective of all network nodes as potential candidate origins of outbreak.The incidence pattern typically consists of a subset of nodes with nonzero incidence.From the perspective of the actual outbreak origin, the effective distance to these affected nodes, should be small and exhibit a small variance, a consequence of the concentricity of the spreading pattern in the effective distance representation.Therefore, in order to quantify the regularity of the incidence pattern from every potential outbreak origin, we compute the average and standard deviation of effective distances to nodes with nonzero incidence (the subset of nodes ) 25 .

Fig. 4: Shortest path trees and effective distance among districts in Germany.
Each column depicts the shortest path tree for a sample root node (red), from left to right districts Uelzen, Göttingen, and Oberalbkreis.The top row depicts embedded in the conventional geographic representation, the bottom illustrates the shortest path tree in a layout such that the radial distance is proportional to the effective distance from the root node in the same way as in Fig. 3.The shortest path tree represents the most probable path that a contagion process takes with initial outbreak in node .

Detection of the German EHEC/HUS outbreak origin
Given the gravity model network for food transportation, we first compute the shortest path tree for every potential root node , see Fig. 4 for examples.Next, a temporal snapshot of the EHEC incidence pattern is analyzed in each of the shortest path tree representations, i.e. from the perspective of all network nodes as potential candidate origins of outbreak.The incidence pattern typically consists of a subset of nodes with nonzero incidence.From the perspective of the actual outbreak origin, the effective distance to these affected nodes, should be small and exhibit a small variance, a consequence of the concentricity of the spreading pattern in the effective distance representation.Therefore, in order to quantify the regularity of the incidence pattern from every potential outbreak origin, we compute the average and standard deviation of effective distances to nodes with nonzero incidence (the subset of nodes ) 25 .In combination, small mean and variance are equivalent to high concentricity and, thus, high likelihood that the chosen reference node is the likely outbreak origin.(14)   In combination, small mean and variance are equivalent to high concentricity and, thus, high likelihood that the chosen reference node is the likely outbreak origin.(14)   In combination, small mean and variance are equivalent to high concentricity and, thus, high likelihood that the chosen reference node is the likely outbreak origin.We used the public available E. coli case count data with report date between calendar weeks 18 and 26 of 2011 28 .According to the Task Force EHEC, this corresponds to the entire outbreak duration from May 2nd until July 4th, 2011 26 .Fig. 5 shows the results of origin detection when the effective distance approach in combination with a gravity model for food distribution is applied to the EHEC incidence data.Since an E. coli infection clustering was noticed at May 19th, 2011 (outbreak week 3), we computed the mean and standard deviation pair for weeks and every node in the network treating every node as a potential outbreak origin.When both quantities are small, the resulting spreading patterns is most concentric in the effective distance perspective.Fig. 5 shows that already in week 3 of the event, district Uelzen is identified We used the public available E. coli case count data with report date between calendar weeks 18 and 26 of 2011 28 .According to the Task Force EHEC, this corresponds to the entire outbreak duration from May 2nd until July 4th, 2011 26 .Fig. 5 shows the results of origin detection when the effective distance approach in combination with a gravity model for food distribution is applied to the EHEC incidence data.Since an E. coli infection clustering was noticed at May 19th, 2011 (outbreak week 3), we computed the mean and standard deviation pair for weeks and every node in the network treating every node as a potential outbreak origin.When both quantities are small, the resulting spreading patterns is most concentric in the effective distance perspective.Fig. 5 shows that already in week 3 of the event, district Uelzen is identified We used the public available E. coli case count data with report date between calendar weeks 18 and 26 of 2011 28 .According to the Task Force EHEC, this corresponds to the entire outbreak duration from May 2nd until July 4th, 2011 26 .Fig. 5 shows the results of origin detection when the effective distance approach in combination with a gravity model for food distribution is applied to the EHEC incidence data.Since an E. coli infection clustering was noticed at May 19th, 2011 (outbreak week 3), we computed the mean and standard deviation pair for weeks and every node in the network treating every node as a potential outbreak origin.When both quantities are small, the resulting spreading patterns is most concentric in the effective distance perspective.Fig. 5 shows that already in week 3 of the event, district Uelzen is identified as the potential origin of the outbreak, this is also true for weeks 6 and 7.In week 5 the method incorrectly identifies district Lüneburg as the likely outbreak origin and Uelzen ranks third in the epicenter reconstruction.

PLOS Currents Outbreaks
Note that the geographic center of district Lüneburg is as close to Bienenbüttel (the alleged location of contaminated sprouts) as the geographic center of Uelzen (ca.20km).Note also, that the overall distribution of pairs differs considerably for each temporal snapshot of EHEC incidence districts close to the actual outbreak location exhibit combined small values of .Table in  as the potential origin of the outbreak, this is also true for weeks 6 and 7.In week 5 the method incorrectly identifies district Lüneburg as the likely outbreak origin and Uelzen ranks third in the epicenter reconstruction.
Note that the geographic center of district Lüneburg is as close to Bienenbüttel (the alleged location of contaminated sprouts) as the geographic center of Uelzen (ca.20km).Note also, that the overall distribution of pairs differs considerably for each temporal snapshot of EHEC incidence districts close to the actual outbreak location exhibit combined small values of .Table in  as the potential origin of the outbreak, this is also true for weeks 6 and 7.In week 5 the method incorrectly identifies district Lüneburg as the likely outbreak origin and Uelzen ranks third in the epicenter reconstruction.
Note that the geographic center of district Lüneburg is as close to Bienenbüttel (the alleged location of contaminated sprouts) as the geographic center of Uelzen (ca.20km).Note also, that the overall distribution of pairs differs considerably for each temporal snapshot of EHEC incidence districts close to the actual outbreak location exhibit combined small values of .Table in Fig. 6 ranks the candidate outbreak locations for weeks 2 to 8. The ranks were computed by comparing the effective distance to the origin in the scatter plot.For all time windows except weeks 4 and 8 the correct district ranks among the top candidates for EHEC outbreak origin.Note that other potential outbreak origins are typically districts that are in close geographic proximity to the actual outbreak location.This implies that even if the origin cannot be identified on the scale of a single district, potential candidates according to the effective distance methods are confined to a small region in the vicinity of the actual outbreak location, for instance the set of neighboring districts.For each district as a potential outbreak origin, we computed the correlation coefficient of arrival time at every other node and effective distance from to .The magnitude of the correlation coefficient is color-coded from blue to red, corresponding to low and high correlation, respectively.High correlation, corresponding to high likelihood of being the outbreak origin is observed in a spatially coherent region in Northern Germany.

PLOS Currents Outbreaks
The effective distance method provides an alternative method for outbreak origin reconstruction.An important result presented in Ref. 25 is that arrival times of a network-driven contagion process correlate strongly with effective distance.In fact, the arrival time of the process at a node with initial outbreak at node increases linearly with effective distance .Again, arrival time and effective distance only correlate strongly when the actual outbreak origin is chosen as the reference node.To supplement the above analysis we computed the correlation coefficient of arrival times (i.e. the week of reported first case of EHEC/HUS in a given district) with effective distance, considering each node of the 412 districts as the potential outbreak origin.We then ranked these correlation coefficients.Fig. 7 depicts the magnitude of in a map of all German districts.Clearly, this method identifies a well-defined region in Northern Germany as containing the likely outbreak location.Note that, in contrast to the incidence patterns, the correlation coefficient varies smoothly with distance from the epicenter somewhere in Northern Germany.When correlation coefficients are ranked according to magnitude, the correct origin district Uelzen only ranks 30 out of 412 districts.However, the difference in correlation coefficients is small among the top-ranked districts, see Table in Fig. 8.The reason for the comparatively low performance of the correlation-based outbreak reconstruction could be that the temporal resolution of the data is too coarse and fluctuations dominate the signal.For instance, travel-related cases could warp the infection pattern.We conclude that outbreak origin reconstruction based on the topological features of the wave front in effective distance, as presented in Fig. 5 and Table in Fig. 6, is a more reliable technique for the detection of the outbreak origin than the correlation approach.Also, for the topological rather than the correlation-based approach only single temporal snapshots of incidence are required, which is an additional advantage.For each district as a potential outbreak origin, we computed the correlation coefficient of arrival time at every other node and effective distance from to .The magnitude of the correlation coefficient is color-coded from blue to red, corresponding to low and high correlation, respectively.High correlation, corresponding to high likelihood of being the outbreak origin is observed in a spatially coherent region in Northern Germany.
The effective distance method provides an alternative method for outbreak origin reconstruction.An important result presented in Ref. 25 is that arrival times of a network-driven contagion process correlate strongly with effective distance.In fact, the arrival time of the process at a node with initial outbreak at node increases linearly with effective distance .Again, arrival time and effective distance only correlate strongly when the actual outbreak origin is chosen as the reference node.To supplement the above analysis we computed the correlation coefficient of arrival times (i.e. the week of reported first case of EHEC/HUS in a given district) with effective distance, considering each node of the 412 districts as the potential outbreak origin.We then ranked these correlation coefficients.Fig. 7 depicts the magnitude of in a map of all German districts.Clearly, this method identifies a well-defined region in Northern Germany as containing the likely outbreak location.Note that, in contrast to the incidence patterns, the correlation coefficient varies smoothly with distance from the epicenter somewhere in Northern Germany.When correlation coefficients are ranked according to magnitude, the correct origin district Uelzen only ranks 30 out of 412 districts.However, the difference in correlation coefficients is small among the top-ranked districts, see Table in Fig. 8.The reason for the comparatively low performance of the correlation-based outbreak reconstruction could be that the temporal resolution of the data is too coarse and fluctuations dominate the signal.For instance, travel-related cases could warp the infection pattern.We conclude that outbreak origin reconstruction based on the topological features of the wave front in effective distance, as presented in Fig. 5 and Table in Fig. 6, is a more reliable technique for the detection of the outbreak origin than the correlation approach.Also, for the topological rather than the correlation-based approach only single temporal snapshots of incidence are required, which is an additional advantage.For each district as a potential outbreak origin, we computed the correlation coefficient of arrival time at every other node and effective distance from to .The magnitude of the correlation coefficient is color-coded from blue to red, corresponding to low and high correlation, respectively.High correlation, corresponding to high likelihood of being the outbreak origin is observed in a spatially coherent region in Northern Germany.
The effective distance method provides an alternative method for outbreak origin reconstruction.An important result presented in Ref. 25 is that arrival times of a network-driven contagion process correlate strongly with effective distance.In fact, the arrival time of the process at a node with initial outbreak at node increases linearly with effective distance .Again, arrival time and effective distance only correlate strongly when the actual outbreak origin is chosen as the reference node.To supplement the above analysis we computed the correlation coefficient of arrival times (i.e. the week of reported first case of EHEC/HUS in a given district) with effective distance, considering each node of the 412 districts as the potential outbreak origin.We then ranked these correlation coefficients.Fig. 7 depicts the magnitude of in a map of all German districts.Clearly, this method identifies a well-defined region in Northern Germany as containing the likely outbreak location.Note that, in contrast to the incidence patterns, the correlation coefficient varies smoothly with distance from the epicenter somewhere in Northern Germany.When correlation coefficients are ranked according to magnitude, the correct origin district Uelzen only ranks 30 out of 412 districts.However, the difference in correlation coefficients is small among the top-ranked districts, see Table in Fig. 8.The reason for the comparatively low performance of the correlation-based outbreak reconstruction could be that the temporal resolution of the data is too coarse and fluctuations dominate the signal.For instance, travel-related cases could warp the infection pattern.We conclude that outbreak origin reconstruction based on the topological features of the wave front in effective distance, as presented in Fig. 5 and Table in Fig. 6, is a more reliable technique for the detection of the outbreak origin than the correlation approach.Also, for the topological rather than the correlation-based approach only single temporal snapshots of incidence are required, which is an additional advantage.

Discussion and conclusion
We introduced a fast and efficient approach for the identification of the origin during food-borne disease outbreaks and evaluated the approach in the context of the 2011 EHEC/HUS outbreak in Germany.A clear advantage of the method is the robust performance on the basis of limited case report data and plausible topological assumptions concerning the underlying food distribution network.When applied to the 2011 EHEC/HUS outbreak in Germany, our method was able to identify an outbreak origin in close proximity to the actual outbreak location (Uelzen, Lower Saxony).Already three days (May 22nd, 2011) after spatial infection clustering, the effective distance approach was able to reconstruct the actual origin.This is particularly promising, as in the context of EHEC/HUS, conventional outbreak investigations, including case-control-and cohort-studies as well as sample testings and tracings along the food-shipping chain,wrongly suggested tomatoes, leafy salads and cucumbers as contaminated foods.When specific suspicions arose that cucumbers imported in Hamburg would be the infection source, our method classifies Hamburg to be a very unlikely origin.
The consideration of such contradictory information could have lead to more spatially targeted sample testing, and, therefore could have improved the efficiency of the outbreak investigations.

Discussion and conclusion
We introduced a fast and efficient approach for the identification of the origin during food-borne disease outbreaks and evaluated the approach in the context of the 2011 EHEC/HUS outbreak in Germany.A clear advantage of the method is the robust performance on the basis of limited case report data and plausible topological assumptions concerning the underlying food distribution network.When applied to the 2011 EHEC/HUS outbreak in Germany, our method was able to identify an outbreak origin in close proximity to the actual outbreak location (Uelzen, Lower Saxony).Already three days (May 22nd, 2011) after spatial infection clustering, the effective distance approach was able to reconstruct the actual origin.This is particularly promising, as in the context of EHEC/HUS, conventional outbreak investigations, including case-control-and cohort-studies as well as sample testings and tracings along the food-shipping chain,wrongly suggested tomatoes, leafy salads and cucumbers as contaminated foods.When specific suspicions arose that cucumbers imported in Hamburg would be the infection source, our method classifies Hamburg to be a very unlikely origin.
The consideration of such contradictory information could have lead to more spatially targeted sample testing, and, therefore could have improved the efficiency of the outbreak investigations.

Discussion and conclusion
We introduced a fast and efficient approach for the identification of the origin during food-borne disease outbreaks and evaluated the approach in the context of the 2011 EHEC/HUS outbreak in Germany.A clear advantage of the method is the robust performance on the basis of limited case report data and plausible topological assumptions concerning the underlying food distribution network.When applied to the 2011 EHEC/HUS outbreak in Germany, our method was able to identify an outbreak origin in close proximity to the actual outbreak location (Uelzen, Lower Saxony).Already three days (May 22nd, 2011) after spatial infection clustering, the effective distance approach was able to reconstruct the actual origin.This is particularly promising, as in the context of EHEC/HUS, conventional outbreak investigations, including case-control-and cohort-studies as well as sample testings and tracings along the food-shipping chain,wrongly suggested tomatoes, leafy salads and cucumbers as contaminated foods.When specific suspicions arose that cucumbers imported in Hamburg would be the infection source, our method classifies Hamburg to be a very unlikely origin.
The consideration of such contradictory information could have lead to more spatially targeted sample testing, and, therefore could have improved the efficiency of the outbreak investigations.

Fig. 1 :
Fig. 1: E. coli incidence in Germany during 2011 EHEC/HUS outbreak.(A) Each panel depicts a different outbreak week (May 30th until June 20th, 2011).Color intensity quantifies infection counts in for each of the German districts (Data source: 28 , Map source: 29 ).The alleged origin of outbreak (district Uelzen) is marked in blue.(B) Time course of E. coli incidence for selected districts.For reference, the overall German incidence per district is shown in black.

Fig. 1 :
Fig. 1: E. coli incidence in Germany during 2011 EHEC/HUS outbreak.(A) Each panel depicts a different outbreak week (May 30th until June 20th, 2011).Color intensity quantifies infection counts in for each of the German districts (Data source: 28 , Map source: 29 ).The alleged origin of outbreak (district Uelzen) is marked in blue.(B) Time course of E. coli incidence for selected districts.For reference, the overall German incidence per district is shown in black.

Fig. 1 :
Fig. 1: E. coli incidence in Germany during 2011 EHEC/HUS outbreak.(A) Each panel depicts a different outbreak week (May 30th until June 20th, 2011).Color intensity quantifies infection counts in for each of the German districts (Data source: 28 , Map source: 29 ).The alleged origin of outbreak (district Uelzen) is marked in blue.(B) Time course of E. coli incidence for selected districts.For reference, the overall German incidence per district is shown in black.
the population size of origin , destination , and their geographic distance, respectively.The non-negative exponents and distance scale are parameters of the gravity model32 , the population size of origin , destination , and their geographic distance, respectively.The non-negative exponents and distance scale are parameters of the gravity model32 , the population size of origin , destination , and their geographic distance, respectively.The non-negative exponents and distance scale are parameters of the gravity model32 ,

( 3 )
Furthermore, we let the coupling strength decrease with distance.The corresponding tail exponent is consistent with the quantitative assessments of human mobility and transportation networks34 , 35  , i.e.

( 5 )
for each node .If at each node, traffic was randomly distributed among the remaining other nodes, a null model would produce .Thus, we only retain links that possess a flux fraction larger than , i.e. if(5)   for each node .If at each node, traffic was randomly distributed among the remaining other nodes, a null model would produce .Thus, we only retain links that possess a flux fraction larger than , i.e. if(5)   for each node .If at each node, traffic was randomly distributed among the remaining other nodes, a null model would produce .Thus, we only retain links that possess a flux fraction larger than , i.e. if9 PLOS Currents Outbreaks

Fig. 2 :
Fig. 2: Multiscale Food Distribution in Germany (A) A map of German districts; hues correspond to the regional network modules obtained by modularity maximization 37 ; color intensity quantifies population density.The origin of the 2011 EHEC/HUS outbreak is marked by a white circle in Bienenbüttel located in the district Uelzen.(B) German food shipping network constructed from a gravity model with parameters , and km.Each district is represented by a network node, coloring corresponds to the link strength.The network has a connectivity of 18.1%.

Fig. 2 :
Fig. 2: Multiscale Food Distribution in Germany (A) A map of German districts; hues correspond to the regional network modules obtained by modularity maximization 37 ; color intensity quantifies population density.The origin of the 2011 EHEC/HUS outbreak is marked by a white circle in Bienenbüttel located in the district Uelzen.(B) German food shipping network constructed from a gravity model with parameters , and km.Each district is represented by a network node, coloring corresponds to the link strength.The network has a connectivity of 18.1%.

Fig. 2 :
Fig. 2: Multiscale Food Distribution in Germany (A) A map of German districts; hues correspond to the regional network modules obtained by modularity maximization 37 ; color intensity quantifies population density.The origin of the 2011 EHEC/HUS outbreak is marked by a white circle in Bienenbüttel located in the district Uelzen.(B) German food shipping network constructed from a gravity model with parameters , and km.Each district is represented by a network node, coloring corresponds to the link strength.The network has a connectivity of 18.1%.
worldwide spread of SARS in 2003 and pandemic influenza H1N1 in 2009.The effective distance method assumes that, irrespective of the details of the local dynamics of a spreading process, the proliferation of the contagion throughout the network is determined by the coupling between nodes, and that this coupling is quantified by the flux matrix elements .Given an initial outbreak location , a contagion process can take a multitude of paths to any other node in the network.Each path is taken with probability .Consider a path that starts at and ends at with a sequence of intermediate steps at nodes such that worldwide spread of SARS in 2003 and pandemic influenza H1N1 in 2009.The effective distance method assumes that, irrespective of the details of the local dynamics of a spreading process, the proliferation of the contagion throughout the network is determined by the coupling between nodes, and that this coupling is quantified by the flux matrix elements .Given an initial outbreak location , a contagion process can take a multitude of paths to any other node in the network.Each path is taken with probability .Consider a path that starts at and ends at with a sequence of intermediate steps at nodes such that worldwide spread of SARS in 2003 and pandemic influenza H1N1 in 2009.

Fig. 3 :
Fig. 3: Effective distance and outbreak origin reconstruction in multi-scale network contagion processes.(A)Each panel depicts a temporal snapshot (from left to right at equidistant time intervals) in a simple contagion process in which infected nodes (red) deliver the infection to connected nodes at a fixed rate before they recover at a another rate (SIR dynamics38  ).The network consists of 512 nodes on a quasitriangular, random lattice.Each node is connected to its nearest local neighbors.In addition to the local lattice structure, 128 long range links exist between randomly chosen pairs of nodes.The origin of the outbreak is marked in green.Because of long range connectivity the pattern quickly loses spatial structure and becomes chaotic such that it is difficult to predict from metric cues alone when the contagion arrives at a given node.More importantly, long range connectivity leads to a loss of spatial coherence and it becomes impossible to determine the origin of outbreak.(B) The same pattern as in (A) is shown in the effective distance perspective from the outbreak origin.The depicted tree is the shortest path tree, i.e. the most probable spreading path of the contagion process.Radial distance is proportional to effective distance as defined in the text.In this alternative representation the complex pattern in the conventional view is mapped onto a simple propagating wave front and arrival times are easily computed.(C) The regularity of the pattern is only present from the perspective of the actual outbreak origin.When the contagion process is viewed from any other node (here the node depicted in blue), the pattern lacks regularity.

Fig. 3 :
Fig. 3: Effective distance and outbreak origin reconstruction in multi-scale network contagion processes.(A)Each panel depicts a temporal snapshot (from left to right at equidistant time intervals) in a simple contagion process in which infected nodes (red) deliver the infection to connected nodes at a fixed rate before they recover at a another rate (SIR dynamics38  ).The network consists of 512 nodes on a quasitriangular, random lattice.Each node is connected to its nearest local neighbors.In addition to the local lattice structure, 128 long range links exist between randomly chosen pairs of nodes.The origin of the outbreak is marked in green.Because of long range connectivity the pattern quickly loses spatial structure and becomes chaotic such that it is difficult to predict from metric cues alone when the contagion arrives at a given node.More importantly, long range connectivity leads to a loss of spatial coherence and it becomes impossible to determine the origin of outbreak.(B) The same pattern as in (A) is shown in the effective distance perspective from the outbreak origin.The depicted tree is the shortest path tree, i.e. the most probable spreading path of the contagion process.Radial distance is proportional to effective distance as defined in the text.In this alternative representation the complex pattern in the conventional view is mapped onto a simple propagating wave front and arrival times are easily computed.(C) The regularity of the pattern is only present from the perspective of the actual outbreak origin.When the contagion process is viewed from any other node (here the node depicted in blue), the pattern lacks regularity.

Fig. 3 :
Fig. 3: Effective distance and outbreak origin reconstruction in multi-scale network contagion processes.(A)Each panel depicts a temporal snapshot (from left to right at equidistant time intervals) in a simple contagion process in which infected nodes (red) deliver the infection to connected nodes at a fixed rate before they recover at a another rate (SIR dynamics38  ).The network consists of 512 nodes on a quasitriangular, random lattice.Each node is connected to its nearest local neighbors.In addition to the local lattice structure, 128 long range links exist between randomly chosen pairs of nodes.The origin of the outbreak is marked in green.Because of long range connectivity the pattern quickly loses spatial structure and becomes chaotic such that it is difficult to predict from metric cues alone when the contagion arrives at a given node.More importantly, long range connectivity leads to a loss of spatial coherence and it becomes impossible to determine the origin of outbreak.(B) The same pattern as in (A) is shown in the effective distance perspective from the outbreak origin.The depicted tree is the shortest path tree, i.e. the most probable spreading path of the contagion process.Radial distance is proportional to effective distance as defined in the text.In this alternative representation the complex pattern in the conventional view is mapped onto a simple propagating wave front and arrival times are easily computed.(C) The regularity of the pattern is only present from the perspective of the actual outbreak origin.When the contagion process is viewed from any other node (here the node depicted in blue), the pattern lacks regularity.

Fig. 4 :
Fig. 4: Shortest path trees and effective distance among districts in Germany.

Fig. 5 :
Fig. 5: EHEC/HUS outbreak origin reconstruction Each panel depicts a scatterplot of mean and standard deviation (see Eqs. (14)) of effective distances from candidate nodes to the subset of nodes that have nonzero incidence for weeks after outbreak onset.All districts are considered as potential candidates as outbreak origin.Symbol size quantifies population size of each district, blueness quantifies incidence in the respective week.A few large district are labeled.The district with combined minimal mean and variance (closest to the origin) has a high likelihood of being the actual 2011 EHEC/HUS outbreak origin.The actual outbreak origin Uelzen in marked by a red cross.

Fig. 5 :
Fig. 5: EHEC/HUS outbreak origin reconstruction Each panel depicts a scatterplot of mean and standard deviation (see Eqs. (14)) of effective distances from candidate nodes to the subset of nodes that have nonzero incidence for weeks after outbreak onset.All districts are considered as potential candidates as outbreak origin.Symbol size quantifies population size of each district, blueness quantifies incidence in the respective week.A few large district are labeled.The district with combined minimal mean and variance (closest to the origin) has a high likelihood of being the actual 2011 EHEC/HUS outbreak origin.The actual outbreak origin Uelzen in marked by a red cross.

Fig. 5 :
Fig. 5: EHEC/HUS outbreak origin reconstruction Each panel depicts a scatterplot of mean and standard deviation (see Eqs. (14)) of effective distances from candidate nodes to the subset of nodes that have nonzero incidence for weeks after outbreak onset.All districts are considered as potential candidates as outbreak origin.Symbol size quantifies population size of each district, blueness quantifies incidence in the respective week.A few large district are labeled.The district with combined minimal mean and variance (closest to the origin) has a high likelihood of being the actual 2011 EHEC/HUS outbreak origin.The actual outbreak origin Uelzen in marked by a red cross.

Fig. 6
ranks the candidate outbreak locations for weeks 2 to 8. The ranks were computed by comparing the effective distance to the origin in the scatter plot.For all time windows except weeks 4 and 8 the correct district ranks among the top candidates for EHEC outbreak origin.Note that other potential outbreak origins are typically districts that are in close geographic proximity to the actual outbreak location.This implies that even if the origin cannot be identified on the scale of a single district, potential candidates according to the effective distance methods are confined to a small region in the vicinity of the actual outbreak location, for instance the set of neighboring districts.

Fig. 6
ranks the candidate outbreak locations for weeks 2 to 8. The ranks were computed by comparing the effective distance to the origin in the scatter plot.For all time windows except weeks 4 and 8 the correct district ranks among the top candidates for EHEC outbreak origin.Note that other potential outbreak origins are typically districts that are in close geographic proximity to the actual outbreak location.This implies that even if the origin cannot be identified on the scale of a single district, potential candidates according to the effective distance methods are confined to a small region in the vicinity of the actual outbreak location, for instance the set of neighboring districts.

Fig. 6 :
Fig. 6: EHEC/HUS outbreak origin reconstruction For each week 2-9 relative to the beginning of the EHEC/HUS outbreak and for each node in the network a rank was computed based on minimization of a concentricity score .District Uelzen, the actual outbreak district is robustly ranked among the top ranked districts, in weeks 3, 6 and 7, Uelzen is ranked first.We considered all 412 districts.For each district the distance provided in parenthesis represents the approximate distance to the actual outbreak location Bienenbüttel in district Uelzen.

Fig. 6 :
Fig. 6: EHEC/HUS outbreak origin reconstruction For each week 2-9 relative to the beginning of the EHEC/HUS outbreak and for each node in the network a rank was computed based on minimization of a concentricity score .District Uelzen, the actual outbreak district is robustly ranked among the top ranked districts, in weeks 3, 6 and 7, Uelzen is ranked first.We considered all 412 districts.For each district the distance provided in parenthesis represents the approximate distance to the actual outbreak location Bienenbüttel in district Uelzen.

Fig. 6 :
Fig. 6: EHEC/HUS outbreak origin reconstruction For each week 2-9 relative to the beginning of the EHEC/HUS outbreak and for each node in the network a rank was computed based on minimization of a concentricity score .District Uelzen, the actual outbreak district is robustly ranked among the top ranked districts, in weeks 3, 6 and 7, Uelzen is ranked first.We considered all 412 districts.For each district the distance provided in parenthesis represents the approximate distance to the actual outbreak location Bienenbüttel in district Uelzen.

Fig. 7 :
Fig. 7: Correlation of effective distance and arrival time during the German EHEC/HUS outbreak, 2011.

Fig. 7 :
Fig. 7: Correlation of effective distance and arrival time during the German EHEC/HUS outbreak, 2011.

Fig. 7 :
Fig. 7: Correlation of effective distance and arrival time during the German EHEC/HUS outbreak, 2011.

Fig. 8 :
Fig. 8: Effective distance and arrival time analysis For each potential district as outbreak origin we computed the Pearson correlation of arrival time and effective distance and ranked all districts with respect to correlation magnitude.The actual outbreak origin Uelzen is ranked at position 30.High correlation districts all lie in Northern Germany.

Fig. 8 :
Fig. 8: Effective distance and arrival time analysis For each potential district as outbreak origin we computed the Pearson correlation of arrival time and effective distance and ranked all districts with respect to correlation magnitude.The actual outbreak origin Uelzen is ranked at position 30.High correlation districts all lie in Northern Germany.

Fig. 8 :
Fig. 8: Effective distance and arrival time analysis For each potential district as outbreak origin we computed the Pearson correlation of arrival time and effective distance and ranked all districts with respect to correlation magnitude.The actual outbreak origin Uelzen is ranked at position 30.High correlation districts all lie in Northern Germany.
likelihood estimate, which assumes virus spread in a general graph along a breadth-first-search tree and derive theoretical thresholds for the detection probability.Pinto at al.19extended this estimate for partially observed transmission trees.Alternative origin reconstruction methods are based on shortest paths or consequent diameter from transmission trees 20 , 21 .Prakash et al. 22 and Fioriti and Chinnici 23 developed methods based on spectral techniques to identify a (set of) origin nodes on a transmission network.They utilize a close relationship of source estimation and node centrality as shown by Comin and da Fontoura Costa 24 .
17 , 18 developed a universal source detection 2 PLOS Currents Outbreaks maximum extended this estimate for partially observed transmission trees.Alternative origin reconstruction methods are based on shortest paths or consequent diameter from transmission trees 20 , 21 .Prakash et al. 22 and Fioriti and Chinnici 23 developed methods based on spectral techniques to identify a (set of) origin nodes on a transmission network.They utilize a close relationship of source estimation and node centrality as shown by Comin and da Fontoura Costa 24 .
extended this estimate for partially observed transmission trees.Alternative origin reconstruction methods are based on shortest paths or consequent diameter from transmission trees 20 , 21 .Prakash et al. 22 and Fioriti and Chinnici 23 developed methods based on spectral techniques to identify a (set of) origin nodes on a transmission network.They utilize a close relationship of source estimation and node centrality as shown by Comin and da Fontoura Costa 24 .