Population Density, Poor Sanitation, and Enteric Infections in Nueva Santa Rosa, Guatemala

Poor sanitation could pose greater risk for enteric pathogen transmission at higher human population densities because of greater potential for pathogens to infect new hosts through environmentally mediated and person-to-person transmission. We hypothesized that incidence and prevalence of diarrhea, enteric protozoans, and soil-transmitted helminth infections would be higher in high-population-density areas compared with low-population-density areas, and that poor sanitation would pose greater risk for these enteric infections at high density compared with low density. We tested our hypotheses using 6 years of clinic-based diarrhea surveillance (2007–2013) including 4,360 geolocated diarrhea cases tested for 13 pathogens and a 2010 cross-sectional survey that measured environmental exposures from 204 households (920 people) and tested 701 stool specimens for enteric parasites. We found that population density was not a key determinant of enteric infection nor a strong effect modifier of risk posed by poor household sanitation in this setting.


SUPPLEMENTAL INFORMATION
VICo surveillance stool specimen collection and testing. Clinic staff collected a stool specimen from enrolled individuals (whole stool or rectal swab if the former was not possible). The details of specimen collection and testing have been previously described. [17][18][19] Stool samples were stored at 4°C (rectal swabs in Cary-Blair media) and transported in temperaturemonitored containers (4°C) within 24 hours of collection to the laboratory at the Cuilapa Regional Hospital for initial analysis. Samples were tested for the presence of soil-transmitted helminth (STH) infections (Ascaris lumbricoides, Trichuris trichiura, hookworm [Ancylostoma or Necator]), protozoan parasites (Giardia lamblia, Entamoeba histolytica, Entamoeba coli, Blastocystis hominis), and tapeworms (Hymenolepis nana, Hymenolepis diminuta) by direct smear microscopic examination 23 ; for bacteria (Salmonella spp., Shigella spp., Campylobacter spp.) by direct culture 24 ; for Escherichia coli pathotypes (enterotoxigenic E. coli, enteropathogenic E. coli, and Shiga toxin-producing E. coli) using conventional polymerase chain reaction 25 ; for rotavirus (group A) by using a commercial qualitative enzyme immunoassay (IDEIA Rotavirus test kits; Oxoid Ltd., Ely, United Kingdom) 17 ; and for norovirus (genogroups I and II) using a standard monoplex quantitative reverse transcription polymerase chain reaction. 18,26 The laboratory at the Universidad del Valle de Guatemala performed quality control assessments for all assays.
Selecting a dichotomous cut point for population density in the cross-sectional survey. We selected a systematic sample of 51 households (25% of 204) across the range of observed densities. An analyst prepared standardized aerial images of 100 × 100 m centered on each sampled household; images were blinded and did not include any identifying information about estimated density. Three independent investigators from the United States and Guatemala (BFA, CJ, JMC) then classified each household image into "high-" or "low-"density groups based on a qualitative assessment of the aerial images, with no restriction other than there were two classes of density. We derived an investigator consensus classification for each of the 51 sampled households by majority vote across the three reviewers. Primary reviewer agreement (BFA, CJ) was 94% (48/51), and there was unanimous classification of images for 69% (35/51). We used the ROCR package in R to calculate agreement between every possible density cut point in the subsample using the investigator consensus classification as the gold standard. 1 We used the cut point that maximized agreement (5,348 persons/km 2 ; 74th percentile of the distribution; average classification accuracy 85%) as the population density cut point for high and low density. Supplemental Figures 4 and 5 summarize the population density distribution and illustrate the average accuracy for different cut points.
Attempt to characterize neighborhood sanitation. We attempted to include sanitation measures at the neighborhood level (defined as a radius of 50 m around each study household), based on the sanitation information available for the random sample of 204 households who were surveyed among the 10,770 roofs identified in aerial imagery of the Nueva Santa Rosa municipality. We used a k-nearest neighbor algorithm 2 to estimate sanitation conditions for all 10,770 roofs in the community, assuming that sanitation conditions follow high levels of spatial correlation. The algorithm identified the k-nearest neighbors from the 204 households in the survey based on Euclidean distance, using majority voting to classify the sanitation conditions of the living structure. Five-fold crossvalidation was used to select k that minimized the training set classification error in the 204 households. The cross-validated classification error for predicted sanitation conditions in the 204 study households was > 38% for all values of k, which exceeded our prespecified 20% error rate. Thus, we felt there was insufficient information in the data to accurately predict sanitation conditions for roofs in the study region based only on geographic location, which led us to limit the analysis to household sanitation conditions. Statistical analysis details. Our parameter of interest for the association between enteric infections and the independent and combined exposures of population density and sanitation was the prevalence ratio (PR). The PR associated with poor sanitation (A = 1) within each stratum of population density (D = d) for outcome Y is: The marginal PR is averaged over covariates W. We examined whether the association between enteric infections and poor sanitation was modified by population density on the additive scale because we were interested in whether the effect of poor sanitation would be greater in high-density compared with low-density households with the aim of targeting future interventions to specific populations. 32 We quantified effect modification with the relative excess risk due to interaction (RERI), which assesses whether the effect of the two exposures together exceeds the sum of their effects when considered separately. 32 For the prevalence of an outcome under two dichotomous conditions (p AD ), the RERI is: A RERI value > 0 indicates positive effect modification. Since this analysis relied on existing data, for our second objective (to determine whether poor sanitation poses a greater risk at high-population density compared with low-population density) we calculated the minimum detectable effect for the stratified PRs and the RERI associated with poor sanitation given the size of the study, the empirical distribution of poor sanitation by high and low density, and assuming 8% outcome prevalence in the improved sanitation group. The study was sufficiently large to detect a PR associated with poor sanitation of 2.13 (low density) and 2.81 (high density) with 80% power and a two-sided alpha of 5%. Using a simulation-based approach, 3,4 and assuming a PR = 2 associated with poor sanitation in the low-density stratum, we estimated that the minimum detectable RERI given the design was 2.25. We computed adjusted estimates using targeted minimum loss-based estimation, which is a double-robust approach to adjust for covariates (W). 5 We used a data-adaptive ensemble machine learning algorithm 6 to flexibly control for covariates in all adjusted analyses; the algorithm included the following model selection approaches: main effects log-linear regression, stepwise Akaike Information Criterion, 7 generalized additive models, 8 and glmnet (lasso) regression. 9 We considered the following covariates: age, sex, household head education, people per room, biofuel use, wealth index quartile, handwashing location within 10 m of the toilet stocked with water and soap, and drinking water supply. The wealth index was the first principal component from a principal components analysis 10 using the following household assets and income variables: refrigerator, computer, radio, clothes washer, clothes dryer, car/truck, television, telephone, microwave, watch, bicycle, motorcycle/ scooter, and reported household income. The wealth index provides a relative measure of wealth within the study population. We selected covariates based on our hypothesized causal model (Figure 1) 31 to block any backdoor paths between sanitation conditions and enteric infections. We calculated percentile 95% confidence intervals for all parameters of interest using a nonparametric bootstrap that resampled households with replacement with 1,000 iterations. 5,11,32 We conducted all data management and statistical analysis in R version 3.03 (www.r-project.org).
Exploratory analyses. Following our primary analysis, we conducted a series of exploratory analyses to describe the major confounders of the relationship between poor household sanitation conditions and enteric infections. We also mapped the geographic distribution of study households and cases of enteric infection to examine the spatial relationship between sanitation conditions, population density, enteric infections, and other potentially important exposures.
We found that the composite wealth index was the single largest source of confounding for the positive associations between poor sanitation and enteric infections in Table 4. Wealth was a strong predictor of STH infection: the A. lumbricoides infection prevalence in the increasing four quartiles of the wealth index was 22%, 7%, 3%, and 1% (Supplemental Table 5). We observed a similar pattern of decreased infection prevalence for E. coli with increasing wealth, although the magnitude of gradient across wealth was less striking. Neither diarrhea nor G. lamblia infection exhibited the same extreme pattern of reduced infection prevalence by increasing wealth quartile (Supplemental Table 5). We observed a clear concentration of A. lumbricoides cases (81%) in a single village named Jumaytepeque (Supplemental Figure 1). Although similar in population density to the municipal center of Nueva Santa Rosa, A. lumbricoides prevalence was 9-fold higher in Jumaytepeque (27% versus 3%). Jumaytepeque also had the single largest concentration of households in the bottom quartile of the wealth index (61% in the bottom quartile), predominantly poor sanitation conditions (80% classified as poor), and abundant soil floors (56%). We did not observe this same type of extreme spatial aggregation of other enteric infections in the study population (Supplemental Figures 1-3).

Sensitivity analysis for soil-transmitted helminth detection.
We had some concern that the fecal parasite concentrator assay used in the study could have low sensitivity for STH. 12 The laboratory also tested a subsample of 324 stool specimens for STH using the Kato-Katz method as part of a separate internal validation study. In a sensitivity analysis, we classified individuals as positive for STH outcomes if they were positive by either the fecal parasite concentrator assay or the Kato-Katz assay. For the 377 individuals who were only tested with the fecal parasite concentrator assay, their outcomes did not change in this analysis. The use of the composite outcome definition led to additional cases of T. trichiura (N = 14) and A. lumbricoides (N = 11) but not of hookworm. We re-estimated the association between poor sanitation and STH infection using the composite outcome definition, and the results are given in Supplemental Table 4. SUPPLEMENTAL FIGURE 4. Geographic distribution of 10,770 identified roofs, 204 study households, and 67 diarrhea cases in the municipality of Nueva Santa Rosa, Guatemala, 2010. Inset 1 includes the municipal center of Nueva Santa Rosa, and Inset 2 includes the town of Jumaytepeque, which differ in their environmental and wealth conditions but not in population density. The median [inter-quartile range] population density in persons per km 2 is similar for study households Inset 1 (4,966 [2,340,9,104]) and Inset 2 (4,966 [1,528,9,167] SUPPLEMENTAL FIGURE 5. Geographic distribution of 10,770 identified roofs, 204 study households, and 48 Giardia lamblia cases in the municipality of Nueva Santa Rosa, Guatemala, 2010. Inset 1 includes the municipal center of Nueva Santa Rosa, and Inset 2 includes the town of Jumaytepeque, which differ in their environmental and wealth conditions but not in population density. The median [inter-quartile range] population density in persons per km 2 is similar for study households Inset 1 (4,966 [2,340,9,104]) and Inset 2 (4,966 [1,528,9,167]