Predicting the Unmet Need for Biologically Targeted Coverage of Insecticide-Treated Nets in Kenya

In some countries the biological targeting of universal malaria prevention may offer optimal impact on disease and significant cost-savings compared with approaches that presume universal risk. Spatially defined data on coverage of treated nets from recent national household surveys in Kenya were used within a Bayesian geostatistical framework to predict treated net coverage nationally. When combined with the distributions of malaria risk and population an estimated 8.1 million people were not protected with treated nets in 2010 in biologically defined priority areas. After adjusting for the proportion of nets in use that were not long lasting, an estimated 5.5 to 6.3 million long-lasting treated nets would be required to achieve universal coverage in 2010 in Kenya in at-risk areas compared with 16.4 to 18.1 million nets if not restricted to areas of greatest malaria risk. In Kenya, this evidence-based approach could save the national program at least 55 million US dollars.

(ITN) coverage data, analysis of predictors of ITN coverage and the details of the Bayesian geo-statistical model procedures are presented here.
1.1. Semivariogram of the ITN coverage data. The variogram or semivariogram is a graphical summary of spatial autocorrelation structure in the data. The semivariogram was calculated using the variogram function and an exponential model was fit using variofit function in R (R version 2.10.1, the R foundation for Statistical Computation, http://www .r-project.org/ ). Only data of sample size of ≥ 40 persons ( N = 942) were used to construct the semivariogram. This sample size restriction was imposed to minimize the effect of random variations inherent in small samples. 1 , 2 The semivariogram of the data showed spatial autocorrelation in the combined ITN coverage data of up to 2 decimal degrees or 222 km at the equator ( Figure S1 ).

Analysis of the predictors of ITN coverage.
In Kenya, urbanization, 3 physical access to sources of ITNs, 3 and poverty 3 , 4 have all been shown to be determinants of ITN coverage. For these covariates to be used for the prediction of ITN coverage, however, they must be universally available at similar spatial resolutions for both the survey locations and across the prediction surface. For this study only urbanization and distance to ITN sources met this criterion. Poverty data for Kenya were only available for relatively large administrative areas such as locations and constituencies and were too coarse for consideration as mapped covariates in this study.
1.2.1. Urbanization. To define urban extents in Kenya the 1999 national census urban-rural definition of enumeration areas (EA) was used. 5 An enumeration area was defined as including part of a village, a whole village, or group of villages that are usually not more than 100 households (circa 500 people). 5 The EA maps for 67 of 69 districts in Kenya were either obtained from the Kenya National Bureau of Statistics or were digitized in-house from 1:50,000 topographical maps and each EA was classified as urban or rural. The two districts where EA data were not available were Nairobi and Mombasa, both of which were exclusively urban. The EA map was gridded into a 1 × 1 km surface and the urban/rural classifications were extracted to each survey location using ArcGIS 9.3 (ESRI Inc., New York) extraction tools.

Distance to sources of ITN.
A list of all antenatal health facilities where ITNs were distributed to pregnant women and mothers of young children either as highly subsidized or at no cost approaches for the entire period of distribution were provided by Population Services International (PSI). For this study, those facilities that distributed PSI bed nets from May 2005 to February 2009 were selected, corresponding to the month when the LLINs were first introduced in the Kenya public sector to the month the FSD survey, the most recent of the national household ITN surveys, ended. The 2006 free mass campaign implemented by the Ministry of Health (MoH) used health facilities, market centers, schools, churches, and other community focal points as delivery sites for LLINs. Additional countrywide focal distribution data were extracted from the website of the Against Malaria Foundation [ http://www.againstmalaria.com/ Distribution_Countries.aspx ]. For the districts in the North Eastern province, health facilities and villages where ITNs were distributed by the MENTOR initiative, United Nations Children's Fund (UNICEF), and the Kenya Red Cross were assembled. The health facilities, market centers, schools, and other sites that were used by the various distribution agents for either the routine or free mass distribution of ITNs were mapped using national geographic information systems databases of health service providers 6 ; schools 7 ; and settlements 8 , 9 ; and a variety of smaller databases developed as part of research projects or development programs. To compute distances to the mapped distribution points, a 100 × 100 m spatial resolution Euclidean distance (km) surface to ITN sources was generated and extracted to each survey location using ArcGIS 9.3.

Regression analysis.
The relationships of urbanization and distance to ITN distribution points to the reported ITN coverage were examined using regression analysis in Stata/SE Version 10 (Stata Corp., College Station, TX). A univariate non-spatial binomial logistic regression model was implemented for these predictors with ITN coverage as the dependent variable. The results of the univariate analyses were used to determine the independent influence of each candidate predictor on ITN coverage and identify those which qualified for inclusion in the Bayesian geostatistical model. Those with Wald's P > 0.2 were considered to have qualified for inclusion into the geostatistical model to predict ITN coverage.
The regression analysis showed that urban clusters (odds ratio [OR]: 1.85, 95% confidence interval [CI]: 1.32-2.59; P < 0.001) had significantly higher mean ITN coverage compared with rural, whereas those outside the mean distance to ITN sources (OR: 0.59, 95% CI: 0.45-0.78; P < 0.001) had lower mean ITN coverage. Both covariates met the inclusion criterion into the Bayesian geostatistical model of P < 0.20 and were therefore used for predicting ITN coverage. SI 2.1. Model overview. The underlying value of ITN coverage among all ages, ITN x i , at each location x i for the year 2009 was modeled as a an inverse logit transformation of a spatial structured field superimposed with additional random variation ε( x i ). 10 The count of individuals sleeping under an ITN the night before survey i N + from the total sample of N i in each survey was modeled as a conditionally independent binomial variate. 11 The spatial component was represented by a stationary Gaussian process f ( x i ) with mean μ and covariance C , defined below. The unstructured component ε( x i ) was represented as Gaussian with zero mean and variance V . Both the inference and prediction stages were coded using Python (PyMC version 2.0). 12 Mean definition. The mean component μ was modeled as a linear function of whether the prediction location was: urban (denoted by the indicator variable 1 ur 1 [ x ]) rather than rural; at distance to ITN sources of ≥ 3 km (1 wb 1 [ x ]) rather than < 3 km. The mean component was therefore defined by three parameters: where β x denotes the intercept. Each survey was referenced temporally using the mid-point (in decimal years) between the recorded start and end months.

COVARIANCE DEFINITION
The spatial covariance was modeled as follows: The Matern covariance function, used for the short-range component, includes a parameter ν that controls the degree of differentiability, whereas the exponential covariance function forces the long-range component to be infinitely differentiable. 13 To allow for spatial anisotropy and to accomodate the effect of the curvature of the earth on point-to-point separations, spatial distance between a pair of points x i and x j was computed as great-circle distance d GC (x i , x j ) multiplied by a factor that depends on the angle of inclination θ(x i , x j ) of the vector pointing from x i to x j . θ was computed as if latitude and longitude were Euclidean coordinates (on a cylindrical projection). Pairs of points for which θ was close to an unknown angle λ were relatively highly correlated. The mag-nitude of this 'eccentricity' was controlled by an unknown parameter ω: SI 2.2. Model implementation and output. Bayesian inference was implemented using Markov chain Monte Carlo to generate samples from the posterior distribution of the Gaussian field f ( x i ) at each data location and of the unobserved parameters of the mean, covariance function, and Gaussian random noise component.
Samples were generated from the mid-year 2009 mean of the posterior distribution of f ( x i ) at each prediction location. For each sample of the joint posterior, predictions were made at points on a regular 1 × 1 km spatial grid across Kenya. Model output therefore consisted of samples from the predicted posterior distribution of the 2009 mean ITN coverage at each grid location, which were used to generate point estimates ( Figure 2A in the main text) (computed as the mean of each set of posterior samples).