Multiple Indicator Cluster Survey 2003 in Afghanistan: Outdated Sampling Frame and the Effect of Sampling Weights on Estimates of Maternal and Child Health Coverage

Due to an urgent need for information on the coverage of health service for women and children after the fall of Taliban regime in Afghanistan, a multiple indicator cluster survey (MICS) was conducted in 2003 using the outdated 1979 census as the sampling frame. When 2004 pre-census data became available, population-sampling weights were generated based on the survey-sampling scheme. Using these weights, the population estimates for seven maternal and child healthcare-coverage indicators were generated and compared with the unweighted MICS 2003 estimates. The use of sample weights provided unbiased estimates of population parameters. Results of the comparison of weighted and unweighted estimates showed some wide differences for individual provincial estimates and confidence intervals. However, the mean, median and absolute mean of the differences between weighted and unweighted estimates and their confidence intervals were close to zero for all indicators at the national level. Ranking of the five highest and the five lowest provinces on weighted and unweighted estimates also yielded similar results. The general consistency of results suggests that outdated sampling frames can be appropriate for use in similar situations to obtain initial estimates from household surveys to guide policy and programming directions. However, the power to detect change from these estimates is lower than originally planned, requiring a greater tolerance for error when the data are used as a baseline for evaluation. The generalizability of using outdated sampling frames in similar settings is qualified by the specific characteristics of the MICS 2003—low replacement rate of clusters and zero probability of inclusion of clusters created after the 1979 census.


INTRODUCTION
The Afghanistan Ministry of Public Health (MoPH) initiated a strategy to reconstruct the health system in 2002 with a focus on laying "the foundations for equitable, quality health care for the people of Afghanistan" (1). The MoPH and other stakeholders required baseline population-level health data for planning and evaluation of this health strategy.
Information was particularly needed on the coverage of health services to identify provinces with the greatest problems and to provide a reasonable starting point to gauge future change in the health sector. In the post-Taliban period, the first population-based health survey of national scope was conducted by the United Nations Children's Fund (UNICEF) and the Central Statistics Office (CSO) for the MoPH in 2003. This Multiple Indicator Cluster Survey (MICS) used data of the outdated population census from 1979 for sampling of households. This pragmatic decision was guided by the lack of a national census since 1979 and the urgent need to collect information on the coverage of health services across the country (2). However, questions persisted about the accuracy of the 2003 MICS estimates, given the substantial changes that occurred in the population since the sampling frame was constructed in 1979. An opportunity presented itself to re-assess the 2003 estimates when the CSO conducted a pre-census enumeration in 2004 and, in 2006, published the national and provincial census figures (3).
Population surveys, such as MICS, are important tools for planning, monitoring, and evaluation of health programmes in developing countries. The results of these surveys are used for summative evaluations and for influencing significant policy decisions on allocation of resources, continuation, and restructuring of programmes (4). In recent times, the 'instrumental' use of such results has increased as a greater proportion of decisions on programme oversight is directly based on these results (5). The estimates from the MICS 2003 have been put to 'instrumental' use as official health indicators for Afghanistan and have been used as benchmarks for health policy (6). Although the MICS 2003 was the first quantitative assessment of coverage of services targeted to women and children in the post-Taliban period, a further study was needed to assess whether these estimates would be adequate for providing baseline estimates for future evaluation of healthcare coverage in Afghanistan (7).
The basic approach in population-based surveys is to collect information from a random sample of people that is representative of the population (8). The sampling and data-collection are usually conducted in multiple stages to overcome the constraints of time, money, and logistics. In order for the results to reflect the situation in the population from which the data are collected, the sampling scheme must be incorporated in the analysis. This usually requires the use of sampling weights and statistical techniques to accommodate for the multi-stage sampling design. The purpose of weighting sample data is to assure the representativeness of the sample vis-a-vis the study population. The inverse of the selection probability of a sampled unit is used as the sampling weight for that unit. The population estimates generated without sampling weights could be biased (8,9). Evaluations of programmes based on the 'instrumental' use of these survey results can be adversely affected by this potential bias and lead to incorrect conclusions. The field of summative evaluation of health programmes can benefit from applied research on this aspect of survey methods. This is especially true in post-conflict settings where the lack of good, routine health information systems, vital registration systems, and census data make household surveys indispensable for information on the health of the population (10). The scarcity of reliable, comprehensive data is considered one of the greatest challenges in planning and evaluating post-conflict reconstruction of the health systems (11).
The clusters for the MICS 2003 were systematically sampled according to the 1979 census using the probability proportional to size (PPS) technique. Therefore, the sample was assumed to be selfweighted, and hence, unweighted estimates for coverage of health services were generated. The present study used the 2004 Afghanistan pre-census figures to generate a set of sampling weights and calculate provincial and national estimates for the coverage of seven maternal and child health services from the MICS 2003 in Afghanistan. We compared the weighted and unweighted estimates to study the effect of these sampling weights on bias and precision of survey estimates and discuss the implications for baseline assessment and evaluation of health programmes.

Sampling frame
The target population for this study was the settled population of Afghanistan

Sample-size and sample design
The indicators relating to vaccinations required the largest sample-size. The smallest target group for these indicators was children aged 12-23 months. An earlier MICS conducted in the eastern region of Afghanistan estimated that an average of 0.26 children aged 12-23 months lived in each household (12). The survey planners concluded that a precision level of ±10% of the estimated prevalence was desired at the provincial level. With these specified, assuming a design effect of 1.5 and a prevalence of 50%, the needed sample-size was 138-144 children aged 12-23 months in every province, which would be met by surveying 550 households in every province. Under the standard assumption of the above parameters being constant, the sampling error would be lower for indicators where the target age-group was wider, e.g. supplementation of vitamin A for children aged 6-59 months.
A stratified multi-stage cluster-sampling design was used for the sampling of households in the 32 provinces, where each province was a stratum. In each of the 32 provinces, a cluster was a village or a town. Information on the number of households in each village and town of every province was collected from the 1979 census database. In total, 20 clusters were systematically selected without replacement in each province with probability of selection being proportional to size (PPS), where size was the number of households in a cluster. Villages and towns (mahals) with their number of households were listed in geographical order, and from cumulative households, after a random start, subsequent clusters were selected after a fixed interval. These clusters were specified as the primary sampling units (PSUs). To collect information on 550 households per province, the total number of households surveyed in every cluster ranged from 27 to 28. The 32 provinces were included as 32 strata during analysis of data.
Clusters initially inaccessible for reasons, such as flood or absence of village head, were covered at a later date. Only one cluster in one province could not be reached for security reasons (a clash between two rival villages). No cases where one selected cluster had split into two or more clusters were encountered during the survey. In each sampled cluster, the number and location of households were verified with the elderly local residents, and a sketch-map indicating well-known landmarks, such as mosques, schools, and health centres, was prepared. In cases where a selected cluster from the 1979 list had been destroyed by war, it was replaced by the next cluster from the household listing. The replacement rate was less than 10% in all the provinces. Clusters that emerged after 1979 could not be selected since the 1979 census list was used as the sampling frame.

Selection of households in a cluster
A household was defined as the people (men and women) usually taking their meals from the same cooking-pot and those who share household assets and accumulate their earnings to procure food and other household materials. The possibility of a dwelling/structure being inhabited by more than one household was considered, and the surveyors were instructed to count each household separately in such cases. Every sampled cluster was partitioned into segments of approximately 55 households each, and one segment was randomly selected. All the households in the selected segment were listed separately even if they lived in the same structure, such as an apartment house or multi-family compound, and every alternate household was interviewed with a random start (1st or 2nd). If a selected household was absent on the day of interview, up to two additional efforts were made on later dates. In cases where no interview could be conducted after three attempts, the selected household was replaced by the nearest household next door. Data were collected for all the respondents meeting the eligibility requirements for an indicator in a sampled household. Households where eligible respondents refused to participate in the survey were replaced by the nearest household next door.

pre-census data-collection
During 2004, the Central Statistics Office (CSO) of Afghanistan sent teams to conduct door-to-door enumeration in all the 32 provinces. In 29 provinces, complete enumeration was conducted, and in three provinces with areas where conditions were deemed too dangerous to send field workers, only partial enumeration was possible. This pre-census laid the ground work for future censuses by providing codes for each province, district, village, subvillage (in large villages), urban sector (nahia), and block. Households were also numbered and counted. Standardized quality-assurance procedures were followed, including several layers of supervisory teams and systematic re-collection of data from selected sites to ensure consistency. Based on this work, the CSO published the official census figures for all the provinces in 2006 (3). While the figures for 29 provinces were based on complete enumeration, the figures for three unsecure provinces were based on partial enumeration supplemented by projections based on demographic models.

Generation of sampling weights based on 2004 pre-census
Although it was designed to be self-weighted, the MICS 2003 sample could not be considered selfweighted. There were significant changes in number and distribution of households in the country during 1979-2004 due to displacement and growth of population over time. The list of villages and towns based on the 1979 census was outdated and incomplete as new villages had come into existence while some villages had been displaced due to war and natural disasters, such as floods and draughts (13). Therefore, the sampling design was used for generating sampling weights in the present study. The sampling weight for every sampled household in a province was the inverse of the selection probability of that household.
The formula to generate the sampling weight for a household (h) in sampled segment (i) within the sampled cluster (k) in province P was as follows: where a p =Number of primary sampling units (PSUs) selected in province P; a j =Number of PSUs in province P; b pk =Number of segment(s) in a selected PSU k in province P; c pih =Number of households selected in a selected segment i in PSU k in province P; c pil =Number of households in a selected segment i in PSU k in province P; The W pih value for each household was used as its sampling weight for provincial estimates.
The additional factor for a household (h) in province P to generate the national estimate was as follows: where N ph =Total number of households in province P The formula to generate sampling weight for national estimates was as follows: The W pih n value for each household was used as its sampling weight for national estimates.
The provincial and national sampling weights were normalized to sum to the sample-size. The two provinces-Panjsher and Daykundi-were created after the 2003 MICS from Parwan and Uruzgan respectively. The 2006 census figures for Panjsher and Daykundi were combined with Parwan and Uruzgan respectively. These figures were then used for generating sampling weights for Parwan and Uruzgan.
The SVYTAB command in the Stata software was used for the calculation of variance estimates taking the design of the survey into account (14). By default, the SVY set of commands compute standard errors using a linearized variance estimator based on a first-order Taylor series approximation (15). In the non-survey context, this variance estimator is referred to as the robust variance estimator (Huber-White sandwich estimator). Each province was specified as the stratum and each cluster as the PSU. The weighted estimates and confidence intervals were calculated by specifying sampling weights in the SVY command. The reported indicators were proportions which used total numbers of women or children as denominators. Since these were not fixed for a given province but are random variables, we estimated the variance of a ratio. This estimation is done automatically when this type of analysis is specified in the Stata program. For proportions, the confidence interval was derived using a logit transformation so that the interval lies between 0 and 1 (16).
The UNICEF defined the coverage variables as the proportion of the population not covered so that a higher point estimate represents a worse situation, i.e. lower coverage. We retained these definitions to be comparable with the original MICS 2003 report published by the UNICEF (2). The difference between weighted and unweighted point estimates and confidence intervals was calculated by subtracting unweighted estimates from weighted estimates. We also calculated the range, mean, median, and absolute mean of the differences for each of the seven indicators.
The following three indicators describe the coverage of health services for women: (a) percentage of women, aged 15-49 years, who delivered in the past two years before the survey and were attended during delivery by unskilled health personnel, i.e. excluding doctor, nurse, or midwife; (b) percentage of women currently married or in union, aged 15-49 years, who were not using a contraceptive method; and (c) percentage of women, aged 15-49 years, who delivered in the past two years before the survey and received antenatal care only from unskilled health personnel, i.e. excluding doctor, nurse, or midwife.
The following four indicators provide information on the coverage of health services to children: (a) percentage of children, aged 6-59 months, who did not receive at least one high-dose vitamin A supplement in the last six months; (b) percentage of children, aged 9-59 months, who were not immunized against measles; (c) percentage of children, aged 12-23 months, who did not receive three doses of DPT immunization; and (d) percentage of children, aged less than five years, who did not receive BCG immunization.

Ethical issues
Although a formal ethics committee did not exist in Afghanistan to review the MICS questionnaire, representatives from the MoPH, Ministry of Rural Rehabilitation and Development, Kabul University, international agencies, and non-governmental organizations were involved in the technical review of the entire questionnaire and survey methodology. Voluntary consent was taken at the beginning of the questionnaire by the interviewer who read out the statement before administering the questionnaire.

Comparison of weighted and unweighted point estimates
Among the three indicators relating to the coverage of health services for women, the widest range of differences between weighted and unweighted point estimates was for the percentage of deliveries conducted by unskilled birth attendants. The difference ranged from -13.35 (Samangan) to 5.21 (Badghis), with a difference in the national estimate of 1.77 (Table 1). Among the four indicators relating to the coverage of health services for children, the widest range of differences between weighted and unweighted estimates was for the percentage of children, aged less than five years, who did not receive BCG immunization. The difference ranged from -16.5 (Faryab) to 17.91 (Takhar), with a difference in the national estimate of 0.79. Across all the provinces, the median difference between weighted and unweighted point estimates was close to zero for every indicator (Fig. 1). The interquartile range of differences for the four indicators relating to the coverage of health services to children was wider than the three indicators relating to the coverage of health services to women. In total, more than 90% of unweighted estimates were within 10 percentage points of weighted estimates ( Table 1). The direction of difference between weighted and unweighted point estimates ranged from a high of 65% values negative for delivery by unskilled birth attendant to a low of 35% values negative for couples not using a method to delay pregnancy. The average difference (weighted -unweighted) across the seven indicators ranged from -1.52 to -0.06 percentage points, and the average absolute difference ranged from 0.75 to 4.07 percentage points. The difference in national point estimates ranged from -1.82 to 2.19 percentage points across the seven indicators.
The provinces were ranked for each indicator based on the weighted and unweighted point estimates, and the provinces with the five highest and the five lowest values were compared. The provinces included among the five highest and the five lowest were similar, although the comparative ranking within the groups of five was not identical. Four of five provinces were the same for all indicators, except the indicator on delivery by unskilled attendants, where only three lowest ranked provinces were the same. Both weighted and unweighted point estimates reflected a relatively better situation for children compared to women in Afghanistan.

Comparison of weighted and unweighted confidence intervals
Among the three indicators relating to the coverage of health services for women, the widest difference between weighted and unweighted confidence intervals was for the percentage of deliveries conducted by unskilled birth attendants ( Table 2). The difference in confidence intervals ranged from -5.68 (Ghazni) to 20.67 (Paktya), with a mean of 2.91, an absolute mean of 3.52, and a median of 1.97. Among the four indicators relating to the coverage of health services for children, the widest range of the difference between weighted and unweighted confidence intervals was for the percentage of children, aged less than five years, who did not receive BCG immunization. The difference ranged from -15.34 (Faryab) to 15.9 (Takhar), with a mean of 1.14, an absolute mean of 4.55, and a median of 0.94.
The median difference between weighted and unweighted confidence intervals ranged from -0.04 to 1.97 for the seven indicators looking across all the provinces (Fig. 2). The interquartile range of differences for the four indicators relating to the coverage of health services for children was wider than the three indicators relating to the coverage of health services for women. In total, more than 90% of unweighted confidence intervals were within 10 percentage points of weighted confidence intervals ( Table 2). The direction of difference between weighted and unweighted point estimates ranged from a high of 50% values negative for children aged 9-59 months not receiving measles immunization to a low of 15% values negative for delivery by unskilled birth attendant. The average difference (weighted -unweighted) across the seven indicators ranged from 0.34 to 2.91 percentage points, and the average absolute difference ranged from 1.67 to 5.67 percentage points. The difference in national confidence intervals ranged from 0.34 to 2.47 percentage points. The provinces were ranked for each indicator based on the confidence intervals, and the provinces with the five widest and the five narrowest values were compared. In total, four provinces included among the five narrowest were the same for all the indicators, except for unskilled birth attendants at delivery and lack of measles immunization, where only three provinces were the same. There was greater heterogeneity between the rankings of the five provinces with the widest values of the confidence intervals. On average, three provinces were different for each indicator, with up to five provinces different for BCG immunization.

DISCUSSION
Estimates generated using sampling weights were unbiased compared to unweighted estimates. The use of sampling weights is a widely-agreed method for descriptive analyses as it adjusts the sample to be representative of the population from which it is derived (8,9). In this study, data were collected in 2003 but with a sampling frame from 1979. The sampling weights generated based on the 2004 precensus data improved the generalizability of the results for the population living in Afghanistan in 2003.
The use of sampling weights generated from the data on distribution of villages and household populations in the 2004 pre-census allowed reduction in bias and adjustment of precision in estimates. This study provided a unique opportunity to measure the bias that can arise from using an outdated sampling frame for estimating baseline measures of the coverage of health services in post-conflict countries. The use of sampling weights leads to larger variances and, thus, widening of confidence intervals. In the present study, we found that the sampling weights were associated with differences in point estimates and confidence intervals for provincial and national estimates. Comparison of weighted and unweighted estimates resulted in some wide differences in magnitude for individual provincial estimates and confidence intervals but, in general, these differences did not lead to different conclusions about the cross-sectional point estimates made at the national level. The mean, absolute mean, and median of difference between weighted and unweighted estimates and confidence intervals were close to zero for all the indicators.
The MICS 2003 was originally intended to generate estimates for children aged 12-23 months, with a precision level of ±10% (of estimated prevalence) at the provincial level (2). Of the seven indicators analyzed in this study, the indicator on DPT immunization was directly related to this age-group. The study found that more than 50% (19 of 32) of weighted estimates for this indicator had a precision level lower than the intended level of ±10% (of estimated prevalence). Confidence intervals wider than 20 percentage points were also found for other coverage indicators. All the indicators included in the study were gathered for descriptive analysis of coverage of health services and to estimate the proportion of individuals in the population who have a certain characteristic. The widening of confidence intervals is unfortunate but not critical because the weighted estimates with confidence intervals offer a valid description of the population and estimators for the baseline assessment of coverage of health services in post-conflict Afghanistan. However, since the variances are higher than originally anticipated, policy-makers will need to have a higher tolerance for error in assessing future change. The MICS 2003 will allow policymakers to make plausible inferences about future changes in health services but they may not reach the probability levels frequently expected in scientific research (17).
The ranking of the five highest and the five lowest provinces on weighted and unweighted estimates and confidence intervals also yielded similar results. For management and evaluation purposes, this allows stakeholders to appropriately identify which provinces needed the most improvement, and where extra effort is needed. The Government has continued to emphasize expanding the coverage of basic health services represented by these coverage indicators and, by using additional data on the quality of services through a Balanced Scorecard, has focused on improving the quality of these services, especially in provinces where there are deficiencies (18).
The general consistency of the results calculated with and without the sampling weights suggests that outdated sampling frames may be acceptable for use in similar contexts to obtain baseline estimates from household surveys to guide policy decisions, although at a lower level of statistical probability than originally planned. However, this conclusion may not be generalizable to other similar settings because of specific characteristics of the MICS 2003. First, data of the MICS 2003 were collected using a probability-based sampling technique in a scientifically-rigorous manner to keep the replacement rate for selected clusters low (below 10% in all provinces). Second, clusters created after 1979 had a zero probability of selection in the MICS 2003 sample. In this study, data from these clusters were used for generating the sampling weights and calculate the weighted estimates and confidence intervals. The use of weights to adjust for these clusters involves the assumption of homogeneity across clusters created before and after 1979. A difference in characteristics between these two groups of clusters would violate this assumption and bias the weighted estimates. Two useful techniques could have been used for testing for this: First, aerial photographs of villages to cross-check and supplement the listing of households available in the 1979 census and subsequent use of these updated lists for sampling (19); second, a similar survey from a representative sample of households right after the 2004 pre-census. The aerial photograph technique was used for the 1972 Demographic and Family Guidance Survey of the settled population of Afghanistan (19). There was no national census conducted in Afghanistan before the 1972 survey. Aerial photographs supplemented the information available from (a) topographic series maps and (b) lists of villages with crude population estimates. These photographs were used for household prelisting, boundary marking, sampling, and quality control re-interviewing. Unfortunately, neither of these techniques was possible at the time of MICS in 2003. In such situations, the use of sampling weights derived from sources of information available later in time is a pragmatic choice to correct the bias in the health service-coverage indicators due to outdated sampling frames.