Cluster designs to assess the prevalence of acute malnutrition by lot quality assurance sampling: a validation study by computer simulation

Traditional lot quality assurance sampling (LQAS) methods require simple random sampling to guarantee valid results. However, cluster sampling has been proposed to reduce the number of random starting points. This study uses simulations to examine the classification error of two such designs, a 67×3 (67 clusters of three observations) and a 33×6 (33 clusters of six observations) sampling scheme to assess the prevalence of global acute malnutrition (GAM). Further, we explore the use of a 67×3 sequential sampling scheme for LQAS classification of GAM prevalence. Results indicate that, for independent clusters with moderate intracluster correlation for the GAM outcome, the three sampling designs maintain approximate validity for LQAS analysis. Sequential sampling can substantially reduce the average sample size that is required for data collection. The presence of intercluster correlation can impact dramatically the classification error that is associated with LQAS analysis.


Introduction
In the last 20 years, development organizations working in international health have increasingly adopted lot quality assurance sampling (LQAS) to assess health care parameters. Nearly all of the 805 studies that were identified in a recent review of LQAS implemented between January 1984 and December 2004 employed traditional LQAS sampling methods (Robertson, 2006), in which simple random sampling (SRS) is used for data collection. The exceptions are studies in which a two-stage LQAS design was combined with cluster sampling to assess neonatal tetanus eradication (World Health Organization, 2001, 2002, and a study in which small clusters instead of SRS were used to assess the prevalence of gobal acute malnutrition (GAM) by LQAS analysis methods (Deitchler et al., 2007).
In the international health setting, small sample sizes (e.g. n = 19) have often been used for LQAS assessment of service provision indicators (Valadez, et al. 2003). The small samples sizes have meant that LQAS has been feasible for use by local managers (Valadez, 1991). However, use of LQAS for assessment of anthropometric indicators requires large sample sizes due to the increased precision that is needed for hypothesis testing. To use SRS with large sample sizes means an increase in time and cost, as data collection for each observation in the sample can require travel to a different site. Sampling observations in batches, or clusters, is an alternative method which reduces the number of site visits that are needed to complete data collection. However, if the observations within each cluster are highly correlated with respect to the outcome being assessed, cluster sampling leads to increased misclassification with the LQAS analysis method. In contrast, cluster sampling could be a viable option if it does not undermine the validity of the independence assumption for hypothesis testing, as required by LQAS. Deitchler et al. (2007Deitchler et al. ( , 2008 field tested both a 67 × 3 and a 33 × 6 cluster design (67 clusters of size 3 and 33 clusters of size 6 respectively) for LQAS assessment of GAM prevalence in the Siraro woreda of Ethiopia in 2003 and in the administrative units of Fur Baranga and Habila in West Darfur in 2005. The use of a 67 × 3 sequential sampling design was also investigated in the Ethiopia study. In comparison with the 67 × 3 and 33 × 6 design, the sequential design allowed for a reduction in the total sample size that was required to assess the prevalence of GAM by LQAS analysis methods (Deitchler et al., 2007). Similar sequential designs have been used for categorizing resistance of human immunodeficiency virus to drugs (Bennett et al., 2006). However, those designs relied on SRS for validity.
The current study uses computer simulations to assess the validity of the small cluster approach that was used to assess the prevalence of GAM. The principal sampling strategy uses a cluster model to minimize the number of random sites to visit. We focus on a 67 × 3 and a 33 × 6 cluster design as these were the designs that were tested in Ethiopia and Sudan. Additionally, we develop and investigate a second strategy which applies a sequential sampling scheme to the 67 × 3 cluster design. Here, we use more robust statistical assumptions for the sequential design than had been applied to the work in Ethiopia, to improve the design.

Traditional lot quality assurance sampling methods
LQAS inference uses the binomial approximation to the hypergeometric distribution to test whether the prevalence of a parameter of interest is exhibited at a proportion that is greater than or equal to some prespecified threshold P 0 . This is equivalent to the hypothesis test H 0 : P P 0 versus H a : P < P 0 where P is the true prevalence in the population and P 0 , the upper threshold, is the prevalence level that the data are tested against. In the case of GAM, P 0 represents an unacceptable level of acute malnutrition in the population. It is chosen to reflect the prevalence at which a population would be considered a priority for humanitarian intervention. The null hypothesis is rejected if the number of individuals in the sample exhibiting acute malnutrition, s, is less than or equal to an a priori defined critical value d .s d/. This critical value is often referred to as the decision rule in LQAS literature (Valadez, 1991). In addition, LQAS requires that we define a lower threshold P a . The lower threshold reflects the prevalence of GAM at which the population would not be considered a priority intervention.
As with any hypothesis test, an αand β-error are associated with LQAS. The α-error is the highest probability that the null hypothesis is incorrectly rejected. In the case of GAM, this would mean concluding that the assessment area does not have a high level of acute malnutrition when in fact it does. This probability is controlled for at the upper threshold: The β-error is the highest probability that we incorrectly fail to reject the null hypothesis. This would mean concluding that the assessment area does have a high level of acute malnutrition when in fact it does not. The β-error is controlled for at the lower threshold: The critical value is chosen to approximate the desired α and β given the upper and lower thresholds, and the sample size. In practice, it is difficult to attain the αand β-errors exactly owing to the discrete nature of the binomial distribution. Further, more than one critical value can achieve the specified constraints. The actual error probabilities for a specific sample size, and upper and lower thresholds, therefore depend on the critical value d that is chosen. In this study, we investigate the upper and lower thresholds that were field tested in Ethiopia and Sudan (Deitchler et al., 2007(Deitchler et al., , 2008. Three couplets (i.e. upper-lower threshold pairs) are investigated: the upper thresholds of 10%, 15% and 20%, and the respective lower thresholds of 5%, 10% and 15%. The 10%-5% and 15%-10% couplet are of primary concern as these are the benchmarks that are most commonly used by humanitarian agencies to assess the severity of GAM prevalence (Food and Agriculture Organization and Food Security Analysis Unit, 2006). The 20%-15% couplet is of secondary consideration as prevalences of GAM above 20% are fairly rare, even in emergency settings (Médecins sans Frontières, 1995).
For each upper and lower threshold couplet, we determined the critical value subject to the constraints of an α-error of approximately 0.10 and a β-error of approximately 0.20 for samples

Lot quality assurance sampling methods for sequential cluster designs
In this section we investigate a sequential cluster design to test the same three null hypotheses as above. The sequential cluster design differs from traditional LQAS as a decision can be made to reject or accept the null hypothesis after each individual cluster has been observed. In a k × m sequential sampling design, there are at most k stages of sampling. At each stage, m sampling elements are observed for a maximum of n possible observations. At the ith stage of sampling, we define a rejection rule r i , an acceptance rule a i and the cumulative number of outcomes, s i (in our application, an outcome is a child exhibiting GAM). If s i a i , then we conclude that the prevalence of GAM is greater than or equal to P 0 , and sampling stops. Likewise, if s i r i , then we conclude that the prevalence is less than P 0 , and sampling stops. Otherwise, if r i < s i < a i , sampling proceeds to the next stage. If no decision is made by the time that the final (kth) stage of sampling is reached, then a decision is made to reject if s n .a n + r n /=2 and to accept if s n > .a n + r n /=2. Wald outlined the calculation of LQAS critical values at each stage of a sequential design applied to observations that are selected by SRS (Wald, 1947). These critical values are linear in the individual observations. We adapt this theory to accommodate clusters of size m .m > 1/, under the assumption that observations within each cluster are independent. Namely, define where α and β refer to the target classification errors. These critical values are linear in the sampling stage and thus reflect a cluster sampling design.
One of the benefits of sequential designs is the potential for reduction of the overall sample size that is required for data collection. With respect to the outcome of acute malnutrition, a reduction in sample size could lead to a more rapid response to an emergency situation. The average sample number ASN, or the average number of clusters that are sampled to reject or accept the null hypothesis, characterizes this reduction. The average sample size is equal to the number of sampling elements per cluster times ASN (m ASN) and is given by the formula (Aroian, 1976).
The Wald critical values rely on the assumption that the number of possible observations is unbounded. However, in virtually all applications, this is not so. When the number of possible observations is bounded, the design is said to be truncated. The use of Wald critical values in truncated sequential designs does not generally yield the appropriate α and β (Wald, 1947). Aroian (1965Aroian ( , 1976 suggested treating a sequential sample as a random walk to calculate the classification error for a truncated design directly. We used Aroian's direct method to calculate the true classification error for a range of sequential designs varied over the parameter space of α and β to arrive within the desired targets of classification error.
Here we investigate a 67 × 3 sequential sampling design with application to the three upperlower threshold couplets of interest. In terms of the above notation, k = 67, m = 3, n = 201 and the upper bound for ASN is 67. For each upper and lower threshold couplet, we determine the acceptance and rejection rules by using Wald theory. We calculated critical values for a range of αand β-errors around the target levels of 0.10 and 0.20 respectively. The final critical values that are chosen are those that yield the true α and β nearest to the desired levels as calculated by using the direct method. For both the 15%-10% and 20%-15% couplets, we could not find a design that yielded the desired αand β-targets. For these couplets we selected the design that jointly minimized the αand β-errors. For the 10%-5% couplet we expect an α of 0.10 and a β of 0.16. For the 15%-10% couplet, we expect an α of 0.10 and a β of 0.24. And, for the 20%-15% couplet, we expect an α of 0.17 and a β of 0.22. The critical values for each couplet are given in Table 2.

Simulation validation of cluster designs for lot quality assurance sampling analysis
One key assumption in LQAS theory is that SRS is used for data collection of binary outcomes (Hoshaw-Woodard, 2001;Valadez, 1991). Cluster sampling often results in an intracluster cor-relation (correlation between subjects within the same cluster with respect to the outcome of interest). For the cluster designs that are of concern here, intracluster correlation could result from within-household correlation (i.e. correlation of GAM between multiple children sampled in one household) or as correlation of GAM between multiple households sampled within the same cluster (Deitchler et al., 2007). Intercluster correlation (correlation between subjects in different clusters) is also possible although this is likely to be minimal for acute malnutrition and can be assumed to be less than or equal to the intracluster correlation (Fenn et al., 2004;Reed, 2000). Validation of the 67 × 3, 33 × 6 and sequential cluster design requires assessing the effect of these potential correlations on the αand β-errors that are associated with LQAS hypothesis testing.
For the cluster sampling techniques that are investigated here, we assume that intracluster correlation is homogeneous and non-negative. Intercluster correlation is also assumed to be homogeneous and non-negative, and less than or equal to the intracluster correlation. This study confines the investigation to the intercluster and intracluster correlations of 0.00, 0.05, 0.10, 0.15, 0.20 and 0.25, because these provide a broad set of acceptable alternatives. Kalton's work on cluster sampling suggests that intracluster correlation is usually less than 0.15 for most indicators (Kalton, 1983). The well-documented multiple causes of malnutrition along with the age dependence vulnerability of children to acute malnutrition (Shrimpton et al., 2001;United Nations Children's Fund, 1990) further suggest that a low intracluster correlation is likely. Moreover, a review of demographic and health surveys that were conducted in 46 developing countries reported intracluster correlations of less than 0.10 for acute malnutrition in 90% of the countries that were studied (Fenn et al., 2004) and intracluster correlations of less than 0.05 Table 3. Simulation results for the 67 3 and 33 6 designs: αand β-errors for the 10%-5% couplet with varied intercluster and intracluster correlation and d = 13 †

Correlation
Results   were reported for GAM in field applications of the 67 × 3 and 33 × 6 designs in Sudan (Deitchler et al., 2008). With these considerations in mind, we expect intracluster correlations using the three cluster sampling schemes used here to be less than 0.05 in most field settings. Intracluster correlation levels equal to and above 0.05 for GAM, although unlikely, are investigated in this study to understand the effect of unusually high levels of intracluster correlation on LQAS classification error for these designs.

Simulation methods
To reproduce the correlation structure arising from the 67 × 3 and 33 × 6 sampling schemes and the 67 × 3 sequential sampling scheme, it is necessary to generate correlated binary vectors D such that D ∼ .P, Σ/ where P is the n × 1 mean vector of Ps and Σ is the n × n variancecovariance matrix describing the correlation structure. For each couplet, samples of size 201 and 198 were generated under the various intercluster and intracluster correlation constraints. This procedure was repeated 10 000 times for each couplet and intercluster-intracluster correlation pair for each design. All simulations were performed by using the statistical package R version 2.6.0 (R Development Core Team, 2007). The simulation methodology is described in detail in Appendix A.

Cluster sampling strategy: the 67 3 and 33 6 designs
Tables 3-5 contain the results of the simulations for the 67 × 3 and 33 × 6 designs along with the  estimated standard errors. As expected, those simulations with an intercluster and intracluster correlation equal to 0 for GAM demonstrate αand β-errors that are approximately equal to the binomial αand β-errors that are presented in Table 1, as this situation corresponds to SRS. In the correlated samples, the least effect on αand β-error occurs when the intercluster correlation equals 0. For example, in the case of the 67 × 3 design, if the intercluster correlation is equal to 0 and the intracluster correlation is less than or equal to 0.25, the 10%-5% couplet maintains the desired error limits of α 0:10 and β 0:20 (Table 3). With intracluster correlations less than 0.10 the 15%-10% couplet performs approximately within the desired error limits (Table 4). Although the 20%-15% couplet has errors that are slightly above the desired limits at this correlation level, these were expected from the outset as the targets were untenable under SRS (Table 5).
In the case of the 33 × 6 design, assuming an intercluster correlation equal to 0, the 10%-5% couplet conforms to the desired error limits of α 0:10 and β 0:20 for intracluster correlations up to 0.10 (Table 3); the 15%-10% couplet conforms approximately to the desired error limits when the intracluster correlation equals 0, and, as expected, the 20%-15% couplet does not attain the desired performance (Tables 4 and 5).
In cases where both the intercluster and the intracluster correlation are greater than 0, there is a substantial increase in the α-error for both the 67 × 3 and the 33 × 6 designs, though the β-error is less affected. This result suggests that, when intercluster correlation is greater than 0, larger samples may be required to attain the desired αand β-levels. On use of random methods for selection of clusters to sample, it is, however, reasonable to assume an intercluster correlation equal to 0 for LQAS assessment of GAM prevalence with the 67 × 3 or 33 × 6 design. 3.2. Sequential sampling strategy: the 67 3 sequential design Table 6 shows the simulation results for the 67 × 3 sequential design. As expected, when intercluster and intracluster correlations are equal to 0, the results closely approximate the αand β-errors that were calculated under SRS. Additionally, the least effect on the αand β-errors occurs in simulations where the intercluster correlation is equal to 0. Assuming an intercluster correlation equal to 0 and an intracluster correlation as high as 0.25, the α-error is 0.16 or less and the β-error is 0.25 or less for the 10%-5% couplet. For the 15%-10% couplet, the αand β-errors are 0.14 or less and 0.30 or less respectively. The errors for the 20%-15% couplet are slightly higher with the α-error 0.211 or less and the β-error 0.284 or less. For all simulated sequential samples, ASN is substantially less than the maximum of 67. For the 10%-5% couplet, the maximum ASN is approximately 23 under the null hypothesis and 34 under the alternative (n = 69 and n = 102 respectively). For the 15%-10% couplet, the maximum ASN is approximately 35 under the null hypothesis and 50 under the alternative (n = 105 and n = 150 respectively) and, for the 20%-15% couplet, the maximum ASN is 40 under the null hypothesis and 47 under the alternative (n = 120 and n = 141 respectively). This result suggests that the 67 × 3 sequential design could be utilized to decrease the total number of clusters sampled, and thus the overall sample size that is required for data collection. A slightly elevated level of misclassification, beyond α 0:10 and β 0:20, would need to be acceptable for the 15%-10% and 20%-15% couplets but, in cases where uncorrelated clusters and a low intracluster correlation can be assumed for GAM, the design may be appropriate to use.

Discussion
This study uses computer simulations to assess three cluster sampling schemes that were field tested in Ethiopia to assess the prevalence of GAM by LQAS analysis methods (Deitchler et al., 2007). The simulation results show that the 67 × 3 and 33 × 6 cluster designs conform to the desired error limits of α 0:10 and β 0:20 for the 10%-5% and 15%-10% couplet at numerous intracluster correlation levels when the intercluster correlation is equal to 0. It stands to reason that the 67 × 3 design conforms to the desired αand β-limits at higher intracluster correlation levels than the 33 × 6 design for both the 10%-5% and 15%-10% couplet. For the 10%-5% couplet, the 67 × 3 design maintains the desired error limits when the intercluster correlation is 0 and the intracluster correlation is as high as 0.25. For the 15%-10% couplet, the 67 × 3 design maintains α and β approximately equal to 0.10 and 0.20 when the intercluster correlation is equal to 0 and the intracluster correlation is less than 0.10. Therefore, when clusters can be assumed independent and correlation within the clusters can be assumed to be less than 0.10, the 67 × 3 design can be an effective method to reduce the number of sites that would otherwise need to be visited by SRS of the same size. In cases where the clusters can be assumed independent and correlation within the clusters less than 0.15, the 33 × 6 design can also be an effective method for assessing the prevalence of GAM, allowing for LQAS inference within the desired error limits for the 10%-5% couplet. To maintain the same error limits for the 15%-10% couplet with the 33 × 6 design, there can be no intracluster correlation. Intuitively, we expect the 67 × 3 design to perform within the desired error limits at higher levels of intracluster correlation than the 33 × 6 design, as smaller clusters would suffer less from intracluster correlation.
The simulation results for the 67 × 3 sequential design indicate a potential time advantage over the 67 × 3 and 33 × 6 cluster designs because the total sample required for data collection is likely to be smaller. However, notwithstanding two exceptions, the simulation results indicate that the αand β-errors for all intercluster and intracluster correlation levels, for each threshold couplet, exceed the desired αand β-limits of 0.10 and 0.20 respectively. Use of the sequen-tial design with these maximal sample sizes would therefore be recommended only when it is acceptable to deviate slightly from the above-stated limits of α and β.
The results of this simulation study demonstrate that information about the intracluster correlation of GAM is needed to use the 67 × 3, 33 × 6 and 67 × 3 sequential sampling designs reliably for LQAS assessment of the prevalence of GAM. The review of demographic and health surveys by Fenn et al. (2004) suggests that most field settings will have an acute malnutrition intracluster correlation of less than 0.10, whereas the field application of the 67 × 3 and 33 × 6 designs in Sudan of Deitchler et al. (2008) suggests that an intracluster correlation of less than 0.05 is likely. These studies provide useful information about the plausible upper limit of intracluster correlation for acute malnutrition. However, investigators rarely know in advance the exact intracluster correlation that exists in a field setting where a malnutrition assessment will be conducted. Until there is more clarity about the conditions in which the upper levels of 0.05-0.10 intracluster correlation of GAM would be expected, or possibly exceeded, investigators desiring strict adherence to the stated LQAS error limits of α 0:10 and β 0:20 may prefer to err on the side of caution by using the better performing 67 × 3 design, whereas investigators who require data rapidly may prefer instead to use the 67 × 3 sequential design. Finally, those investigators seeking a balance between limited classification error and potential expediency of data collection may find that the 33 × 6 design meets their data requirements best.
The results of this study support use of the cluster designs that were used in Ethiopia and Sudan (Deitchler et al. 2007(Deitchler et al. , 2008 for detecting threshold levels of GAM prevalence by LQAS analysis methods. Further, the findings from this study provide useful information to investigators who need to decide which design (i.e. a 67 × 3, 33 × 6 or 67 × 3 sequential design) best suits their analytic needs, with respect to expediency of data collection, and desired limits of classification error. The cluster sampling schemes that were analysed here offer both time efficient and statistically valid alternatives to the conventional methodology for assessment of acute malnutrition in emergency settings.
There are few discrete probabilistic distributions which easily lend themselves to simulation of correlated binary observations. We outline a specific method for generating binary random vectors that is based on truncation of multivariate normal random variables.

A.1. Simulation
For a k × m cluster sample (k is the number of clusters and m is the size of each cluster for a total sample of size n), it is necessary to generate clusters with specific intercluster and intracluster correlation subject to the constraint that the intercluster correlation is less than or equal to the intracluster correlation. Let τ 1 = τ 1 11 T + .1 − τ 1 /I and τ 2 = τ 2 11 T , where 1 is an m × 1 column vector of 1s and I is the m × m identity matrix. Then the desired correlation structure A is a block diagonal matrix of dimension n × n with τ 1 on the diagonal blocks and τ 2 on the off-diagonal blocks.
To achieve this structure for a binary random vector, first generate a realization Y from the multivariate normal distribution of dimension n with mean equal to the zero vector and variance-covariance matrix equal to the above-described correlation matrix A. Each component of the multivariate normal realization Y (Y i , i = 1, . . . , n) is marginally distributed as a normal random variable with mean 0 and unit variance, and the correlation between any two components Y i and Y j is given by the .i, j/th entry of A.
To attain the binary sample with the desired correlation structure, let D i = 1 i fY i Φ −1 .P/, 0 otherwise, for i = 1, 2, . . . , n, where Φ.·/ denotes the cumulative distribution function of the standard normal distribution and P is chosen to reflect the prevalence of malnutrition (P = P 0 when simulating under the null hypothesis and P = P a when simulating under the alternative hypothesis). Then D i is a Bernoulli random variable with mean P{Y i Φ −1 .P/} = P for i = 1, . . . , n and D = .D 1 , D 2 , . . . , D n / is the resulting correlated sample of binary outcomes. The correlation between any two of the resulting binary components D and D is given by  The goal is to choose τ and P such that the resulting correlation is a specific value ρ. Define the function ρ.P, τ / ≡ corr.D, D /. Fig. 1 plots ρ.P, τ / against τ for a range of P. Although no closed form solution to the double integral exists in equation (1), numerical integration yields highly precise approximations and can be implemented in many software packages. Here, numerical integration was performed by using the mvtnorm library in R version 2.6.0. This approximation is used to simulate binary outcomes with correlation ρ and mean P. For example, to simulate binary outcomes with correlation ρ = 0:05 and mean P = 0:10 requires simulation in the multivariate normal with correlation τ = 0:131. Table 7 outlines the values of τ that were used to simulate binary outcomes with correlation ρ and mean P in this study.