Statistical Modeling for Quality Assurance of Human Papillomavirus DNA Batch Testing

Supplemental digital content is available in the text.

programs. The prevalence of oncogenic HPV infection from this review ranged from 12% to 16%. This range was used for our modeling exercise. [7][8][9][10][11][12][13] Statistical analysis for this project was carried out in the programming language R. We wrote an algorithm to simulate test results stochastically, assuming discrete HPV prevalence of 12%, 13%, 14%, 15%, and 16%-where each prevalence is the simulation condition-and to execute the following analytical steps for each (see Appendix A for more details on the algorithm and examples of the R code).
The algorithm first created the analysis population of 900,000, representing women of the target age for screening. We specified 900,000 draws from the binomial distribution (positive or negative for HPV), which assumes that each draw has a consistent and independent probability of being positive. The probability of each draw equaled the simulation condition (discrete HPV prevalence). We selected 900,000 draws to simulate the individual results of 10,000 microplate assays. This generated a vector of results where each entry was either 1 or 0 (a positive or negative test result, respectively).
Because each 96-well microplate contains 6 control wells (3 positive, 3 negative) and 90 test wells, the vector was divided into 10,000 groups each containing 90 simulated results, and each group had 6 "not applicable" (NA) values appended to the beginning of the vector to represent the control wells. The 96-element vectors were then reshaped to become 12 Â 8 matrices to represent 96-well assay microplates in shape and structure (see Figure 1). The process by which each matrix was shaped followed the order in which laboratory technicians fill the microplate with calibrators and specimens, beginning in the top-left corner and moving columnwise from left to right. We used these matrices as a basis for calculating aggregate measures to be compared with real microplate test results (see Figure 1).
After assembling the 10,000 96-cell matrices, the algorithm aggregated the results to create analytical measures in each quality category.
As noted previously, the quality categories planned for analysis were (1) number of positive cells per matrix, (2) number of clusters of positive cells per matrix, and (3) size of the largest cluster per matrix, where the matrices represent 96-well microplates. Thus, the first tabulation was the number of positive cells per matrix for all matrices in the simulation, giving a distribution of expected results under the simulation conditions (when viewed as a percentage of cells that are positive, this is roughly equal to the HPV prevalence defined as the simulation condition).
Next, we defined a cluster as 2 or more positive cells horizontally or vertically adjacent and defined the size of a cluster to be the number of cells that make up the cluster (see Figure 1). Based on our observation of the manual assay steps, including pipetting reagents and manipulating the microplate, and considering the shape of the microplate, we determined that it was improbable for material to be carried from one well to a diagonal well without also transferring material to an adjacent well; therefore, we did not designate diagonal cells as a site for possible contamination. To generate data for the analysis of these quality categories, the algorithm identified the clusters in each matrix of simulated results to enable their tabulation and analysis. This was carried out by first identifying pairs of adjacent positive cells and then grouping those pairs into clusters.
To identify pairs of adjacent positive cells, the algorithm assigned an identification (ID) number between 1 and 96 to each matrix cell, including those cells with an NA value. The ID numbers were assigned consecutively in the same order by which the matrices were populated, beginning in the top-left corner and moving columnwise from left to right. The algorithm then located pairs of adjacent positive cells by calculating the difference between cell ID numbers. When the difference between the cell ID numbers was equal to ±8, the cells were horizontally adjacent. When the difference between the cell ID numbers was equal to ±1, the cells were vertically adjacent. To ensure that perimeter cells with a difference of ±1 that were located in different columns were not counted as adjacent, the algorithm removed these 11 pairings from consideration as adjacent positive results. The algorithm then returned a list of pairs of adjacent positive cells.
Next, the algorithm considered every combination of pairs in the matrix and combined those with shared cell IDs to identify complete clusters in each matrix. The algorithm then counted the number of clusters per matrix and the size of the largest cluster per matrix to give a distribution for the expectation of these metrics under the simulation conditions.
The algorithm generated results for each matrix in the simulation and created curves of normal distributions for each parameter evaluated (number of positive cells, number and size of clusters in the matrices); the simulation results yielded the frequency distribution in each quality category for the simulation conditions of 12%, 13%, 14%, 15%, and 16%. In the Discussion, we convert these parameters to numbers and clusters of wells in microplates and convert the conditions (12%-16%) to oncogenic HPV prevalence.
To confirm the sample size, we inspected the distribution of results in each quality category when using sizes ranging from 1 matrix to 10,000 matrices. All chosen measures across simulation conditions stabilize well before reaching 10,000 matrices, giving us confidence in the simulation sizes.
In addition, to provide guidance for screening programs in low-resource settings outside of Central America, we ran the analysis described in this article for prevalence rates 8% to 11% and 17% to 23% as a supplemental analysis. 14

Role of the Funding Source
The funding source had no involvement in the development of the methodology.

RESULTS
We present our results for the 3 parameters that were evaluated as tables with numerical values for each condition and parameter.

Number of Positive Cells Per Matrix
For a simulation condition of 12% (i.e., simulating HPV prevalence of 12%), the number of positive cells per matrix ranged from 1 to 23, with 95% of the matrices having between 5 and 17 positive cells (see Table 1). For a simulation condition of 16%, the number of positive cells per matrix ranged from 3 to 29, with 95% of the matrices having between 8 and 22 positive cells, and for conditions between 12% and 16%, results were intermediate. As the simulation condition increased, the middle 95% of the data points shifted toward higher values (see Table 1), and the distribution curve shifted to the right, as did the mode for positive cells found in the distribution (see Figure 2).

Number of Cell Clusters Per Matrix
For a simulation condition of 12%, the number of cell clusters per matrix ranged from 0 to 7, with 95% of the matrices in the simulation having between 0 and 4 clusters. For a simulation condition of 16%, the number of cell clusters per matrix ranged from 0 to 8, with 95% of the matrices having between 0 and 5 clusters (see Table 2).

Size of the Largest Cell Cluster Per Matrix
For a simulation condition of 12%, the size of the largest cell cluster per matrix ranged from 1 (no cluster) to 9 positive cells, with 95% of the matrices in the simulation having a largest cluster size between 1 and 5 positive cells. For a simulation condition of 16%, the size of the largest cell cluster per matrix ranged from 1 (no cluster) to 11 positive cells, with 95% of the matrices in the simulation having a largest cluster size between 1 and 6 positive cells (see Table 3). As the condition of the simulation increases, the frequency distribution of results shifts toward larger numbers for all quality categories.

Results for Additional Prevalence Ranges
For simulation conditions 8% to 11% and 17% to 23%, the frequency distribution of results shifts toward lower numbers for all quality categories as the condition of the simulation decreases and higher numbers as the condition increases (Supplemental Tables 1 http://links.lww.com/LGT/A90, 2 http://links.lww.com/ LGT/A91, and 3 http://links.lww.com/LGT/A92).

DISCUSSION
To discuss our results in terms of a laboratory setting, instead of using the modeling terminology of cells and matrices, we now will refer to results or wells and microplates. To the best of our knowledge, this is the first statistical modeling exercise conducted for evaluating the likelihood of well-to-well contamination in 96-well microplates. Our interpretation of the modeling results as applied to laboratory practice is that microplate results that are not consistent with the simulations are either consistent with well-to-well contamination or do not hold with the assumptions of the simulation.
Our results suggest that HPV DNA screening programs in regions with an oncogenic HPV prevalence of 12% to 16% can reasonably expect to observe between 9 and 17 positive results per microplate in approximately 50% of assays and between 5 and 22 positive results per microplate in approximately 95% of assays. These ranges represent the exterior limits of the middle 50% and middle 95% of the data points when comparing across the 12% to 16% prevalence simulations without aggregating. We would   Similarly, our findings suggest that these same screening programs should anticipate between 0 and 5 positive clusters per microplate, representing at least 95% of all data points of the simulation results under each prevalence condition. Our results also indicate that in general, clusters are likely to be present. Furthermore, screening programs in Central America would not anticipate a cluster larger than 6 positive wells per microplate, with most clusters having 2 to 3 positive wells. Cluster sizes that are not within the range of the modeled results would be suspicious for well-to-well contamination. Screening programs in other regions may anticipate lower or higher numbers of positive wells and clusters, as well as smaller or larger sizes of the largest cluster per microplate, depending on local HPV prevalence. These modeled results can be used to provide quality assurance parameters for screening programs implementing HPV tests conducted in batches on a 12 Â 8 microplate with 6 control wells when microplates are run with 90 samples. For example, in Central America, screening programs may decide that microplates with greater than 22 positive results, with more than 5 clusters, or containing clusters of more than 6 positive wells are suspicious of well-to-well contamination and do not meet quality standards.
While presented individually, the 3 quality categories are interdependent, and therefore, constraining the allowable outcomes in one category also constrains the outcomes in the other 2. For example, when constraining the total number of positive cells, the number of possible clusters is constrained at half the number of positive cells and the possible size of the largest cluster is constrained at the total number of positive cells (see Figure 3). Despite this interdependency, it is necessary to consider all 3 categories in the context of contamination because constraining one category leaves enough freedom in the remaining categories to allow for conditions suspicious of contamination.
Our analysis requires a known underlying oncogenic HPV prevalence, which is a constraint in areas where HPV screening is novel and prevalence is not yet established. It also is limited by the assumption that prevalence is constant throughout a community and, therefore, that samples from women screened together are not more or less likely to have the same results. However, studies have reported that prevalence is not always consistent throughout a community and can vary depending on factors such as age, geographic location, and HIV prevalence. In Central and South America, HPV prevalence is highest in women younger than 25 years, after which it decreases, and is lowest in women aged 35 to 44 years, before reaching a second peak in women older than 45 years. 15 Prevalence may vary between rural and urban areas, and it correlates positively with HIV prevalence in a given population. [16][17][18][19][20] Therefore, screening programs that develop quality assurance tools based on our analysis should reconsider periodically the characteristics of the population being screened and how this may impact expected HPV positivity rates in their assays.
This analysis is limited to providing statistical considerations for the likelihood of well-to-well contamination and does not diagnose causes of observed patterns and outlying positivity rates. For example, our results are not able to identify at which assay step well-to-well contamination may have occurred or, in a plate with a positivity rate below the lower normal limit, what factors may have contributed to these outlier results. Similarly, it does not account for other possible errors outside the test process such as tainted or mislabeled specimens, reagents, or calibrators. Furthermore, because of the nature of our defined quality categories, contamination in a plate that has a low but normal number of positive wells can go unnoticed if, for example, a sole cluster of positive wells does not meet the defined threshold for suspicion. For the quality category of number of positive wells per plate, we consider both upper and lower limits in this analysis. While an increased number of positive wells could be the result of well-to-well contamination, results that fall below the lower limit could indicate (1) a problem with the samples collected or how they were stored or (2) failure to pipette the accurate volume of sample into the assay microplate.

CONCLUSIONS
Our analysis provides guidance that laboratories can use to identify microplates suspicious for well-to-well contamination. Each country or screening program will need to determine the range of acceptable results for each quality category (number of positive wells, number of clusters, and size of largest cluster) based on the oncogenic HPV prevalence in the community screened. Programs will also need to determine the action to be taken whether a microplate fails to meet quality standards, such as repeating the microplate assay or retraining technicians.
Quality assurance parameters developed for screening programs based on these results could be used to identify laboratory technicians who are in need of additional training. These parameters also can be used to develop a simple tool that laboratory technicians can reference when interpreting the results of each microplate, enabling them to monitor their own performance.
For Central America, we developed and piloted such a tool for laboratory technicians. Although results for most of the microplates observed in these screening programs have been within the middle 95% of data points, in cases where results have fallen outside this range, this tool has enabled local laboratory technicians to identify those microplates as suspicious for well-to-well contamination and has helped programs identify laboratory technicians in need of skills reinforcement. Further evaluation of the use of this tool in the field could provide additional guidance for cervical cancer screening programs in other countries.
As discussed as part of the limitations of this analysis, it may take time for screening programs initiating HPV testing to determine the expected oncogenic HPV prevalence in the region they are serving; therefore, programs should carry out periodic reassessments of the quality assurance parameters.
As countries with emerging economies adopt HPV DNA testing as a primary screening test for cervical cancer prevention, quality assurance programs are needed to ensure valid testing and the accuracy of results delivered to patients. A results interpretation tool such as the one discussed in this article is one example of how screening programs can use statistical analyses to establish parameters that can be adopted and implemented at the laboratory level, saving time and resources at the central program level.

APPENDIX A
All analyses for this project were carried out in the statistical programming language R, Version 3.2.4, with which we wrote an algorithm to conduct the simulations and characterize results. The simulation algorithm is as follows. The binomial distribution B(n,p) describes the probability of observing some number of successes in a group of n observations, where each successful outcome has the same independent probability p. Simulated test results were drawn as random variates from the binomial distribution. Each simulation generated 10,000 matrices of test results, representing the results from 10,000 microplates. Multiple simulations were run with different independent probabilities (the simulation condition), representing communities with different underlying HPV prevalence between 12% and 16%.
The backbone of simulating from the binomial distribution in R is the rbinom function from the stats package; we specified one trial of 900,000 observations, where each observation has a consistent probability p of being positive. This generated a vector where each component is either 1 or 0 (a positive or negative test result, respectively).
Using the split, lapply, and matrix base R commands (see code snippet below), the single vector was modified to resemble 10,000 of the 96-well microplates described above. The vector was reshaped into a list with 10,000 elements, where each list element contained 90 of the simulated results. Each list element had 6 NA values appended to the beginning of the vector of 90 results and was reshaped from a vector to a 12 Â 8 matrix (filled columnwise). The resulting matrices represent assay microplates in shape and structure.
To achieve the first analysis goal of characterizing the number of positive cells per matrix, the algorithm took the sum of all the cells in a matrix. The second analysis goal was to characterize the number of clusters of positive cells per matrix. First, the algorithm located pairs of positive cells in the matrices by assigning a cell ID number to each matrix cell, identifying the position of each positive cell, and identifying each possible combination of positive cell IDs in each matrix. Because the cell ID numbers were assigned columnwise from the top left, if the difference between any pair of positive IDs from a matrix was equal to 1, the 2 positives were vertically adjacent in a matrix, and if the difference was 8, the 2 positives were horizontally adjacent in a matrix. To ensure that consecutive positives at the top and bottom borders of the plate were not counted as adjacent, any combinations of bottom-and top-border cells were removed from consideration as adjacent positives. Subsetting the positive ID combinations to only those whose difference was 1 or 8 returned a list of adjacent positives.
A group of 3 positive cells in a row would initially be identified as 2 different adjacent pairs. Therefore, after compiling an R list of adjacent positive cell pairs for each matrix in the simulation, the adjacent pairs were grouped into clusters. Positive cell IDs were defined to be in the same cluster if there existed a path between them only along adjacent positive cells.
Any 2 adjacent pair vectors with a positive cell ID in common must be touching and therefore part of a single cluster. Starting with the positive cell ID numbers in the first vector in the list (representing 1 adjacent pair in the matrix), the following vectors in the list were checked for cell IDs in common with the first pair. Cell IDs from the subsequent pair were added to the IDs in the first vector, creating a cluster (and repeating 1 of the ID numbers). After compiling all the cell clusters, duplicate ID numbers were removed. As a result, each element in the list represented a single cluster of positive cells in the matrix; within each list element was a vector containing the matrix position IDs involved in the cluster. The third analysis goal was to characterize the expected size of positive cell clusters (the numbers of cells that make up the clusters). The algorithm measured the size of clusters by taking the length of each element in the list of clusters on a matrix. We recorded the size of each cluster and noted the size of the largest cell cluster in each matrix.