HPV16 L1 and L2 DNA methylation predicts high-grade cervical intraepithelial neoplasia in women with mildly abnormal cervical cytology

DNA methylation changes in human papillomavirus type 16 (HPV16) DNA are common and might be important for identifying women at increased risk of cervical cancer. Using recently published data from Costa Rica we developed a classification score to differentiate women with cervical intraepithelial neoplasia grade 2 or 3 (CIN2/3) from those with no evident high-grade lesions. Here, we aim to investigate the performance of the score using data from the UK. Exfoliated cervical cells at baseline and 6-months follow-up were analyzed in 84 women selected from a randomized clinical trial of women undergoing surveillance for low-grade cytology. Selection of women for the methylation study was based on detectable HPV16 in the baseline sample. Purified DNA was bisulfite converted, amplified and pyrosequenced at selected CpG sites in the viral genome (URR, E6, L1 and L2), with blinding of laboratory personnel to the clinical data. The primary measure was a predefined score combining the mean methylation in L1 and any methylation in L2. At the second follow-up visit, 73/84 (87%) women were HPV16 positive and of these 25 had a histopathological diagnosis of CIN2/3. The score was significantly associated with CIN2/3 (area under curve = 0.74, p = 0.002). For a cutoff with 92% sensitivity, colposcopy could have been avoided in 40% (95% CI 27–54%) of HPV16 positive women without CIN2/3; positive predictive value was 44% (32–58%) and negative predictive value was 90% (71–97%). We conclude that quantitative DNA methylation assays could help to improve triage among HPV16 positive women.


Data and methods
The data from Costa Rica were DNA methylation measurements on 172 individuals, composed of 92 who cleared, 36 who persisted and 44 who progressed to CIN2/3. Selected measurements were made in five categories, namely methylation 1. just before clearance; 2. for HPV-16 persistence over a shorter period (persister first sample); 3. for HPV-16 persistence over a longer period (persister last sample); 4. for CIN2/3 diagnosed in the future (CIN2/3 first sample); and 5. for CIN2/3 diagnosed around the same time as the sample (CIN2/3 last sample).
Missing methylation measurements were imputed to be zero for derived variables that count the number of zero CpGs, because data analysis shows that missing CpG sites in a gene or region were more likely to jointly occur with zero methylation in other CpGs. For L1 the mean methylation was used, but missing values were not imputed to be zero unless all CpGs were missing.
Summary tables are first used to show the overall structure of the data in an exploratory analysis. Then the scores are fitted using logistic regression models to predict development to CIN2/3, or persistence. Receiver operating characteristic (ROC) plots and tables show fitted sensitivity and specificities.

Exploratory analysis
The main patterns from the CRISP CpGs in Costa Rica are shown in Tables 1 and 2. In summary,  Longer  The fitted score from a logistic regression model was 3.17x 1 + 1.83x 2 Thus it runs from zero to a maximum 5, but can be re-scaled to run between 0 and 100, say. Figure 1a shows the ROC from the fitted model. The components are plotted in Figure 1b. This shows that L2 contributes the most, due to the relatively large difference between the two groups in Table 1. However, L1 might add information that is useful for high sensitivity. Table 3 shows some ROC points.

Secondary classifier
The secondary model adds URR and E6 to L1 and L2. Table 5 suggests that E6 and URR might be used together: when both were methylated it was more likely that a women progressed to disease. The following variables were therefore used in a logistic regression.
Proportion of CRISP L2 CpG sites methylated (x 1 ); Mean methylation of L1 (x 2 ); and Indicator variable for both E6 and URR with at least one CpG methylated (x 3 ).
The fitted score was 2.65x 1 + 2.23x 2 + 1.1x 3 .    Thus the score runs from zero to a maximum 6. Figure 2a shows the ROC from the fitted model. The components are plotted in Figure 2b. Table 4 shows selected ROC points. Contingency Tables 6 and  7 identify where individuals fall in cross tabulations of L2 and URR/E6 variables depending on their outcome. Although limited, these data indicate that E6/URR might be useful after allowing for L2. Finally, it was observed that L1 methylation was higher on average when URR and E6 were methylated (26% against 19%).

Classifier 2: predict persistance from clearance
We compare samples (1. clearance) and (2. persister first). For this Table 1 suggests that L1 is the only gene/region of use. Table 8 shows that classifying all with mean methylation of more than 4% as persisters had approximately 69% sensitivity for approximately 38% specificity.