Risk Scores for Predicting Advanced Colorectal Neoplasia in the Average‐risk Population: A Systematic Review and Meta‐analysis

OBJECTIVES: A systematic review and meta‐analysis was performed to summarize the available evidence on risk scores for predicting advanced colorectal neoplasia (advanced adenomas and cancer) in average‐risk and asymptomatic populations undergoing screening colonoscopy. METHODS: PubMed, EMBASE, and Web of Science databases were searched up to 28 March 2018. Studies that developed or validated a risk score to predict the risk of advanced colorectal neoplasia were included. Two reviewers independently extracted study characteristics including diagnostic performance indicators and assessed risk of bias and applicability in the included studies. Metaanalyses were conducted to determine the overall discrimination of risk scores evaluated by more than 1 study. RESULTS: A total of 22 studies including 17 original risk scores were identified. Risk scores included a median number of 5 risk factors. Factors most commonly included were age, sex, family history in first‐degree relatives, body mass index and smoking. The area under the receiver operating characteristic curve of risk scores ranged from 0.62 to 0.77 in the individual studies and from 0.61 to 0.70 in the meta‐analyses. CONCLUSIONS: Although the majority of available risk scores had relatively weak discriminatory power, they may be of some use for risk stratification in CRC screening. Rather than developing more risk scores based on environmental risk factors, future research should focus on exploring possibilities of enhancing predictive power by combining risk factor data with novel laboratory matters, such as polygenetic risk scores.

factors might be an effective tool for risk stratification. They might help to identify individuals with a higher or lower risk for AN, who should start screening at a younger or older age, or who should undergo screening more or less frequently compared to the average-risk population, thereby focusing colonoscopy resources to those at higher risk. In recent years, a number of such risk scores have been developed, which have shown modest discriminative ability to distinguish between individuals with and without CRC and its precursors [28][29][30]. In addition, some of these risk scores were expanded and have been combined with results of blood or stool tests, such as fecal immunochemical tests (FITs) [31]. Furthermore, genetic risk scores are increasingly developed based on the combination of single nucleotide polymorphisms (SNPs) identified to be associated with CRC risk in genome-wide association studies [32,33]. Therefore, the aim of this systematic review was to provide an overview on the development and validation of risk scores and their composition and discriminatory power for identifying people at high or low risk of AN.

MethodS
This systematic review and meta-analysis was conducted using the methodology recommended by the Cochrane Collaboration [34] and was reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist [35]. Ethical approval and patient informed consent were not necessary because all data were obtained from previously published studies.

Criteria for considering studies for this review
Studies were included if they met all of the following criteria: (1) published as an original research article in a peer-reviewed journal; (2) using data from cohort or cross-sectional studies or randomized controlled trials to develop or validate a risk score. Studies using data from cohorts to validate a score that was derived from case-control studies were also included; (3) considering at least age and sex, and either other risk factors, laboratory tests, genetic scores, or a combination thereof to generate a risk score in order to predict the risk of AN; (4) only including participants who were considered asymptomatic and at average risk for AN and who underwent screening colonoscopy; and (5) reported results for the presence of AN as an outcome. Studies were excluded if the outcome included only proximal or distal neoplasia. Studies were also excluded if they were published as conference proceedings, dissertations or abstracts only or were not published in English.

Search strategies
PubMed, EMBASE, and Web of Science were searched up to 28 March 2018 to identify relevant publications. The employed search terms are presented in the Supplementary Appendix and aimed to cover expressions for advanced neoplasms, risk scores, and discriminatory accuracy. The reference lists of each eligible study were also scanned to identify potential papers that fulfilled the aforementioned inclusion criteria.

Selection of studies
After removal of duplicates, titles and abstracts of records were screened according to the inclusion and exclusion criteria. Full texts of the remaining publications were scrutinized. Studies that fulfilled the pre-defined criteria were included.

Data extraction and management
Two authors (LP and KW) independently performed data extraction of all included studies. The following information was abstracted: first author, year of publication, country/region, type of study (according to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement [36]), study period, number of participants, age and sex of participants, data source and risk factors that were included and/or considered, outcome measures and area under the receiver operating characteristic curve (ROC) or C-statistic. In case of any disagreement, consensus was obtained by discussion.

Assessment of risk of bias and applicability in included studies
The same authors independently assessed the risk of bias and applicability concerns of the included studies using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) [37]. Any initial disagreement was resolved through further discussion among the authors.

Statistical analysis
The discrimination of a risk score, i.e. its ability to discriminate between subjects with low and high risk of AN, was measured by the area under the receiver operating characteristic curve (AUC) or C-statistic, which ranges from 0.50 (indicating no discriminating ability) to 1.00 (indicating perfect discriminating ability) [38]. An AUC between 0.70 and 0.80 is typically considered to indicate modest/good discrimination [39]. AUCs were reported separately for score development and score validation where this information was given in the articles. AUCs of validations of risk scores which tested the same risk prediction model were pooled using R statistical software (version 3.3.2) and the R "meta" package (version 4.8-1). Heterogeneity across studies was evaluated using Cochrane's Q statistic with P value and the I 2 statistic. If significant heterogeneity was observed (I 2 > 50% or P Q-Statistics < 0.10), pooled estimates were calculated using a random-effects model, otherwise a fixed-effects model was used [34]. Two-sided P values of 0.05 or lower were considered to be statistically significant.

reSultS
The initial electronic search generated 5528 records, including 1459 citations from PubMed, 2163 citations from EMBASE, and 1906 citations from Web of Science. After removal of duplicates (n = 2265) and exclusion due to our pre-selected criteria (n = 3193), 72 records were qualified for full-text assessment including 2 studies which were identified through cross-references. Of those, 50 records were excluded due to the inclusion and exclusion criteria. Finally, a total of 22 studies [28][29][30][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58] The American Journal of GastroenteroloGy www.nature.com/ajg Peng et al. 1790 Review ARticle were included, which evaluated 17 different risk scores. Detailed information of the selection process is presented in Fig. 1. Table 1 summarizes the characteristics of the 17 original risk scores included in this review. Fourteen risk scores were built based on traditional risk factors. Only 3 risk models were developed with a combination of risk factors and laboratory test results (including γ-glutamyltransferase [41]; positive serology of Helicobacter pylori, high triglyceride level and low high-density lipoprotein cholesterol [44]; serum levels of fasting glucose, low-density lipoprotein cholesterol, and carcinoembryonic antigen [46]). No risk scores incorporating FITs or genetic biomarkers with environmental or lifestyle risk factors met our inclusion criteria. Study areas comprised United States (4 studies), Korea (5 studies), Hong Kong (1 study), China (2 studies), Germany, Poland, Spain, Japan (1 study each) and 1 study was conducted in 11 different Asian cities. Derivation and validation of risk scores were conducted with various approaches, ranging from derivation sets only to split-sample techniques, or using separate data. The study periods stretched from 1988 to 2014, with sample sizes ranging from 905 to 96,235. Most studies included participants aged both younger and older than 50 years, 5 studies [28,30,45,48,52] recruited people aged >50 years and only 3 studies [42,44,46] enrolled subjects aged <50 years. The proportion of female participants ranged from 25.4% to 61.7% in studies that developed one single score for both sexes. The majority of studies selected AN as the primary outcome, only 1 study had a deviating definition: Murchie et al. [43] chose AAs (including cancer) and high-risk polyps (i.e. ≥3 non-AAs), but we only focused on the outcome of AAs. The AUCs were >0.70 in 7 risk prediction models [30,40,41,44,46,49,50], indicating modest discrimination. Imperiale et al. [30] did not report the 95% confidence intervals (CIs) of the AUC and information on the AUC was furthermore missing in 1

Review ARticle
study [52]. The AUCs were between 0.60 and 0.70 in the remaining risk scores. The majority of risk scores were based on questionnaire data only, and no consistent differences were seen in the AUCs between traditional risk scores and risk scores including additional laboratory data. Table 2 provides an overview of the risk factors that were included (marked by " × ") or considered but finally not included (marked by "○") in the risk prediction models. Risk scores included a median number of 5 risk factors. The most commonly considered and finally included factors were age, sex, FH in first-degree relatives (FDR), body mass index (BMI) and smoking; other frequently considered factors were alcohol, diabetes, nonsteroidal anti-inflammatory drugs (NSAIDs), aspirin, physical activity, red meat and vegetable consumption and Review ARticle cardiovascular diseases (CVD) and hypertension. There was a great variety regarding risk factors that were considered, included or excluded. For some scores, potential risk factors were not considered a priori; for some models, these factors were evaluated but not ultimately included; for others, these were considered and finally retained. For instance, 4 studies [43,48,49,53] excluded subjects with first or second degree FH of CRC, while 10 original risk scores considered FH as a predictor and finally included it. BMI was examined in 11 original models and finally included in 8 models. Smoking was checked in 14 original risk models and eventually only was excluded in 1 risk score. The less common risk factors, listed in the right columns in Table 2, were only considered or included once or twice.
An additional summary of studies that validated risk scores previously derived by other papers is presented in Table 3. The score by Yeoh et al. [51] was most commonly validated (9 studies), followed by the score of Betés et al. [53] (6 studies) and Kaminski et al. [29] (5 studies). Two studies [54,55] validated a previously proposed risk score [59] separately in women and men. Quite substantial variation was seen in the AUCs even for the same score across different studies. For example, the AUC of the score by Betés et al. [53] varied from 0.56 in the study by Wong et al. [58] from Hong Kong to 0.71 in the study by Chen et al. [49] from China. A proposed explanation for this apparent discrepancy might be the much large variation in age in the latter study compared to the former one, as reflected in the different standard deviations which may have led to a high discriminatory power contributed by age in the study by Chen et al. [49].

Assessment of risk of bias and applicability in included studies
Risk of bias and applicability concerns in the included risk scores were assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) ( Table 4). The questionnaire which included the sociodemographic and lifestyle information was considered the index test, and the results of colonoscopy and histology reports were deemed as the reference test. Regarding patient selection, 4 studies [43,48,49,53] excluded subjects with FH of CRC in a first or second degree relative and 1 study [50] excluded participants with a FH of cancer of any type. Although this may be useful for application in preselected population groups free of FH, it limits applicability in the general population and comparability with other scores. Two studies [45,58] did not provide detailed information about patient selection, so the risk of bias and applicability concerns were rated unclear for this domain. Otherwise, no major risk of bias or applicability concerns were identified.

Meta-analyses of available AUCs in the validation studies of risk scores
We performed meta-analyses regarding the validation studies that provided AUCs and their 95% CIs for the same score. While the risk score developed by Yeoh et al. [51] was validated in 7 studies reporting AUCs with their 95% CIs with a total of 93,018 participants and a pooled discrimination of 0.63 (95% CI: 0.60-0.66), the risk score proposed by Cai et al. [50] was validated in only 2 studies involving 3217 participants, but the overall discriminatory power 0.70 (95% CI: 0.61-0.79) was the highest within the 7 models. Further details are presented in Table 5 and in the Supplementary Figures. dIScuSSIon This systematic review summarizes the available evidence on risk scores for predicting AN in asymptomatic populations at average risk. A total of 22 studies including 17 original risk prediction models were identified. Age, sex, FH in FDR, BMI, and smoking were the most commonly included factors in the risk scores. Only 7 scoring systems [30,40,41,44,46,49,50] showed at least modest discriminatory power (AUC ≥ 0.70) in internal or external validation and meta-analysis of AUCs in 1 risk score [50] indicated that the overall performance was relatively good.
Considerable evidence has shown that incidence and mortality of CRC could be reduced through screening [5,60,61]. However, the implementation of colonoscopy-based screening is usually confined by insufficient resources [11], low participant compliance [12], and concern about complication rates [13]. Risk scores identified from this review might be used to tailor screening based on the risk of AN in that individuals might have an informed choice on the selection of screening modalities according to the score. For example, participants with a higher risk score might preferably be offered screening colonoscopy, during which adenomas can be directly identified and removed, whereas those with a medium or lower risk score might still be encouraged for screening tests that are less invasive than colonoscopy, such as stool tests [8,62]. These risk-adapted screening strategies might improve effectiveness and acceptance of currently employed screening modalities, as they reduce the burden of invasive procedures for those at lower risk while focusing on those with higher risk. Risk-adapted screening strategies might therefore also improve cost-effectiveness of current screening modalities. The use of risk scores could furthermore increase compliance and uptake of CRC screening as persons who are aware of their increased risk are more likely to comply with expert recommendations [63][64][65].
To be useful in clinical practice, risk scores should have good discriminating ability. In this review, the discriminatory power of identified risk scores was generally weak with only 7 models reaching an AUC of 0.70. Of these, the score by Hong et al. [41] was developed in 24,726 participants and was validated in 24,724 participants. Using a large study population might minimize sampling error and better represent real-world practice. Although an even higher AUC (0.75) was reported for the model of Chen et al. [49] based on a relatively small study population (n = 905 participants), this result has to be interpreted with caution as lack of external validation most likely resulted in overestimating the discriminatory power. The risk score by Cai et al. [50] also showed modest discrimination (AUC = 0.74) possibly because the model included several dietary factors (pickled food and fried food), which demonstrated a strong association with the risk of advanced neoplasia in their study (odds ratio = 2.25 and 1.41, respectively for regularly vs. occasionally eating pickled food and fried food). Our review and most published results focused on the AUC as a summary measure of the performance of the scores in predicting presence of advanced neoplasia. For specific cutoffs of the risk Risk Scores For Predicting Advanced colorectal Neoplasia in The Average-risk Population: A S...   Review ARticle scores, predictive performance can be expressed in terms of sensitivity and specificity. Increasing cutoffs will reduce sensitivity and increase specificity, whereas decreasing cutoffs will have opposite effects. Definitions of cutoffs in a specific setting should consider additional factors, such as availability of colonoscopy resources or the prevalence of advance neoplasia in the target population which is a major additional determinant of positive and negative predictive values of the dichotomized risk scores. When risk scores are used in clinical or community settings, the number of predictors should also be as small as possible and risk factors should be easy to obtain or measure. As recently stated by Wells et al. [66], there should be a balance between the simplicity of the model and the prediction accuracy. Most models included age, gender, FH and lifestyle or dietary factors for predicting CRC or AA. While age, sex and FH may be easily obtained, other lifestyle-related factors such as smoking, alcohol consumption and dietary factors may be more difficult to ascertain [62]. For example, Kaminski et al. [29] and Imperiale et al. [30] measured smoking with pack-years, while Murchie et al. [43] and Kim et al. [47] used smoking status (never smoking, previous smoking or current smoking) to assess smoking. Even easily calculable factors like pack-years might be more difficult to obtain in clinical practice than collecting the smoking status. Additionally, collection of lifestyle factors especially for lifetime history of lifestyle factor may be prone to recall bias [62]. Although some models including a number of factors which were less easily measured might perform better, these complex models might be less useful from a practical or clinical perspective. For example, the score by Yang et al. [46] comprised 8 variables, which resulted in a 15-point score and were divided into 5 risk tiers. The complexity of this type of score might limit its use in a clinical or community setting, in spite of the good discrimination of 0.73. For settings where laboratory or genetic assessments are available, a combination of risk factors and results of laboratory tests or genetic risk scores might produce better risk prediction [39]. There is evidence showing that risk scores combining traditional risk factors with FIT results or genetic score can improve discrimination [31,32,67]. In a recent systematic review, however, Usher-Smith and colleagues [68] found there was no clear improvement of discrimination when models added laboratory test results or genetic biomarkers to traditional risk factors compared to models only consisting of traditional risk factors. They also found a small number of risk scores developed from case-control studies that used genetic biomarkers alone showed promising discriminatory power, but population-based samples were lacking to further validate those scores externally. Nevertheless, advances in sample techniques and decreasing costs for laboratory and genetic tests might contribute to making the combination of both traditional risk scores with other predictors a feasible risk stratification approach for large populations in the foreseeable future [39].

Review ARticle
The risk factors that were most commonly included in the risk scores are well-established CRC risk factors. Age is one of the most important risk factors for developing CRC or AN [14,15]. A recent study [66] showed that a model only containing age alone had C-statistics of 0.663 and 0.658 respectively in men and women, while a model including age plus 14 other variables only generated C-statistics of 0.694 and 0.687 separately in men and women, which indicates that age might be considered the most powerful predictor for CRC. Another important risk factor is sex which was included in all developed risk models. Male sex has been consistently demonstrated to be associated with a higher risk of colorectal adenoma as well as CRC [17] and some studies also suggested that men should begin CRC screening at an earlier age than women [15,16,69,70]. Multiple studies reported that people with one affected FDR on average have a 2-fold increased risk of CRC compared to those without FH and this relation increases even further for people with three or more FDRs [18,71,72]. A positive FH of CRC is thus considered to be an indication for an earlier start of CRC screening in many screening guidelines [73][74][75]. An elevated BMI or obesity is associated with an increased risk of CRC [76]. A previous study [77] found that a five unit increase of BMI was associated with a 1.2-fold increased risk for colorectal adenomas. Smoking is also a well-studied risk factor for CRC which increases the risk of CRC or adenomas [78]. Two meta-analyses have shown that smoking is associated with a 20-25 % increased risk for CRC [79,80].   Strength of our study includes the use of comprehensive search strategies along with well-defined eligibility criteria to identify relevant articles. Two reviewers independently extracted data and assessed the risk of bias in the included studies. To our knowledge, this is the first review that applies meta-analyses to determine the overall discrimination of existing risk scores in the average risk population that constitutes the target population for CRC screening. However, several limitations should also be addressed. Firstly, heterogeneity across the pooled studies in the meta-analyses was high. Substantial heterogeneity may arise from diversity of study populations, methods of risk factor ascertainment and varying variables in the prediction models. Unfortunately, due to the limited number of validation studies, we were not able to perform meta-regressions to further investigate the influence of various factors on the observed heterogeneity of AUCs. Secondly, as one of our prerequisites for selecting eligible studies was that risk scores had to be derived or validated in screening settings, we only included four risk models that were developed by combining risk factors and laboratory test indicators. No risk model combining environmental risk factor data with genetic biomarkers was identified. Lastly, most risk scores were developed using data from predominantly Caucasian and Asian populations, which might not be applicable to other populations and which need to be externally validated in racially diverse populations.

concluSIon
In summary, we identified 17 risk scores for prediction of advanced neoplasms that were derived in average-risk populations. Commonly included risk factors comprise age, sex, FH in FDR, BMI and smoking. Only 7 models showed at least modest discriminatory power in internal or external validation. Ten risk prediction models were validated in various populations with rather heterogeneous results. Parallel assessment of multiple scores in the same population might help to choose the best performing score for a given study population setting. Rather than developing more risk scores based on environmental risk factors, future research should focus on exploring possibilities of enhancing predictive power by combing risk factor data with novel laboratory markers, such as polygenetic risk scores.