A circRNA signature predicts postoperative recurrence in stage II/III colon cancer

Abstract Accurate risk stratification for patients with stage II/III colon cancer is pivotal for postoperative treatment decisions. Here, we aimed to identify and validate a circRNA‐based signature that could improve postoperative prognostic stratification for these patients. In current retrospective analysis, we included 667 patients with R0 resected stage II/III colon cancer. Using RNA‐seq analysis of 20 paired frozen tissues collected postoperation, we profiled differential circRNA expression between patients with and without recurrence, followed by quantitative validation. With clinical information, we generated a four‐circRNA‐based cirScore to classify patients into high‐risk and low‐risk groups in the training cohort. The patients with high cirScores in the training cohort had a shorter disease‐free survival (DFS) and overall survival (OS) than patients with low cirScores. The prognostic capacity of the classifier was validated in the internal and external cohorts. Loss‐of‐function assays indicated that the selected circRNAs played functional roles in colon cancer progression. Overall, our four‐circRNA‐based classifier is a reliable prognostic tool for postoperative disease recurrence in patients with stage II/III colon cancer.


Introduction
Approximately 60% of patients with colon cancer present with stage II/III disease (Rabeneck et al, 2015). Surgical resection is the only possible cure for these patients (Rabeneck et al, 2015). However, there are still 20-30% of patients who suffer from postoperative recurrence, which results in dismal survival (O'Connell et al, 2008;Andre et al, 2009). Traditionally, adjuvant chemotherapy has been the standard of care for patients with high-risk stage II, defined by clinicopathological features such as T4 lesion and the retrieval of < 12 lymph nodes, and all stage III colon cancer, defined as N1/ N2M0 disease irrespective of T stage. However, clinicopathological risk factors and microsatellite instability status do not adequately distinguish between patients who have a high or low risk of disease recurrence, thereby not indicating which patients are likely to benefit from postoperative chemotherapy (Gray et al, 2007;Morris et al, 2007). In view of this clinical challenge, there is an unmet need for novel recurrence-specific molecular biomarkers that allow for better prognostic stratification and more appropriate therapies for patients with stage II/III colon cancer.
Circular RNA (circRNA), a rediscovered, abundant RNA species, is a type of non-coding covalent closed RNAs formed from both exonic and intronic sequences (Morris & Mattick, 2014;Chen & Yang, 2015). circRNAs are characterized by several properties, such as being evolutionarily conserved, having tissue-specific expression, more stable than linear miRNA (Jeck et al, 2013;Memczak et al, 2013;Taborda et al, 2017). They can regulate gene expression, acting as real sponges for miRNAs, and also regulate alternative splicing or act as transcriptional factors and inclusive encoding for proteins (Taborda et al, 2017). However, to the best of our knowledge, the ability of circRNA-based signatures as novel prognostic biomarkers for colon cancer has not yet been comprehensively investigated.
In this study, we conducted a multicenter, retrospective study to assess the ability of circRNA expression profiles to predict disease recurrence in patients with stage II/III colon cancer. We aimed to identify and validate a circRNA-based signature that could improve postoperative prognostic stratification for these patients.

Selection and validation of candidate circRNAs
Based on the RNA-seq data and bioinformatics analysis, differential expression analysis identified 437 circRNAs (326 upregulated and 111 downregulated, marked as "TNcircles" afterward) between the tumor and adjacent normal tissues by using a soft threshold. The analysis also identified 103 differentially expressed circRNAs (48 upregulated and 55 downregulated, marked as "RNcircles" afterward) between recurrent and non-recurrent tumor tissues. Both TNcircles and RNcircles showed strong classification properties in distinguishing each of the groups (Fig 2A and B). In addition, the differential expression results indicated that circRNAs experienced more prominent changes between the normal and tumor tissues than between the recurrent and non-recurrent tumor tissues (Fig 2A and B).
Next, we investigated whether circRNAs could be used as prognostic biomarkers in patients with stage II/III colon cancer. First, 38 significantly upregulated circRNAs were selected from TNcircles for further validation according to the aforementioned retaining criteria. In addition, we prioritized 62 circRNAs from RNcircles using the same selection criteria to obtain a total of 100 circRNAs for validation assays. Considering a potential false discovery that might be introduced by the inadequate sensitivity of the RNA-seq and sample size, we enrolled 48 recurrent and 48 non-recurrent samples for further validation using qRT-PCR assay. Among these candidates, 22 circRNAs (10 from TNcircles and 12 from RNcircles) were further selected based on the extremely significant difference (P < 0.01; Figs 2C and EV1). We quantified these 22 circRNAs with qRT-PCR in the training cohort (n = 249) and further reduced the number of candidates using the (least absolute shrinkage and selection operator) LASSO-bagging procedure as described in Materials and Methods ( Fig 2D). Finally, we obtained four circRNAs that were strongly

In vitro migration assay
In vivo migration assay SYUCC = Sun Yat-sen University Cancer Center. SAHSY = the Six Affiliated Hospital of Sun Yat-sen University. Training and internal validation sets were randomly selected at a 2:1 ratio from the samples from SYUCC.
predictive of DFS, i.e., hsa_circ_0122319, hsa_circ_0087391, hsa_circ_0079480, and hsa_circ_0008039 ( Fig 2D). Notably, multivariate Cox regression analysis showed that these four circRNAs are mutually independent (Appendix Table S1). We also observed that the predicting performance of the four-circRNA-based risk score (cirScore) mostly outperforms than single circRNA with the timedependent AUC analysis (Fig 2E). The circularity and stability of the four selected circRNAs were verified by Sanger sequencing and RNase R treatment. After examined by RT-PCR with divergent primers, the sequenced PCR product was corresponding from the bioinformatics analysis with the exact back-splice junction ( Fig EV2A). We next validated the circularity of these candidates by RNase R treatment, and the mouse GAPDH mRNA was used as spike-in for normalization. The results indicated that these circRNAs were more resistance to digestion with RNase R exonuclease compared with linear host genes, which further confirmed that these circRNAs harbors a circular RNA structure ( Fig EV2B). Taken together, these results indicated that the circRNA may be served as novel prognostic biomarkers for colon cancer.

Construction and validation of the four-circRNA-based prognostic model
Then, a risk score was calculated for each patient using a formula derived from the expression levels of the four circRNAs weighted by their regression coefficient: cirScore ¼0:46ÂExp hsa circ 0122319 þðÀ0:386ÂExp hsa circ 0083791 Þ þ 0:293ÂExp hsa circ 0079480 þ0:439ÂExp hsa circ 0008039 Using the cirScore, we divided patients into high-and low-risk groups with its median value (À0.323) among the training cohort. Survival analysis showed that patients in the high-risk group had a poorer DFS than those in the low-risk group (hazard ratio [HR], 4.38; 95% confidence interval [CI], 2.52-7.64, P < 0.0001; Fig 3A). Moreover, we observed a similar impact of the cirScore on OS (high vs. low risk, HR, 5.13, 95% CI, 2.56-10.16, P < 0.001; Fig 3B).
After adjustment for baseline clinicopathologic factors, the cirScore remained a powerful and significant predictor of DFS and OS in the training set (HR = 4.64 [95% CI, 2.64-8.17], P < 0.0001 and HR = 5.45 [95% CI, 2.70-11.00], P < 0.0001, respectively). We also noted similar results in the internal validation set (HR = 2.96 [95% CI, 1.37-6.42], P = 0.0058 for DFS and HR = 3.82 [95% CI, 1.44-10.15], P = 0.007 for OS) and in the external validation set (HR = 2.50 [95% CI, 1.16-5.36], P = 0.008 for DFS and HR = 4.15 [95% CI, 1.79-9.64], P = 0.0009 for OS).    N1  N2  N3  N4  N5  N6  N7  N9  N10  N11  N15  N16  N17  N18  N19  N20  T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T11  T12  T13  T14  T15  T16  T17  T18  T19  T20  2  1  N 3   1  N 4   1  N   T12  T14  T13  T16  T18  T19  T11  T13  T17  T20  T2  T3  T6  T1  T5  T8  T7  T9  Loss-of-function assay of selected circRNAs regulating cell metastasis We thus determined to evaluate the biological roles of the selected circRNA in colon cancer. Among the four circRNA markers, three circRNA (hsa_circ_0122319, hsa_circ_0079480, and hsa_circ_0087391) were significantly overexpressed in the recurrent samples and in the colon cancer cells (Figs 2C and EV3A). The circularity of these circRNAs was further verified by RT-PCR with divergent or convergent primers (Figs 4A and EV3B). To assess whether these circRNAs promoted colon cancer progression, SW620 and HCT116 cells with high metastatic potential were used to conduct loss-of-function assay by lentivirusmediated stable gene silencing. The knockdown efficiency and specificity were verified by qRT-PCR, immunoblotting, and RNAseq analysis. The results demonstrated that knockdown of these circRNAs had no effects on the mRNA or protein expression of the host genes (Figs 4B and EV3C and D), and had a high similarity of gene expression profile between two independent shRNA group in SW620 and HCT116 cells (Fig EV3E), suggesting that the following regulatory effects directly result from targeting the circRNAs rather than off-targets. Remarkably, knockdown of these circRNAs using two independent shRNAs significantly suppressed cell migration capacity in the detected cells (Fig 4C and D). Subsequently, to further determine the oncogenic effects of representative circRNA in promoting colon cancer metastasis in vivo, the hsa_circ_0079480 knockdown and control cells were injected into the distal tip of the mice spleen using a Hamilton syringe. Six weeks later, the mice were sacrificed, and the spleen and liver were removed and embedded in paraffin. All the mice (N = 8 per group) had tumors that formed in the spleen. Moreover, the number of metastatic nodules in the livers was significantly reduced in mice injected with hsa_circ_0079480 knockdown cells compared with those injected with control colon cancer cells (Fig 4E and F). We further explored the role of hsa_circ_0079480 in lung colonization by injecting colon cancer cells directly into the tail veins of nude mice (N = 8 per group). The mice injected with control colon cancer cells induced a heavy lung metastatic burden as verified by histologic examination, whereas knockdown of hsa_circ_0079480 almost abolished lung metastasis (Fig 4G and  H). The loss-of-function assay indicated that the circRNAs might play functional roles in the sophisticated regulation of colon cancer progression.

Stratified analysis with known risk factors
We further performed stratified survival analyses to assess the prognostic performance of the cirScore against the clinical risk-stratification scheme (i.e., the high-and low-risk stage II and high-and lowrisk stage III groups). Stage II disease was considered high-risk if it was presented with poorly differentiated or undifferentiated histology (exclusive of mismatch repair-deficient cases), perineural invasion, lymphatic or vascular invasion, or T4 status. Stage III disease was considered high-risk if it was staged T4, N2, or both. All three study cohorts were combined to obtain an increased statistical power for stratified survival analyses. As a result, in both the lowand high-risk stage II group, patients with high cirScore had a shorter DFS (HR = 7.72 [95% CI, 0.9-66.44, P = 0.028 and HR = 2.03 [95% CI, 1.06-3.90], P = 0.0290; Fig EV4A] than those with low cirScore. Additionally, in both the non-high-risk and highrisk stage III groups, patients were further stratified by the cirScore into subgroups with significantly different DFS (high vs. low cirScore: HR = 3.37 [95% CI, 1.90-15.97], P < 0.0001 and HR = 7.62 [95% CI, 3.16-18.41], P < 0.0001, respectively; Fig EV4A). Moreover, similar findings were obtained regarding the impact of the cirScore on OS after stratified by the clinical risk-stratification scheme ( Fig EV4B). To note, result from the low-risk stage II did not reach the statistical significance (P = 0.21; Fig EV4C). Timedependent receiver operating characteristic (ROC) analyses revealed that the combination of the cirScore with the clinical risk-stratification scheme achieved a superior prognostic accuracy to the clinical risk-stratification scheme alone for DFS and OS in the training set, and the internal and external validation sets (Fig EV4C).

Building nomograms and time-dependent ROC analysis
Through a stepwise backward selection process on the basis of AIC, the cirScore, age at diagnosis, N stage, NI, and VI remained in the final Cox model for DFS (Appendix Table S2). To develop a clinically applicable tools that could provide individualized estimation of the 3-or 5-year DFS, a nomogram was established based on the final Cox model for DFS (Fig 5A). The nomogram achieved a Cindex of 0.816 (95% CI, 0.774-0.857), and the calibration plots showed close agreement between the actual DFS probabilities and the predicted DFS from the nomogram in the training set ( Fig 5B). The C-indices were 0.789 (95% CI, 0.719-0.859) and 0.694 (95% CI, 0.608-0.780), respectively, in the internal and external validation sets. The actual DFS probabilities were consistent with the ◀ Figure 2. Marker validation and selection from the circRNA-sequencing experiment.
A Expression profiling of differentially expressed circRNAs between the tumor and normal groups. Rows represent circRNAs, and columns represent samples. Rows were ordered by fold change, and columns were ordered by their group. The sample of N8 was not included due to low sequencing library size. B Expression profiling of differentially expressed circRNAs between the recurrence and non-recurrence groups. Both the row and column were unsupervised and clustered with the hierarchical clustering method. C The 4 of 22 differentially expressed circRNAs were confirmed by qRT-PCR, which were retained after marker selection procedure. **P < 0.01, Student's t-test, mean AE SD. D Bar plot shows the resample model inclusion proportion (RMIP) of qualified circRNAs calculated in the training dataset. The red line presents the threshold used to obtain the final markers. E Time-dependent AUC analysis of individual circle RNA and cirScore for predicting recurrence in the training dataset. P-values are shown for the indicated comparison of AUC between each marker and cirScore. Student's t-test, AUC = area under the curve.
Data information: Exact P-values are specified in Appendix Table S5. Source data are available online for this figure.

Discussion
In this study, we developed and validated a novel prognostic tool based on four circRNAs to improve the prognostic stratification for patients with radically resected stage II/III colon cancer. Our results showed that this tool can effectively classify patients with stage II/III colon cancer into groups with low and high risks of disease recurrence. Furthermore, this proposed cirScore provided additional prognostic value to existing clinicopathological prognosticators for stage II/III colon cancer. Of particular importance, this is the first study that demonstrates the clinical utility of the circRNA signature as a postoperative prognostic tool in patients with stage II/III colon cancer. For patients with R0 resected stage III or high-risk stage II colon cancer, adjuvant chemotherapy is considered a standard of care. However, previous evidence suggests that adjuvant chemotherapy, with or without oxaliplatin, conveyed limited benefits to patients with high-risk stage II disease (O'Connor et al, 2011). In contrast, adjuvant chemotherapy has shown a robust efficacy in patients with stage III disease and 6-month oxaliplatin-based chemotherapeutic regimens have become standard adjuvant treatment for these patients since 2004 (Andre et al, 2004). Given the cumulative neurotoxicity associated with oxaliplatin exposure, the International Duration Evaluation of Adjuvant Therapy (IDEA) collaboration conducted a prospective pooled analysis and showed that 3 months of adjuvant therapy appeared to be sufficient in a lower-risk group (defined as patients with T1, T2, or T3/N1 disease), especially when the capecitabine and oxaliplatin combination was chosen. In a higher-risk group (patients with T4, N2, or both), 6 months of adjuvant therapy may be needed, particularly when the fluorouracil and oxaliplatin combination was the chosen regimen (Grothey et al, 2018). Notably, our study showed that patient survival was heterogeneous even within the high-or lowrisk stage II/III groups; that is, patients with high-risk stage II and low-and high-risk stage III disease could be further stratified by the cirScore into subsets with distinct outcomes, suggesting a room for tailoring treatment strategies and avoiding overtreatment or undertreatment in selected patients. Moreover, we proposed prognostic nomograms that allow for individualized estimation of the 3-and 5-year DFS and OS probabilities among patients with radically resected stage II/III colon cancer. Taken together, the cirScore and the associated nomograms may serve as a clinically useful tool to improve surveillance and guide decision making regarding the administration of adjuvant chemotherapy and treatment duration.
Recently, the circRNA study has captured the interest of many in the scientific and medical communities (Vicens & Westhof, 2014;Ebbesen et al, 2017). Owing to their unique properties, such as being evolutionarily conserved, having tissue-specific expression, more stable than linear miRNA, circRNAs may serve as potential diagnostic or predictive biomarkers for colorectal cancer patients (Ebbesen et al, 2017;Taborda et al, 2017). Some circRNAs have been shown to be associated with prognosis and regulate cell biological function in colorectal cancer, including circHIPK3, circCCDC66, and CiRS-7 (Hsiao et al, 2017;Weng et al, 2017;Jiang et al, 2018;Zeng et al, 2018). A recent study also demonstrated the existence of abundant exo-circRNAs in the serum of colorectal cancer patients (Li et al, 2015). However, these studies have been limited by the small number of circRNAs screened, the small sample sizes, and the lack of independent validation. Our study included 667 patients and is therefore, to our knowledge, the largest circRNA-based biomarker discovery project to be done in stage II/III colon cancer. The use of the LASSO-based marker selection strategy and the Cox regression model allowed us to integrate multiple circRNAs into one tool, which has a significantly greater prognostic accuracy than that of single circRNAs alone. This method has been successfully applied to establish prognostic prediction models using other biomarkers, such as miRNA (Zhang et al, 2013) and circulating tumor DNA methylation (Xu et al, 2017). For the first time, we built a four-circRNA-based signature using the LASSO-bagging algorithm and the Cox regression model that can predict recurrence of stage II/III colon cancer patients. Among these four circRNAs, hsa_circ_0008039 has been ◀ Figure 4. Loss-of-function assay of candidate circRNAs regulating cell invasion.

A
RT-PCR products with divergent and convergent primers showing circularization of has_circ_0079480 and has_circ_0087391. cDNA, complementary DNA; gDNA, genomic DNA. B qRT-PCR evaluated the knockdown efficiency of has_circ_0079480 and has_circ_0087319 in SW620 and HCT116 cells transfected with two unique shRNAs (#1, #2). **P < 0.01, Student's t-test, mean AE SD (n = 3). C Representative images of the migration phenotype in HCT116 cells with knockdown of candidate circRNAs, scale bar: 100 lm. D The relative fold change of the transwell migration for indicated knockdown cells over those of control cells. **P < 0.01, Student's t-test, mean AE SD (n = 3). E, F Representative hematoxylin and eosin (H&E) staining and statistical results of the micro-metastatic nodules in the liver from mice injected with the indicated cells into the spleen for 45 days, white and black arrows indicate the liver metastatic foci, scale bar: 100 lm. N = 8 per group. *P < 0.05; **P < 0.01, Student's t-test, mean AE SD. G, H Representative H&E staining and statistical results of metastatic lung nodules from mice injected with the indicated cells via the tail vein for 60 days. Five sections evaluated per lung, black arrows indicate the lung metastatic foci, scale bar: 100 lm. N = 8 per group. **P < 0.01, Student's t-test, mean AE SD. reported to promote breast cancer progression by regulating miR-432-5p/E2F3 axis . However, the other novel circRNAs have not been investigated in cancer.
Limitations of the present study should be acknowledged. The retrospective nature of this study made it susceptible to inherent biases. Additionally, in view of potential selection bias and limited sample size, we were not able to determine how the proposed cirScore and nomograms may modify treatment strategies for stage II/III colon cancer, and further prospective trials addressing this issue are needed. Moreover, this study was East Asia-centric and patient cohorts from other geographical regions are required to validate our findings. Despite these limitations, this study represents currently the best evidence regarding the potential clinical utility of circRNAbased signatures for prognostic stratification in patients with middle stage colon cancer, and directly quantifies circRNA expression from fresh colon cancer tissues based on qPCR assay, making it easy to implement in clinical practice.
In conclusion, the cirScore can effectively classify patients with radically resected stage II/III colon cancer into groups with different risks of recurrence, thereby raising the possibility that circRNAs may be supplementary to the traditional clinicopathological risk factors as a prognostic scheme. Additionally, the proposed nomograms incorporating the cirScore and existing clinical prognosticators might facilitate personalized postoperative surveillance and management of patients with stage II/III colon cancer.

Patient enrollment
We collected frozen tissue samples from patients who met all the following criteria: (i) histologically confirmed as stage II/III colon cancer between January 1, 2010, and December 31, 2013, according to the 7th edition American Joint Committee on Cancer staging scheme; (ii) underwent histologically confirmed R0 resection; and (iii) availability of complete follow-up data. Patients were excluded if they had previous treatment with any anticancer therapy, had any tumor type other than adenocarcinoma or mucinous carcinoma, or insufficient RNA (< 5 ng/ll) available. Two pathologists (QN Wu, XJ Fan) reassessed all of the samples, all of which were found to contain more than 70% tumor cells. All the tissue samples were collected from patients with informed consent. Studies were conducted in alignment with the ethical principles for medical research involving human subjects set out in the World Medical Association Declaration of Helsinki and Department of Health and Human Services Belmont Report and were approved by the ethics committees of both participating institutions.

Bioinformatic analysis
We retrospectively collected 20 paired of frozen tumor tissues and adjacent normal tissues of the primary site from patients with stage II/III colon cancer from the discovery set, including 10 recurrence and 10 non-recurrence patients within 5 years after surgery. To identify potentially deregulated circRNAs that correlated with the outcomes of the colon cancer patients, we conducted an RNAsequencing study and profiled circRNAs by a series of bioinformatic analysis as described below. We employed limma (Ritchie et al, 2015) to identify differentially expressed circRNAs between the tumor and normal groups, or between the recurrence and non-recurrence groups, with a threshold of 1 for the log fold change and a P-value < 0.05. For marker selection, we only considered overexpressed circRNAs for detection convenience, and 100 circRNAs were selected for further validation according to the following retaining criteria: (i) upregulated circRNAs; (ii) circRNAs located in the junction site of exons; and (iii) fold change > 5.0, P-value < 0.05, and the raw intensity of each sample > 200. We applied the qRT-PCR assay to validate the selected circRNAs in a larger cohort for further selection. circRNAs that were validated as having the same expression trend and that had a P-value < 0.05 were considered as consistent markers. We next tested those markers on the samples from the training and validation cohorts by qRT-PCR assay.

Identification and quantification of circle RNAs from RNA-seq dataset
Raw RNA-sequencing reads of each sample were aligned to the hg38 human genome using TopHat2 software (Kim et al, 2013). Three bioinformatics circle RNA analytic methods, circRNA_finder (Westholm et al, 2014), CIRI (Gao et al, 2015), and UROBORUS (Song et al, 2016), were used for circRNA identification with default parameters. Next, we filtered out the circRNAs with less than two samples expressed. To annotate circle RNAs, we converted the hg38 coordinates of each circRNA into hg19 by using liftOver program from UCSC (Kent et al, 2002). The nearest protein-coding gene for a circRNA was determined according to the distance from the corresponding circRNA along the genome sequence. All known circRNAs were named with circBase ID referring to circBase annotation. Novel circRNAs were named according to their rank number summarized in final table.
To classify, all circRNAs were divided into seven types after intersection with known transcript (Memczak et al, 2013): exonic circRNA, intronic circRNA, 3 0 UTR circRNA, 5 0 UTR circRNA, antisense circRNA, intergenic circRNA, and ncRNA circRNA. Expression levels of circRNAs were quantified by the number of junction-spanning reads obtained from the UROBORUS tool. The Transcripts Per Million (TPM) of reads of circRNAs were calculated to obtain an estimate of relative expression. The circRNAs with a P-value ≤ 0.05 and an absolute value of log 2 fold change ≥ 1 were treated as differentially expressed.

qRT-PCR assay
Total RNA was isolated with TRIzol reagent (#15596-08, Life Technologies, Carlsbad, USA) and then reverse-transcribed with random hexamers using a PrimeScript RT Reagent Kit (TaKaRa Bio, Inc., Shiga, Japan) according to the manufacturer's protocol. The resulting complementary DNA was analyzed by qRT-PCR performed with SYBR reagent using the IQ5 PCR system (Bio-Rad, Hercules, CA). b-Actin was used as the internal control gene, and data were analyzed using the 2 ÀDDct method. Specific divergent primers, convergent primers, and primers for detecting the corresponding host genes were designed by Geneseed Biotech. (Guangzhou, China), synthesized by Sigma-Aldrich (Louis, MO, USA). These primer sequences are described in Appendix Supplementary Methods. The circRNA ID, gene symbol, and back-splice junction (BSJ) coordinate for 22 circRNAs are described in Appendix Table S4.
To further narrow down the candidate list, we first filtered out the circRNAs with Wald P ≥ 0.05 by univariate Cox regression analysis with disease-free survival, and 8 circRNAs were remained in the training set. DFS and the expression matrices of validated circRNAs were then subjected to the LASSO-bagging procedure. LASSO is a popular method for regression with high dimensional predictors (Tibshirani, 1997), and broadly applied to the Cox proportional hazard regression model for survival analysis (Zhang et al, 2013). Here, we applied a multisplit strategy with LASSO to reduce the overfitting from the training dataset as described previously (Xu et al, 2017). The algorithm contains the following steps: (i) bootstrapping the data point 500 times and generated 500 training matrices; (ii) for each matrix with PFS, Lasso Cox regression analysis was performed using 10-fold cross-validation. Tuning parameter k was chosen by 1-SE (standard error), and we finally got a list of variables that had non-zero beta coefficient in Lasso fit output; (iii) collapse all variable list obtained in each matrix and the resample model inclusion proportion (RMIP) for each circRNA was calculated (explained by an observed frequency in 500 resamples); and (iv) using RMIP as weight of each variable, we observed a sharp RMIP decrease after the fourth marker when ranked all markers in a decrease order. We finally selected the top four markers to build the regression model.

Cell culture and migration assay
The human CRC cell lines and immortalized/non-tumorigenic cells were purchased from the ATTC (Manassas, VA, USA) and cultured under conditions specified by the supplier. All cells were negatively tested for mycoplasma contamination before use, and authenticated based on STR fingerprinting before use at Medicine Lab of Forensic Medicine Department of Sun Yat-sen University. The lentiviruses containing shRNA targeting circRNAs were purchased from Gene-Pharma (Shanghai, China), and the lentiviral transduction was performed as previously described (Ju et al, 2017). The shRNA sequences targeting against the circRNAs were as follows: hsa_ circ_0122319 (#1: atgtttactgaatgataaatt; #2: ctgaatgataaattattagtc); hsa_circ_0079480 (#1: gttgttgtttcaagagaattt; #2: gtttcaagagaatttccc aag); hsa_circ_0087391 (#1: cagtcttataaaattatctgc; #2: cttataaaattatc tgcaatt). Then, cell migration assay was conducted to assess the in vitro function of the selected circRNAs as previously described (Ju et al, 2016). To rule out off-target effects, RNA-seq and bioinformatic analyses were performed by the Novogene Corporation (Beijing, China).

Immunoblotting analysis
Immunoblotting analysis was conducted with standard procedures as previously described (Ju et al, 2017). Briefly, cells were lysed in RIPA buffer and normalized using a BCA Protein Assay Kit (Thermo Scientific, Waltham, MA, USA). Proteins were separated by SDS-PAGE and blotted onto a PVDF membrane (Millipore, Billerica, MA, USA). Membranes were probed with the specific primary antibodies and then with peroxidase-conjugated secondary antibodies. b-Actin antibody was used as a loading control. The bands were visualized by enhanced chemiluminescence using Hyperfilm ECL. The following antibodies were used for immunoblotting analysis: PLOD2 antibody (1:800, #ab72939) (Abcam, Cambridge, MA, USA) and b-actin (1:1,000, #3700) (Cell Signaling, Danvers, MA, USA).

In vivo metastasis study
Two xenograft models were used to evaluate the in vivo metastasis effects of circ_0079480 that exhibited in vitro function as previously described (Ju et al, 2018). Female BALB/c nude mice (3/4 weeks old) were obtained from the Animal Center of Guangdong Province (Guangzhou, China) and housed under specific pathogen-free (SPF) conditions. For liver metastasis, the cells (2 × 10 6 ) in 50 ll PBS were injected into the distal tip of the spleen using a Hamilton syringe (8 mice/group). Six weeks later, the mice were sacrificed and the spleen and liver were removed and embedded in paraffin. The numbers of metastatic nodules in the livers were counted. For tumor lung metastasis, the circ_0079480 knockdown and control cells (2 × 10 6 ) in 100 ll PBS were injected into the tail vein of nude mice (8 mice/group). Six weeks postinjection, the mice were killed and the lung was removed and paraffin-embedded. Consecutive sections were made and stained with hematoxylin and eosin (H&E). The micrometastases in the lungs were examined and counted under a dissecting microscope.
All animal experiments were performed in accordance with a protocol approved by our institutional Animal Care and Use Committee. The randomization of animal allocation was done by random numbers generated by the computer. Following experimentation, no animals were excluded from analysis, and no blinding procedure was undertaken. The reporting of mouse studies in this manuscript conforms with the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines (Kilkenny et al, 2010).

Statistical analysis
For survival analyses, we used the Kaplan-Meier method to analyze the correlation between variables and the survival, and the log-rank test to compare between-group survival. We used the Cox regression model to do the multivariable survival analysis, and Cox regression coefficients to generate nomograms. Concordance indices (C-indices) were used to measure the discriminative abilities of the nomograms (Harrell et al, 1996). Calibration was performed by reviewing the plots of nomogram-predicted survival probabilities with the Kaplan-Meier-estimated probabilities (Iasonos et al, 2008). All statistical tests were two-sided, and P < 0.05 was deemed significant. All analysis scripts were programmed using R software (v3.3.3), with the "glmnet" package (R Foundation for Statistical Computing, Vienna, Austria) for LASSO, the "rms" package for development of nomogram, and the "survival ROC" package to do the time-dependent ROC curve analysis.
For functional assay, all experiments that were repeated three times are presented as mean AE standard deviation (SD), evaluated using Student's t-test (unpaired, two-tailed). Sample size was chosen based on the need for statistical power. Differences reached The paper explained Problem Current staging methods seem to have only a limited role in predicting the risk of disease recurrence and benefit of adjuvant chemotherapy for patients with stage II/III colon cancer. Circular RNA is a novel type of non-coding RNA with a potential use as biomarkers; however, whether circRNA-based signatures could serve as novel prognostic biomarkers for stage II/III colon cancer is unknown.

Results
Dysregulated circRNAs showed strong classification properties in distinguishing the recurrent colon cancer patients from non-recurrent colon cancer patients. A novel prognostic tool (cirScore) based on four circRNAs (i.e., hsa_circ_0122319, hsa_circ_0087391, hsa_circ_0079480, and hsa_circ_0008039) is developed and validated to improve the prognostic stratification for patients with radically resected stage II/III colon cancer. The proposed cirScore can effectively classify patients with stage II/III colon cancer into groups with low and high risks of disease recurrence. Loss-of-function assays indicated that the representative circRNAs play functional roles in the sophisticated regulation of colon cancer progression.

Impact
Our current study addresses an important gap, which is the refinement of our prognostic tools for stage II/III colon cancer, by using a novel approach that takes into consideration the circular RNA. The proposed cirScore might be used in the future to guide better and more personalized treatment decisions for patients with stage II/III colon cancer.