An integrated analysis of genome‐wide DNA methylation and gene expression data in hepatocellular carcinoma

Despite progress in the treatment of hepatocellular carcinoma (HCC), 5‐year survival rates remain low. Thus, a more comprehensive approach to explore the mechanism of HCC is needed to provide new leads for targeted therapy. We performed an integrated analysis to discover the relationship between DNA methylation and gene expression in hepatocellular carcinoma (HCC). DNA methylation and gene expression data for HCC were downloaded from The Cancer Genome Atlas (TCGA) database, and differential analysis was performed. Correlation analysis between DNA methylation and gene expression data was then performed in R language. Finally, we selected several crucial genes and evaluated their potential use as diagnostic biomarkers for HCC. In total, 1135 differentially DNA‐methylated CpG sites (DMCs), 377 differentially methylated regions (DMRs), and 1194 differentially expressed genes (DEGs) were identified in HCC. Among the DEGs, 14 genes (ALX3, B4GALNT1,CTHRC1,DLX5,EMX1,IRX3,OTX1,SIX2,TLX1,VASH2,ZIC2,ZIC4,ZIC5, and ZNF695) exhibited changes in DNA methylation in terms of CpG sites or CpG island (CGI) level, of which TLX1 and ZIC4 had the most DMCs (12 and 13, respectively). Further analysis of CTHRC1,ZIC4,SIX2,VASH2,IL17D,TLX1,OTX1, and LART, examining alterations in both DNA methylation and gene expression level in HCC, showed their potential diagnostic value for HCC was better at the gene expression level than that the DNA methylation level. The DNA methylation status of CTHRC1,VASH2, and IL7D was significantly associated with HCC overall survival (P‐value <0.05). This systemic analysis identified a group of novel gene signatures (CTHRC1,ZIC4, and OTX1) that may be regulated by DNA hypermethylation, which may be closely associated with HCC.

Despite progress in the treatment of hepatocellular carcinoma (HCC), 5year survival rates remain low. Thus, a more comprehensive approach to explore the mechanism of HCC is needed to provide new leads for targeted therapy. We performed an integrated analysis to discover the relationship between DNA methylation and gene expression in hepatocellular carcinoma (HCC). DNA methylation and gene expression data for HCC were downloaded from The Cancer Genome Atlas (TCGA) database, and differential analysis was performed. Correlation analysis between DNA methylation and gene expression data was then performed in R language. Finally, we selected several crucial genes and evaluated their potential use as diagnostic biomarkers for HCC. In total, 1135 differentially DNA-methylated CpG sites (DMCs), 377 differentially methylated regions (DMRs), and 1194 differentially expressed genes (DEGs) were identified in HCC. Among the DEGs, 14 genes (ALX3, B4GALNT1, CTHRC1, DLX5, EMX1, IRX3, OTX1, SIX2, TLX1, VASH2, ZIC2, ZIC4, ZIC5, and ZNF695) exhibited changes in DNA methylation in terms of CpG sites or CpG island (CGI) level, of which TLX1 and ZIC4 had the most DMCs (12 and 13, respectively). Further analysis of CTHRC1, ZIC4, SIX2, VASH2, IL17D, TLX1, OTX1, and LART, examining alterations in both DNA methylation and gene expression level in HCC, showed their potential diagnostic value for HCC was better at the gene expression level than that the DNA methylation level. The DNA methylation status of CTHRC1, VASH2, and IL7D was significantly associated with HCC overall survival (P-value <0.05). This systemic analysis identified a group of novel gene signatures (CTHRC1, ZIC4, and OTX1) that may be regulated by DNA hypermethylation, which may be closely associated with HCC.
Hepatocellular carcinoma (HCC) is one of the most common malignant cancers worldwide, leading to about one million deaths annually [1]. HCC is a heterogeneous disease, and the underlying risk factors were liver cirrhosis, alcoholism, chronic hepatitis B virus, and chronic hepatitis C virus. Despite progress in therapy, 5-year survival rates of HCC remain unfavorable [2]. Therefore, it is urgent to deeply explore the mechanism of HCC based on a more comprehensive approach to promote the exploration of targeted therapy. DNA methylation is a major event of epigenetic modifications which are heritable and stable. Recently, DNA methylation has been extensively investigated in cancer [3]. Promoter hypermethylation induces the silencing or downregulation of genes especially for tumor suppressor genes, whereas global DNA hypomethylation leads to genomic instability [4]. Emerging evidences have suggested that altered DNA methylation profiles were involved in early stage of HCC [5,6]. Holmila demonstrated that aberrant VIM and FBLN1 methylation levels in cell-free DNA were potential plasma-based biomarkers for HCC [7]. Nevertheless, DNA methylation alterations in HCC have not been elucidated systemically.
Previous evidences have clearly underscored the crucial role of DNA methylation in the regulation of gene expression in the process of normal development and diseases such as HCC [8]. Despite recent advances that permit genomewide screening in the DNA methylation or mRNA expression level, the mechanisms of DNA methylation involved in the regulation of gene expression remain unclear in HCC. The TCGA database offers large-scale and multigenomic data of over 30 human compressive insight into methylation characters of HCC, we performed differential analysis in DNAmethylated and gene expression level based on the data from TCGA, and then, a integration analysis of DNA methylation and gene expression changes to uncover the regulatory role of DNA methylation [9]. Finally, several crucial genes were selected to validate DNA methylation and gene expression pattern in GEO database and assessed their potential diagnosis value and the correlation with overall HCC survival. Our results may provide some clues for the regulatory role of DNA methylation in HCC.

Data collection
The DNA methylation array data (Illumina Infinium Human Methylation 450 BeadChip), mRNA expression data (IlluminaGA_RNASeqV2.1.0.0), and the corresponding clinical information of HCC were downloaded from TCGA dataset firehose browse (http://firebrowse.org/). According to the corresponding clinical information, the patients with a history of other malignancy were excluded. Consequently, 361 tumor tissues and 41 corresponding adjacent normal tissues from HCC patients were included in this study. According to 361 personal clinical information and methylation data, integrative clinical-methylation-expression analysis was performed using R package. Clinical information includes grade, fibrosis, gender, and stage.

Identification of differentially methylated CpG sites and differentially methylated regions
We removed the CpG sites for which beta value was not available in more than 80% samples, and obtained 395 633 CpG sites. COHCAP package in R [10] was used to analyze the differentially methylated CpG sites (DMCs) and differentially methylated regions (DMRs) between HCC tumor and adjacent normal tissues. Using COHCAP, methylated state and unmethylated state were set, calculating DMRs based on CpG island (CGI) level. Manhattan plot was constructed to explore the distribution of CpG sites according to false discovery rate (FDR) by qqmanpackage in R [11]. The top 200 DMCs were used to perform hierarchical clustering analysis by R package pheatmap.

Identification of differentially expressed genes
Differentially expressed genes (DEGs) between tumor and normal groups were calculated by R package DEseq2 [12]. FDR was calculated from multiple testing corrections of raw P-value via the Benjamini and Hochberg method [13]. Finally, |log2(fold change) | > 2.5 and FDR < 0.01 were set as the threshold for DEGs.

Construction of weighted comethylation network
After filtering CpG sites that were missing in 80% of samples or the bottom 99% variable CpG sites, we selected 7913 sites to perform WCCNA by WGCNA package in R language [14]. The soft threshold was set as 10. The correlation analysis between eigen vectors and tumor-normal state was performed to obtain modules associated with tumor-normal status The threshold was set as| coefficient of correlation |> 0.2 and P-value <0.01. Additionally, CpG sites that were mostly correlated with eigen vectors in modules were selected as hubs. Furthermore, DMC enrichment analysis of modules in comethylation network was performed by Fisher's exact test in R.

Integrated analysis of DNA methylation and gene expression
To explore the relationship between DNA methylation and gene expression, DMR-DEG pairs were gained in shell. Then, the correlation analysis between DMCs and DEGs was performed in R language and those with |coefficient of correlation | > 0.2 and P-value <0.05 were selected as significant ones.

Genomic features and functional enrichment analysis
Genomic features included CGI context and gene context. To explore the genomic features of DMC and CpG sites in modules of comethylation network, Fisher's exact test was performed, and those with P-value < 0.01 were considered as significant ones. To discover the function of differentially methylated DEGs and corresponding genes of CpG sites in modules, enrichment analysis was performed by GenoCodis (http://genecodis.cnb.csic.es/analysis) online [15], including Gene Ontology (GO) [16] and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway [17].

Receiver operating characteristic curve of candidate genes
To assess the diagnostic value of candidate genes, receiver operating characteristic (ROC) curve was constructed by pROC package in R. Furthermore, area under the curve (AUC) was calculated to assess the performance of each gene.

Kaplan-Meier analysis of candidate genes
To further explore the relationship of DNA methylation or mRNA expression level of candidate genes with overall survival time of HCC, the Kaplan-Meier curve analysis was performed. A cohort of 378 HCC patients was downloaded from cBioPortal for Kaplan-Meier analysis (http://www.cb ioportal.org/). P value <0.05 was set as the cutoff of significance; Kaplan-Meier analyses were performed by survival package in R.

Validation of DNA methylation and expression level of candidate genes
The DNA methylation and expression levels of differentially methylated DEGs were validated through GSE89852 and GSE45436 generated from GEO (https://www.ncbi. nlm.nih.gov/) database, respectively, and were analyzed by t-test.

The characteristics of DNA methylation pattern in HCC
To uncover the methylation profile of HCC, we obtained 227 189 DMCs with FDR <0.0001. As is shown in Manhattan plot ( Fig. 1), those CpG sites distributed in all chromosomes. Using the threshold of delta beta >0.2 and FDR <0.0001, we obtained 6510 DMCs totally, including 3903 hypermethylated and 2607 hypomethylated CpG sites (Table S1). Unsupervised hierarchic clustering analysis was performed using the top 200 DMCs based on FDR rank, which showed that the tumor group was in hypermethylated state compared to the normal group (Fig. 2). In table 1, the methylation-clinical information-gene expression association result was obtained by setting | correlation|> 0.2 and P < 0.05 and integrating with the previous methylation-gene expression. There were three clinical features including grade, fibrosis, gender which have correlation with the methylation profile.
To gain more robust results, the threshold was set as followings: delta beta>0.2, FDR<0.05, number of DMCs in each CGI ≥ 2. Consequently, 377 significant DMRs on CGI level were obtained, comprising of 359 hypermethylated DMRs and 18 hypomethylated DMRs, which also suggested that the tumor group was more hypermethylated (Table S2).

Construction of weighted CpG site comethylated network
To explore features of comethylated modules of HCC, WCCNA was performed. Consequently, one module named turquoise including 6723 CpG sites was obtained (Fig. 3). This module was significantly associated with tumor-normal status (coefficient value = À0.4, P-value = 2.8e-17). Enrichment analysis indicated that CpG sites in turquoise module were significantly enriched in DMCs (odds ratio=3.02, 95% confidence interval = 2.68-3.40, P-value = 8.17e-58). Unsupervised hierarchic clustering analysis showed that the top 200 CpG sites detected in turquoise module apparently clustered all samples into two groups, and the tumor group was in the hypomethylated cluster ( Fig. 4). At the methylation-methylation-clinical information-gene expression association analysis, we get no result of methylation-clinical information.  Furthermore, genomic features enrichment analysis revealed that CpG sites in turquoise module significantly enriched in gene body, 3 0 UTR, TSS1500, TSS200 on gene context level, and island, shore, shelf on CGI level. The most significant enrichment was in gene body (odds ratio = 6.98, 95% confidence interval = 6.64-7.34, P-value = 0) and island (odds ratio = 0.52, 95% confidence interval = 0.49-0.55, P-value = 2.48e-109), respectively. Additionally, KEGG pathway enrichment analysis revealed that CpG sites in turquoise module were significantly enriched in neuroactive ligand-receptor interaction, calcium signaling pathway, olfactory transduction, pathways in cancer, basal cell carcinoma, and focal adhesion. Inturquoise module, 10 CpG sites were identified as hubs, which were significantly associated with the module.

Integrative analysis of differentially DNAmethylated data and DEGs
To explore the relationship between DNA methylation and mRNA expression, we combined the two-omics data for further analysis. In brief, 1194 significant DEGs were obtained, consisting of 1017 upregulated genes and 177 downregulated genes (Table S3).

Validation of several crucial genes in other HCC cohorts from GEO database
Based on the above bioinformatics analysis, eight genes (CTHRC1, ZIC4, SIX2, VASH2, IL17D, TLX1, OTX1, and LART), for which DNA-methylated status was correlated with their gene expression, were selected for further analysis. To confirm the change in these genes in DNA methylation and gene expression level among HCC patients in other cohorts, GSE89852 (a dataset presenting DNA methylation data of 74 samples, HCC = 37, normal = 37) and GSE45436 (a dataset presenting gene expression data of 134 samples, HCC = 93, normal = 41) were used, respectively. The DNA methylation and expression level of the selected eight genes were also significantly changed in the GSE89852 and GSE45436 datasets, which was highly consistent with our integrated analysis (Fig. 5A,B).

Assessment of the potential use of several crucial genes as diagnostic markers of HCC
To evaluate their power as diagnostic biomarkers for HCC, we calculated their sensitivity and specificity using ROC curve analysis. For DNA-methylated patterns of CTHRC1, ZIC4, SIX2, VASH2, IL17D,

Overall survival analysis by Kaplan-Meier curves in HCC patients
The correlation between alter of CTHRC1, ZIC4, SIX2, VASH2, IL17D, TLX1, OTX1, LART in DNA methylated or gene expression level and HCC survival was analyzed through Kaplan-Meier curves. Among them, DNA-hypomethylated status of CTHRC1 (P = 0.0356) and IL17D (P = 0.0324) and DNAhypermethylated status of VASH2 (P = 6.28E-5) were significantly associated with poor survival of HCC. Additionally, the upregulation of CTHRC1 was significantly associated with poor survival of HCC (Fig. 7).

Discussion
In our study, we characterized DNA methylation profile of HCC by combining the module in comethylated network. Our results showed that HCC tumor group was hypermethylated compared with matched adjacent tissue. CGI is normally unmethylated [18], but our results showed that DMRs were all hypermethylated based on CGI level, indicating that the hypermethylated CGI may affect the expression of corresponding genes. Interestingly, in the DMR-DEG pairs, all the DEGs were hypermethylated and Fig. 6. The discriminatory ability of the candidate genes between HCC tissues and adjacent nontumor tissues with ROC curve. upregulated. Moreover, 69.7% (92 of 132) DMC-DEG pairs were significantly positively correlated. Moreover, genomic features enrichment analysis indicated that DMCs were significantly enriched in gene body and differentially methylated CGIs in DMR-DEG pairs were all located in gene body. It is well known that CGI hypermethylation in promoter of tumor suppressor genes can drive downregulation of the corresponding genes. This is a classic methylation regulatory pattern in cancer [19]. Ju Dong Yang [20] found potential novel oncogenes (PSRC1, MRE11A, MYO1E), tumor suppressor genes (CFH, MYRIP), implicated in hepatocarcino-genesis that may be regulated by CpG site methylation, and affect prognosis after resection for HCC. David A. Wheeler [21] found significantly mutated genes, including LZTR1, EEF1A1, SF3B1, and SMARCA4, a p53 target gene expression signature correlating with poor survival. Potential therapeutic targets for which inhibitors exist include WNT signaling, MDM4, MET, VEGFA, MCL1, IDH1, TERT, and immune checkpoint proteins CTLA-4, PD-1, and PD-L1. Aberrant DNA methylation in gene body has not been elucidated clearly. In bladder and colon cancer, hypermethylated CpG-rich regions along with altered expression were detected in gene body of PAX6 [22]. Expressed genes were hypermethylated in active X chromosome regions compared with the corresponding silenced genes [23]. Yang et al. [24] demonstrated that the hypermethylation status in gene body promoted corresponding gene expression.So hypermethylated CGI in gene body along with upregulated gene expression may provide another DNA methylation regulatory pattern in HCC.
In WCCN, CpG sites in turquoise module were negatively correlated with tumor-normal status, and the unsupervised hierarchical clustering analysis of above CpG sites suggested a hypermethylation status of normal group. This could provide us a methylation characteristic of HCC in a systematical view. The CpG sites enriched in gene body, CGI, and DMCs suggested that there were also some common traits between differential methylation and WCCN analysis.
Moreover, a series of potentially epigenetic regulated genes were identified. CTHRC1 (collagen triple helix repeat containing-1), an extracellular matrixrelated secreted glycoprotein, was ubiquitously upregulated in many cancer types. For instance, it has been reported that CTHRC1 was upregulated by promoter demethylation in gastric cancer [25]. In our present study, CTHRC1 was hypermethylated at CGI level along with downregulated expression. The methylation level of CTHRC1 acted as a protective factor in HCC, whereas a contrary trend was observed for the expression of CTHRC1 in our analysis. CTHRC1 played an important role in attenuating fibrosis in hepatic fibrosis [26]. Tameda et al. showed that knocking down of CTHRC1 can suppress cell proliferation and motility in HCC cells. Moreover, the protein of CTHRC1 was significantly overexpressed in poorly differentiated HCC [27]. In addition, CTHRC1 activated Wnt/PCP signaling pathway, which promoted cervical cell migration, invasion, and epithelial-to-mesenchymal transition (EMT) in renal cell carcinoma and glioblastoma cells [28][29][30]. Herein, we give additional evidence that upregulation of CTHRC1 may be regulated by aberrant methylation in HCC.
As one member of the zinc finger of the cerebellum family, Zic family member 4 (ZIC4) was hypermethylated and was involved in lymph node in oral cancer [31]. ZIC4 was also associated with progression to muscle-invasive disease in bladder cancer [32]. In addition, orthodenticle homeobox 1 (OTX1) was upregulated and associated with high stage of colorectal cancer [33]. Silencing of OTX1 suppressed the proliferation and migration by promoting cell cycle arrested in HCC [34]. Herein, ZIC4 and OTX1 were all hypermethylated on CGI level and upregulated, implying that DNA methylation alteration may play important roles in HCC.

Supporting information
Additional Supporting Information may be found online in the supporting information section at the end of the article: Table S1. List of 6510 differentially DNA-methylated CpG sites (DMCs). Table S2. List 377 differentially methylated regions (DMRs). Table S3. List of 1194 differentially expressed genes (DEGs).