Discovery of Pancreatic Adenocarcinoma Biomarkers by Untargeted Metabolomics.

Pancreatic ductal adenocarcinoma (PDAC) is one of the most aggressive and lethal cancers, with a 5-year survival rate of less than 5%. In fact, complete surgical resection remains the only curative treatment. However, fewer than 20% of patients are candidates for surgery at the time of presentation. Hence, there is a critical need to identify diagnostic biomarkers with potential clinical utility in this pathology. In this context, metabolomics could be a powerful tool to search for new robust biomarkers. Comparative metabolomic profiling was performed in serum samples from 59 unresectable PDAC patients and 60 healthy controls. Samples were analyzed by using an untargeted metabolomics workflow based on liquid chromatography, coupled to high-resolution mass spectrometry in positive and negative electrospray ionization modes. Univariate and multivariate analysis allowed the identification of potential candidates that were significantly altered in PDAC patients. A panel of nine candidates yielded excellent diagnostic capacities. Pathway analysis revealed four altered pathways in our patients. This study shows the potential of liquid chromatography coupled to high-resolution mass spectrometry as a diagnostic tool for PDAC. Furthermore, it identified novel robust biomarkers with excellent diagnostic capacities.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) represents one of the deadliest neoplasms, with disturbingly close incidence and mortality rates. In 2017, there were 448,000 incident cases of pancreatic cancer (PC) globally, and the number of deaths had increased 2-to 3-fold, from 196,000 in 1990 to 441,000 [1]. Despite improvements in diagnosis and treatment and our growing understanding of the complex molecular events underlying PDAC, the overall 5-year survival rate remains worryingly

LC-HRMS Analysis
RPLC, coupled to HRMS, was used in an untargeted metabolomics approach to detect candidate diagnostic markers for PDAC. Peaks obtained from RPLC-ESI+ and RPLC-ESI-experiments are summarized in Table 1. After alignment and filtering procedures, a data matrix of 1150 and 996 metabolite features was obtained for positive and negative ion modes, respectively. Only monoisotopic peaks (365 and 345 for positive and negative ions, respectively) were selected for further analysis. As a result of the Student's t-test comparison between OS and study case samples, 220 and 113 features were removed as potential background and contaminants. At this point, the remaining features (145 and 152 for positive and negative ions, respectively) were assessed for reproducibility in QC samples. As a result, 61 and 18 features (in positive and negative ion mode, respectively) displayed coefficients of variation higher than 30% and were excluded. The remaining variables (84 and 134 for positive and negative ions, respectively) were evaluated in multivariate analysis. Instrumental repeatability was assessed using principal component analysis (PCA) score plots ( Figure 1). The close clustering of QC samples indicated that the differences observed between HC and PDAC patients were only attributable to biological factors. In parallel, partial least squares-discriminant analysis score plots ( Figure 2) illustrated a clear discrimination between the groups, with acceptable values of R 2 and Q 2 ( Table 1). According to the established criteria, metabolomic data should display R 2 ≥ 0.7 and Q 2 ≥ 0.4 and should not vary by more than 0.2-0.3. No over-fitting was observed, indicating that these Cancers 2020, 12, 1002 3 of 13 models successfully discriminated between HC and PDAC patients. The results of univariate statistical analysis (Student´s t-test or Wilcoxon rank-sum test) were applied to the global set of filtered variables (a total of 218 features, 84 from positive-and 134 from negative-ion analyses) yielding 154 variables with p < 0.05. Accordingly, 86 of these selected variables were tentatively identified by applying the criteria described in the following sections.  Marker View software provided a data matrix with the extracted peaks. This software allowed the application of different filter steps to finally detect the features responsible for the discrimination between the groups. R 2 and Q 2 parameters from the partial least squares-discriminant analysis were used to assess model quality. RSD: relative standard deviation; PCA: principal component analysis.   . Partial least squares-discriminant analysis score plot shows a clear separation between the study groups (pancreatic ductal adenocarcinoma (PDAC) in green and healthy controls (HC) in red) in ESI+ (a) and ESI-(b) modes, suggesting that it is possible to discriminate between PDAC patients and HC.

Pathway Analysis
Pathway analysis of the 86 tentatively identified variables revealed 17 deregulated metabolic pathways, and four of these pathways were statistically significant (p < 0.05) and might be potentially associated with unresectable PDAC (Table 2).  Figure 2. Partial least squares-discriminant analysis score plot shows a clear separation between the study groups (pancreatic ductal adenocarcinoma (PDAC) in green and healthy controls (HC) in red) in ESI+ (a) and ESI-(b) modes, suggesting that it is possible to discriminate between PDAC patients and HC. Marker View software provided a data matrix with the extracted peaks. This software allowed the application of different filter steps to finally detect the features responsible for the discrimination between the groups. R 2 and Q 2 parameters from the partial least squares-discriminant analysis were used to assess model quality. RSD: relative standard deviation; PCA: principal component analysis.

Pathway Analysis
Pathway analysis of the 86 tentatively identified variables revealed 17 deregulated metabolic pathways, and four of these pathways were statistically significant (p < 0.05) and might be potentially associated with unresectable PDAC ( Table 2).

Biomarker Evaluation
The clinical value of all tentatively identified candidates (86 features showing p < 0.05 in univariate analysis) was evaluated by univariate receiver-operating characteristic curves (ROC) analysis (Table  S1). As a result, 19 features displayed an area under the ROC curve (AUC) > 0.8. The individual markers were combined in a multivariate model to develop a more reliable algorithm ( Figure S1). In order to avoid overfitting, the 19 features were cumulatively combined, starting with those with the highest individual AUC values, until a steady AUC value was achieved in the multivariate model. Accordingly, we propose a multivariate model based on nine candidate markers ( Table 3) that discriminates unresectable PDAC patients from HC with an AUC value of 0.992 (95% CI of 0.972-1.000). The corresponding confusion matrix shows that all HC samples and 55 PDAC samples were correctly classified, while four PDAC samples were misclassified ( Figure 3).

Discussion
There is an urgent need to improve diagnostic tools for PDAC, whose incidence is expected to increase worldwide over the next few years [17]. Using an untargeted metabolomics approach based on LC-HRMS and multivariate analysis, we identified a specific molecular signature for PDAC patients. Different omics technologies have been used to find robust and reliable biomarkers for different aspects of PDAC [18,19]. In the present study, we established a nine-biomarker panel with excellent diagnostic capacities for PDAC.
The conversion of cells from a normal to a cancerous state is accompanied by the reprogramming of metabolic pathways, an undisputed hallmark of cancer [20]. We detected four altered biological pathways in these patients: linoleic acid metabolism, glycerolipid metabolism, glycerophospholipid metabolism, and primary bile acid biosynthesis.
The symptoms of PDAC are highly varied and non-specific, usually appearing at more advanced stages of the disease [21], and there has been little metabolomic research on biomarkers for early PDAC. In one of the largest investigations to date, Fest et al. (2019), conducted a prospective study of pre-diagnostic samples from five biobanks; however, they concluded that analyses on a much larger

Discussion
There is an urgent need to improve diagnostic tools for PDAC, whose incidence is expected to increase worldwide over the next few years [17]. Using an untargeted metabolomics approach based on LC-HRMS and multivariate analysis, we identified a specific molecular signature for PDAC patients. Different omics technologies have been used to find robust and reliable biomarkers for different aspects of PDAC [18,19]. In the present study, we established a nine-biomarker panel with excellent diagnostic capacities for PDAC.
The conversion of cells from a normal to a cancerous state is accompanied by the reprogramming of metabolic pathways, an undisputed hallmark of cancer [20]. We detected four altered biological pathways in these patients: linoleic acid metabolism, glycerolipid metabolism, glycerophospholipid metabolism, and primary bile acid biosynthesis.
The symptoms of PDAC are highly varied and non-specific, usually appearing at more advanced stages of the disease [21], and there has been little metabolomic research on biomarkers for early PDAC. In one of the largest investigations to date, Fest et al. (2019), conducted a prospective study of pre-diagnostic samples from five biobanks; however, they concluded that analyses on a much larger scale were needed to obtain metabolomic biomarkers with an adequate predictive value to be clinically useful. Nevertheless, they proposed a panel of the metabolites showing the highest capacity to diagnose early PC, including docosahexaenoic acid. Interestingly, docosapentaenoic acid was altered in our cohort of patients with PDAC [22].
We highlight our finding that the levels of several phospholipids were altered in PDAC versus HC samples, most of which were downregulated in PDAC patients. A possible explanation for these results is the uncontrolled cell proliferation, which leads to phospholipid degradation in order to generate energy for expansion and dissemination [23]. This mechanism was previously proposed by Kühn et al. (2016) in a metabolomic study that demonstrated a correlation between elevated lysoPC (18:0) values and a lesser risk of breast, prostate, and colorectal cancer [24]. Concentrations of this metabolite were also higher in healthy controls than in our patients with PDAC. Similar results were observed in two previous studies of patients with PC. Thus, Fahrmann et al. used one patient cohort to search for biomarkers and another to construct the model, which was then validated in patients with early-stage PDAC. They proposed a panel of five metabolites to detect early-stage PDAC, including lysoPC (18:0) alongsideCA19.9, TIMP1, and LRG1, achieving an AUC of 0.924 [25]. Likewise, in the metabolomic study by Ritchie et al. of two ethnically and geographically diverse populations (USA Caucasian and Japanese), three of the metabolites found to be altered in patients with PC also differed between the patients with PDAC and HC in our study: lysoPC (18:0), lysoPC (18:2), and lysoPC (20:5) [26]. Another recent study of patients with PC also observed significantly lower values of lysoPC (18:2), alongsidePC-594 and two other serum-based choline metabolites, and this metabolite demonstrated 76.8% accuracy, 58.6% sensitivity, and 92.0% specificity for the diagnosis of PC [27]. Our results are also consistent with data recently published by Tao et al., who used untargeted lipidomic analysis in serum and exosomes and proposed a panel of 37 lipids differentially expressed by patients with PC versus HC [6]. The panel included three metabolites also proposed in our study as PDAC biomarkers: lysoPC (14:0), lysoPC (18:3), and lysoPE (20:5). In a previous study of populations in the USA and China, lysoPC (14:0) had also been included as a diagnostic biomarker in a panel of 30 metabolites that differed between patients with PC and HC [28]. Finally, we also observed decreased levels of the diacylglycerol CDP-DG (i-24:0/i-17:0) in PDAC patients. CDP-diacylglycerol is an important branchpoint intermediate in eukaryotic phospholipid biosynthesis and may be a key regulatory molecule in phospholipid metabolism; therefore, decreased levels of this molecule may explain the aforementioned reduction in glycerophospholipid levels [29].
Tumor cells upregulate the production of biosynthetic intermediates to build new cells, while increasing ATP levels and energy production. They use glucose, amino acids, and fatty acids as fuel sources, increasing the uptake of these metabolic substrates to meet increased demands for biosynthesis and energy production [10,30]. This may explain the decreased levels of phenylalanine in the serum of PDAC patients, as reported in various studies [31,32].
Highly proliferative cancer cells demonstrate a strong lipid avidity that is satisfied by increasing the uptake of exogenous lipids and lipoproteins [33], and we also detected alterations in the biosynthesis of fatty acids. Specifically, we detected reduced levels of TG (22:2), 3-oxooctadecanoic acid, and Oleoyl-L-carnitine, and increased levels of adrenic acid. Long-chain acyl-CoA synthetases are responsible for the activation of fatty acids and are commonly deregulated in cancer, which may explain the altered levels observed in these metabolites. We highlight that decreased Oleoyl-L-carnitine levels can cause cancer cachexia, a poor prognostic factor for PDAC [34,35].
Furthermore, significantly higher levels of the two bile acids, glycocholic acid and glycochenodeoxycholic acid 7-sulfate, were observed in PDAC patients, which may be attributable to tumor expansion into the bile duct. It has also been proposed that bile acid reflux into the pancreas can lead to malignant cell transformation by directly activating cancer signaling pathways [36,37]. Notably, the aforementioned study by Xie et al. (2015) in the USA and China identified glycocholic acid as the metabolite with the highest fold change value (3.65) in comparisons between 100 patients with PDAC and 100 healthy controlsin each population [28].
Two of the tentatively identified candidates were dehydroepiandrosterone sulfate (DHAS) and androsterone sulfate, which were decreased in PDAC patients. It has been shown that DHAS inhibits the growth of PC cells because it is a potent inhibitor of glucose-6 phosphate dehydrogenase, the rate-limiting enzyme of the hexose monophosphate shunt, a biochemical pathway that provides a substrate for DNA synthesis in neoplastic tissue [38,39].
One of the most discriminating markers was 4-oxo retinoic acid, which plays important roles in cell development and differentiation and in cancer treatment. Specifically, low and high doses of retinoic acid may cause cell cycle arrest and cancer cell apoptosis, respectively [40]. The increased levels detected in PDAC patients may therefore be a defense mechanism against tumor progression.
Besides their application for cancer diagnosis, metabolomic techniques have also proven useful to predict the response of patients to chemoradiation therapy. In one study in advanced rectal cancer patients, comparison of the lipodomic profile between responders and non-responders yielded a panel of five metabolites that predicted the response with an AUC of 0.95 and included lysoPC (16:0) [41]. Besides being identified in our study, lysoPC (16:0) was also proposed in a panel alongside lysoPC (20:4) for the predictive diagnosis, targeted prevention, and personalized treatment of pancreatic and biliary tract cancer, showing a sensitivity of 90.20% and specificity of 92.31% to discriminate between the two diseases [42].
Finally, 13-HODE levels were lower in PDAC patients than in HCs. This metabolite is regulated by the 15-Lipoxygenase-1 enzyme, which metabolizes linoleic acid to 13-HODE and loses its expression during tumor development in the human pancreas. According to these findings, loss of this enzyme may play an important role in pancreatic carcinogenesis, possibly as a tumor suppressor gene, consequently reducing the production of 13-HODE [43].

Sample Collection
The study included 119 samples: 60 from HC and 59 from unresectable PDAC patients. The baseline characteristics of all participants are detailed in Table 4. PDAC diagnosis was based on clinical evaluation and imaging studies and was histologically confirmed by surgery or imaging-guided biopsy. Blood samples from patients were obtained at Virgen de las Nieves Hospital in Granada (Spain) and the variation in different parameters regarding patients and sampling (fasting state, time of day of sampling, etc.) was minimized to obtain the maximum reproducibility in our study. In this way, fasting blood samples were drawn from the patients in hospital between 8 am and 9 am, before the initiation of any treatment. Written informed consent was obtained from all PDAC patients and HCs before their enrollment in the study, which was approved by the ethics committee of the hospital (Comité Ético de Investigación Provincial de Jaen; code: 1269-M1-19) and followed the principles of the Declaration of Helsinki.  Samples were collected in BD vacutainer SSTII advance tubes (Becton Dickinson, Franklin Lakes, NJ) with silica to activate clotting of the specimen. They were centrifuged for 10 min at 2450 rpm, and the supernatant was then aspirated and stored at −80 • C until analysis. Serum samples from HC were supplied by the Biobank of the Andalusian Public Health System and were obtained and treated in the same manner as the samples from patients.

Metabolite Extraction
All serum samples (75 µL) were kept at 4 • C throughout the analytical process. Proteins were removed using acetonitrile (AcN) (1:8 sample/AcN) and shaken for 2 min. Samples were then centrifuged at 15,200 rpm for 10 min at 4 • C. Supernatants were collected and transferred to HPLC analytical vials and evaporated in a GeneVac HT-8 evaporator (Savant, Holbrook, NY). Dry residues were reconstituted in 210 µL of AcN/water (50:50) with 0.1% formic acid and 250 ppb of internal standards. Finally, samples were shaken for 1 min and stored at −80 • C until their analysis.

LC-HRMS Analysis
Samples were analyzed using an Agilent 1290 LC system (Agilent Technologies, Santa Clara, CA, USA) coupled to a quadrupole-time-of-flight 5600 mass spectrometer (AB SCIEX Q-TOF 5600, Concord, ON, Canada) in ESI+ and ESI-modes.
The Q-TOF 5600 was operated using a TOF method, providing mass selection (80-1600 Da) with a high resolution in combination with information-dependent acquisition (IDA) and allowing fragmentation of the eight most intense ions to simultaneously collect full-scan HRMS and MS/MS information. Ion source parameters and IDA conditions were as follows: gas source 1: 50.00; gas source 2:50.00; curtain gas: 45.00; temperature: 500.00 • C; ionspray voltage floating: 4500.00; TOF masses: Min = 80.0000 Da Max = 1600.0000 Da; accumulation time: 0.2500 sec, IDA accumulation time: 0.1000 sec.

Data Set Creation
PeakView software (version 1.0 with Formula Finder plug-in version 1.0, AB SCIEX, Concord, ON) was used to evaluate the retention time (RT) and mass/charge (m/z) variability of the experiment. Then, MarkerView software (version 1.2.1, AB SCIEX, Concord, ON) was used to process raw LC-HRMS data, performing peak detection, alignment, and data filtering, and yielding a data matrix in which the measured m/z ratio, RT, and ion peak area were defined for each sample.
Data mining was performed by the program algorithm in the RT range of 1-17.20 min for the ESI+ experiment, and in the RT range of 1-13.00 min for the ESI-experiment. The peak intensity cutoff was set to 150 cps, and peak alignment was achieved using RT and m/z tolerances of 0.10 min and 5 ppm, respectively. A filter by "presence" was used to retain only masses that appeared in at least 12 samples in the study groups. Only monoisotopic peaks were considered in order to reduce mass redundancy and enhance selection of the true molecular feature. Next, we identified mass signals differentially expressed between the OS samples and case study samples (HC and PDAC), applying an additional filtering procedure with a t-test (p > 0.05) to remove background and contaminants, keeping the true biological mass signals from LC-HRMS data. Finally, we removed variables from the data matrix with unacceptable reproducibility (relative standard deviation > 30%) in QC samples. The following steps were carried out using Metaboanalyst 4.0 Web Server software.

Normalization and Analytical Validation
Probabilistic quotient normalization based on QC samples, log transformation, and Pareto scaling were used to convert the dataset to transform the data matrix into a more Gaussian-type distribution. Then, unsupervised PCA was carried out to evaluate the performance of the analytical system and detect possible outliers. In this regard, we evaluated the stability of the analytical system by plotting the QC samples on a PCA plot. The statistical quality of the model was assessed by goodness of fit (R 2 ) and goodness of prediction (Q 2 ). In parallel, the Hotelling T2 ellipses in a partial least squares-discriminant analysis score plot enabled the detection of outlier samples in both study groups. The elimination of outliers did not produce an increase in R 2 or Q 2 values.

Statistical Analysis
Filtered normalized variables in ion modes (ESI+ and ESI-) were combined in a single data matrix for global statistical analysis. The Shapiro-Wilk test was used to evaluate the normality of the data (p < 0.05). Next, the Student's t-test or Wilcoxon-rank-sum test was applied to evaluate differences between patients with PDAC and HC. The Benjamin-Hochberg false discovery rate (FDR) correction for multiple comparisons was then performed to minimize the expected proportion of false positives (Type I errors). An FDR-corrected p value of 0.05 in the t-test is generally used in metabolomics as a cutoff threshold. Next, multivariate analysis was conducted to identify the variables responsible for the discrimination between groups.

Biomarker Evaluation
The clinical value of candidates was evaluated using their AUC values, performing univariate and multivariate ROC analyses to assess the clinical value of the candidates as biomarkers, either individually or in combination. The AUC values expressed in this study are flipped (1-AUC) and are, therefore, always presented as >0.5, independently of the case-control ratios.

Biomarker Identification
PeakView software (version 1.0 with Formula Finder plug-in version 1.0, AB SCIEX, Concord, ON) was used to estimate the elemental formula of selected marker compounds from accurate mass, isotopic clustering, and fragmentation patterns. Next, accurate mass queries were performed in compound databases (Metlin, Human Metabolome Database, Lipid Maps, PubChem, ChemSpider), and fragmentation patterns were searched in spectral databases (MassBank, NIST2014) for the tentative structural identification of the molecular formula. According to these criteria, all the metabolite identifications reported in our study should be considered tentative.

Pathway Analysis
Metaboanalyst 4.0 Web Server software was used to identify altered metabolic pathways. The metabolite ID matching was performed with the Human Metabolome Database. The analysis was adjusted by a hypergeometric test, and the impact on pathway topology was based on relative-betweenness centrality.

Conclusions
Our tentative identification of a molecular signature for PDAC patients highlights the potential of LC-HRMS untargeted metabolomics for the discovery of biomarkers. This study offers new insights into the molecular mechanism and signaling pathways of PDAC and proposes novel metabolic biomarkers that might be clinically relevant in the future. The main study limitation is that the patients were in advanced stages of PDAC at the time of their diagnosis; therefore, further research is needed to verify the usefulness of these markers for the diagnosis of patients with early-stage disease.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/4/1002/s1, Table S1: Metabolites statistically significant with a tentative identification. Figure S1: ROC curve for the 19-biomarker panel; 100 cross-validations were performed, and the results were averaged to generate the plot (a). Average of predicted class probabilities for each sample in the 100 cross-validations. Given that the algorithm uses a balanced subsampling approach, the classification boundary is located at the center (x = 0.5, dotted line). The confusion matrix shows that all HC samples and 52 PDAC samples were correctly classified, but 4 PDAC samples were incorrectly classified (b).