Nuclear Magnetic Resonance Metabolomics Approach for the Analysis of Major Legume Sprouts Coupled to Chemometrics

Legume sprouts are a fresh nutritive source of phytochemicals of increasing attention worldwide owing to their many health benefits. Nuclear magnetic resonance (NMR) was utilized for the metabolite fingerprinting of 4 major legume sprouts, belonging to family Fabaceae, to be exploited for quality control purposes. Thirty-two metabolites were identified belonging to different classes, i.e., fatty acids, sugars, amino acids, nucleobases, organic acids, sterols, alkaloids, and isoflavonoids. Quantitative NMR was employed for assessing the major identified metabolite levels and multivariate data analysis was utilized to assess metabolome heterogeneity among sprout samples. Isoflavones were detected exclusively in Cicer sprouts, whereas Trigonella was characterized by 4-hydroxyisoleucine. Vicia sprouts were distinguished from other legume sprouts by the presence of L-Dopa versus acetate abundance in Lens. A common alkaloid in all sprouts was trigonelline, detected at 8–25 µg/mg, suggesting its potential role in legume seeds’ germination. Trigonelline was found at highest levels in Trigonella sprouts. The aromatic NMR region data (δ 11.0–5.0 ppm) provided a better classification power than the full range (δ 11.0–0.0 ppm) as sprout variations mostly originated from secondary metabolites, which can serve as chemotaxonomic markers.


Introduction
The plants of family Fabaceae grow worldwide in different climatic regions and are considered to be the third largest family among the flowering plant families, with about 700 genera and 20,000 species. Legume seeds are widely incorporated in the human diet, especially in developing countries, for their rich nutrient content of protein, providing about 33% of human dietary protein, nitrogen, starch, dietary fiber, and minerals [1][2][3]. Legumes are enriched in phytochemicals, i.e., flavonoids, alkaloids, phenolic acids, and saponins, some of which have proven or proposed health-promoting action and medicinal importance, offering a protective effect against several chronic diseases, especially inflammation-based ones [4].
The presence of antinutrients in legume seeds, e.g., tannins, phytic acid, trypsin inhibitors, and hemagglutinins nevertheless limits the nutritional value of many legumes [5].
the identified metabolites to be compared with the previously published LC-MS (liquid chromatography-mass spectrometry) and GC-MS (gas chromatography-mass spectrometry) classification models.
According to previous investigations, the antihyperlipidemic and hypoglycemic properties of Trigonella seeds were strongly related to their amino acid composition, especially 4-hydroxyisoleucine (10), which was detected exclusively in Trigonella sprouts ( Figure 1A and Supplementary Figure S3) and is considered the precursor of sotolon (3-hydroxyl-4,5dimethyl-2(5H)-furanone), the powerful aroma component in fenugreek [35,36]. Another potential hypoglycemic alkaloid detected in most sprouts was trigonelline (25), identified from δ 9.23 (H-2) and 4.44 (N-CH 3 ) characteristic signals ( Figure 1B and Supplementary  Figures S8, S9, and S12) [37]. Trigonelline was detected in all legumes studied and in accordance with LC-MS findings [28]. Trigonelline is widely distributed in dry legume seeds [38], however, the previous studies concerning its presence in germinated legumes are scarce and only report on fenugreek [39]. In plants, trigonelline acts as a reserve molecule which turns into NAD (nicotinamide adenine dinucleotide) during the germination process of, e.g., coffee seeds [40], which is suggestive for a role in sprout germination in legumes.
Several studies showed that sprouted seeds contain higher amino acid levels than their seeds, concurrent with other beneficial constituents, such as phenolic compounds. This would thus be reflected in improved antioxidant activity of the germinated seeds [41].
The 1 H-NMR spectra showed low-intensity signals attributed to cytosine (20) (δ 5.70 (H-5) and 8.01 (H-6)) in all sprouts ( Figure 1B and Supplementary Figure S8). Nucleobases play an important part in the regulation of many physiological processes in the human body via the purine or pyrimidine receptors. Additionally, some cytosine derivatives were reported to possess diverse biological activities such as antimicrobial and anticancer properties [42,43].
Regarding the annotation of malonyl-glucoside forms of isoflavones, assignment was based on the key malonyl CH 2 signal (δ 3.17 ppm), concurrent with the down-field shifts of the H-6 and H-8 aromatic protons at δ 6.50 and 6.72 (malonyl-genistin (28)) and δ 7.22 and 7.27 (malonyl-daidzin (31)) respectively, with respect to their corresponding glucosides at δ 6.52 and 6.71 (genistin (27)) and δ 7.19 and 7.25 (daidzin (30)), respectively (Table 1 and Supplementary Figure S10). These down-field shifts were attributed to the de-shielding effect of malonic acid attached to a hydroxy group of glucose and in agreement with reported data [49]. However, the unexpected up-field shift of malonyl-genistin H-6 (Supplementary Figure S10) is attributed to the hydrogen bonding between the free carboxylate of malonic acid with the hydroxyl group at C-5 of genistin, resulting in a little shielding near H-6. This assumption is confirmed by observing that H-6 of malonyldaidzin without a hydroxyl group at C-5 does not show this pattern (Supplementary Figure  S10) [49]. Malonyl isoflavone glucosides were previously detected in chickpea seeds [50].
Isoflavonoids are well-recognized for a myriad of biological effects, i.e., antioxidant, estrogenic, antimicrobial, antiosteoporosis, and anticancer properties [54], some of which are rare in other flavonoid subclasses, such as strong phytoestrogenic effects. Previous findings revealed that germination remarkably increased isoflavones content, when compared to raw seeds, and hence is likely to contribute to enhanced antioxidant or estrogenic effects. This suggests that the germinated Cicer seeds may be a promising functional food component being rich in isoflavonoids [50].

Quantification of Major Metabolites Detected Via 1 H-NMR
1 H-NMR was further used to determine the absolute amounts of the identified metabolites in legume sprout extracts for future standardization purposes. NMR has been utilized efficiently in many medicinal plants and food metabolites for quantification without standard requirements [23,26]. For each of the previously mentioned identified metabolites, the ability of 1 H-NMR to recognize a single well-resolved signal further allowed for their unbiased absolute quantification in sprout samples (Supplementary Table S1). The concentrations of the identified metabolites were expressed as µg/mg dry powder in different legume sprout samples, as shown in Table 2.
Sugars represented the major metabolites in all sprouts with maximal levels observed in Cicer extract (468.3 µg/mg total sugars), and with sucrose amounting for the major sugar. The high sugar content adds to the palatable taste of Cicer sprout. The percentage of the identified sugars ranges from 38% to 47%, and in accordance with that previously stated for other sprouts [5,9,55].
Total choline and betaine levels were quantified in all specimens, reaching up to 119.1 µg/mg in Vicia, rationalizing for its use as a natural antidiabetic [31]. Similarly, total amino acids content reached its highest level in Vicia samples (266.5 µg/mg). The high amino acids content adds to the nutritional value of Vicia sprout. However, 4hydroxyisoleucine (51.1 µg/mg) was detected exclusively in Trigonella sprout, which may be correlated to its potential antidiabetic effect.
The absolute quantification utilizing NMR also showed that the highest levels of trigonelline were detected in Trigonella and Cicer sprouts, amounting to ca. 25 and 18 µg/mg, respectively. Trigonella and Cicer sprouts were also the richest in ω-3 fatty acids, amounting to 21.7 and 20.1 µg linolenic acid equivalent/mg dry matter respectively, as shown in Table 2. All sprouts contained the desirable ω-6 to ω-3 ratio recommended by the WHO and FAO, agreeing with the previously stated ratio in Trigonella sprouts [7]. Vicia sprouts were found rich in the anti-parkinsonismic L-Dopa, amounting to 112 µg/mg, confirming previous reports in sprouts of Vicia faba varieties [32]. Cicer sprouts presented a good source of isoflavonoids (~350 µg/mg) with malonylated isoflavone glycosides, i.e., malonyl-daidzin and malonyl-genistin, amounting to 80.2 and 78.9 µg/mg of the dried sprout matter, respectively (Table 2).
To the best of our knowledge and compared to previous NMR studies, this study provides the first comprehensive NMR metabolites fingerprinting and standardization of 4 sprouted legumes for future quality control purposes.

1 H-NMR Data Multivariate Data Analyses
Multivariate analysis results point to an advantage of our comparative metabolomics approach to reveal sample relatedness. Principal component analysis (PCA) and orthogonal projection to latent structures-discriminant analysis (OPLS-DA) are often utilized to analyze large complex datasets in order to define the differences between groups of data or to interpret group differences in meaningful ways.

Unsupervised Multivariate PCA of Full-Range 1 H-NMR Data
PCA is an extensively used multivariate data analysis method for chemometrics. PCA was performed within the full 1 H-NMR region (δ 11.0-0.0 ppm) (Figure 3) for all sprouts, with distance to the model (DModX) test used to verify the presence of outliers (Supplementary Figure S13). PC1, representing the main principal component, accounted for 58% of the total variance. The PC1/PC2 scores plot ( Figure 3A) revealed 3 major distinct clusters corresponding to the four examined sprouts. Cicer specimens were located on the far-right side of the plot (positive PC1), while the remaining samples were positioned on the left side (negative PC1). Discrimination of Trigonella specimens from Vicia and Lens was observed along PC2 (23% of the variance). The score plot showed good reproducibility for all sprout specimens, confirming a low technical variability for the extraction method. Metabolites accounting for specimen's segregation in a PCA score plot were revealed from the loading plot ( Figure 3B), displaying the most discriminatory 1 H-NMR signals. Three major groups stood out in this plot. The first corresponded to the 1 H-NMR signals for isoflavonoids (δ 6.96) and sucrose (δ 3.61 and 3.71), contributing positively to PC1, and were found more enriched in Cicer. The second showed a negative effect on PC1 from 1 H-NMR signals, which were assigned to asparagine (δ 3.84) and 4-hydroxy-isoleucine (δ 1.24), negatively affecting PC2 and abundant in Trigonella. Sugars (δ 3.76 and 3.64) positively effect PC2 and were found abundant in all sprouts except Cicer, suggesting that such sugars may be glucose and/or fructose. Metabolites showing less influence according to the loading score emanated from 1 H-NMR signals of L-Dopa (δ 6.73 and 6.75), which was found exclusively in Vicia sprouts and had a negative effect on PC1 and a positive one on PC2 ( Figure 3B). To confirm that the discrimination between samples is mostly affected by such metabolites among sprouts, i.e., sucrose, 4-hydroxy-isoleucine, and asparagine, box plots were attempted for these metabolites using NMR detection (Supplementary Figure S14). In agreement with the PCA results, the highest level of sucrose was found in Cicer, while Trigonella was the sprout most enriched in hydroxy-isoleucine and asparagine. Details on the absolute quantifications for all major compounds detected in all sprouts are provided in Table 2. PCA results were further confirmed by performing a heatmap plot, which revealed a similar clustering pattern (Supplementary Figure S15A).

Unsupervised Multivariate PCA of the Aromatic 1 H-NMR Region Data
For more sample classifications and metabolite marker determinations, PCA was performed for all samples limited to the more distinctive aromatic 1 H-NMR region (δ 11.0-5.0 ppm). Such model (Supplementary Figure S16) showed better classification power than that of the full-range-based one with higher PC1 value (61%). As observed in full-range NMR, three distinct clusters were revealed in the PC1/PC2 scores plot (Supplementary Figure S16A), with Cicer specimens still being the most distant and located on the farleft side of the plot (negative PC1 values), whereas other sprouts were positioned at the right side (positive PC1). Vicia samples could be discriminated from Trigonella and Lens along PC2 (30% of total variance). The observed separation could be explained from the corresponding loading plot (Supplementary Figure S16B). In detail, high isoflavonoids (δ 6.99 and 7.49) content was detected in Cicer specimens contributing negatively to PC1, whereas L-Dopa (δ 6.61, 6.72, and 6.75) affected PC2 positively and was abundant in Vicia samples (Supplementary Figure S16B). Tryptophan (δ 7.33) showed less influential loading scores with positive effect on PC1, discriminating Lens and Trigonella sprouts (Supplementary Figure S16B). A similar clustering pattern was revealed from a heatmap plot (Supplementary Figure S15B), and in accordance with PCA results. Moreover, box plots' results for isoflavonoids, L-Dopa, and tryptophan (Supplementary Figure S14), as the major discriminatory metabolites, were in agreement with the PCA results.

Supervised Multivariate OPLS-DA of 1 H-NMR Data
In spite of the clear separation observed in both full and aromatic 1 H-NMR-based PCA, legume metabolite markers were further confirmed by constructing several supervised OPLS-DA models. OPLS-DA is more potent in the identification of markers by providing the most relevant variables for the differentiation between two sample groups. First, Cicer samples were modelled against other sprout samples and analyzed using both 1 H-NMR full region (δ 11.0-0.0 ppm) and aromatic region (δ 11.0-5.0 ppm) (Figure 4 and Supplementary Figure S17, respectively). The derived score plot showed a clear separation of Cicer from other samples, with variance coverage of R 2 = 0.95 (full range) and 0.97 (aromatic range), and a prediction goodness parameter of Q 2 = 0.94 (full range) and 0.97 (aromatic range) ( Figure 4A and Supplementary Figure S17A). The corresponding derived S-plot ( Figure 4B and Supplementary Figure S17B), showing the contributing 1 H-NMR signals, revealed that Cicer was particularly rich in sucrose (δ 3.61 and 3.71) and isoflavonoids (δ 6.99, 7.49, 7.84, and 8. 12-8.16), where axes plotted from the predictive component are the covariance p [1] against the correlation p(cor) [1].

Supervised Multivariate OPLS-DA of 1 H-NMR Data
In spite of the clear separation observed in both full and aromatic 1 H-NMR-based PCA, legume metabolite markers were further confirmed by constructing several supervised OPLS-DA models. OPLS-DA is more potent in the identification of markers by providing the most relevant variables for the differentiation between two sample groups. First, Cicer samples were modelled against other sprout samples and analyzed using both 1 H-NMR full region (δ 11.0-0.0 ppm) and aromatic region (δ 11.0-5.0 ppm) (Figure 4 and It should be noted that ellipses do not denote statistical significance but are rather added for better visibility of clusters discussed. For sample codes, refer to Table 1. S-plot ( Figure 4B and Supplementary Figure S17B), showing the contributing 1 H-NMR signals, revealed that Cicer was particularly rich in sucrose (δ 3.61 and 3.71) and isoflavonoids (δ 6.99, 7.49, 7.84, and 8. 12-8.16), where axes plotted from the predictive component are the covariance p [1] against the correlation p(cor) [1].  Table 1 for metabolite identification using 1D-and 2D-NMR.
The PCA and OPLS-DA clustering of 1 H-NMR data of legume sprouts confirmed the unique metabolite profile of Cicer in both primary and secondary metabolites which had previously already appeared in UPLC-MS and GC-MS data analyses [28]. The results suggested Cicer sprouts as a good source of estrogenic isoflavones [52].
To confirm the metabolic marker of Trigonella, appearing on the far-left side of the PCA plot ( Figure 3A), Trigonella sprout was modelled against the other sprout samples  Table 1 for metabolite identification using 1D-and 2D-NMR.
The PCA and OPLS-DA clustering of 1 H-NMR data of legume sprouts confirmed the unique metabolite profile of Cicer in both primary and secondary metabolites which had previously already appeared in UPLC-MS and GC-MS data analyses [28]. The results suggested Cicer sprouts as a good source of estrogenic isoflavones [52].
To confirm the metabolic marker of Trigonella, appearing on the far-left side of the PCA plot ( Figure 3A), Trigonella sprout was modelled against the other sprout samples and analyzed using both 1 H-NMR full-region data (δ 11.0-0.0 ppm) and aromatic-region data (δ 11.0-5.0 ppm) ( Figure 5 and Supplementary Figure S18, respectively). The derived score plots revealed a clear discrimination between Trigonella and the remaining sprouts ( Figure 5A and Supplementary Figure S18A). The corresponding S-plots ( Figure 5B and Supplementary Figure S18B) showed that 4-hydroxy-isoleucine (δ 1.24), asparagine (δ 3.84), and trigonelline (δ 9.23 and 8.88) were abundant in Trigonella. The study confirmed that Trigonella sprouts exclusively contain 4-hydroxy-isoleucine in addition to being the richest in trigonelline alkaloid, both are suggested to mediate for the potential anti-diabetic and antihyperlipidemic actions of Trigonella sprouts [56,57]. This is in accordance with our previous UPLC-MS and GC-MS analyses [28] and further confirms 1 H-NMR for absolute quantification (Table 2). and analyzed using both 1 H-NMR full-region data (δ 11.0-0.0 ppm) and aromatic-region data (δ 11.0-5.0 ppm) ( Figure 5 and Supplementary Figure S18, respectively). The derived score plots revealed a clear discrimination between Trigonella and the remaining sprouts ( Figure 5A and Supplementary Figure S18A). The corresponding S-plots ( Figure 5B and Supplementary Figure S18B) showed that 4-hydroxy-isoleucine (δ 1.24), asparagine (δ 3.84), and trigonelline (δ 9.23 and 8.88) were abundant in Trigonella. The study confirmed that Trigonella sprouts exclusively contain 4-hydroxy-isoleucine in addition to being the richest in trigonelline alkaloid, both are suggested to mediate for the potential anti-diabetic and antihyperlipidemic actions of Trigonella sprouts [56,57]. This is in accordance with our previous UPLC-MS and GC-MS analyses [28] and further confirms 1 H-NMR for absolute quantification (Table 2).  Table 1 for metabolite identification using 1D-and 2D-NMR.  Table 1 for metabolite identification using 1D-and 2D-NMR.
Vicia and Lens full-region 1 H-NMR data (δ 11.0-0.0 ppm) and aromatic-region data (δ 11.0-5.0 ppm) were modelled against each other using OPLS-DA with derived score plots (R 2 = 0.99 and Q 2 = 0.99), showing a clear separation between both sample groups ( Figure 6A and Supplementary Figure S19A). The corresponding derived S-plot ( Figure 6B and Supplementary Figure S19B) showed that Vicia was particularly rich in sugars (δ 3.44-3.76 and 4.00-4.04), which, however, may depend on growing conditions, in addition to the exclusive and more specific presence of L-Dopa (δ 3.73, 3.75, and 6.61), whereas Lens was higher in acetate (δ 1.92), in agreement with 1 H-NMR absolute quantification ( Table 2). The study succeeded in distinguishing between Vicia and Lens samples and identification of each sample marker, along with confirmation of the exclusive presence of the antiparkinsonism agent, L-Dopa, in Vicia samples, as previously revealed in UPLC-MS and GC-MS analyses [28].  Table 1 for metabolites identification using 1D-and 2D-NMR.   Table 1 for metabolites identification using 1D-and 2D-NMR.

Sprouting Procedures
The sprouting process was performed following the procedure described in Lv et al. [51]. In brief, 100 g of the dried seeds were soaked in 3 volumes of distilled water in glass containers for 8 h at 28 • C, followed by sprouting in glass dishes lined with cotton in the dark. The seeds were moistened with distilled water every 3 h during the germination process and washed twice daily for 3 days to avoid microbial growth. The seedlings were pinched, lyophilized, and then kept at −20 • C until further analysis. Sprouting was carried out in 3 independent biological replicates.

Extraction Procedure and Sample Preparation for NMR Analysis
A one-pot extraction protocol developed by Farag et al. [23] was employed for legume sprout extraction. The lyophilized and deep-frozen legume sprouts were ground with a pestle in a mortar under liquid nitrogen. The powder (120 mg) was homogenized with 5 mL 100% methanol using a Turrax mixer (11,000 RPM) 5 times for 20 s, with 1 min intervals to prevent heating. Extracts were then intensely vortexed and centrifuged at 3000× g for 30 min to remove sprout debris. 3 mL were aliquoted, and the solvent was evaporated under nitrogen until complete dryness. Dried extracts were resuspended with 800 µL 100% methanol-d 4 containing HMDS (0.94 mM final concentration), and then centrifuged (13,000× g for 1 min). The supernatant was transferred to a 5 mm NMR tube. 3 biological replicates were analyzed under identical conditions for each specimen.

NMR Analysis
All spectra were recorded on an Agilent VNMRS 600 NMR spectrometer using a 5 mm inverse detection cryoprobe, and with the following parameters: frequency 599.83 MHz, digital resolution 0.367 Hz/point, pulse width 3 µs (45 • ), acquisition time 2.7 s, relaxation delay 23.7 s, number of transients 160, zero filling up to 128 K, and exponential window function with lb 0.4. 2D-NMR spectra were recorded using standard CHEMPACK 4.1 pulse sequences (gDQCOSY, gHSQCAD, gHMBCAD) implemented in Varian VNMRJ 2.2C spectrometer software. The heteronuclear single quantum coherence spectroscopy (HSQC) experiment was optimized for 1 J CH = 146 Hz with DEPT (distortionless enhancement by polarization transfer)-like editing and 13 C-decoupling. The heteronuclear multiple bond correlation (HMBC) experiment was optimized for a long-range coupling of 8 Hz, and a two-step 1 J CH filter was used (130-165 Hz). Samples were randomly allocated in the sequence run.

NMR Quantification
For metabolite quantification using NMR spectroscopy, the peak areas of the internal standard (HMDS) and selected proton signals belonging to the target compounds were integrated manually for all samples. The following equation was applied for calculating metabolite concentrations (µg/mg dry matter):  Supplementary Table S1.

NMR Data Processing and Multivariate Data Analysis
The methodology used in this study was applied following the protocol of Farag et al. [23]. Briefly, the 1 H-NMR spectra were automatically Fourier-transformed to (.esp) files using ACD/NMR Manager lab version 10.0 software (Toronto, ON, Canada). Spectral intensities were reduced to integrated regions (buckets) of equal width, 0.04 ppm, within the region of δ = 11.4−0.4 ppm. PCA was performed with R package (2.9.2) using customwritten procedures after scaling to HMDS signal, as described elsewhere [58]. OPLS-DA was performed with the program SIMCA-P Version 13.0 (Umetrics, Umeå, Sweden). All variables were mean-centered and scaled to Pareto variance. To assess the validity of the NMR-based OPLS models, Q 2 and R 2 values of all calculated models were bigger than 0.4 and close to 1, with most models showing a regression line crossing zero, with negative Q 2 and R 2 close to 1, which signifies the model's validation. Also, the p-values for each OPLS-DA model were calculated using CV-ANOVA (analysis of variance of cross-validated residuals) and were all below p-value of 0.005 (Supplementary Figures S20-S25).

Statistical Analysis
NMR quantification data were analyzed using the Co-Stat computer program (version 8, Monterey, CA, USA). Data are expressed as mean ± standard deviation (SD) of the groups. Differences between sample groups were compared by one-way analysis of variance (ANOVA) and were considered statistically significant when p ≤ 0.05.

Conclusions
This research provided the first NMR-based metabolite fingerprinting of 4 major legume sprouts, i.e., Cicer, Lens, Trigonella, and Vicia. A total of 32 compounds belonging to various metabolite classes were identified and quantified. PCA and OPLS-DA were used for exploring the variations and determining the main markers of each sprout to be utilized in samples' authentication and future quality control. Trigonelline and 4-hydroxy-isoleucine were found more enriched in Trigonella versus higher isoflavonoids and sucrose abundance in Cicer sprout. Nevertheless, sucrose cannot be considered as a useful marker as it is both a primary metabolite and quantitatively strongly dependent on growth conditions. Vicia was characterized by the exclusive presence of L-Dopa versus acetate abundance in Lens. The aromatic region data (δ 11.0-5.0 ppm) provided a better classification model than the full-range NMR (δ 11.0-0.0 ppm) as legume sprout variations mainly originated from secondary metabolites, which can serve as chemotaxonomic markers.
Determination of the metabolite patterns at different sprouting stages should now follow to provide a better understanding of the role of these constituents in the sprouting process, and to identify the optimum time of harvest for a certain effect or metabolite enrichment level. Moreover, future work should now examine different varieties or seed origin for each legume seed to determine whether differences in sprout composition shall be observed and/or to identify accessions yielding highest targeted metabolite levels.