Metabolite annotations based on the integration of mass spectral information

A large number of metabolites are found in each plant, most of which have not yet been identified. Development of a methodology is required to deal systematically with unknown metabolites, and to elucidate their biological roles in an integrated ‘omics’ framework. Here we report the development of a ‘metabolite annotation’ procedure. The metabolite annotation is a process by which structures and functions are inferred for metabolites. Tomato (Solanum lycopersicum cv. Micro-Tom) was used as a model for this study using LC-FTICR-MS. Collected mass spectral features, together with predicted molecular formulae and putative structures, were provided as metabolite annotations for 869 metabolites. Comparison with public databases suggests that 494 metabolites are novel. A grading system was introduced to describe the evidence supporting the annotations. Based on the comprehensive characterization of tomato fruit metabolites, we identified chemical building blocks that are frequently found in tomato fruit tissues, and predicted novel metabolic pathways for flavonoids and glycoalkaloids. These results demonstrate that metabolite annotation facilitates the systematic analysis of unknown metabolites and biological interpretation of their relationships, which provide a basis for integrating metabolite information into the system-level study of plant biology.


Introduction
Large-scale biology studies supported by high-throughput data acquisition technologies require a method to bridge the gap between the data obtained and their biological interpretation. In genomics, without an analytical method to define genes, the nucleotide sequence of a whole genome is merely a series of letters (Ashburner, 2000). Using the process of annotation, by which information about the location and the number of genes and the functions of encoded proteins is inferred, researchers obtain biological meaning from the genome sequence (Stein, 2001). Metabolomics researchers are currently experiencing a similar situation to that which faced early genomics researchers. Recent progress in data acquisition technologies such as chromatography-coupled mass spectrometry has facilitated simultaneous detection and quantification of a large number of metabolite-derived peaks (Hall, 2006). However, the data obtained by high-throughput MS are merely a series of peaks without metabolite assignment. At this stage in metabolomics research, most of the peaks detected using MS cannot be assigned to identified metabolites. Such peaks are labeled as 'unknown' and usually are not characterized further. Thus the limited capability for metabolite identification has been one of the major obstacles in metabolomics (Kind and Fiehn, 2006;Wagner et al., 2003).
One approach to overcoming this obstacle is to quantify all detected peaks and compile them as un-annotated variables Roessner et al., 2001;Schauer et al., 2005). This approach, non-targeted metabolic profiling, is frequently combined with statistical correlation analysis to hypothesize biological roles for the detected metabolites (Carrari et al., 2006;Schauer et al., 2006).
Another approach to overcoming the obstacle is to create a comprehensive dataset of plant metabolites by compiling various pieces of chemical information as has been done for human metabolites (Smith et al., 2005), and to provide annotations for the metabolites. FTICR-MS is a promising candidate technology to achieve this goal. FTICR-MS measurement provides mass values with very high accuracy and resolution. This technology has been employed for nontargeted analyses of metabolites, and has demonstrated its advantage in detecting differentially expressed metabolites (Aharoni et al., 2002;Murch et al., 2004;Oikawa et al., 2006). However, despite many technical advantages, FTICR-MS has a drawback in that it is incapable of separating isomers that have the same elemental compositions. It has been demonstrated recently that coupling of liquid chromatography to FTICR-MS facilitates the effective separation of isomers . However, a comprehensive metabolite dataset using chromatography-coupled FTICR-MS has not yet been produced.
In the present study, we propose a procedure for metabolite annotation using the data obtained by high-performance LC-FTICR-MS. Tomato (Solanum lycopersicum cv. Micro-Tom) fruit was analyzed as a model plant for two reasons. First, tomato contains a number of secondary metabolites that are not present in other model plants such as Arabidopsis and rice. Second, a tomato genome sequencing project is currently underway  that will allow interpretation of metabolite data in conjunction with annotated gene functions.
Tomato metabolite data were collected in a non-targeted manner. We then compiled a dataset comprised of mass spectral features including retention time, UV/visible absorption spectrum, m/z value, m/z value of the MS/MS fragment, and relative intensity of the MS/MS fragment. These mass spectral features were attached as annotations to individual metabolites. This information allowed us to provide annotations of predicted molecular formulae for 869 metabolites. Comparison with public databases suggests that 494 of the metabolites are novel. Additionally, MS/MS fragmentation profile data allowed provision of annotations for a number of secondary metabolites with known chemical structures. We constructed a web-based database compiling the metabolite annotations (http://webs2.kazusa.or.jp/ komics/). Based on comprehensive characterization of tomato fruit metabolites, we identified chemical building blocks that appear frequently in the tomato fruit tissues. We also assigned several unknown flavonoids and glycoalkaloids to novel metabolic pathways based on the annotations of putative structures. These results demonstrate that metabolite annotation allows us to systematically analyze unknown metabolites and facilitates biological interpretation of their roles in metabolic processes.

Procedure of metabolite annotation
We developed a procedure to organize MS data in a metabolite-oriented manner, which hereafter is referred to as a metabolite annotation procedure. The procedure comprises eight sequential steps. First, the whole raw data set comprising data from successive mass scans were exported as a text file ( Figure 1a). Second, the observed m/z values of mass signals were calibrated with those of internal standards detected in the same scan (Oikawa et al., 2006) ( Figure 1b). After internal standard calibration, errors in m/z values decreased to less than 1 ppm (Table S1). Third, we grouped mass signals if the same m/z value was detected in consecutive scans, hereafter referred to as a 'peak group' (Figure 1c). An accurate m/z value for each peak group was calculated as the mean of the m/z values for the mass signals with the highest intensities (for details, see Experimental procedures). Fourth, we searched for pairs of peak groups that had m/z intervals (D) of 1.0033 and 1.9958 to identify 12 C/ 13 C 1 isotopic peak pairs and 32 S/ 34 S 1 isotopic peak pairs, respectively (Figure 1d). A peak group for the quasi-molecular ion accompanied by isotopic peaks was regarded as an individual 'metabolite'. Fifth, molecular formulae were predicted from the accurate m/z values of the metabolites (Figure 1e). To avoid obtaining obviously unnatural formulae, we surveyed elemental compositions in the DNP database (Dictionary of Natural Products). Although the results for such a survey have been reported previously (Kind and Fiehn, 2007), we checked the maximum element numbers within our mass scan range (50-1500 Da). Our survey demonstrated that 95.65% of the DNP compounds (186 788 compounds in a range 50-1500 Da) consist of C, H, N, O, P and S within the ranges C 1-95, H 1-182, N 0-10, O 1-45, P 0-6 and S 0-5. Thus, we set these as upper limits for elemental compositions in the molecular formula calculations. Sixth, we narrowed down the number of candidate formulae using the relative intensity of the 13 C 1 and 34 S 1 isotopic ions (Figure 1f). A particular advantage of LC-FTICR-MS is that the resolution is high enough to separate the 34 S 1 isotopic ion from the 13 C 2 isotopic ion. Thus, we could use the relative intensity of the 34 S 1 isotopic ion as a constraint for the number of sulfur atoms. Seventh, we manually performed the isotopic peak group assignment and in-source fragment peak group assignment (Figure 1g). Assignment of the peak groups composed of adduct ions was also performed manually in this step. After these manual curation processes, metabolites were finally designated as 'annotated metabolites'. In the eighth step, the mass spectral features (including retention time, m/z value, m/z value of the MS/MS fragment, relative intensity of the MS/MS fragment and UV/ visible absorption spectrum) and database search results were attached to each metabolite as annotations ( Figure 1h). All of the steps, except the manual curation process, are computerized. The annotated metabolites were classified using an annotation grading system ( Figure 2, see Experimental procedures).

Number of annotated metabolites in tomato fruit
We applied the metabolite annotation procedure to the MS data obtained from eight different tomato fruit tissues, comprising peel and flesh at the mature green, breaker, turning and the red stages. The number of detected mass signals ranged from 12 498 to 70 278 (Table 1). On average, 14.0 AE 3.6 mass signals were combined into one peak group. In both positive-and negative-ionization modes, 21 AE 1.7% of the peak groups were consistently assigned with the isotopic ions and recognized as metabolites. After manual curation, 57 AE 7.9% of the metabolites were provided with molecular formula annotations and designated as annotated metabolites. After removing the redundancy (h) Provision of metabolite annotations. This procedure aims to identify a putative 'metabolite', which is defined as a group of mass signals that are detected in consecutive scans to form a peak group, accompanied by isotopic ions. across samples, the total number of annotated metabolites was 869 (Table S2).
Only 3.6% of the metabolites were identified by comparison with authentic compounds (grade A, Table 1). Database searches in the DNP, KNApSAcK (Oikawa et al., 2006), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Goto et al., 2002) and MotoDB (Moco et al., 2006) revealed that 494 of the annotated metabolites were not present in the databases, suggesting that they are novel metabolites.

Qualitative analysis of metabolite composition
Based on the metabolite annotations (Table S2), we investigated the distribution of mass differences between metabolites. Given that a metabolite is generated from a preexisting metabolite by substitution of chemical building blocks, mass differences may provide insights into the types of reactions that have occurred between two metabolites. The distribution of D[m/z] values showed 'spikes', demonstrating that certain D[m/z] values occurred more frequently than others ( Figure 3; the threshold probability to identify D[m/z] spikes was determined as described in Figure S1). The D[m/z] spike profiles seen in tomato fruit samples were different from those of 10 743 compounds containing C, H and O listed in KEGG (Goto et al., 2002) (Figure 3c; for a complete list of the compounds, see Table S3). This demonstrates that the D[m/z] spikes have a sample-specific profile. The D[m/z] spikes that occurred in the tomato samples are listed in Table S4.
We then checked whether D[m/z] spikes were generated from biologically relevant metabolite pairs, i.e. that D[m/z] values were produced in combinations that reflect reaction relationships. This was achieved by inspecting the MS/MS fragmentation data (available at http://webs2.kazusa.or.jp/ komics/). Biologically relevant metabolite pairs were .053 between the fragments. In addition, several common fragments were detected in the MS/MS spectra of these two metabolites. Thus, the pair is regarded as biologically relevant. We manually inspected the MS/MS spectra of    Table 2). The D[m/z] spike profiles show tissue-and ripening stage-dependent differences ( Figure S2). To confirm the ripening stage-dependent changes, D[m/z] values between metabolites in two consecutive stages were analyzed (for details, see Experimental procedures). The analysis indicated that addition of chemical building blocks such as an amino group, caffeic acid, a C 3 H 7 NO 2 S moiety or hexose occurred frequently during ripening. According to the annotations of putative structure and database hits, these chemical building blocks are frequently associated with secondary metabolism.

Secondary metabolites in tomato
In addition to the frequently occurring mass differences, the tomato fruit metabolites analyzed using LC-FTICR-MS include diverse flavonoids and glycoalkaloids. Of the 869 annotated metabolites, 70 and 93 were assigned to the flavonoid and glycoalkaloid groups, respectively. The number of flavonoids increased during ripening (Table S5). In addition, peel tissues contained a larger number of flavonoids than flesh. Four chalcone and flavanone aglycones   [naringenin chalcone (NGC), naringenin (NG), eriodictyol (ED) and eriodictyol chalcone (EDC)] and two flavonol aglycones [kaempferol (Kae) and quercetin (Que)] were identified by MS/MS and MS 3 fragmentation patterns combined with UV/visible absorption spectra, as reported previously Iijima et al., 2008). Dehydrokaempferol glycosides, previously identified in other cultivars of tomato (Le Gall et al., 2003;Moco et al., 2006), were not detected in the Micro-Tom samples. MS/MS fragmentation patterns of the flavonoids demonstrated the occurrence of various glycosylations and acylations. Flavonoids in the chalcone/flavanone and flavonol groups showed different conjugation patterns. Conjugate moieties of NH 3 (m/z 17.027) and C 3 H 7 NO 2 S (m/z 121.020) were associated exclusively with chalcones and flavanones. On the other hand, deoxyhexose, p-coumaroyl hexose and feruloyl hexose were associated exclusively with Kae and Que.
Possible pathway relationships for the flavonoids are illustrated based on the putative structures ( Figure 5a). The modification pattern observed in the NGC pathway is quite similar to that in the EDC pathway. Likewise, the modification patterns observed in pathways starting from Kae and Que are similar to each other. The apparent similarities suggest that regulation of modification reactions may be similar between the NGC and EDC pathways and between the Kae and Que pathways. To test this, we investigated flavonoid levels in fruits of transgenic Micro-Tom lines over-expressing PAP1, an Arabidopsis transcription factor that up-regulates flavonoid pathway genes (Borevitz et al., 2000). We focused on comparison of the pairs of NGC and EDC derivatives and the pairs of Kae and Que derivatives, each of which has an identical conjugate moiety (numbered metabolites in Figure 5a). The accumulation levels of three pairs of metabolites in the NGC and EDC pathways changed in a highly correlated manner (correlation coefficient >0.6) in PAP1 over-expressing lines (Figure 5b), as did those of six pairs of metabolites in the Kae and Que pathways (Figure 5c). This suggests that pairs of genes responsible for the same modification reactions are coordinately regulated by the over-expression of PAP1. Alternatively, each pair of modifications may be catalyzed by an identical enzyme.
Most of the glycoalkaloids annotated in this study (Table S6) appear to be novel, as they were not found in the literature or public databases. The composition of glycoalkaloids showed tissue-dependent differences. Peel contained a larger number of glycoalkaloids than flesh. The composition of glycoalkaloids also appeared to change with ripening. The intensity of the mass peak of tomatine (m/z 1034.55303 [M+H] + ) was high in fruits at the mature green and breaker stages, but very weak at the red stage, suggesting that levels of tomatine decreased during ripening. On the other hand, a number of glycoal-kaloids that are larger than tomatine were detected at the red stage. According to MS n data, some of these were assigned as putative intermediate metabolites in the metabolic pathway between tomatine and esculeoside A, the major glycoalkaloid at the red stage   (Figure 6). To test whether this pathway is regulated by ripening, we investigated the accumulation levels of the intermediates in fruit tissues (containing both peel and flesh) of non-ripening (nor) and ripening-inhibitor (rin) mutants that do not exhibit ripening-associated ethylene production. The levels of metabolites upstream of C 52 H 85 NO 24 increased in nor and rin fruits in comparison with wild-type Rutgers, but the level of esculeoside A decreased remarkably (Figure 6). This indicates that the final step of esculeoside A biosynthesis is associated with developmentally regulated ripening events.

Concept of metabolite annotation
We established a metabolite annotation procedure and constructed a comprehensive metabolite annotation database to organize experimental information obtained by LC-FTICR-MS, using tomato as a model plant species. The term 'metabolite annotation' has been proposed previously to describe the process of labeling experiments with biological metadata (such as a description of actual experimental conditions) in order to help unravel the biological role of metabolites based on changes in their levels in response to genetic and environmental perturbation (Fiehn et al., 2005;Scholz and Fiehn, 2007). Their concept of 'metabolite annotation' comprises (i) mass spectral annotation and (ii) biological metadata annotation. In this study, we used the term 'metabolite annotation' to describe a procedure by which mass spectral information is provided to individual metabolites, thus our annotation procedure can be classified as mass spectral annotation.
The metabolite annotation procedure reported in this study is based on four novel concepts. First, we provided annotations to individual 'metabolites'. We identified metabolite-representing peaks systematically based on the following criteria: (i) that mass signals were detected in consecutive scans to form a peak group, and (ii) that quasimolecular ions were accompanied by isotopic ions. Second, we aimed to establish a data-driven annotation protocol for LC-MS-derived data as only a few metabolic profiling methods for LC-MS-derived data have been reported (De Vos et al., 2007;Smith et al., 2006). This is in contrast to the well-established metabolic profiling methods for GC-MSderived data (Duran et al., 2003;Fiehn et al., 2005;Tikunov et al., 2005). Third, we provided annotations for non-volatile secondary metabolites that are difficult to detect by GC-MS, which allowed us to explore a diverse range of secondary C 1 2 3 4 5 6 7 8 9 C 1 2 3 4 5 6 7 8 9 C 1 2 3 4 5 6 7 8 9 C 1 2 3 4 5 6 7 8 9 C 1 2 3 4 5 6 7 8 9 C 1 2 3 4 5 6 7 8 9  metabolites. Fourth, we introduced a grading system to describe the experimental evidence by which the annotation was supported. It should be mentioned that the metabolite annotations provided in this study are open to future curation. For example, heuristic rules for filtering molecular formulae have been proposed recently (Kind and Fiehn, 2007). In the current study, we implemented procedures equivalent to element number filtering, LEWIS and SENIOR checks, and isotopic pattern filtering, but did not implement element ratio checks or element probability checks. Thus, curation of molecular formula annotations will be feasible by applying these rules.

Limitation in complete coverage and quantification of metabolites
In this study, tomato fruit tissues were extracted using 75% w/v methanol. This method was suitable for extracting a wide range of secondary metabolites, amino acids, sugars, nucleotides and organic acids, but did not extract non-polar metabolites such as lycopene. This demonstrates that the metabolite composition detected is inevitably biased by the choice of extraction method. Thus, an appropriate combination of multiple extraction methods is needed for complete coverage of metabolites. For comprehensive profiling of the annotated metabolites, quantification depends on the measurement of mass signal intensity. However, differences in the mass signal intensity may be caused by a different degree of ion suppression, a phenomenon by which the intensity of a certain ion is suppressed by the presence of other ions. Even with LC separation prior to MS, several peaks co-eluted in single m/z scans. We performed semi-quantitative analyses of flavonoids and glycoalkaloids based on comparison of the relative mass signal intensities of an identical metabolite across samples (Figures 5b,c and 6). To minimize the possibility that mass signal intensity was affected by different degrees of ion suppression, we checked (i) whether the mass signal intensity is proportional to the UV/visible absorbance, (ii) whether the profile of ions co-eluted with the target ion is similar, and (iii) whether ion suppression is observed in the intensity of co-injected internal calibration standards. Further study is needed to estimate the extent to which ion suppression affects the quantification.

Novel metabolites in tomato fruit
Comparison of 869 annotated metabolites with compounds registered in public databases revealed that 494 of the annotated metabolites appear to be novel. Putative structures for the novel metabolites can be predicted from the annotations of MS/MS fragmentation data. This was particularly effective in predicting putative structures of novel flavonoids and glycoalkaloids. In the flavonoid group, an unknown moiety, C 3 H 7 NO 2 S (m/z 121.020), was found as conjugates with NGC, NG and ED. Its predicted molecular formula matched that of cysteine. It has been reported that cysteine forms a conjugate with epicatechin when procyanidins depolymerize in the presence of cysteine (Torres et al., 2002). However, cysteine conjugates of chalcones and flavanones have not been reported. Structural identification of the moiety will be required to understand the biosynthesis of C 3 H 7 NO 2 S conjugates. Modification of flavonoids has been attracting attention as the biological effects of flavonoid conjugates depend on the nature of the conjugate moieties. The tomato flavonoids found in the present study provide an experimental basis to search for novel functional flavonoids, and to elucidate unknown mechanisms of flavonoid modification. In the glycoalkaloid group, our results indicated the presence of novel glycoakaloids with m/z values larger than the maximum molecular mass (1271 Da) of tomato glycoalkaloid reported so far (Ono et al., 2006) ( Table S6). Most of these novel glycoalkaloids appeared after the onset of ripening. This suggests that glycoalkaloid metabolism is active during fruit ripening, and that glycoalkaloids play unidentified physiological roles in the ripening fruit.
Carotenoids, another major secondary metabolite group in tomato, were not detected under our experimental conditions. Development of a metabolite annotation method for MS data obtained in atmospheric pressure photoionization mode, which efficiently ionizes non-polar metabolites including carotenoids, is currently underway.

Reaction and pathway relationships
Metabolite annotations aid our understanding of mechanisms controlling metabolism from chemical and biological points of view. From a chemical point of view, metabolite annotations provide detailed chemical information for each metabolite, which will serve as a basis for identifying unknown metabolites. From a biological point of view, metabolite annotations provide a basis for elucidating biological relationships between metabolites, such as reaction and pathway relationships.
To obtain insights into reaction relationships between metabolites, we performed mass difference analysis. Several D[m/z] values occur frequently in metabolites from tomato fruit, suggesting that chemical building blocks corresponding to those D[m/z] values appear frequently in tomato fruit metabolites. It should be emphasized that signal intensities were not taken into consideration in this analysis. Thus, when we state that certain D[m/z] values occur frequently, this does not mean that the accumulation levels of these metabolites are high. Nevertheless, mass difference analysis combined with inspection of MS/MS spectra annotations provides an efficient way to study metabolites relating to a reaction of interest.
To understand the metabolic pathway relationships between annotated metabolites, we arranged flavonoids detected in this study into metabolic diagrams (Figure 5a). These demonstrate that the modification patterns between the NGC and EDC pathways and between the Kae and Que pathways, respectively, are similar to each other. When the flavonoid pathway was up-regulated by over-expression of PAP1, changes in the relative accumulation levels of several pairs of metabolites with identical conjugation patterns were highly correlated (Figure 5b,c). This result demonstrates that genes responsible for each pair of modification reactions are coordinately regulated by PAP1. Alternatively, identical enzymes may use both Kae and Que derivatives as sub-strates, as reported previously for flavonol glycosyltransferases (Jones et al., 2003;Yonekura-Sakakibara et al., 2007). For glycoalkaloids, a biosynthetic pathway from tomatine to esculeoside A  was illustrated (Figure 6). By analyzing fruits of nor and rin mutants, we have demonstrated that the reaction step between C 52 H 85 NO 24 and esculeoside A is regulated by the occurrence of ripening, which is developmentally controlled by NOR and LeMADS-RIN (Giovannoni, 2004). These results demonstrate that the metabolite annotation procedure is a powerful approach for producing hypotheses with respect to unknown metabolic pathways.

Possible link between metabolite annotations and integrated 'omics' study
Further insights into the regulation of metabolite biosynthesis will be obtained by the integration of metabolomics data with other 'omics' data. A parallel analysis of metabolites and transcripts is a promising approach to achieve this goal (Hirai et al., 2004;Nikiforova et al., 2005;Tohge et al., 2005;Urbanczyk-Wochniak et al., 2003). Another promising approach involves combination of metabolite analysis with genetic analysis such as quantitative trait loci (QTL) analysis (Keurentjes et al., 2006;Morreel et al., 2006;Schauer et al., 2006). In such approaches, the metabolite annotation plays a complementary role to the metabolic profiling in linking metabolite information to other 'omics' information. By contrast to quantitative metabolic profiling, annotations of mass spectral features facilitate qualitative characterization with respect to identity, structural similarity and biochemical relationships between the metabolites. This assists in inference of biological meanings from metabolic profiling combined with other 'omics' data. Additionally, new metabolites predicted by the metabolite annotations will be included in multi-'omics' pathway tools (Thimm et al., 2004;Tokimatsu et al., 2005;Zhang et al., 2005), and expand our knowledge about unknown metabolic pathways. Metabolite annotations provide firm foundations for integrating chemical information regarding metabolites into a system-level study of plant metabolism.

Plant materials
Seeds of cultivated tomato (S. lycopersicum cv. Micro-Tom) were sown in pots (500 ml) filled with a mixture of vermiculite and Powersoil (mix ratio 1:1, Kureha Chemical Industries, http:// www.kureha.co.jp/ and Kanto Hiryou Industries, http://www. okumurashoji.co.jp/). Until germination, seeds were covered with plastic film and kept in the dark at 25°C. After 4 days in the dark, they were grown with a photoperiod of 16 h light (80 lmol m )2 s )1 )/8 h dark at 25°C. Hyponex Ò (Hyponex Ltd, http://www.scotts.com/) at 1000-fold dilution was applied to plants once a week. Fruits at the mature green (G, approximately 30 days after anthesis), breaker (B, approximately 35 days after anthesis), turning (T, approximately 38-40 days after anthesis) and red (R, approximately 45-48 days after anthesis) stages were harvested. A vector construct expressing Arabidopsis PAP1 under the control of the CaMV 35S promoter (Tohge et al., 2005) was provided by K. Saito (Chiba University, Japan). Transformation of Micro-Tom was performed according to the protocol reported previously (Sun et al., 2006). Seeds of wild-type Rutgers (LA1090) and the nor (LA3013) and rin (LA3012) mutants were obtained from the C.M. Rick Tomato Genetic Resource Center (University of California, Davis, CA, USA).

Metabolite extraction
The peel and the flesh of tomato fruit were separated using a razor blade. Each sample was sliced, immediately frozen in liquid nitrogen and ground to powder using a Shake Master homogenizer (Biomedical Science, http://www.bmsci.com). Powdered samples (50-70 mg) were extracted with three volumes of methanol containing formononetin (20 lg ml )1 ) as an internal standard. After homogenization using a Mixer Mill MM 300 (Qiagen, http:// www.qiagen.com/) at 27 Hz for 2 min twice, homogenates were centrifuged (12 000 g, 10 min, 4°C). The supernatant was filtered through 0.2 lm PVDF membrane (Whatman, http://www.whatman. com), and the filtrate was used for LC-FTICR-MS analysis.
To monitor HPLC elution, a photodiode array detector was used in the wavelength range 200-650 nm. The ESI setting was as follows: spray voltage 4.0 kV and capillary temperature 300°C for both positive-and negative-ionization modes. Nitrogen sheath gas and auxiliary gas were set at 40 and 15 arbitrary units, respectively. A full MS scan with internal standards was performed in the m/z range 100-1500 at a resolution of 100 000 (at m/z 400).
A mixture of internal calibration standards dissolved in 50% v/v acetonitrile was introduced by a post-column method at a flow rate of 20 ll min )1 . The concentration of each standard in the mixture was as follows: for positive mode, ). MS/MS and MS 3 fragmentation were carried out at a normalized collision energy of 35.0% and a isolation width of 4.0 (m/z), and were obtained by ion trap mode. Relative accumulation levels of flavonoids and glycoalkaloids were estimated by dividing the peak area of the metabolite by that of internal standard (formononetin).

Metabolite annotation procedure
A program written in Microsoft VC ++ was used to export the raw data (XRAW) file of each single run as a text file. The output file includes retention time, scan number, m/z value and their intensities. To discriminate mass signals from baseline noise, mass signals whose intensities were more than three times the baseline level of each scan were selected. Next, m/z values of all ions in each scan were bulk-calibrated with observed m/z values of internal calibration compounds in the same scan using the computational tool DrDMASS (http://kanaya.naist.ac.jp/DrDMASS/, Oikawa et al., 2006). By using internally calibrated m/z, if the m/z were obtained in more than 30% of the total mass scans, those mass signals could be regarded as artificial noise and thus excluded from further analyses. After removing noise, all data were collected as a Microsoft Excel file. The quasi-molecular ions detected with a 13 C isotopic ion in the scan at an m/z value that was 1.003 greater were selected. After sorting mass signals by scan number, those detected in more than three consecutive scans were selected and grouped. If a peak group consisted of three or four mass signals, an accurate m/z value for the group was obtained as the mean m/z value for the three or four mass signals. If a peak group consisted of five or more mass signals, an accurate m/z value was obtained as the mean m/z value for the five most intense signals. For the peak group whose intensity was more than 1 000 000, m/z values for the highest intensity signals were not used for the mean value calculation. Instead, a mean value was calculated using the m/z values of mass peaks whose intensities were just below 1 000 000. Molecular formulae that matched a given accurate m/z value were determined as follows. A library of molecular formulae with all possible elemental combinations whose theoretical m/z matched the input m/z with 1 ppm tolerance was generated using elements C, H, N, O, P and S. To screen the library for chemically possible molecular formulae, all formulae were tested for whether they met following criteria (Senior, 1951): (i) the sum of valences is an even number, and (ii) the sum of valences is greater than or equal to twice the number of atoms minus 1. The accurate m/z was used for molecular formula calculation. Upper limits of 95 for C, 182 for H, 10 for N, 45 for O, 6 for P and 5 for S were used for calculation of formulae. In addition, the relative intensity of the 13 C 1 isotopic ion was calculated. The number of carbons in the molecular formula was estimated using the following equation: n ¼ ð 13 C 1 isotopic ion intensity= 12 C isotopic ion intensityÞ Â ð0:9893=0:0107Þ where n represents the number of carbons. The tolerance for relative intensity was set at 5%. Chemically possible molecular formulae and the relative intensities of the isotope ions were calculated by programs written in Java. The library of molecular formulae was constructed using MySQL. A Java program was developed to search the molecular formula library for molecular formulae matching the criteria described above. Any peak group that is selected based on these criteria is defined as a metabolite. The analysis was repeated three times for each tomato fruit tissue. When a metabolite was detected in two or more repeats, it was regarded as 'present' in that tissue. Computational assignment of peak groups of isotopic ions to the parental metabolite was rechecked manually. Assignments of fragment ions and adduct ions to the parental metabolite were performed manually. Peak groups composed of adduct ions produced during ionization were assigned using two criteria as follows.  (Svatos et al., 2004) and [2M+H] + ). Second, retention time was checked to determine whether the adduct ions co-eluted with the proton adduct ion. In negative-ionization ESI mode, formic acid adduct ions ([M+HCOO] ) ) were frequently produced together with [M-H] ) ions, and were assigned using the same criteria. Metabolite annotations were provided for the adduct ion species with the highest intensity, i.e.
[M+H] + and [M-H] ) in positive-and negative-ionization ESI modes, respectively, for the majority of the metabolites detected in the present study (Table S2). After these manual curation processes, metabolites were designated as 'annotated metabolites'.

Annotation grading system
To each metabolite, an annotation grade was added to describe the evidence supporting the annotations for that metabolite (Figure 2). First, annotations were classified into two grades (A/B versus C) according to whether a single molecular formula was obtained or not. Grades A and B were further classified according to whether the mass spectral attributes of the metabolites matched those of standard chemicals or not. In grade A, annotations were verified by comparison with standard chemicals. In grade B, annotations were assigned with single molecular formulae but lacked verification by standard chemicals. Annotations in grade B were classified into eight sub-grades according to the availability of MS/MS, k max and reference information. In grade C, multiple molecular formulae were assigned to each metabolite. Annotations in grade C were

Supplementary Material
The following supplementary material is available for this article online: Figure S1. Comparison of threshold probability values to detect D[m/z] spikes. Figure S2. Tissue-dependent occurrence of D[m/z] spikes. Table S1. m/z values for five major metabolites in tomato fruits before and after the internal standard calibration. Table S2. Mass spectral data and metabolite annotations of the metabolites in Micro-Tom fruit tissues. Table S3. Molecular formulae and molecular weights of KEGG CHO compounds. Table S4. D[m/z] spikes at the threshold probability of 40-fold standard-deviation level. Table S5. Flavonoids in Micro-Tom fruit tissues. Table S6. Glycoalkaloids in Micro-Tom fruit tissues. This material is available as part of the online article from http:// www.blackwell-synergy.com. Please note: Blackwell publishing are not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.