Safe composition levels of transgenic crops assessed via a clinical medicine model

Re-use of this article is permitted in accordance with the Creative Commons Deed, Attribution 2.5, which does not permit commercial exploitation. Substantial equivalence has become established as a foundation concept in the safety evaluation of transgenic crops. In the case of a food and feed crop, no single variety is considered the standard for safety or nutrition, so the substantial equivalence of transgenic crops is investigated relative to the array of commercial crop varieties with a history of safe consumption. Although used extensively in clinical medicine to compare new generic drugs with brand-name drugs, equivalence limits are shown to be a poor model for comparing transgenic crops with an array of reference crop varieties. We suggest an alternate model, also analogous to that used in clinical medicine, where reference intervals are constructed for a healthy heterogeneous population. Specifically, we advocate the use of distribution-free tolerance intervals calculated across a large amount of publicly available compositional data such as is found in the International Life Sciences Institute Crop Composition Database.


Introduction
Substantial equivalence has become established as a foundation concept in the safety evaluation of transgenic crops. If the composition (nutrients, antinutrients, etc.) of a transgenic crop is found to be equivalent to that of non-transgenic varieties of the same crop, and those crop varieties are considered safe, then further safety assessment of the transgenic crop can focus solely on the intended modification, usually the expression of a transgenic protein that is novel in that crop [1]. A number of statistical approaches have been used to compare the composition of transgenic crops with their conventional counterparts , and new methods have recently been suggested [29,30]. However, the concept of substantial equivalence has been adopted in the area of clinical medicine for a much longer period compared with its relatively short applica-tion to the field of transgenic crops, so it seems wise to learn from this experience. Here we discuss how the issue of equivalence has been dealt with in the area of clinical medicine, and suggest an analogous approach for evaluating substantial equivalence for transgenic crops. Specifically, we suggest how reference intervals should be calculated for evaluating the substantial equivalence of new transgenic crops relative to existing crop varieties that have a history of safe use. We use the term crop variety here to encompass both inbred lines and hybrids.
Bioequivalence is a common concept in the field of clinical medicine. It is an approach that is typically applied to the evaluation of new generic drugs. The intent of such bioequivalence studies is to compare the performance and bioavailability of a new generic drug with the performance of a commercially available brand-name drug. Equivalence limits are constructed based on arbitrarily set deviations (e.g., ±20% of the performance of the brandname drug) or the variability in the response observed when the brand-name drug is administered (statistical equivalence limits). The performance of the candidate generic drug is then examined to see

Safe composition levels of transgenic crops assessed via a clinical medicine model
if it performs within these equivalence limits [31]. These limits are centered on the average performance of the brand-name drug. This is an appropriate approach because the generic and brand-name drug are expected to have the same average performance.
Although this approach has been suggested for evaluating the substantial equivalence of transgenic crops [29,30], the aforementioned pharmaceutical situation is fundamentally different from that of transgenic-crop composition comparisons. Unlike the pharmaceutical situation, no single variety of a crop is considered the benchmark for safety or nutrition. Rather, a large number of crop varieties are considered safe and nutritious. Furthermore, different crop varieties often have distinct compositional profiles, so that there is an expectation that any single variety, whether transgenic or not, would have a composition that differs from the average composition across all varieties. Therefore, constructing equivalence limits around the average composition across a number of different crop varieties that are each considered safe is not useful for understanding the safety of an individual variety. In fact, if many crop varieties are used to construct statistical equivalence intervals, then many of the individual varieties used to construct the interval will fall outside of the interval. This clearly illustrates the inappropriateness of this approach for evaluating the safety of transgenic crops.
Another concept that is also widely applied to the area of clinical medicine is a better model for the safety analysis of transgenic crops. It is common in the medical field to test individual patients for the presence of analytes (e.g., disease markers or blood chemistry) and to assess whether such results are normal. As in the previous case, intervals are constructed to use as a frame of reference to judge individual patient results [32]. Such intervals may be based on previous results with diseased patients, or more commonly, are based on responses from a population of healthy individuals. We generally do not have crop varieties that are considered unsafe, but for a small number of crops and analytes, such varieties exist. For example, unsafe levels of glycoalkaloids in a non-transgenic variety of potatoes led to intoxication upon consumption, as did cyanogenic compound levels in a non-transgenic lima bean variety [6,33]. For this reason, new potato and lima bean varieties, whether transgenic or non-transgenic, are routinely tested for these compounds before commercial release. However, for most compositional constituents, unsafe levels are not known to exist in food crops. Thus, in the vast majority of cases, each and every crop variety is considered safe. This is analogous to the field of clinical medicine in which a population of healthy individuals may be used to construct intervals for evaluating the test results of an individual patient [32]. Thus we can look to this example to gain insight into how intervals describing a safe/normal population should be constructed.
There are three common types of statistical intervals: confidence, prediction, and tolerance [34]: 1. Confidence limits describe the interval of certainty around a mean value (or mean difference between two groups). In the bioequivalence example above, if a new generic drug was to produce results within the confidence limits for a standard brand-name drug, then one would not conclude that they are different. These limits are sometimes considered equivalence limits, and falling within them is sometimes asserted to demonstrate equivalence [31]. This approach has merit where two treatments are being tested for having equivalent mean responses (or a zero mean difference between treatments) and/ or variability. 2. Prediction intervals estimate the probability that a new sample from the same population will fall within the estimated limits. This measure is rarely used for evaluating equivalence. 3. Tolerance limits describe the interval that is expected to contain a certain specified proportion of the population with a specified level of certainty. For example, one can calculate a tolerance interval that is expected to contain at least 99% of the population with 95% certainty. While a confidence interval will approach a zerowidth as the sample size increases to infinity (reflecting the true population mean), tolerance limits converge on the values that contain the specified proportion of the population as the sample size increases. Tolerance limits are used in a number of fields, including clinical medicine, to evaluate whether or not a new response is normal for a healthy individual. For example, if a patient is tested for the presence of a cancer marker, results might be compared with tolerance intervals generated from results with a population of healthy individuals [35]. Results outside the specified tolerance interval indicate that further diagnostics should be conducted. Of these, tolerance intervals are the most appropriate for evaluating whether or not a transgenic variety is within the normal range for commercial varieties of the same crop.
One concern with calculating reliable intervals, including tolerance intervals, is obtaining a sufficiently large sample size. If too small a sample size is used to generate a tolerance interval, it is likely to contain far more coverage than specified. This is a consequence of the certainty level that is desired, and is similar to confidence intervals that are very wide for small sample sizes. This situation is especially problematic for intervals that are designed to cover a large proportion of the population with a high degree of certainty. In these cases, if the sample size is too low, the tolerance interval will be too wide to be of practical value [36]. The appropriate sample size for constructing tolerance intervals is also affected by the assumed underlying distribution of the data from which it is calculated. For a normal distribution, a minimum of 120 points is recommended for a 95%-coverage, 90%-certainty, tolerance interval [32]. Higher numbers would be needed for a useful 99%-coverage, 95%-certainty, tolerance interval. When estimating high-coverage tolerance intervals, accurately defining the underlying data distribution is especially important, because the intervals will be very sensitive to deviations from the assumed distribution in the tails of the distribution, where few (if any) points are available [37]. An alternative approach is to calculate distribution-free tolerance intervals. Such intervals are robust if an adequate sample size is used, but this approach requires large sample sizes. For example, a minimum of 473 data points are needed for calculating a 99%-coverage, 95%-certainty, distribution-free, tolerance interval [34,38].
The International Life Sciences Institute (ILSI) has compiled a large database of compositional results for many non-transgenic varieties of a few widely planted crops [39]. This resource provides an opportunity to calculate valid high-coverage tolerance intervals for many compositional components found in these crops. Here we present these tolerance intervals for corn, cotton, and soybean, and discuss the merits of using these intervals to evaluate the substantial equivalence of transgenic crops compared with conventional crop varieties.

Tolerance intervals
Distribution-free tolerance intervals (99%-coverage, 95%-certainty) for various compositional components of corn seed that are available in the ILSI crop-composition database are compiled in Table 1, along with the sample sizes used to construct them. Sample sizes for some components of corn (Table 1) and all components of cotton ( Table 2) and soybean (Table 3) were less than the minimum needed to calculate 99%-coverage, 95%certainty tolerance intervals (N<473). Based on the sample size, distribution-free methods were used to calculate alternative tolerance intervals where the certainty with which the range of the corn, cotton, and soybean data captured 99% of the population. This latter calculation is equal to the 99%-coverage, distribution-free tolerance interval at the specified certainty. For example, the 99%-coverage, 83.4%-certainty, distribution-free tolerance interval for ash in soybean seed is 3.89-6.99% dry weight ( Table 3, row 1). These intervals should conservatively capture the safe levels of these compositional components in the seeds of these crops, since 100% of commercial corn, cotton, and soybean varieties are considered compositionally safe. The calculation of certainty with which each tolerance intervals captures 99% coverage provides a measure of the robustness of the interval. As described earlier, the approach of using tolerance intervals to describe the range of response variables expected when testing a healthy population has precedence in the area of clinical medicine [32,35,43]. This approach has also been used to compare the compositional and nutritional equivalence of transgenic crops with populations of nontransgenic crops [9,10,12,19,21,24,25,35,39,44,45]. However, large sample sizes are required to calculate tolerance intervals that are not so wide as to be of little practical value [36]. Furthermore, the construction of tolerance intervals is very sensitive to deviations from the assumed distribution especially for high-coverage, high-certainty intervals like those typically constructed [37]. For this reason, we used the publicly available data in the ILSI crop-composition database to construct useful 99%-coverage, 95%-certainty, distribution-free, tolerance intervals where possible (N ≥473). In cases where the sample size was insufficient to calculate 99%-coverage tolerance intervals with 95% certainty, the certainty with which the range of the data covers at least 99% of the population was calculated.
We used all data for each analyte in the ILSI crop-composition database and did not segregate the data by the analytical method used to determine the compositional component, or by the laboratory used to analyze the samples. Only validated methods that are comparable should be used to determine the composition of crops if such data are to be used for a safety assessment. This validation process should include spike-recovery experiments that demonstrate that the method is able to recover an adequate proportion of the analyte, and each laboratory should validate its ability to carry out the analyses successfully. The methods used to determine compositional analytes compiled in the ILSI crop-composition database are accepted, validated, and independently-developed [39]. Furthermore, the acceptance criteria required by ILSI for including data are stringent and potential outliers are confirmed as valid before entry into the database [39]. As such, all data in the database, regardless of the analytical method, should be comparable.
However, some data in the ILSI crop composition database may display a bimodal distribution correlated with the method of analysis (e.g., vitamin B1). In these cases, we recommend that single homogeneous subsamples of plant tissue be sent to the different laboratories conducting the analyses in question, along with additional subsamples that have been fortified with a well-characterized purified preparation of the analyte in question. The characterization of the purified standard should include an absolute purity estimate based on the best methods available, and if possible, be verified by an additional analytical method. If the laboratories obtain equivalent results for these samples, then the distribution of values in the database may represent true differences in the analyte concentrations between the germplasm sources sent to each laboratory. If the laboratories obtain different results for the subsamples sent to each laboratory, then it will be possible to subtract the results for the non-fortified sample from those of the fortified sample and determine the accuracy of each laboratory or method. Finally, if the accuracy of the laboratories for predicting the correct quantity of fortification is good for both laboratories, but results from the unfortified samples differ, the laboratory with the lower results for the non-fortified samples may have inferior extraction methods. Since the units associated with analytes in the database are absolute, meaningfully inaccurate results should be removed from the database, or the units changed to an index scale. It is also important to be sure that subtle differences in the actual analyte being measured are not causing differences. If this is the  case, then more explicit analyte names should replace the current names to more clearly segregate the data. We also included all geographies and growing seasons in our datasets, since the compositional safety of corn, cotton, and soybean is not known to be compromised when these crops are grown in any geography or environment, whether consumed locally or in other regions. This differs from the model sometimes used in clinical medicine where tolerance limits may be generated regionally, because results for a healthy subpopulation in one region may indicate disease in another subpopulation in a different region. As indicated above, this does not apply to crop varieties that are safely grown and consumed worldwide.
In addition to the advantages previously described for distribution-free tolerance intervals, a couple of additional points are worthy of mention. Many of the data in the ILSI crop-composition database were collected from replicated field trials like those used to evaluate transgenic lines. Therefore, any bias in the sampling of such data should be roughly equivalent between these two groups of data, making the data distributions similar between groups. However, it must be acknowledged that, like previous studies in the clinical field and those used to evaluate the substantial equivalence of transgenic crops in the past [9,10,12,19,21,24,25,35,39,44,45], sample results may not represent truly random independent samples, and correlations within the samples likely exist, theoretically reducing calculated tolerance intervals such that they do not span the designated coverage. As such, the tolerance intervals reported here may be more conservative than those generated from truly random samples. Furthermore, it is important to investigate the distribution of samples collected from specific field trials when comparing them with the tolerance intervals reported here to check the assumption that both datasets appear to be distributed similarly.
An addition advantage of tolerance intervals is simplicity of interpretation. It is easy to understand the coverage encompassed in tolerance intervals and the degree of certainty that one has about this content. The ability to work in the natural units of analyte concentration, as opposed to transformed units that might be applied in an attempt to normalize datasets for a parametric analysis, also simplifies interpretation of results. Concentrations of analytes can be directly compared with literature pertinent to their safety or nutrition without backcalculation of transformed values. Finally, data indicating analyte concentrations below the level of detection or quantification do not need to be cen-sored or assigned "dummy" values for reporting, because distribution-free tolerance intervals are based on the rank of responses, not the responses themselves. This further simplifies analysis and interpretation of results.
Here we have applied techniques analogous to those used in the field of clinical medicine to estimate the normal range of analytes in several non-transgenic crops. The concept of substantial equivalence has been used in the field of clinical medicine for much longer than its fairly recent application to transgenic crops, so it seems natural to make use of the progress made in this area. However, it is noteworthy that tolerance intervals are used in the medical field because it is not possible to ensure that the reference population contains 100% healthy individuals. By using a limited-coverage tolerance interval that excludes a small proportion of subjects, potentially diseased individuals are excluded from what is considered normal. For many crops, such as corn, cotton, and soybean, no unsafe crop varieties are known. As such, the use of 99%-coverage, tolerance intervals to assess safe levels of crop components may be unnecessarily conservative. By way of example, if the US produces 10 billion bushels of corn in any given year and 1% of this crop is considered to be of questionable safety based on composition, this would result in the production of 100 million bushels of potentially unsafe corn in the US every year. Unless invalid data are present in the reference database, all determined levels should be safe, and the range of the data is an appropriately conservative interval to use for assessing safety.
In reality, the range of the data in the ILSI database may also be too conservative, because many varieties are not represented in this database, and studies with some non-transgenic varieties indicate that their composition is frequently outside of this range [23]. In addition, the sample sizes for many crop-analyte combinations are not sufficiently robust to capture an adequate cross-section of the expected variability across all varieties. For these reasons, compositional equivalence studies typically include a concurrently grown non-transgenic, near-isogenic line, and sometimes include various other commercial reference lines. Such lines can be used to supplement the range of responses found in the ILSI database. The composition of samples collected from transgenic varieties can be compared with intervals constructed from the values tabulated in the ILSI crop-composition database and from these concurrently grown reference lines to evaluate substantial equivalence. Traditional analysis of variance approaches comparing concurrently grown controls with trans-genic lines may also be useful in assessing whether or not varietal differences are statistically significant. Finally, the safety consequences of any differences will need to be assessed in the context of biological impact. It is important to understand that compositional equivalence studies with transgenic crops are typically conducted to inform the safety assessment, and are not conducted to detect minor changes from the non-transgenic isogenic line. While such changes may be of academic interest, they do not suggest a safety risk if compositional components are within the normal range for a crop that is safely consumed regardless of the variety. It is also noteworthy that perfect isogenic lines are never actually available because native genes closely linked to the transgenic traits will always be present in the transgenic line at higher frequencies than in the non-transgenic isogenic control, and these genes will likely result in some compositional differences between the transgenic line and the non-transgenic isogenic line. However, this phenomenon is likely more dramatic when polygenic traits are selected in traditional breeding programs with non-transgenic crops.

Value of compositional analyses in safety assessment
It is unclear how the insertion of a novel gene would disrupt the genome causing an unsafe perturbation of composition in a fundamentally different manner than that experienced during traditional breeding or mutagenesis, and the current literature supports the concept that agronomically acceptable varieties containing transgenic insectresistance genes and herbicide-tolerance genes are not particularly prone to such changes . A long history of crop improvement, resulting in very few adverse health effects, suggests that our current food crops are not generally prone to upregulation of detrimental constituents. In fact, this attribute of these plant species likely contributed to their selection and persistence as food crops. Furthermore, agronomically "off-types" are culled from any breeding program, whether traditional or transgenic. Compositional analysis is, none the less, required by most governments for approval of transgenic plants, but not for non-transgenic crop varieties.
Regulation of non-transgenic crop composition was attempted in the early 1970s when the FDA enacted similar but much less aggressive regulation in the area of traditional crop breeding [46]. However, the regulations were impractical and unenforceable, and today, have been largely forgotten. In addition, novel food regulations are in place in sev-eral geographies, but these do not generally extend to new non-transgenic varieties of food crops unless expressly bred to have an altered composition [47].
Compositional equivalence studies have added little to the safety assessment of currently available transgenic crops since unsafe levels of compositional components have not been identified . However, new transgenic crops are in development that are expressly intended to have modified composition. While the likelihood of altering the safety of transgenic crops through DNA-insertional effects may be lower than for traditional breeding [16,18,20,22,39], the safety assessment for new transgenic crops bearing traits intended to alter endogenous metabolic pathways may be aided by hypothesis-driven compositional analyses.

Concluding remarks
The use of tolerance intervals using appropriate sample sizes, and covering many varieties and environments, represents a valid statistical approach for assessing the composition of transgenic crops in relation to their conventional counterparts. For crops that are not known to contain unsafe levels of compositional components, the range of compositional data for commercially available varieties is an adequately conservative safety interval.This approach has the most value for assessing the safety of traits intended to alter endogenous metabolic pathways in plants, but compositional analysis for input traits is generally not warranted. To support these methods, the continued submission of quality data to the ILSI crop composition database is strongly encouraged, especially where the sample sizes are insufficient to calculate distribution-free 99%-coverage 95%-certainty tolerance intervals (N<473).
Statistical approaches to data analysis are almost universally required when reporting data to regulatory agencies or in peer-reviewed journals. Here we describe the application of a statistical approach used in clinical medicine to the evaluation of substantial equivalence of transgenic crops and non-transgenic crops, and suggest that the greater experience in the field of clinical medicine should make this model the standard against which other approaches are compared. We describe the methods used to construct 99%-coverage, 95%-certainty tolerance intervals, and also how to determine the certainty that the range of data for non-transgenic crops covers 99% of the data. Both types of tolerance intervals should be useful in complying with the need to present statistical measures of compo-sitional equivalency to support the safety assessment of transgenic crops, and Tables 1-3 should be a handy resource for comparing the composition of new transgenic corn, soybean, and cotton varieties with conventional comparators. While beyond the scope of this publication, the methods and intervals reported here can be compared with those reported elsewhere using alternative methods.