Genomic regions under selection in crop-wild hybrids of lettuce: implications for crop breeding and environmental risk assessment

Genomic selection patterns and hybrid performance influence the chance that crop (trans)genes can spread to wild relatives, which may have implications for the methodology of Environmental Risk Assessment (ERA). We performed QTL analyses on fitness(-related) traits in two different field environments and estimated the fitness distribution of early-and late-generation hybrids relative to the wild parent. We used two different lettuce crossing populations: Backcross (BC 1 ) lines from a cross between a Dutch Lactuca serriola and the cultivar L. sativa cv. Dynamite, and a Recombinant Inbred Line (RIL) population from a cross between the cultivar Lactuca sativa cv. Salinas and a Californian L. serriola . We detected consistent results across field sites and crosses for a fitness QTL at linkage group 7, where the wild allele conferred a selective advantage through early flowering. Other fitness QTL detected across field sites were located on linkage group 6, with the wild allele conferring a selective advantage for BC 1 , whereas RIL fitness QTL were located on linkage group 5, with the crop allele conferring a selective advantage. The average fitness of the hybrid offspring was lower than the fitness of the wild parent, but several individual BC 1 lines and RILs outperformed the wild parent, especially in the site with a clay soil, which is not a common habitat for L. serriola . For the BC 1 lines, this may partly be due to heterosis effects, whereas in the homozygous RILs transgressive segregation played a major role. These results show that the study of genomic selection patterns can identify crop genomic regions that are under negative selection in multiple environments and crop–wild crosses, that might be applicable in transgene mitigation strategies. At the same time results were cultivar specific, so that implementation in ERA will need to be on a case-by-case basis, which decreases its general applicability. Importantly, it is more informative to identify specific genomic regions under selection than average hybrid fitness, because there is a high chance that some transgressive phenotypes will outperform the wild parent, even if the average fitness of the hybrid offspring is lower.


Introduction
The chance of crop alleles to introgress into their wild relatives is highly dependent on genetic and environmental selection patterns (Barton 2001;Stewart et al. 2003).For crop alleles to become permanently established in the wild population after single hybridization events, hybrid genotypes should confer a selective advantage in a particular environment (Burke and Arnold 2001;Rieseberg et al. 2007).Such patterns that affect the outcome of hybridization are not only interesting from a theoretical point of view (Rieseberg et al. 2000;Burke and Arnold 2001;Burger et al. 2008), but are also of high interest to Environmental Risk Assessment (ERA) of transgenic crop species, given the potential of crop-wild hybrids to outperform the wild parent (Schierenbeck and Ellstrand 2009).
Introgression of crop genes into a recipient population starts with F 1 hybrids, with equal contributions of crop and wild genomes, genome-wide heterozygosity, and strong linkage disequilibrium (LD).In subsequent generations, a range of new genotypes is formed as a result of recombination and segregation in meiosis and the creation of new individuals by outcrossing or selfing (Stewart et al. 2003;Kwit et al. 2011).However, since the genetic background changes rapidly in the first phases of the introgression process, selection patterns may differ between early-and late-generation hybrids, as well as among individual plants within a certain category of hybrids (Barton 2001).For ERA, it is of specific interest to what extent genomic selection patterns can be generalized across different cultivars and whether the performance of hybrids differs between early-and late-generations and different environments (EFSA 2011).
The performance of crop-wild hybrids can differ depending on the cultivar and wild parental lines used to produce specific crosses.In experiments employing crop-wild hybrids from several crosses with different parental lines, variation was found in life history and fitness traits, such as germination, seed production, and survival between different crossing populations in oilseed rape (Hauser et al. 1998a, b), sunflower (Mercer et al. 2006;Snow et al. 1998), andsorghum (Muraya et al. 2012).These differences in fitness response might also imply that selection acts on different regions in the genome.Recently, Quantitative Trait Loci (QTL) analysis on fitness characteristics measured in field trials has been used to identify genomic regions under selection in crop-wild hybrids (Baack et al. 2008;Dechaine et al. 2009;Hartman et al. 2012), but little is known of how differences in life history and fitness traits between different cultivarwild type crosses translate to differences in genomic selection patterns.With the production of high density integrated, and consensus, maps it becomes possible to compare QTL results between different cultivar-wild type crosses (Danan et al. 2011;Hund et al. 2011;Swamy and Sarla 2011).
After a single hybridization event, several processes play a role: hitchhiking effects because of linkage drag, heterosis, epistasis, and transgressive segregation interact to determine hybrid fitness (Stewart et al. 2003;Johansen-Morris and Latta 2006) and so influence the introgression chances of crop alleles.Epistasis is more thought to contribute to hybrid breakdown through the disruption of co-adapted gene complexes (Rieseberg et al. 2000), while heterosis and transgressive segregation can contribute to an increase in the performance of some hybrid lines relative to the wild parent (Burke and Arnold 2001).Hence, we focus on the latter processes in this study (but see Uwimana et al. (2012b) for a study on epistasis in lettuce) and we use two distinct hybrid generations: early generation backcross (BC) lines in which heterosis and transgression effects can occur and Recombinant Inbred Lines (RILs) with only transgressive effects.
Heterosis is most pronounced in early-generation hybrids, especially after hybridization between closely related species or inbred lines (Rieseberg et al. 2000), because of high levels of heterozygosity.Heterosis may be due to dominance (masking of deleterious alleles), overdominance (single-locus heterosis), and epistasis (enhanced performance of traits derived from different lineages due to non-additive interactions of QTL) effects (Rieseberg et al. 2000).It has been found many times in plants (Rhode and Cruzan 2005;Thiemann et al. 2009;Krieger et al. 2010;Muraya et al. 2011), animals (Hedgecock et al. 1995), and insects (Bijlsma et al. 2010).
Transgressive phenotypes include hybrid plants that exceed the parental phenotype in a negative or a positive direction (Rieseberg et al. 2000).Transgressive phenotypes arise if parental species contain alleles with opposing effects, where some lines derive the positively contributing alleles from both parents and others derive the negatively contributing alleles, leading to hybrid genotypes that are more extreme than the parental lines (Lynch and Walsh 1998).In a review of 171 studies on segregating plant and animal hybrids, Rieseberg et al. (1999) showed that in 155 studies at least one transgressive trait was reported and that 44% of 1229 traits examined were transgressive.These studies show that both heterosis and transgressive segregation are widespread phenomena in hybridizing species (Rieseberg et al. 1999(Rieseberg et al. , 2003)), suggesting that there is a high likelihood that at least some crop-wild hybrids have an increased fitness compared to the wild relative in a given environment (Johansen-Morris and Latta 2006;Latta et al. 2007).Therefore, rather than estimating average hybrid fitness, it is necessary to view the entire fitness distribution of the hybrid lineages and identify how many individual hybrid lineages outperform the wild relative and when.
In addition to the potentially different response of hybrids from different parental lines, or from early-and late-generations, hybrid performance is also subject to Genotype × Environment (G × E) interactions (Barton 2001;Hails and Morley 2005).For example, several QTL studies that compared hybrid performance between greenhouse and field environments have shown that different traits and loci were favored because of different selection pressures (Weinig et al. 2002;Martin et al. 2006;Latta et al. 2007;Hartman et al. 2012).Similarly, hybrid fitness selection patterns differ across different natural environments (Weinig et al. 2003) and as a consequence of varying stresses, such as competition (Mercer et al. 2007).This suggests that hybrid fitness might be weakly correlated across divergent environments (Latta et al. 2007) and that as a result of these G × E interactions different hybrid lineages, and consequently alleles, might be selected for in different environments (Mercer et al. 2007).Moreover, hybridization between two wild parental species can lead to the colonization of new habitats previously unavailable to either of the parental species (Lexer et al. 2003;Rieseberg et al. 2007).Therefore, the hybrid fitness distributions of different types of crosses and generations should also be considered in different environments, including the original wild habitat and novel environments, as we have done in this study.
We use the crop lettuce (Lactuca sativa L.), a leafy vegetable, and its wild relative prickly lettuce (L.serriola L.) as a crop-wild model system.These species are fully cross-compatible and interfertile without any crossing barriers (Kesseli et al. 1991;Koopman et al. 2001).A recent study suggested that a substantial part of wild L. serriola plants in Europe (7%) show evidence of previous introgression of alleles from L. sativa (Uwimana et al. 2012a).In addition, in a series of field experiments, it was demonstrated that at least four generations of hybrids on average had higher germination and survival rates than the wild parent (Hooftman et al. 2005(Hooftman et al. , 2007(Hooftman et al. , 2009)), and that part of the crop genome was selectively advantageous leading to skewed crop-wild allele distributions (Hooftman et al. 2011).Although it is often assumed that crop alleles confer negative fitness effects in the wild habitat (Stewart et al. 2003), this suggests that in lettuce parts of the crop genomic background contribute to higher hybrid fitness and, therefore, potentially to the transfer of crop alleles to the wild population.
As different generations, early Backcross (BC) lines as well as late-generation Recombinant Inbred Lines (RILs) were used, originating from different parental lines.We employed these hybrid lineages and their parents in a location with sandy soil, which is similar to the natural habitat in which L. serriola occurs, and one with clay soil, which can be considered as a novel habitat given the current distribution of L. serriola (Hooftman et al. 2006).For RILs, we already identified two genomic regions under selection, one where the crop genomic background was selectively beneficial and one where the wild genomic background was selectively beneficial (Hartman et al. 2012).In this study, we extend this analysis to BC lines and, in addition, studied the performance of individual hybrid lineages for both crossing types.This design allowed us to study differences in genomic selection patterns between different lettuce cultivar-wild crosses, hybrid performance in early-and late-generation hybrids, and environmental influence on hybrid fitness distributions.Specifically, we address the following questions: (i) Which crop genomic regions are under positive or negative selection and are these similar or different between the BC and RIL crossing populations?(ii) Do the crop-wild hybrid populations differ in their fitness distribution and do they include hybrid lineages that perform better than the wild parent?(iii) Are there environment specific effects on the fitness distributions?In particular, is there an indication that introgression is more likely to occur in a novel habitat compared to the original habitat of the wild relative?Finally, we discuss the likelihood of crop gene transfer to the wild relative and the implications for environmental risk assessment procedures.

Plant material
We used two different lettuce crop-wild crosses.We used 98 lines of an existing Recombinant Inbred Line (RIL) population (selfed for nine generations) derived from a cross between the cultivar Lactuca sativa cv.Salinas (Crisphead) and Californian L. serriola (UC96US23; Johnson et al. 2000;Argyris et al. 2005;Zhang et al. 2007).In addition, we used 98 Backcross lines selfed for one generation (BC 1 S 1 ) from a cross between the cultivar L. sativa cv.Dynamite (Butterhead) and a L. serriola collected near the town of Eys, The Netherlands (a common genotype in NW Europe, designated cont83 in van de Wiel et al. 2010; further refered to as L. serriola (Eys)).
Latuca sativa was used as the pollen donor to mimic a hybridization event due to pollen flow from the crop to a neighbouring wild population.The F 1 hybrid plant was subsequently backcrossed to the wild-type, creating a BC 1 generation and each BC 1 was then selfed to create a BC 1 S 1 population.Crossing followed the protocols by (Nagata 1992) and (Ryder 1999) and is described in detail in Hooftman et al. (2005).Note that BC 1 individuals were genotyped, whereas the BC 1 S 1 were used in the experiments (see below).
Both wild parents used in the crosses, L. serriola, have long serrate leaves that contain white, bitter latex.Plants have up to 2 mm long spines on the stem base and on downside leaf midribs.The wild-type produces a rosette instead of the head formed by several crop-types, furthermore it bolts and flowers early and can develop many basal and cauline reproductive shoots.Capitula (flower heads) produce approximately 15-20 florets that develop into brown single-seeded achenes (for brevity further referred to as seeds).When seeds are ripe the involucral bracts become reflexed (van der Meijden 1996).Latuca serriola mainly occurs in ruderal habitats, such as roadsides, railways, and construction sites.It is an annual species that flowers in July-August and survives winter as seed, but sometimes as small rosettes (Y.Hartman, personal observation).Lettuce is a predominantly selfing species, but up to 5% outcrossing rates via insect pollination have been reported (D'Andrea et al. 2008;Giannino et al. 2008).In contrast, the crop-types of L. sativa used in this study have broad almost circular leaves, without any spines or latex content, and develop a head without any basal side shoots.The cultivar group of Crisphead typically develops a very dense head (de Vries 1997) and develops brown seeds, whereas the Butterheads develop a relatively loose head and white seeds.Both cultivars have erect involucral bracts when seeds are ripe, most likely selected for to prevent seed shattering (de Vries 1997).

Field design and traits measured
We selected two field sites with contrasting environments.The first site, Sijbekarspel (SB), the Netherlands (N52°42', E04°58'), had a clay soil mimicking agricultural conditions with nutrient rich and high water retention conditions.Wageningen (WG), the Netherlands (N51°59', E05°39'), had a nutrient-poor, dry, sandy soil, more similar to the natural habitat of L. serriola.In SB, environmental data were obtained with a data logger, measuring temperature and humidity levels.In WG, daily temperature and rainfall was obtained from the Haarweg weather station approximately 1 km from the field (www.met.wau.nl).
For a detailed description on field design, we refer to Hartman et al. (2012).In short, ninety-eight RILs, ninety-eight BC 1 S 1 families, and all parent lines were grown in a randomized block design at the two sites.To follow the entire life cycle, the experiment lasted until the end of October.During the life cycle, we measured several fitness-related traits (Table 1).
Germination and initial establishment was measured 4 weeks after sowing.We collected two individuals per square for biomass measurements 7 weeks after sowing.One week later, we did a thinning round so that one individual was left per square for measurements in the adult stage.We recorded the flowering date and, at seed set, we counted the number of basal reproductive side shoots, the number of branches of the main stem, and the total number of seeds in ten capitula to calculate the average number of seeds per capitulum.Subsequently, we estimated the total number of capitula from the number of branches and shoots following Hooftman et al. (2005, see Appendix 1), and the seed output of a reproductive plant as the product of the number of capitula and the average number of seeds per capitulum.Survival was scored as a binary trait with 1 for survival until seed production and 0 for individuals that either died before seed-set or did not complete their life cycle before the end of the growing season.Survival rate was subsequently calculated as the proportion of seed-producing plants per line.Finally, seeds produced per seed sown (SPSS) was used as 'main fitness trait', because it is the closest direct association with life cycle fitness of the different lines, and calculated as: SPSS = Germination rate x Survival x Estimated seed output per reproductive plant (1) Note that the calculation of SPSS is slightly different than in Hartman et al. (2012), where we used average survival rate per line to calculate SPSS for each square, whereas here we used survival (e.g.either 0 or 1).

Statistical analysis
All statistical analyses were performed in PASW Statistics 17.0 (SPSS Inc. 2009).To improve normal distributions all traits were transformed, with the exception of number of seeds per capitulum as it was already normally distributed.Germination and survival rates were expressed as proportional data and arcsine-square-root-transformed.Biomass, number of reproductive basal shoots, number of branches, and total number of capitula were log-transformed.Seed output and SPSS were square-root-transformed.We estimated the mean, standard deviation, broad-sense heritability, and selection differential for each trait separately.Selection differentials were calculated as the covariance between the main fitness trait, SPSS, and the separate trait values, using the 12 data points per RIL or BC line (one per square) as replicates.Broad-sense heritability values (H 2 ) were estimated as the proportion of the total variance accounted for by the genetic variance using the formula: With Vg is the genetic variance and Ve is the environmental variance.Vg and Ve were inferred from between-and within-line variance components extracted with procedure VARCOMP (SPSS Inc. 2009).Heritability values of family means (H f 2 ) were estimated using the following formula (Chahal and Gosal 2002): Where n is the average number of replications for a certain trait (Table 2).The latter value indicates how well the family mean estimate resembles the true genetic value, given the number of replicates used, and is therefore important for the power of the QTL analyses.

Quantitative trait loci analysis
Genetic map and marker data used for the RILs in the QTL analysis were obtained from The Compositae Genome Project website (http://compgenomics.ucdavis.edu).The genetic map employed consisted of 1513 markers distributed over nine linkage groups (http://cgpdb.ucdavis.edu/GeneticMapViewer/display/; map version: RIL_MAR_2007_ratio; Johnson et al. 2000;Argyris et al. 2005;Zhang et al. 2007).Genetic map and marker data used for the BC lines is described in detail in Uwimana et al. (2012b); the genetic map consisted of 347 SNPs polymorphic between the parent lines also distributed over nine linkage groups.Note that BC 1 plants were genotyped and that the offspring (BC 1 S 1 families) were used in the experiments.All QTL analyses were performed with Composite Interval Mapping (CIM) in QTL Cartographer version 2.5.008(Wang et al. 2010).The RIL and BC 1 S 1 data were analyzed separately.Tests for the presence of a QTL were performed at 2 cM intervals using a 10 cM window and five background cofactors, which were selected via a forward and backward stepwise regression method.Statistical significance threshold values (α = 0.05) for declaring the presence of a QTL were estimated from 1000 permutations (Churchill and Doerge 1994;Doerge and Churchill 1996).One-LOD support intervals and additive effects were calculated from the CIM results.The linkage map and QTL were drawn with MapChart 2.2 (Voorrips 2002).The marker order of LG1, 3, 4, 7, and 8 of the BC map was reversed to be able to compare RIL and BC QTL.

Fitness distributions
To visualize variation in fitness, we ranked BC and RIL lines based on the estimated average SPSS and plotted the estimated average SPSS of lines against their rank.This was performed for both sites and crossing types separately and included all 98 RIL or BC lines and all parental lines.In addition, we visualized the influence of major fitness QTL on the fitness distributions.
We focused specifically on the genomic locations where fitness QTL clustered for both field locations.We color-coded lines for which we could unambiguously determine the genotype for those specific genomic locations, i.e., no missing data or all present markers of one parental background, further refered to as 'fitness QTL genotypes'.Color-codes indicated if fitness QTL contained either crop or wild alleles at these locations, or the combinations thereof.We also estimated the average rank per fitness QTL genotype indicating if a certain fitness QTL genotype had an average high or low rank.

Influence of crop genome
To visualize the influence of the amount of crop genome on fitness, we plotted the estimated average SPSS of BC 1 S 1 families and RILs against an estimate of the percentage of crop genome.This estimate was based on counting markers as coming from the crop or wild relative (missing data was excluded).The analysis was done for both sites and crossing types separately and included all 98 RIL or BC lines and all parental lines.First, we used a univariate linear regression to estimate the overall relationship between SPSS and the percentage of crop genome in R (version 2.14.0,R development core team 2011).Second, we repeated this analysis, while excluding the effect of the two major fitness QTL by adding these as covariates (based on the genotype data that were also used for the fitness distributions), therefore estimating the relationship between the residual variation in SPSS and the percentage of crop genome.In this second analysis, we omitted genotypes for which the presence of the fitness QTL was ambiguous, either due to missing markers or a recombination event in the QTL interval.Similar to the average rank per fitness QTL genotype, we estimated the average amount of crop genome per fitness QTL genotype.

Environmental data
During the period of the experiment, from May until the end of October, weather conditions were comparable in Sijbekarspel (SB) and Wageningen (WG).The average temperatures were 15.5°C and 14.8°C and relative humidity was 85.2% and 79.5%, respectively.The highest average maximum daily temperature reached 27.4°C in July in SB and 27.9°C in July in WG.The minimum average daily temperature was 5.0°C in October in SB and -4.3°C in October in WG.
The number of plants that survived until reproduction was also comparable between sites, with 56.9% of RIL individuals surviving in SB and 57.1% in WG.A higher percentage of BC individuals survived until reproduction at both sites; 72.4% in SB and 80.1% in WG.

Parental lines
The main difference between the two crop cultivars and the two wild parental lines is that most crop individuals died before seed production, whereas the majority of wild-type individuals survived and produced seeds (   Other trends that are similar across sites are that crop cultivars had higher germination rates, higher biomass production, and flowered later compared to the wild parental lines of the same cross (Table 2).Of the four parental lines, the Californian wild plants (L.serriola UC96US23) flowered first, followed by the Dutch wild plants (L.serriola (Eys)) and the two Crisphead plants that had similar flowering times, whereas the few Butterhead plants that flowered were last.Another trend was that plants developed faster in WG than in SB; all parental lines flowered earlier in WG compared to SB.

Heritability values and selection differentials
For BC lines, broad-sense heritability values ranged from 6.2% to 30.2% and family-mean heritability values ranged from 41.8% to 83.9% (Table 2).Heritability values patterns were more variable among BC lines than among the RILs, consistent with the larger genetic variation within and among these lines.In SB, biomass, number of reproductive basal shoots, and seed output had the lowest heritability values, whereas in WG, number of reproductive basal shoots and branch number had the lowest heritability values.At both sites, germination showed the highest broad-sense and family-mean heritability.
For RILs, heritability values patterns were very similar between SB and WG, with germination rate, biomass, and branch number showing lower broad-sense and family-mean heritability values than the other traits.Broad-sense heritability values ranged from 14.1% to 89.5% and heritabilities of the family-means based on approximately 10 replicates ranged from 62.7% to 98.9% (Table 2), indicating that the replication level was adequate, given the environmental variation under field conditions.At both sites, days until first flower showed the highest broad-sense and family-mean heritability.
Almost all traits had significant selection differentials (Table 2); the only exceptions being BC 1 S 1 biomass in SB and WG.Across sites and crosses, selection differentials showed the same trends.In all cases, selection differentials favored higher values for all traits, except for days to first flower where up to 6-7 days (RILs) and 5-9 days (BC 1 S 1 ) earlier flowering was favored.In addition, selection differentials were highest for total seed output and survival until reproduction, favoring a higher seed output (6 to 18 thousand) and up to 40% higher survival rates for RILs and around 20% higher survival rates for BC 1 S 1 at both sites.

Quantitative trait loci analysis
QTL results of the RIL population are summarized in Figure 1 and are described in more detail in Hartman et al. (2012, see Appendix 2).For the BC 1 S 1 , we detected a total of 43 QTL for 10 fitness and fitness-related traits distributed over all nine linkage groups (Table 3; Figure 1).The Phenotypic Variation Explained (PVE) per QTL varied between 6.4% to 42.8%.For each trait, one to three QTL were detected (mean 2.2).The 1-LOD support intervals ranged from 4.2 cM to 34.7 cM (mean 13.7 cM).
Combining the two field sites for all 10 traits, we found that a majority of BC QTL (25) was unique to either SB or WG; the remaining nine QTL were found for both sites.Only two regions show a clustering of QTL that include the main fitness trait, seeds produced per seed sown, namely at the bottom of LG6 and at the top of LG7.The same QTL are found for SB and WG at these genomic locations and in both cases the wild allele conferred the selective advantage for all QTL, as indicated by the selection differentials.At LG6 and LG7, the wild   allele reduced days until first flower and increased survival rate and seeds produced per seed sown.At LG7, additional QTL were detected for biomass and again the wild allele conferred a selective advantage increasing biomass.
The comparison between RIL and BC QTL fitness clusters shows similarities but also differences (Fig. 1).The BC cluster at the bottom of LG6 does not coincide with any RIL QTL, making this a unique genomic region for BC lines.However, the BC cluster at LG7 is situated in the same genomic region as a main cluster found for the RIL population.Similar to BC results, the wild allele conferred the selective advantage for QTL found for days to first flower, survival rate, and seeds produced per seed sown.Additional RIL QTL found were total capitula, shoot number, and biomass, but for these traits the crop allele conferred a selective advantage.
For the RIL population, one other fitness cluster was found across both field sites at the bottom of LG5, where QTL for seeds per capitulum, seed output, and seeds per seed sown were detected (Fig. 1).Here, it was the crop allele that conferred a selective advantage, as opposed to the QTL found at LG7.There were also BC QTL found for seed output, total capitula, and shoot number but, in contrast to the RIL QTL, no seeds per seed sown QTL was found and for BC QTL it was the wild allele that conferred a selective advantage.

Fitness distributions
Fitness distributions differed considerably, especially when comparing RIL and BC fitness distributions for the same site.All BC lines had at least some seed output, whereas approximately 30% of RILs produced no seeds both in SB and WG (Fig. 2).They either died before seed set or did not complete their life cycle before the end of the growing season.For RILs, the proportion of lines that performed better than the wild parent (L.serriola UC96US23) was comparable across sites, with 27% in SB and 23% in WG.However, for BC lines there was a considerable difference, with 79% of lines performing better than the wild parent (L.serriola Eys) in SB, whereas only 5% performed better in WG.
QTL fitness regions (LG5 and 7 for RILs and LG6 and 7 for BC lines) and the parental allele effects were described earlier.Given the QTL results, BC lines with a wild genomic background for both LG6 and 7, denoted as 6W-7W (green bars), were expected to have the highest seed yield, whereas the opposite fitness QTL genotype 6H-7H (crop genomic background for LG6 and 7, red bars with the letter H denoting that the BC 1 genotypes were heterozygous for these loci) should have the lowest seed yields.For both SB and WG, the 6W-7W (green) lines are indeed situated at the high-end of the fitness distributions, whereas the 6H-7H (red) lines are situated at the low-end side (Fig. 2).This is reflected in the average ranks of 24.0 out of 100 in SB and 30.5 out of 100 in WG for 6W-7W lines, and 78.6 in SB and 77.9 in WG for 6H-7H lines (Table 4).
Given the QTL RIL results, lines with the crop genomic background for LG5 (denoted as 5C) and the wild parental background for LG7 (denoted as 7W) were expected to have the highest fitness.Most lines with this 5C-7W fitness QTL genotype (blue bars) are indeed located at the high-end of the fitness distribution (Fig. 2) and 5C-7W fitness QTL genotypes had the highest average rank at both sites (27.6 out of 100 in SB and 28.9 out of 100 in WG, Table 4).RILs with the opposite combination, 5W-7C (orange bars), mainly situated at the low-end of the fitness distribution, had the lowest average rank of 76.5 in SB and 73.1 in WG; none performed better than the wild parental line.At both sites, only one 5C-7W (blue) line gave no seed output, whereas eight to nine 5W-7C (orange) lines produced no seeds.
These QTL fitness regions do not explain all variation of the fitness distributions as seen by the mixed distribution of the colored bars (Fig. 2).For example, the best performing RIL was  not a 5C-7W fitness QTL genotype (blue bars), but a RIL with a crop genomic background for both LG5 and 7 (5C-7C genotype, red bars).The Phenotypic Variation Explained (PVE) of the QTL for seed production (SPSS) reflects the unexplained variation.The combined PVE for BC fitness QTL was approximately 27% (WG) to 37% (SB), and for RIL fitness QTL is approximately 30% for both sites, implying that part of the variation could be due to minor QTL below the detection threshold of the current experiments.

Impact of the proportion crop genome
The average amount of crop genome was 23.7% for the BC 1 derived lines, ranging from a minimum of 10.5% to a maximum of 39.5% (Fig. 3).For RILs, the average was 50.9%, ranging from 29.1% to 76.9%.There was a large spread in SPSS for both BC 1 S 1 families and RILs for lines that have, approximately, the same amount of crop genome (Fig. 3a and b).Consequently, only a small part of the variation in SPSS was explained by the univariate linear regressions.
For BC 1 S 1 families approximately 3% to 7% was explained by the linear regression, for SB and WG respectively, and P-values were significant (SB: R 2 = 0.03, P < 0.05, df = 96; WG: R 2 = 0.07, P < 0.01, df = 96).The estimated slopes of the linear regression, however, were quite steep, with an increase in crop genome from 20% to 30% predicted to result in a reduction of seed production from 11.449 to 9.178 for SB and from 19.656 to 14.957 for WG (based on the regression equations).For RILs, the explained variance was very low with 1.0% in SB and 0.4% in WG, and P-values were not significant (SB: R 2 = 0.01, P = 0.62, df = 96; WG: R 2 = 0.004, P = 0.45, df = 96).
The results of the regression analysis changed considerably for BC 1 S 1 families when the variation in SPSS due to the two major fitness QTL was removed (Fig. 3c and d).The variation in SPSS explained by the linear regressions was lower and P-values were no longer significant (SB: R 2 = 0.02, P = 0.14, df = 74; WG: R 2 = 0.01, P = 0.96, df = 74).For RILs, the explained  variance was even lower and non-significant.
The average amount of crop genome per fitness QTL genotype (same categories as used in the fitness distributions) was approximately the same for all fitness QTL genotypes in RILs (Table 4: 49.8%-52.1%).The most advantageous BC 1 fitness QTL genotype (6W-7W) had the lowest amount of crop genome (21.0%), whereas the least advantageous BC 1 fitness QTL genotype (6H-7H) had the highest (31.0%), indicating that selection in this BC 1 population might lead to a considerable purging of crop genes at these genomic locations.

Discussion
Overlapping and separate genomic regions are under selection Our results indicate that introgression chances of crop alleles extrapolated from the genetic location might differ between crosses, because of the different genetic makeup of the parental lines (Mercer et al. 2006;Muraya et al. 2012).In general, we detected few regions with colocalization between BC and RIL QTL, even though selection differentials indicated that selection pressures were similar between the two crossing types and the two sites.In our case, the crop cultivar, as well as the wild parent, differed between the BC and RIL crossing population.Both the BC and RIL populations had two genomic regions with fitness QTL that were consistent across field sites.Fitness distributions and the average rank of fitness QTL genotypes (based on fitness QTL) confirmed that these genomic regions indeed had a substantial impact on the fitness of BC and RIL hybrid lineages.The majority of lines with the most selectively advantageous fitness QTL genotype displayed relatively high seed yields and averaged these groups showed the highest rank compared to other combinations of parental alleles.This pattern with few genomic regions of major impact is similar to QTL selection patterns found in sunflower (Baack et al. 2008;Dechaine et al. 2009) and slender wild oat (Latta et al. 2010).
BC and RIL QTL for seeds produced per seed sown (SPSS) co-localized at the top of linkage group (LG) 7. The wild allele conferred the selective advantage, as indicated by the selection differentials, by favoring a higher SPSS, early flowering, and higher survival rates.In previous work, we hypothesized that this QTL region is probably the result of the presence of a major gene for flowering, in which the crop allele confers a selective disadvantage by delaying flowering (Hartman et al. 2012).The second genomic region under selection was specific for each cross, with BC fitness QTL on the bottom of LG6 and RIL fitness QTL on the bottom of LG5.For BC QTL at LG6, it was again the wild allele that gave the selective advantage favoring earlier flowering, higher survival rates, and higher SPSS.These did not co-localize with any RIL QTL.In contrast, for RIL QTL at the bottom of LG5, it was the crop allele that favored seeds per capitulum, seed output, and SPSS (Hartman et al. 2012).

Genetic basis of better performing lines
At both field sites and for BC, as well as RIL crossing populations, there was a substantial number of hybrid lines that outperformed their respective wild parent, although hybrids on average produced less seeds per seed sown than the wild parent, with the exception of BC hybrids on clay soil that performed better than the wild parent (see below).This observed hybrid vigor concurs with the transgressive segregation observed in greenhouse experiments employing the same BC and RILs hybrid lineages, in which individual lines had an increased vigor under drought, nutrient limitation, and salt stress (Uwimana 2012b; Chapter 4).
Heterosis, increased hybrid vigor in early-generation hybrids (Rieseberg et al. 2000;Johansen-Morris and Latta 2006), probably explains, for the larger part, that all BC 1 S 1 families produced at least some seeds, even though these hybrids where backcrossed once to one of the parents.In contrast, approximately 30% of RILs produced no seed output.With each subsequent generation, heterozygosity rapidly decreases in a selfing species such as lettuce.Hence, in a RIL population selfed for nine generations lines are virtually entirely homozygous and heterosis effects cannot account for the better performing lines in later generations (Burke and Arnold 2001).However, the higher fitness of early-generation lettuce hybrids may favor survival of hybrids with novel genotypes, thereby increasing the chances for these beneficial novel genotypes to be fixed in later generations (Johansen-Morris and Latta 2006;Latta et al. 2007).
The steep decline in fitness of BC 1 S 1 families with a higher amount of crop genome indicates that a strong selection against and hence, a rapid elimination of crop genome in the first hybrid generations is expected.This could be due to hitchhiking effects, since in earlygeneration hybrids many crop genes are in linkage disequilibrium (LD) with genes under selection, as indicated by the lower amount of crop genome of the most advantageous BC 1 fitness QTL genotype (based on fitness QTL).In contrast, LD is greatly reduced in 9 th generation RILs (Flint-Garcia et al. 2003;Stewart et al. 2003).Moreover, a positively selected crop gene was also segregating in the RIL population.In RILs, all genotypes have approximately the same amount of crop genome.This suggests that in later generations particular combinations of genes became important, independent of linkage drag, giving rise to transgressive segregation (Rieseberg et al. 1999(Rieseberg et al. , 2003)).
Especially QTL studies have consistently pointed at the additive effects of complementary genes of the two parental species as the most likely underlying genetic basis for transgressive segregation (Rieseberg et al. 1999(Rieseberg et al. , 2000;;Burke and Arnold 2001).Indeed, six to seven (BC and RILs results, respectively) out of the ten traits measured in this study show QTL with opposing effects, where in some genomic locations the crop parental allele is selectively advantageous and in other locations it is the wild parental allele.After hybridization, QTL with effects in opposing directions within each parent may recombine in the hybrids, resulting in some lettuce hybrids having most or all QTL with effects in the positive direction leading to a high fitness, or with effects in the negative direction leading to a low fitness (Lynch and Walsh 1998;Rieseberg et al. 2007), a pattern also observed in tomato (deVicente and Tanksley 1993).
It should be noted that heterosis, linkage, and transgressive segregation are not the only genetic processes underlying hybrid fitness.For example, Uwimana et al. (2012b) found epistasis effects in BC 1 and BC 2 generation lettuce hybrids when subjecting these to several stress treatments in greenhouse conditions.In later generations, these epistasis effects are more likely to contribute to the breakdown of co-adapted gene complexes (Rieseberg et al. 2000;Burke and Arnold 2001) and therefore lower hybrid fitness.This may also partly explain the 30% of RILs without any seed output.

Higher chance of introgression in novel habitats
Fitness distributions indicated that introgression of crop alleles through hybridization might be more likely to occur in novel habitats, as opposed to the natural wild habitat of the wild parent.More hybrid lineages performed better than L. serriola in the novel clay soil habitat than in the original sandy soil habitat (habitat requirement as described in Hooftman et al. 2006), especially BC hybrid lineages.In spite of the fact that the wild allele gave the selective advantage for the two BC fitness QTL, 79% of families performed better than the wild parent (L.serriola Eys) in clay soil, whereas only 5% of BC 1 S 1 families performed better in sandy soil.The lower performance of the wild parent in the clay site was caused by a lower survival until reproduction, as well as a lower than average seed yield of reproducing plants.In addition, the Percentage Variation Explained (PVE) by fitness QTL (in total 36.9% in clay soil and 26.9% in sandy soil) indicates that not all fitness variation was explained by these fitness QTL and that apparently the increased fitness of BC 1 S 1 hybrids in clay soil could be due to their mixed cropwild genomic background and heterosis effects.
Similar patterns have been found in other species.In slender wild oat, more hybrid genotypes were able to outperform the parental lines in a greenhouse environment, representing a novel habitat, than in the original wild habitat (Johansen-Morris and Latta 2008).Similarly, radish crop-wild hybrids exhibited a higher survival rate and produced more seeds per plant relative to the wild parent in a new environment, whereas they had comparable survival rates but produced fewer seeds in the original habitat (Campbell et al. 2006).Our results also concur with those found by Hooftman et al. (2005Hooftman et al. ( , 2007Hooftman et al. ( , 2009)), in crossings of the same parents as the BC lines of the current study.They found a strong heterosis effect in the clay soil averaging over all lines, but also a clear hybrid vigor breakdown over multiple generations potentially through further segregation or epistasis effects.Since our experiments only included one location of each habitat type, albeit with replicated plots per site, these conclusions should be further verified in experiments including multiple sites for each habitat.

Implications for crop breeding and risk assessment
The genetic processes underlying hybrid fitness have important consequences for the chances of crop (trans)gene transfer to wild populations and, therefore, for the methods of Environmental Risk Assessment (ERA).Many studies on crop-wild hybrid fitness use the average fitness of hybrid classes (Halfhill et al. 2005;Hooftman et al. 2005;Mercer et al. 2006;Campbell and Snow 2007;Cao et al. 2009;Huangfu et al. 2011); in case hybrid fitness is low compared to the wild parent this is taken to suggest that chances for crop allele transfer are low as well.However, our results and those of others indicate that particular hybrid genotypes may outperform the parental lines under certain environmental conditions (Burke and Arnold 2001;Johansen-Morris and Latta 2008;Hooftman et al. 2009).Also, although it appears that a larger amount of crop genome decreased hybrid fitness, there was considerable spread in fitness among hybrid lines with similar crop-wild genomic ratio.Therefore, even if hybrids on average have a lower fitness, particular hybrid lines with a large amount of crop genome may exist that have a higher fitness.Thus, a lower average fitness of hybrids does not preclude gene transfer between crops and their wild relatives.
In addition, we have found that results can be cultivar-specific, i.e., the fitness of hybrids depends on the specific combination of crop and wild parent and hence, fitness studies for risk assessment should include a range of wild parents (Muraya et al. 2012).Similarly, selection pressures differ across time and place, so ideally risk assessment should be performed at several locations and in multiple years (Hails and Morley 2005).ERA including hybrids of several parental lines, locations, and years involves field experiments with a huge amount of time and labor.However, measuring life history traits can already lead to robust conclusions, because through QTL analysis most genomic selection patterns can be identified (Hartman et al. 2012).

Conclusion and way forward
Our results show that there is a high likelihood in lettuce for novel crop-wild hybrids to arise that have a higher fitness than the wild parent through combinations of heterosis, linkage, and transgressive segregation.This may be more likely to occur in novel habitats (Barton 2001).Consequently, this provides an avenue for introgression of crop alleles into the wild population.We did identify a genomic region on LG7 where the crop allele induced delayed flowering that was under negative selection.In this region, effects were stable across cultivars and the environments of our field experiments and it could therefore be used in transgene mitigation strategies.In such a strategy, the transgene is placed in close linkage to a gene or region that has a strong negative selection effect in the wild habitat (Gressel 1999;Stewart et al. 2003).Whether or not the detrimental effect of delayed flowering is strong enough to prevent crop (trans)gene escape will be explored further in simulation models (Gosh et al. in prep; Meirmans et al. in prep.) using these empirical field data.

Figure 1 .
Figure 1.Genomic locations of quantitative trait loci detected in composite interval mapping for Lactuca sativa cv.Salinas × L. serriola (UC96US23) recombinant inbred lines (RILs) population and a Lactuca sativa cv.Dynamite × L. serriola (Eys) Backcross (BC 1 S 1 ) population.The same linkage groups of RIL and BC map are shown next to each other.Linkage group names are shown at the top and dotted lines between linkage group bars indicate similar markers.Markers are indicated by horizontal lines on the linkage group bars and map distances (cM) are shown on the left side.Bars to the right represent one LOD confidence intervals of QTL.For abbreviations, we refer to Table1.An open bar indicates that the crop allele (L.sativa cv.Salinas) gives a selective advantage, whereas

Figure
Figure 2. Fitness distributions across lines for a) Backcross (BC 1 S 1 ) families in Sijbekarspel, b) recombinant inbred lines (RILs) in

Table 1 . Traits examined in a Lactuca sativa cv. Salinas × Lactuca serriola (UC96US23) recombinant inbred lines (RILs) population and in a Lactuca sativa cv. Dynamite x Lactuca serriola (Eys) Backcross (BC 1 S 1 ) population.
of seeds per capitulum based on 10 collected capitula Total no.capitula TC Total no. of capitula developed, calculation following Hooftman et al. (2005); values log-transformed Seed output SDO Total no. of seeds produced, calculation following Hooftman et al. (2005); values square-root-transformed Survival rate SUR No. of plants per RIL that produced seed divided by 12, values arcsinesquare-root-transformed Seeds produced per seed sown SPSS No. of seeds per seed sown, calculated by multiplying germination rate, with survival and seed output, values square-root-transformed

Table 2 )
. Only one individual of the Crisphead cultivar (Lactuca sativa cv.Salinas) survived until flower production in both SB and WG, but it died before

Table 2 . The mean, standard deviation, broad-sense (H 2 ) and family-mean (H f 2 ) heritability values, and selection differentials for the parent lines, recombinant inbred lines (RILs) and Backcross (BC
For abbreviations, we refer to Table1, * Significant at 0.05 level, ** Significant at 0.01 level.reproductivecharacters, such as shoot and branch number, could be recorded.Similarly, only one Butterhead (L.sativa cv.Dynamite) individual survived until flower production in SB; in WG, four individuals survived until flowering but only one of them produced seeds in four capitula.

Table 3 . Quantitative trait loci (QTL) positions using composite interval mapping in a Lactuca sativa cv. Dynamite × Lactuca serriola (Eys) Backcross (BC 1 S 1 ) population.
QTL positions of the Lactuca sativa cv.Salinas × Lactuca serriola (UC96US23) recombinant inbred lines population are shown in Chapter 3 (but see Appendix 2).For abbreviations, we refer to Table 1.Positive additive effects indicate that the crop-type (L.sativa) allele increases trait values, whereas negative values indicate that the wild-type (L.serriola) allele increases trait values.PVE = Percentage Variation Explained.QTL with peak values within 5 cM are shown on the same line.

Table 1 .
An open bar indicates that the crop allele (L.sativa cv.Salinas) gives a selective advantage, whereas a filled bar indicates that the wild allele (L.serriola) gives a selective advantage.Selective advantage is inferred from the selection differentials (Table2).Bar colors indicate the location: Grey = Sijbekarspel and Black = Wageningen.

Table 4 .
Black squares indicate parent lines and gray squares indicate lines for which the genotype is unknown.

Table 4 . Average rank and amount of crop genome of genotypes across 98 recombinant inbred lines (RILs) or Backcross (BC 1 S 1 ) families
. C = homozygous crop allele, W = homozygous wild allele, H = heterozygous crop and wild allele, n = number of lines.For RILs, letters indicate genomic fitness regions on LG5 and 7 and for BC lines, letters indicate genomic fitness regions on LG6 and 7.For example, 5C-7C indicates crop genotype for the identified QTL on both LG5 and LG7; lines without sufficient information are joined into 'No genotype'.