From Genotype × Environment Interaction to Gene × Environment Interaction

Historically in plant breeding a large number of statistical models has been developed and used for studying genotype × environment interaction. These models have helped plant breeders to assess the stability of economically important traits and to predict the performance of newly developed genotypes evaluated under varying environmental conditions. In the last decade, the use of relatively low numbers of markers has facilitated the mapping of chromosome regions associated with phenotypic variability (e.g., QTL mapping) and, to a lesser extent, revealed the differetial response of these chromosome regions across environments (i.e., QTL × environment interaction). QTL technology has been useful for marker-assisted selection of simple traits; however, it has not been efficient for predicting complex traits affected by a large number of loci. Recently the appearance of cheap, abundant markers has made it possible to saturate the genome with high density markers and use marker information to predict genomic breeding values, thus increasing the precision of genetic value prediction over that achieved with the traditional use of pedigree information. Genomic data also allow assessing chromosome regions through marker effects and studying the pattern of covariablity of marker effects across differential environmental conditions. In this review, we outline the most important models for assessing genotype × environment interaction, QTL × environment interaction, and marker effect (gene) × environment interaction. Since analyzing genetic and genomic data is one of the most challenging statistical problems researchers currently face, different models from different areas of statistical research must be attempted in order to make significant progress in understanding genetic effects and their interaction with environment.


INTRODUCTION
The presence of genotype × environment interaction (GE) in plant breeding multi-environment trials (MET) is expressed either as inconsistent responses of some genotypes relative to others due to genotypic rank change or as changes in the absolute differences between genotypes without rank change (i.e., heterogeneity of within-site variance). Several models are commonly used for describing the mean response of genotypes across environments and for studying and interpreting GE in agricultural experiments: linear models, bilinear models, and linear-bilinear models. Fixed-effect linear-bilinear models, such as the Sites Regression (SREG) [1,2] and the Additive Main effect and Multiplicative Interaction (AMMI) models [3,4] are used for studying genotypic response patterns across environments. In these models, the response patterns of genotypes and environments can be visualized graphically using biplots [5,6] that allow the breeder to observe the high performing genotype(s) in a region(s) and/or sub-region(s). Recently, several review articles pointed out the merits and demerits of the fixedeffect linear-bilinear models [7][8][9][10][11]. One class of fixed-effect linear models, namely factorial regression (FR) models, and *Address correspondence to this author at the Biometrics and Statistics Unit, International Maize and Wheat ImprovementCenter (CIMMYT), Apdo. Post 6-641, 06600 Mexico, D.F., Mexico; Tel: 58042004; Fax: 58047559; E-mail: j.crossa@cgiar.org one class of bilinear models, namely partial least squares (PLS) regression, allow incorporating external environmental and genotypic covariables directly into the model and are useful for finding the climatic causes of GE or the genetic factors (molecular markers) influencing GE.
Linear mixed models have become widely accepted and used for analyzing MET in plant breeding [12][13][14][15][16][17][18][19]. The models naturally lead to a factor analytic (FA) [20,21] form of the genetic variance-covariance for environments that is more parsimonious and flexible than other variancecovariance structures. Since these are linear-bilinear mixed models, they also have the usual advantages, i.e., that error variance modeling can be accommodated (in particular, heterogeneity of block and error variance between environments and within-environment spatial correlation) and that incomplete data are handled with ease. Furthermore, when genotypes are considered as random effects, coefficients of parentage can be incorporated into the FA model for GE, thereby obtaining more precise estimates of the breeding values of genotypes [16,22,18]. Furthermore, Burgueño et al. [19] showed how to use the FA model and its biplot for clustering sites and genotypes with statistically negligible crossover interaction (COI). The method proposed by Burgueño et al. [19] has two main advantages: (1) the descriptive biplot is the starting point for testing successive hypotheses about the suitability of combining sites and genotypes into subsets that decrease the amount of 1 -/12 $58.00+.00 ©2012 Bentham Science Publishers significant COI, and (2) the process of delineating megaenvironments and (3) answering the question "who wins where?" can be done within the framework of linear mixed models.
As for comparing the predictive ability of linear-bilinear fixed-effect models, the study of Cornelius and Crossa [23] used a cross-validation scheme that splits the data of a MET into a training set and a validation set, and computes the root mean squared predictive error between the predictive value and the observed value of the model. Those authors evaluated the within-trial predictive assessment of several fixed-effect linear-bilinear models, and developed efficient shrinkage estimates of fixed linear-bilinear models that showed the same or better predictive ability than the traditional Best Linear Unbiased Prediction (BLUP) of the cell mean. However, Cornelius and Crossa [23] only studied the predictive ability of models within specific environments (within-trial prediction) but did not assess the prediction of performance of unobserved genotypes in other environments (between-trial prediction). Also, Piepho [21] performed cross-validation on linear-bilinear models but in a slightly different form. Fixed linear-bilinear models have been used for describing GE or the combination of genotype plus GE interaction (GGE) based on a biplot graph. Recently, Burgueño et al. [24] compared the predictive ability of various linear-bilinear models and mixed effects models using the factor analytic model; results show that for data sets with complex GE or GGE, modeling GE and GGE using the FA model improved the predictability of the model up to 5-7%. When GE and GGE were not complex, most models gave high predictability (FA versus no FA) and FA did not seem to lose much predictability (only 2%). Therefore, it was concluded that modeling GE and GGE is a good thing.
Selection in plant breeding is usually based on estimates of breeding values obtained with pedigree-based mixed models. In their multivariate formulation, these models can also accommodate genotype × environment (GE) interaction. These models have been used successfully for predicting breeding values in plants and animals. However, pedigreebased models cannot account for Mendelian segregation, a term that, under an infinitesimal additive model (e.g., Fisher [25]) and in the absence of inbreeding, explains one half of the genetic variability. Molecular markers allow tracing Mendelian segregation at several positions of the genome, which gives them enormous potential in terms of increasing the accuracy of estimates of genetic values and the genetic progress attainable when these predictions are used for selection purposes.
The models mentioned above assess GE from an overall perspective, that is, they do not attempt to decompose or partition the total GE into chromosome regions or even further into specific genes so that the physical genetic causes of GE can be identified in the chromosome. Although Quantitative Trait Loci (QTL) mapping has been routinely used in plant breeding, approaches that fully exploit data from MET to assess and study QTL × environment interaction (QEI) are very limited. The modeling of genetic (co)variances between environments, in combination with modeling of heterogeneous residuals, is an important condition for reaching reliable conclusions about the main effects of QTLs as well as the QEI that is caused by the different expressions of QTLs in different environments; linear mixed models are the natural framework for analyzing such complex data. Multi-environment QTL mapping approaches have been presented in the literature [26]. In these examples, single-trait QTL mapping is studied so that the problem is reduced to either a multi-trait or a multienvironment dimension, but not both. The QTL linear mixed model can be extended to cover both multi-trait and multienvironment cases by first identifying the genetic correlation between traits and/or environments and by imposing some structure on the (co)variance matrix, and then incorporating molecular marker information to extend the phenotypic model into the QTL model [27]. Marker-assisted selection (MAS) and the identification of molecular markers closely linked to QTLs [28] have been widely used in plant breeding to improve a few traits controlled by major genes. However, adoption of the technology has been limited because the biparental populations used for mapping QTLs are not easily used in breeding applications. Also, since MAS uses only partial information (few markers), it presents limitations for improving traits controlled by many loci with small effects because the few markers linked to significant QTLs explain only a small percentage of the total genetic variability (the problem of missing heritability).
On the other hand, genomic selection (GS) (or genomewide selection) is an approach for improving quantitative traits [29] that uses all available molecular markers (MMs) across the genome to estimate genetic values. Reports on the use of GS in plants are few and refer mainly to computer simulation studies such as the research of Bernardo and Yu [30], who concluded that GS is superior to marker-assisted selection in maize. In recent articles, de los Campos et al. [31], Crossa et al. [32,33], and Perez et al. [34] used Bayesian estimates from genomic parametric and semiparametric regression and showed that models using MMs produced more accurate predictions of grain yield and other traits in maize and wheat than those based only on pedigree. Genomic selection has been validated in animal breeding for predicting breeding values [35,36,31].
In a usual genetic model, the phenotypic response of the i th individual ( i y ) is described as the sum of a genetic value, i g , and a model residual, i ε , such that the linear model for the genotypes (i=1,2,..,n) is represented as (where µ is the general mean). One method for incorporating markers in models for GS is to define i g as a parametric regression on marker covariates ij x (which can take values of 1, 0, and -1 for a biallelic marker of a segregating population or values of 1 and -1 for inbred lines) of the form where j β is the regression of i y on the j th marker covariate.
In matrix notation, the model is expressed as ε Xβ y + = .
Usually, the number of markers exceeds the number of individuals, and estimation of marker effects via ordinary least squares (OLS) is not feasible. In OLS, estimates are obtained to maximize model goodness-of-fit to the training set, and model complexity is not considered. When the number of MMs is large, this typically yields high meansquared error of estimates of marker effects and poor predictive ability.
In the rest of the chapter I will show the basic fixed and mixed linear-bilinear models underlying GE, the models employed to develop QTL mapping and QTL×E interaction assessment, and the basic models for genomic-enable prediction that allow assessing gene × environment interaction..

The Basic Model
The basic two-way fixed effects linear model for GE analyses considers that the empirical mean response, ij y , of the i th genotype (i=1,2,…,g) in the j th environment (j=1,2,…,s) with r replications in each of the g×s cells is expressed as where μ is the grand mean over all genotypes and environments, i τ is the main effect of the i th genotype, j δ is the main effect of the j th environment, ij δ) (τ is the effect of the interaction (GE) of the i th genotype in the j th environment, and ij ε is the average error, assumed to be NID ( /r) 2 ε σ 0, (where 2 ε σ is the within-environment error variance, assumed to be constant, and r is the number of observations per cell). For a complete random model, it is assumed that i τ , j δ , and ( ij ) δ τ are normally and independently distributed, with variances 2 τ σ , 2 δ σ , and 2 τδ σ , respectively. Adding the design effects to (1) with randomized complete blocks, or any type of incomplete block design, does not pose a problema. Furthermore, modeling the residulas by means of spatial analyses does not present further difficulty and is a practice that must be routinely used in any field experiment.
Yates and Cochran [37] introduced a model in which the GE term is linearly related to the environmental main effect .j , such that the stability parameter i ϑ is the regression of genotype performance on the environment mean .j y . This was later more formally presented by Finlay and Wilkinson [38] and extended by Eberhart and Russell [39] to include the deviation from regression as another statility parameter (although this is in fact a lack or fit of the model).

Fixed Effect Linear-Bilinear Models
Williams [40] considered the model , where λ is the largest singular value of ZZ′ and Z′Z (for Z= i. y ij y ) and i α and j γ are the corresponding eigenvectors. Gollob [41] and Mandel [42] extended Williams' [40] work by considering the bilinear GE term as Thus, the general formulation of the linear-bilinear model is where the constant k λ is the singular value of the k th bilinear the ik α are elements of the k th left singular vector of the true interaction and represent genotypic sensitivity to hypothetical environmental factors represented by the k th right singular vector with elements jk γ . The ik α and jk γ satisfy the constraints i = 1 In matrix notation, these linear-bilinear models can be expressed as Y= ∑ = m 1 k k β X k + AΛG′+E [44], where or partially into each), and the deviation modeled as multiplicative components, provided that t>1.
There are several statistical as well as biological reasons to prefer SREG over AMMI for assessing COI and non-COI under the common situation of complex GE: (1) for the same number of bilinear terms, SREG is a more parsimonious model than AMMI; (2) SREG incorporates the main effect of genotypes directly into the statistical analysis of GE, that is, both effects genotypes and GE (GGE) are combined and estimated jointly; this is important for reaching breeders' objectives, which requires including the main performance of genotypes in the model; (3) sometimes the mixed SREG model can be fitted much more easily than the mixed AMMI model; and (4) the mixed SREG model, as proved by Burgueño et al. [19], is useful for delineating megaenvironments using a formal statistical approach based on the factor analytic model.

What if Genotypes or Environments, or both, are Random Effects?
The basic linear mixed model used for fitting data from g genotypes, s sites, and r replicates when searching for subsets of environments and/or genotypes with non-COI is where X is the incidence matrix of 0s and 1s for the fixed effects of environments, and r Z and g Z are the incidence matrices of 0s and 1s for the random effects of replicates within environments and genotypes within environments, respectively. The random effect of genotypes within environments combines the main effects of genotypes and GE (GGE). Vector b denotes the fixed effects of environments and/or the effect of the design (i.e., replicates, incomplete blocks, etc.); vectors r , g , and e contain random effects of replicates within environments, genotypes within environments, and residuals within environments, respectively, and are assumed to be random and normally distributed with zero mean vectors and variance-covariance matrices R, G, and E, respectively. are identity matrices of order r and r×g×s, respectively, 2 j r σ , 2 e σ are the replicates within the j th environment and residual variances, respectively, and ⊗ is the Kronecker (or direct) product of the two matrices. The structure of E assumes that the residuals of the field plots at each environment (i.e., elements of vector e) are not spatially correlated; however, when field information is available, the spatial model approach using models such as the two-dimensional auto-regressive procedure in the direction of rows and columns in the field can be incorporated into the analyses. The solution ( b ) for the vector of fixed environment means and the vectors of random effects ( r and ĝ ) are obtained from the mixed model equations.
The variance-covariance matrix G is indexed by two factors, environments and genotypes, and can therefore be written as the Kronecker product of two matrices indexing those factors, G = ∑ g ⊗I g , where the j th diagonal element of the s× s matrix g Σ is the genetic variance 2 j g σ within the j th environment, and the jj' th off-diagonal element is the genetic covariance j' g σ j g σ jj ρ between environments j and j'; thus jj' ρ is the correlation of genetic effects between environments j and j'. As for the genotype factor, the identity matrix g I (of order g) is used when it is assumed that the genotypes are not related, and the breeding value of each genotype will be predicted only by the value of the empirical responses of the genotype itself. The environmental component of G, g Σ , can be modeled by the FA, whereas the genotypic component of G is modeled by the identity matrix g I , which assumes no relationship among genotypes.
However, a relationship matrix A using the coefficient of parentage among genotypes can be used instead.

The Factor Analytic model
The environmental component g Σ of the variance of random effects, G, can be modeled by the FA model, which expresses the random effect of the i th genotype in the j th environment as a linear function of latent variables ik x with coefficients jk δ for k = (1, 2, … t), plus a residual ij η that is where jk δ is the factor loading of the j th environment in the k th latent factor, ik x is the score of the i th genotype in the k th latent factor, and ij η is the residual term. In matrix form the previous equation is expressed as is of order gs × 1, and vector δ is of order gs × 1; then it can be written as is a matrix of order s×k with the k th column containing the environment loadings for the k th latent factor. Since it is assumed that the genotypes are unrelated, random effects x and δ are independent and have a joint normal distribution with a mean vector of zero and variances V(x)= kg g k 1 η (σ of order s×s. Therefore, the variance of the random effects g, which separates the environmental and genotypic components, is given by are estimates of the genetic variance within the j th environment (i.e., diagonal elements of g Σ , 2 j g σ ) and estimates of the genetic covariances between the j th and the j' th environments (i.e., off-diagonal elements of ).
Thus, FA can be interpreted as the linear regression of genotype and GE on latent environmental covariates (environmental loadings, jk δ ), with each genotype having a separate slope (genotypic scores, ik x ) but a common intercept (if main effects of genotypes are not distinguished from GE). The slopes of genotypes measure the sensitivity of the genotypes to hypothetical environmental factors represented by the loadings of each environment. A mixed-model analogue of AMMI or SREG has been developed using the factor analytic (FA) model for approximating the variance-covariance GE structure [12,17,14,15]. Research conducted by Crossa et al. [16] and Burgueño et al. [19] described how to model variancecovariance GE and GGE using the FA model and how to incorporate the additive (relationship A) matrix and the additive × additive covariance matrix into the FA model based on pedigree information. Burgueño et al. [19] also described the equivalence between SREG2 and FA(2) for finding subsets of genotypes and environments without COI.

The Relationship Between the Factor Analytic and Sites Regression Models for Assessing Crossover genotype × Environment Interaction
In the FA model, the random effect of the i th genotype in the j th environment ( ij g ) is expressed as a linear function of latent variables ik x with coefficients jk δ for k=(1,2,…t), plus a residual, ij η , i.e., that the ij th cell mean can be written as ij With only the first two latent factors being retained, ij g is Therefore, there is a clear connection between the SREG2 and FA(2) models. A similar connection between the AMMI2 and FA(2) models was established by Smith et al. [14].
Under principal component rotation, the directions and projections of the vectors of FA(2) and SREG2 in the biplot are the same. Therefore, the property of the SREG by which the first principal component of SREG2 accounts for noncrossover interaction (non-COI) and the second principal component of SREG2 is due to COI variability should hold for FA(2) as well. It should be pointed out that the absolute values of genotypic and environmental scores under the FA(2) and SREG2 models may not necessarily be the same; the estimates of the random effects in the FA(2) model are BLUPs (Best Linear Unbiased Predictions), whereas the estimates in the fixed effects SREG2 model are least squares estimates, that is, Best Linear Unbiased Estimates (BLUEs). Furthermore, the standard errors of the estimable functions of fixed effects under SREG differ from those of predictable functions of a mixture of fixed and random effects under FA, and FA models are more flexible in handling unbalanced data (the SREG model does not handle missing data).

Prediction of Unobserved Individuals Based on Phenotypic Data While Modeling GE
Burgueño et al. [24] compared the predictive ability of linear mixed models when the GE is modeled by the FA model with that of simple linear mixed models when the GE is not modeled. A cross-validation scheme is used that randomly deletes some genotypes from environments for a 10-fold partition; the values for these genotypes are then predicted by the different models and correlated with their observed values in order to assess model accuracy. A total of six multi-environment trials (one potato trial, three maize trials, and two wheat trials) with GE of varying complexity were used in the evaluation. Although Burgueño et al. [24] analyzed six MET, here we present results from only three maize METs (M1-MET, M2-MET, and M3-MET). The models used for prediction are linear mixed models 1 and 2, which are two simple mixed models that had the random GE term not modeled, and two linear models, 3FA and 4FA, which had the GE modeled by the FA model ( Table 1).
For M1-MET, the best predictive model was model 3FA, which had a correlation of 0.878 that represents a 6% increase in accuracy with respect to simple linear mixed model 1, whereas model 4FA, with a correlation of 0.867, had a 4.7% increase in accuracy with respect to the simple linear mixed model 1 ( Table 2). Prediction of the overall genotypic effects of the simple linear mixed models in a complex GE setting was poor as compared with that of models that considered covariances between environments. The correlations between the observed and predicted values of the four models fitted to M2-MET and M3-MET were higher in the M2-MET than in the M3-MET because the GE in M2-MET was simpler than the GE in M3-MET. For M2-MET, the two FA models (3FA and FA) did show slightly better predictability (0.938) than their non-FA counterparts (models 1 and 2) (0.916) ( Table 2), and there was a 2.4% increase in the predictive ability of these models with respect to simple linear model 1. For M3-MET, the best predictive   models were 3FA and 4FA, with correlations of 0.848 and 0.852, respectively. However, the percentage increases in correlations of these models over model 1 were 2.9% and 3.4% for models 3FA and 4FA, respectively, indicating that for more complex GE (M3-MET), the FA models increase the predictability more than the simple linear mixed model for less complex GE (M2-MET) ( Table 2).

Incorporating Pedigree Information into Linear-Bilinear Mixed Models While Modeling GE
As has been shown, the linear mixed version of SREG and AMMI naturally leads to a FA form for the genetic variance-covariance for environments that is more parsimonious and flexible than other variance-covariance structures. Since the above mentioned models are linear mixed models, they also have the usual advantages when compared with ordinary fixed effects linear-bilinear AMMI and SREG models. That is, error variance modeling can be accommodated, in particular, heterogeneity of block and error variance between environments and withinenvironment spatial correlation, and incomplete data are handled with ease. Furthermore, when genotypes are considered as random effects, the coefficients of parentage can be incorporated into the FA for modeling GE or GGE of the mixed versions of AMMI and SREG, respectively, hence obtaining more precise estimates of the breeding values of genotypes. The genetic covariance between any pair of related individuals (i and i'), due to their additive genetic effects, is equal to two times the coefficient of parentage (COP=f ii' ), also known as the coefficient of coancestry, times the additive genetic variance, i.e., 2f ii' 2 a σ =A 2 a σ , where A is the additive relationship matrix. In self-pollinated species, A 2 a σ is the variance-covariance matrix of the breeding values (additive genetic effects). Closely related individuals contribute more to the prediction of breeding values of their relatives than do less closely related genotypes. Moreover, when one genotype is missing (either partially or totally), its breeding value can still be predicted from its relatives, albeit less efficiently than if the data were complete. A linear mixed model used for fitting the data from g genotypes, s sites and r replicates, assuming the relationship of the genotypes is measured by the matrix COP=f ii' (of order g), is variances within the j th site, respectively; and ⊗ is the Kronecker (or direct) product of the two matrices. In covariance pattern models, it is assumed that residuals have a multivariate normal distribution with zero means and covariance matrix E. In this study, the structure of E assumes that the residuals of the field plots at each site (i.e., elements of vector e) are not spatially correlated, that is, However, when the field location of the plots is recorded, the matrix E could be modeled as a structure that is less restrictive than rg e I Σ ⊗ . Commonly, these spatial correlations among field plots are modeled using the two-dimensional auto-regressive procedure in the direction of the rows and columns in the field. The design effects given by the replicates and the incomplete blocks within replicates in environments can be easily incorporated into the above linear mixed model as random or fixed effects.
Vectors r , g , and e contain random effects of replicates within sites, genotypes within sites, and residuals within sites, respectively, and are assumed to be random and normally distributed with zero mean vectors and variancecovariance matrices R, G, E, respectively, such that The variance-covariance matrix G, which combines the main effect of genotypes (breeding values) and GE, can be represented as where the j th diagonal element of the s ×s matrix g Σ is the additive genetic variance 2 j a σ within the j th site, and the jj' th element is the additive genetic covariance j' a σ j a σ jj' ρ between sites j and j'; thus jj' ρ is the correlation of additive genetic effects between sites j and j'. The variance-covariance matrix G can be modeled using the FA structure. The matrix A=2f ii' is of order g×g and measures the relationship or covariance between relatives due to additive genetic effects. When genotypes are not related, A is replaced by g I (identity matrix of order g) [14,45] and the breeding value of each genotype will be predicted only by the value of the empirical response of the genotype itself.
Standard errors (SE) of BLUPs of breeding values of the lines for grain yield were smaller for models using information on relatives and when the main effect of genotypes and GE were modeled using FA structure of the G variance-covariance matrix (Fig. 1). The standard fixed linear-bilinear model 1 where all the effects are considered as fixed effects had the largest SEs, followed by a simple mixed linear-bilinear model 2 (with the GE considered as random effect) without A (without modeling GE) and a simple mixed linear-bilinear model 3 with A (without modeling GE). In contrast, the mixed linear-bilinear model 4 that models the GE using the FA and includes information on Only the SE of the BLUP of genotype 26 in NZL from model 4 was larger than those from models 2 and 3 (Fig. 1). The benefits, in terms of precision, of including information on related lines are evident when comparing Fig. (1). Standard errors of BLUPs of 29 wheat genotypes (1-29) within 16 sites for grain yield for four linear-bilinear models (models 1, 2, 3, and 4). Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), (28,29).  Model 4 models 2 and 3; model 3 is similar to model 2, but includes information on relatives. It is clear that the SEs of sister lines were always smaller than the SEs of genotypes that had no relatives in the trial in all cases except for models 1 and 2, which do not incorporate this information. When modeling the main effects of genotype and GE in conjunction with the information on relatives, the improvement in the precision of the BLUP of the sisters, as well as of the other lines, is evident. It should be pointed out that no model is selected based on the SE of the means because SEs are modeldependent and one might be dealing with an inappropriate model despite its small SE. Fig. (1) only shows the natural impact of performing a more realistic statistical analysis instead of ignoring the relationship between genotypes and environments.
The standard biplots of a fixed effect linear-bilinear model (model 1 that excludes matrix A) and model 4 (FA including matrix A) are depicted in Figs. (2 and 3), respectively. The biplot of model 1 (Fig. 2) is the usual biplot of a fixed SREG model that shows genotypes 19 and 20 as having a positive response in terms of genotype main effect and GE for most of the sites because they are in the same direction. Genotypes 16, 18, and 25, located on the opposite side of the biplot, have a negative response in all sites. Sites located farther away from the center, such as NZL, USA1, ZBW, ISRL, and SPN2, are the ones that discriminate the genotypes the most. Pairs of sister lines 19 and 20, and 14 and 15 are distinct from the others. Sister lines are scattered throughout the two dimensions of the biplot. Fig. (2). Biplot of the standard fixed effect linear-bilinear model 1 for grain yield. Lines are 1-29. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), (28,29). The 16 sites are MEX, USA1, USA2, TKY, ISR, BGD, IND, PKT, SYR, SPN1, SPN2, NPL, KNY, ZBW, NZL, and CHL (adapted from Crossa et al. [16]).
When the main effect of genotypes and GE are modeled together using the factor analytic variance-covariance model 4 (FA that includes information between relatives (Fig. 3)), sister lines with a strong genetic association are held together, but others, such as genotype 4, tended to be farther away from their sister lines (19 and 20), because their COP was 0.792; a similar situation occurred with sister lines 28 and 29, with a COP of 0.774. Although the general patterns of response in terms of directions and projections of genotypes and sites in the biplot were not altered by using different models, the inclusion of information between relatives in conjunction with modeling the main effects of genotypes and GE by means of a factor analytic structure offers the best option for predicting breeding values of genotypes across different environments and studying the main effects of genotypes and GE. Models with FA for GE and pedigree information gave a more realistic view of the relationship between the genotypes themselves, between sites, and between their interactions. Fig. (3). Biplot of the mixed linear-bilinear model 4 with FA for GE and pedigree information (A) for grain yield. Lines are 1-29. Sister lines are in bold (5,6), (4,19,20), (14,15), (21,23,24), (28,29). The 16 sites are MEX, USA1, USA2, TKY, ISR, BGD, IND, PKT, SYR, SPN1, SPN2, NPL, KNY, ZBW, NZL, and CHL (adapted from Crossa et al. [16].

STATISTICAL MODELS FOR MAPPING QTLS AND STUDYING QTL × ENVIRONMENT INTERACTION (QEI)
This section is related to the issue of understanding the nature and causes of interaction. The factorial regression (FR) model (e.g., Vargas et al. [46] and van Eeuwijk et al. [47]) is useful for studying the effects of both genetic and environmental covariables and for developing functional relationships and predictability with explanatory covariables. In plant breeding, much research is directed at locating regions of the chromosomes that are involved in the physiological processes underlying phenotypical traits. These regions are called quantitative trait loci (QTLs). When the relative contributions of these regions to the explanation of the phenotype differ between genotypes, then QTL × environment interaction (QEI) occurs. The statistical problem can be interpreted as a multivariate multiple regression of phenotypic traits as observed over a set of environments on a set of genetic predictors. FR provides a suitable framework for QEI analysis. Crossa et al. [48] give examples of how FR can be used for assessing the chromosomal location of QTLs and QEI and the importance of their effects.
There are approaches in which the GE is modelled directly using regression on environmental (and/or genotypic) variables, rather than regression on the environmental mean, as originally proposed by Yates and Cochran [37]. A useful linear model for incorporating external environmental (or genotypic) variables is the FR model [49,50]. The FR models are ordinary linear models that approximate the GE effects of Eq. 1 by the products of one or more of the following: (1) genotypic covariables (observed) × environmental potentialities (estimated); (2) genotypic sensitivities (estimated) × environmental covariables (observed); (3) scale factor (estimated) × genotypic covariables (observed) × environmental covariables (observed). The aim of FR is to replace, in the GE subspace, genotypic and environmental factors with a small number of genotypic and environmental covariables. Vargas et al. [51] further developed the statistical approaches described by Crossa et al. [48] and van Eeuwijk et al. [52] for modeling QTLs and QEI. The main objectives of their research were to demonstrate the use of: (1) FR for estimating effects and locations of QTL and QEI, and (2) FR for modeling and interpreting QEI in terms of products of genetic predictors and environmental variables.
In FR, genotypic covariables x a (a =1…A), with values x ia for the i th genotype can be introduced for the genotypic main effect, residual a ρ ia is the regression coefficient for the regression of G i on the genotypic covariables, x ia . For more than one genotypic covariable, this becomes residual In the context of molecular markers and when attempting to map QTLs, the genotypic covariables, x a , are replaced by the genetic predictors x q , and the FR framework can also be used to do a genome scan for QTL effects. Analogous to the genotypic main effect, in FR, the environmental main effect, E j , can also be regressed on environmental covariables z b with values z jb for the j th environment.
. F-tests can be constructed from ratios of regression mean squares to independent error term.
An example including 211 F2-derived F3 maize families from a biparental cross evaluated across eight environments differing in the level of drought stress and soil nitrogen content is employed to illustrate the use of the fixed effect FR linear model for mapping QTLs and for studying the influence of external covariables on QEI [51]. Table 3 shows parts of the analysis of variance table at position 63 cM on chromosome 10 [51]. The first part shows the usual analysis of variance for a two-way table of grain yield measured in 211 genotypes with partitioning of the joint effect of G+GE into G and GE effects. The middle part of Table 3 shows the variability due to QTL+QE effects in parts of the genome other than chromosome 10 (i.e., due to QTLs on chromosomes 1-9), the variability due to G+GE after correction for QTLs on the other chromosomes, and the corresponding partitioning into G and GE components. The last part shows the partitioning of G+GE adjusted for the QTLs on chromosomes 1-9 into variation due to QTL+QE at position 63 cM on chromosome 10 and deviations from the QTL model. When the QEI effects were regressed on the set of environmental covariables, maximum temperature at flowering showed the closest relationship with the QEI effects and explained 23.8% of the QEI. The effect of this environmental covariable was highly significant by an F-test for the regression mean square over the deviations from the regression.
For grain yield, Fig. (4) depicts the profile of R 2 QTL , R 2 QEI , and R 2 QTL+QEI and the corresponding critical values for α=0.01 based on 1000 randomizations. There is good reason to believe that there are environment-specific QTLs around 63 cM of chromosome 10 (QTL+QEI and QEI effects were both significant). In contrast, only main effect QTLs were observed in other chromosomes.

Linear Mixed Models for Multi-Trait Multi-Environment QTL Analysis
Plant breeders are interested in evaluating genotypes for multiple traits. A general formulation of a linear mixed model for the multi-trait multi-environment (MTME) model is presented by Malosetti et al. [53]. The initial model is e Zu Xβ y where the response variable y is modelled by fixed and random effects factors ß and u, respectively; X and Z are design matrices assigning fixed and random effects to the observations, respectively. Random genetic effects are assumed to be normally distributed, u~N(0,G), with G the genetic (co)variance matrix. Finally, e is a vector of nongenetic residuals associated with each observation and normally distributed, e~N(0, R). The phenotypic (co)variance is given by V(y) = ZGZ' + R. From a breeder's point of view, G is of special interest, as it reflects the magnitude and pattern of relationships between genetic effects of traits. Exploiting the genetic correlations between traits is useful because it may be a primary trait with low heritability or a difficult-to-measure primary trait that is correlated with an easy-to-measure secondary trait with high heritability.
A QTL model arises by including the effect of a putative QTL as follows: The extra term in the model is composed of a design matrix X QTL , which is derived from molecular marker information and a vector of fixed QTL effects (α). In an MTME model, vector α has dimensions JK×1 and contains the additive genetic QTL effects for all the traits in each of the environments. The random genetic effects, now collected in a vector u*, result from the effects of QTLs outside the test region, that is, the genetic background. Genetic background effects are assumed to be normally distributed: u ~ N(0, G*). Note that G* represents the part of the genetic (co)variance that is not explained by the QTL. The extension from a single-QTL model to a multi-QTL model is straightforward and given by An example of the application of the MTME is presented in a wheat trial (S. Singh and M. Vargas, personal communication), where four diseases (Karnal bunt, tan spot, yellow rust, and leaf rust) were simultaneously measured in a biparental progeny. A multi-trait QTL scan was performed and the various significant QTLs detected are depicted in Figs. (5 and 6). Fig. 5 shows the joint profile and the two significant peaks for the same trait, Karnal bunt, one between 20 and 20 cM and the other one at about 95 cM chromosome 3A. On the other hand, Fig. 6 shows two significant peaks for two different traits, one for yellow rust at the beginning of chromosome 5B and another peak for Karnal bunt at about 20 cM. These findings suggest that some diseases are determined in similar regions of the chromosomes; they should be studied jointly rather than by single-trait QTL mapping.  Fig. (6). Profile of LOD of chromosome 5B from the multi-trait analysis. The LOD profile of the joint analysis of all four traits (red) and the LOD of the marginal analyses for Karnal bunt (green), leaf rust (blue), tan spot (light blue) and yellow rust (black). The LOD threshold for the joint profile is the red horizontal line and the LOD threshold for the traits is the blue horizontal line.

STATISTICAL MODELS FOR GENOMIC SELEC-TION AND PREDICTION
Selection in plant breeding is based on estimates of breeding values obtained with pedigree-based mixed models [16][17][18][19][20]. These models have been used successfully for predicting breeding values in plants and animals. However, pedigree-based models cannot account for Mendelian segregation, a term that under an infinitesimal additive model [23] and in the absence of inbreeding, explains one half of the genetic variability. Molecular markers (MM) Genomic selection (GS) (or genome-wide selection) is an approach for improving quantitative traits [29,35] that uses all available MM across the genome to estimate genetic values. Genomic selection has been validated in several species and populations in animal breeding [35,36]. However, reports on the use of GS in plants are few and refer mainly to computer simulation studies such as the research of Bernardo and Yu [30], who concluded that GS was superior to marker assisted selection in maize. In recent articles, de los Campos et al. [31], Pérez et al. [34], and Crossa et al. [32,33] validated GS in plant breeding using genomic regression and showed that models using MM were more accurate in predicting grain yield in wheat and maize than those based on pedigree only.
One method for incorporating markers into models for GS is to define genetic values This approach was first proposed by Meuwissen et al. [29] and is the most commonly used in GS. With high-density markers, the number of markers exceeds the number of individuals, and estimation of marker effects via ordinary least squares (OLS) is not feasible. Instead, penalized or Bayesian estimation methods are commonly used. Several penalized and Bayesian shrinkage estimation methods are available. Examples of the first group are Ridge Regression (RR) and the Least Absolute Shrinkage and Selection Operator (LASSO) of Tibshirani [54] and Elastic Net (EN). The list of Bayesian models is extensive and includes Bayesian counterparts of RR (BRR) and LASSO [55] (BL) and the Elasctic Net (BEN).
We considered models for GS that differ in the information used (pedigree, molecular markers or both) and in the way molecular markers are incorporated into the model (parametric regression or semi-parametric regression). In all cases, the response variable is the average standardized performance of each line within each environment, that is are vectors of average phenotypes and genetic values, respectively, and 2 ε σ is a residual variance.
A standard additive infinitesimal model (e.g., Fisher [25] and Henderson [57]) postulates that genetic values are multivariate normal, centered at zero, and with a co-variance matrix proportional to the numerator relationship matrix (A) computed from the pedigree, that is, , and 2 a σ is the additive variance. The collection of unknowns in this model and in a Bayesian setting, a prior density is assigned to these unknowns. Following standard assumptions, independent scaled inverse Chi-square distributions for the residual and the additive variance are chosen. The joint prior of this pedigree-based model becomes: − is a scaled-inverse Chi-square density, and df and S are prior degree-of-freedom and scale parameters, respectively. The joint posterior distribution for this model is obtained combining the likelihood and prior, such that: Inferences are based on samples obtained using a Gibbs sampler.
An alternative is to replace A with kinship matrix (U) estimated using marker genotypes. A kinship-based infinitesimal model (K) is obtained using

Penalized Parametric Regression on Marker Effects
In ridge regression, estimates are obtained by minimizing the residual sum of squares, x β subject to the following constraint: t j 2 j β ≤ ∑ or, equivalently, The solution to this optimization problem can be shown to be , where ( ) 0 t λ > is a regularization parameter that induces shrinkage of estimates of effects towards zero and controls the trade-off between goodness of fit amd model complexity. Even though these estimates are biased, the sampling variance is reduced, yielding smaller mean-squared error and better predictive ability.
An alternative to ridge regression is to use LASSO [54]. Estimates in LASSO are obtained by minimizing the residual sum of squares, subject to the following constraint: t j j β ≤ ∑ or, equivalently, simultaneously as some predictor effects may take the value of zero. When predictors are highly co-linear empirical evidences show that RR has better prediction accuracy than LASSO; this situation is common in GS with dense molecular markers.
One variant of the traditional LASSO is the elastic net LASSO, which differs from LASSO in that it uses two penalties the L 2 and the L 1 norms such that The advantage of the EN is that, by adding another penalty, it stabilizes the LASSO solution when some predictors are highly correlated; such is the case of MMs used in GS. Two other variants of LASSO are the group LASSO and the fused LASSO. Group LASSO selects variables at a group level such that some groups of predictors are selected together; this may be useful when the researcher, instead of examining the effect of individual molecular markers wishes to examine the effects of haplotypes (genes) comprising several molecular markers in high linkage disequilibrium. Group LASSO could also be useful when there are more than two alleles for each molecular marker and the breeder wishes to keep all the alleles of the same MM active in the model. The fused LASSO focuses on adjacent predictors such that the value effects tend to be the same for adjacent predictors. This can be useful when there is a natural ordering of predictors, for example, when markers have been ordered on a common map.

Bayesian Shrinkage Regression Methods
Estimates of regression coefficients derived from penalized optimization problems such RR, LASSO or EN are equivalent to posterior modes in certain class of Bayesian Models. In the Bayesian approach, inferences are based on the posterior distribution of the unknowns (hyperparameters, H) given the data (y), p(H|y). Following Bayes' rule, this density is proportional to the product of the conditional distribution of the data given the unknowns, p(y|H), or the Bayesian likelihood times the prior density assigned to model unknowns p(H). In the models we are interest, the conditional distribution of the data given the parameters p(y|H) can be represented as the the product of independent normal densities centered at the regression function, j β and with common residual variance 2 ε σ , that is Marker effects are assigned identical and independent normal prior densities, where ω represents hyper-parameters indexing the prior density of marker effects. Following Bayes' rule, the posterior distribution is: The choice of prior density, ) p( ω | β , determines whether this posterior mode is equivalent to estimates obtained with the BRR, BL, BEN. In BRR the prior density of marker effects is Gaussian, centered at zero and with common variance, that is, prior-variance of marker effects. In BL the prior assigned to marker effects is Double-Exponential (DE) centered at zero and with inverse-scale parameter, 2 ε λ/σ , that is, The prior density of the Bayesian LASSO represents a compromise between the normal and DE densities. Relative to the Gaussian density, the DE places higher mass at zero and have thicker tails, inducing a different type of shrinkage. In particular, relative to BRR, the prior used in the BL induces stronger shrinkage towards zero of estimates of effects of predictors having weak association with the response, and not as much shrinkage of estimates of effects of predictors having strong association with the response.

Penalized Semi-Parametric Regression on Marker Effects
An alternative to parametric regressions is to use semiparametric methods such as reproducing kernel Hilbert spaces (RKHS) regression [56]. A Bayesian RKHS regression for molecular markers regards genetic values as random variables coming from a Gaussian process with a (co)variance structure that is proportional to a kernel matrix K, that is, vectors of marker genotypes for the i th and j th individuals, respectively, and ( ) .,. K is a positive definite function evaluated in marker genotypes. One of the advantages of RKHS regression is that it can be used with almost any information set (e.g., covariates, strings, images, graphs). An advantage of RKHS is that the model is represented in terms of n unknowns, which gives RKHS a great computational advantage relative to parametric methods, when p>>n. A disadvantage of the RKHS is that no individual marker effect can be estimated.
The geostatistical models of Piepho [58] that uses the ridge regression BLUP has also the advantage of having only n instead of p unknowns when p >> n. The RKHS applied to marker data is related to the spatial models because the ridge regression BLUP corresponds to a spatial model with a quadratic covariance model, while RKHS just replaces this quadratic model with a Gaussian model.

Do Some (or all) Markers (Genes) have Different (or the same) Response Patterns in Environments?
Genomic selection is designed to improve complex traits; it focuses mainly on prediction and, as the number of markers increases, prediction of the genomic estimated breeding values (GEBV) is expected to become more accurate, whereas the marker effect is expected to decrease in absolute magnitude. However, parametric regression models also provide the opportunity to examine marker effects and study the possible differential response of markers in environments, that is, the gene × environment interaction effect. In general, the previously examined Bayesian shrinkage methods do not have an associated test for detecting chromosome regions; however, they can be routinely used for examining marker effects in certain chromosome regions.

Multivariate Analysis of Estimated Marker Effects in Environments
Parametric models such as those described in the previous section yield estimates of marker effects that are environment-specific. In the previous sections, we described how biplots [5,43,6] from singular-value decomposition can be used to assess GE. We used these techniques to study GE at the level of estimated marker effects.
are ortho-normal matrices that span the row (marker) and column (environment) spaces of B , respectively, and q q× D is a diagonal matrix whose nonnull entries are the singular values of B , that is, The biplot is constructed using the first and second components, that is, 1 α , 2 α , 1 γ , and 2 γ . Points in the biplot are the marker effects projected in the first two components, and are displayed using the coordinates provided by 1 α and 2 α . The "environmental effects" are displayed as vectors whose coordinates are given by 1 γ and 2 γ . The length of the vectors approximates the variance accounted for by the specific molecular marker and "environmental effect." Molecular markers represented in the same direction as the environments had positive effects on those environments, whereas molecular markers located in the opposite direction of the environmental vectors had negative effects on those environments. The cosine of the angle between two environments (or molecular marker effect) approximates the correlation of the two environments (or molecular marker), with an angle of zero indicating a correlation of +1, an angle of 90° (or -90°) a correlation of 0, and an angle of 180° a correlation of -1.
Biplots of marker effects for several maize datasets comparing maize lines genotyped with several hundred markers and phenotyped in different environments for male and female flowering and some diseases are shown in Crossa et al. [32,33]  Traits included here were female flowering (FFL) (or days to silking), male flowering (MFL) (or days to anthesis), as well as the anthesis-silking interval (ASI) evaluated in 300 lines under severe drought stress (SS) and in wellwatered (WW) environments [32]. The display of the first two component axes on estimated effects of the SNP markers in the six trait-environment combinations (MFL-SS, MFL-WW, FFL-SS, FFL-WW, ASI-SS, and ASI-WW) obtained from the M-BL model is depicted in Fig. (7). The correlation between trait-environment combinations using marker effects and phenotypic data, and the effects of the    Tables 4 and 5, respectively. Table 4 shows the correlations between phenotypic data (upper triangular) and between estimates of marker effects (lower triangular) from the analysis of six trait-environment combinations (ASI-SS, ASI-WW, FFL-SS, FFL-WW, MFL-SS, and MFL-WW) of the maize flowering trial data. Some trait-environment combinations are highly correlated (both phenotypically and genetically). The pattern of correlations between estimated SNP effects reflects the patterns of observed phenotypic correlations [32].
Clearly the two groups of trait-environment combinations are dominated more by the trait (ASI vs FFL and MFL) and less by the environment (SS and WW). Phenotypic outcomes and estimates of marker effects for ASI showed relatively small correlations with those of FFL and MFL; this is because ASI is defined as the difference between FFL and MFL, and these two traits are positively correlated.
Markers with relatively large (in absolute value) estimated effects are identified by name in Fig. (7), and their effects are shown in Table 5. The marker effect on these traits should be interpreted differently than their effect on grain yield, since the favorable marker allele decreases both female and male flowering times, whereas for ASI, the optimal marker should give an ASI of 0. The alleles coded as 1 of SNPs whose estimated effects are located in the left and upper left corner of the biplot (i.e., PZA03551.1, PZA03578.1, PZA03222.1, PZA03385.1, PZB01201.1, and PZB00118.2) increase FFL, MFL, and ASI (they all have positive effects on all trait-environments combinations), whereas those SNPs located on the opposite side of the biplot (lower right corner) (i.e., PZA02587.16, PZA00236.7, Despite the high heritability (between 0.74 and 0.87) found for flowering time and ASI in this maize trial, results show substantial interaction between molecular marker effects and environment. The biplot in Fig. (7) shows SNPs that had very contrasting effects across environments. For example, the minor alleles of SNPs whose estimated effects are located in the upper right corner of the biplot (PZA03592.3, PZB01077.3, and PZB02076.1) increase the anthesis-silking interval under drought and well-watered conditions ( Table 5), but decrease days to male and female flowering. In contrast, the minor alleles of SNPs whose estimated effects are located in the opposite quadrant of the biplot (lower left corner) (PZB00592.1, PHM13183.12, and PZB01964.5) showed a complete rank reversal with respect to the effects of SNPs PZA03592.3, PZB01077.3, and PZB01077.3 on those trait-environment combinations, i.e., a decrease in ASI under SS and WW, and an increase in male and female flowering times. These results are suggestive of important molecular marker effect × environment interaction, which in turn causes genotype × environment interaction.

Patterns of Co-Variability of Estimated Marker Effects Across Environments for Exserohilum Turcicum (NCLB) in Maize Genomic Data
This section shows marker effects from northern corn leaf blight (NCLB) disease caused by the fungus Exserohilum turcicum and evaluated in three international environments: El Batán (Mexico), Harare (Zimbabwe), and Mpongwe (Zambia) [31]. The patterns of marker effects in three environments for Exserohilum turcicum (NCLB) in the maize data are depicted in Fig. (8). The first two principal components explained 83.58% of the total variability in estimated SNP effects. The correlation among the three environments (based on phenotypic data or on estimates of marker effects) was 0.50 between Mpongwe and Harare and 0.26 between El Batán and Mpongwe and between El Batán and Harare. This pattern of correlations (depicted in Fig. 8) may indicate the presence of different races of NCLB in these environments, i.e., a pathogen population in El Batán (Mexico) that may be different from populations in Harare and Mpongwe (Southern Africa).
The 16 SNPs with the largest effects (positive or negative) for NCLB are located farthest on the biplot in Fig.  (8). The presence of the allele coded with 1 of the SNPs in group 1 should confer some degree of resistance to NCLB across environments. The opposite is true for SNPs in groups 2 and 3 (Fig. 8); here the presence of the allele coded with 1 is expected to favor the disease in El Batán (group 2) and Harare and Mpongwe (group 3). Selection should therefore aim at decreasing this allele. It should be noted that caution must be exercised when interpreting the above results because they are based on one-year data. The prevalence of NCLB races can change from year to year, depending on changes in environmental conditions (e.g., temperature, rainfall, and relative humidity). The variance explained by these 16 SNPs with the largest positive or negative effects ranged between 0.3-1% of the total variance explained by all SNPs.

CONCLUDING REMARKS ON THE PLANT GENOMIC STUDIES
The results of the studies by de los Campos et al. [31] and Crossa et al. [32,33] are encouraging; they indicate that models for GS can attain high predictive ability for genetic values of traits of economic interest under contrasting environmental conditions. The results indicate that genomic selection can be used effectively for selecting individuals whose phenotypes for various traits and in various environments have yet to be observed. As the number of available markers increases (as is expected in the near future), larger gains in predictive ability may be attained.
An advantage of models that include a parametric regression on marker covariates, such as M-BL, is that, in addition to estimating genetic values, they also provide information on (estimates of) "marker effects." This information can be used to attain a better understanding of the genetic architecture of the traits under study and examine the patterns of response of marker effects across environments. In these studies, separate models were fitted to each trait-environment combination. An alternative to these single-environment models for genomic selection is to use multi-environment (or, equivalently, multi-trait) models where genetic values and marker effects on several traits/environments are jointly estimated. Multi-environment models allow borrowing information between correlated environments; thus it can be speculated that multienvironment genomic models can yield similar or even better predictions for individual environments. The literature on genomic selection has focused on single-trait models only; therefore the development of multi-environment and multitrait models for genomic selection appears to be a natural next step.
Interestlingly, the results of these studies can also be used to generate a better design for field evaluations. For example, they show that prediction of unobserved lines in any of the correlated environments should be relatively accurate, and the scheme for testing these lines in any of those environments should be planned accordingly. It can be speculated that only one of the sets of correlated environments should be included in the trial, because any information lacking for the other environments can be borrowed from the one in use. However, unobserved lines in low correlated environments are expected to be poorly predicted. For example, concerning the analysis of NCLB in the maize data, results showed that estimated marker effects in the two African sites (Mpongwe and Harare) were positively correlated; however, they were negatively correlated with estimates of marker effects obtained for the Only the effects of 24 SNPs located far from the center of the biplot were identified with the names of their corresponding SNPs. Three groups of environments and molecular markers are delineated as groups 1, 2, and 3 (from Crossa et al. [33]). same trait in Mexico (El Batán), which is suggestive of marker effect × environment interaction.
The analysis of genetic and genomic data is one of the most challenging and interesting mathematical and statistical problems researchers currently face. Therefore, different models from different areas of mathematical statistical research must be attempted in order to achieve a significant advance in understanding genetic effects. Models that consider complicated systems in biology and are able to study complex gene × gene interactions as well as gene × environment and gene × gene × environment are needed not only for improving prediction of the genetic values of individuals but also for understanding the chromosome regions related to important traits and the main climatic factors affecting these genomic regions. In a recent comprehensive study, Burgueño et al. [59] present multienvironment (multi-trait) models for GS and compare the predictive accuracy of these models with: (a) multienvironment analysis without pedigree and marker information, and (b) multi-environment pedigree or/and marker-based models. The authors described a statistical framework for incorporating pedigree and molecular marker information in models for multi-environment data and applied it to data that originated from wheat multienvironment trials. Two prediction problems relevant to plant breeders are considered: predicting the performance of untested genotypes ("newly" developed lines), and predicting the performance of genotypes that have been evaluated in some environments, but not in others. Results confirmed the superiority of models using both marker and pedigree information over those based on pedigree information only. Models with pedigree and/or markers had better predictive accuracy than simple linear mixed models that do not include either of these two sources of information. Burgueño et al. [59] concluded that the evaluation of such trials can benefit greatly from using multi-environment GS models.