Strategies to assure optimal trade-offs among competing objectives for genetic improvement of soybean

Plant breeding is a decision making discipline based on understanding project objectives. Genetic improvement projects can have two competing objectives: maximize rate of genetic improvement and minimize loss of useful genetic variance. For commercial plant breeders competition in the marketplace forces greater emphasis on maximizing immediate genetic improvements. In contrast public plant breeders have an opportunity, perhaps an obligation, to place greater emphasis on minimizing loss of useful genetic variance while realizing genetic improvements. Considerable research indicates that short term genetic gains from Genomic Selection (GS) are much greater than Phenotypic Selection (PS), while PS provides better long term genetic gains because PS retains useful genetic diversity during the early cycles of selection. With limited resources must a soybean breeder choose between the two extreme responses provided by GS or PS? Or is it possible to develop novel breeding strategies that will provide a desirable compromise between the competing objectives? To address these questions, we decomposed breeding strategies into decisions about selection methods, mating designs and whether the breeding population should be organized as family islands. For breeding populations organized into islands decisions about possible migration rules among family islands were included. From among 60 possible strategies, genetic improvement is maximized for the first five to ten cycles using GS, a hub network mating design in breeding populations organized as fully connected family islands and migration rules allowing exchange of two lines among islands every other cycle of selection. If the objectives are to maximize both short-term and long-term gains, then the best compromise strategy is similar except a genomic mating design, instead of a hub networked mating design, is used. This strategy also resulted in realizing the greatest proportion of genetic potential of the founder populations. Weighted genomic selection applied to both non-isolated and island populations also resulted in realization of the greatest proportion of genetic potential of the founders, but required more cycles than the best compromise strategy.


Background
Historical responses to selection of commodity crops has been enabled by decreasing the number of years between cycles of recurrent selection, by increasing the number of replicable genotypes (selection intensity) and by increasing the number of field trials (heritability on an entry mean basis). In other words, genotypic improvements from responses to selection in commodity crops over the last 50 years (Specht et al. 2014) required monetary investments that became part of the exponential rise in seed costs during the same time (Byrum et al. 2017;USDA-ERS 2020). Since the emergence and adoption of Genomic Selection (GS), it has been possible to increase the numbers of genotypes that are evaluated, i.e., selection intensity, without significant increases in numbers of field plots (Bernardo 2007(Bernardo , 2008Asoro et al. 2011;Heslot et al. 2012;Nakaya and Isobe 2012;Emily and Bernardo 2013;Crossa et al. 2014;Beyene et al. 2015;Bassi et al. 2016;Marulanda et al. 2016;Jonas and de Koning 2016;Hickey et al. 2017;Goiffon et al. 2017).
In a companion manuscript we reported an investigation of various factors on response metrics to recurrent selection of soybean lines derived from founders of the SoyNAM population (Ramasubramanian and Beavis 2020). The combinatorial set of factors consisted of phenotypic selection (PS) and four commonly used GS methods, training sets (TS), selection intensity (SI), number of QTL (nQTL), and heritability (H) on an entry mean basis. While interactions among all factors affected all response metrics, only the impacts of GS methods, SI and TS are factors that plant breeders can control. All GS methods provided greater responses than PS for at least five cycles, but PS provided better responses to selection as response from GS methods reached a limit. These results are consistent with reports by Goddard (2009), Jannink (2010 and Liu et al. (2015) demonstrated that the full genotypic potential of the founders is eliminated more quickly with GS than PS. In terms of factors that a soybean breeder can control, we found that SI's of 1.75 and use of Ridge Regression Genomic Prediction (RRGP) models updated every cycle with training data from all prior cycles of selection provided rapid response in the early cycles of selection and retention of genetic diversity for continued response to selection in later cycles, but we noted that further improvements might be made if the populations were organized into islands and mating designs other than the hub network were employed (Ramasubramanian and Beavis 2020). Herein we investigate strategies that soybean breeders can employ to find optimal tradeoffs between maximizing genetic gain from selection and retaining useful genetic diversity.
The challenge of realizing genetic gains from selection and retaining useful genetic diversity in closed populations has been of interest since it was demonstrated that there are theoretical limits for response to selection in closed populations (Hill and Robertson 1968;Bulmer 1971). Tradeoffs among objectives don't prohibit finding optima as long as optimality is defined as a compromise among competing objective functions (Deb 2003;Konak et al. 2006;Shoval et al. 2012;Sheftel et al. 2013;Saeki et al. 2014).
Before the development of GS, quantitative geneticists working on domestic animal systems utilized mathematical programming modeling and operations research (OR) approaches to find near-optimal solutions to the challenge of assuring genetic gain and minimizing inbreeding per cycle of selection (Wray and Goddard 1994). The first publication using OR approaches to address multiple objectives in plant breeding was applied to selection of multiple traits (Johnson et al. 1988). Generally OR approaches involve three activities: 1) define objectives using measurable metrics, 2) translate the objectives into mathematical programming models consisting of objective functions, decision variables and constraints, 3) find an algorithm that will provide values for the decision variables resulting in optimal solutions to the mathematical programming model (Rardin 2017).
If the genetic improvement project wants to assure genetic gain and retain useful genetic diversity then there are two competing objectives for which a trade-off needs to be optimized.
This represents an example of a multi-objective optimization (MOO) problem (Deb 2003(Deb , 2011Rardin 2017). After translating each of the objectives into an objective function, there are several strategies for finding the optimal solution (Deb 2003). The two most commonly used strategies are known as the ε -constraint and the weighted sum. The ε -constraint method consists of identifying one of the objectives, e.g., maximize genetic gain, and translate other objectives, such as minimize inbreeding, into decision variables that can be constrained in a linear, integer or quadratic mathematical programming model (Haimes 1971). In other words, translate the MOO mathematical model into a Single Objective Optimization (SOO) model for which there exist computational algorithms capable of finding the optimum solution (Frank and Wolfe 1956;McCarl et al. 1977;Lazimy 1982). Framing the ε -constraint method requires definition of metrics for genetic diversity or inbreeding. In animal breeding this method became known as Optimum Contribution Selection (OCS: Wray and Goddard 1994;Brisbane and Gibson 1995;Meuwissen 1997;Grundy et al. 1998;Meuwissen et al. 2001). Subsequent to development of GS, OCS was modified to maximize Genomic Estimated Breeding Values (GEBVs) and the realized relationship matrix was used to constrain inbreeding in what became known as Genomic OCS (GOCS) (Sonesson et al. 2010;Schierenbeck et al. 2011;Woolliams et al. 2015).
The second well-established approach to a MOO challenge is known as the weighted sum method. The weighted sum method assigns weights, i  [0, 1] and  i =1, to each of the 'i' objective functions and an algorithm is employed to find the values for the decision variables that minimize all objective functions simultaneously (Zadeh 1963). Breeders will recognize the weighted sum method as a selection index composed of weighted parameters for genetic gain and inbreeding, or equivalently genetic diversity. If genomic information is available, GEBV's can be used to maximize genetic gain and the realized relationship matrix can be used to minimize inbreeding resulting in a genomic selection index (GSI) that can be calculated for all genotypes. (Carvalheiro et al. 2010;Clark et al. 2013).
Both ε-constraint and weighted sum methods are referred to as preference methods (Deb 2003) where the constraints or relative weights have been predetermined. For defined preferences there exist exact optimization algorithms if Karush-Kuhn-Tucker (KKT) conditions are met (Karush 1939;Kuhn and Tucker 1951). An exact optimization solution guarantees that no other feasible solution will be a better solution for the specified set of constraints or weights. Unfortunately, it is difficult to predetermine these values because they require forecasting the relative economic values of genetic gains and retention of useful genetic diversity. For commercial plant breeding projects competition in the marketplace will force much greater emphasis on maximizing genetic gains than retaining genetic diversity. In contrast public soybean breeders have an opportunity, perhaps an obligation, to retain useful genetic diversity while realizing genetic gains for quantitative traits of agronomic importance. Because each plant breeding project has unique relative trade-offs, evolutionary algorithms (EAs) have been adopted to provide multiple solutions on an efficient (Pareto) frontier of solutions to competing objectives (Deb 2003(Deb , 2011Konak et al. 2006). Decision makers then decide which of the solutions have the appropriate relative emphasis on the competing objectives.
Genetic algorithms (GAs) are a class of EAs that are based on recurrent selection of breeding populations and were developed to find computational solutions to large combinatorial problems (Goldberg 1989;Luque 2011). In a canonical GA, selected solutions are pooled together into a set of solutions. Subsequently the individual solutions are randomly sampled for pairwise "matings" to create a new set of solutions for evaluation and selection. The algorithm is iterated until there are no improvements in the sets of solutions. Computational analogs of mutation or recombination, are utilized to move from local optima to global optima. A subclass of GAs, known as parallel GAs maintain structure among subsets of individual solutions and enable the subsets to independently find different solutions for different domains (Luque 2011). The parallel GA system is analogous to the concept of genetic subpopulations (Falconer and Mackay Figure 3.2, 1996). Island Model GAs allow for exchange of solutions among subpopulations.
Island model GAs (IMGAs) are distinct from canonical GAs in terms of properties and behavior because evolution happens locally, within island, and globally, among islands. Island model parameters consist of number of islands, island size, selection pressure within each islands, numbers of migrants, migration frequency, connectedness or topology of islands and emigration and immigration policies among islands (Whitley 1999;Skolicki 2007 a, b).
Rather than investigate the trade-off between objective functions, Jannink (2010) proposed that it would be possible to retain useful genetic diversity in GS by weighting low frequency alleles with favorable estimated genetic effects. Simulations with Weighted Genomic Selection (WGS) resulted in greater responses across 24 selection cycles of recurrent selection than unweighted GS, using RRBLUP values, for both low and high heritability traits. However, the initial rates of response using WGS were less than responses from application of PS and less than GS. The response using WGS was better than response from PS after twenty cycles of selection, but the responses relative to GS depended on the number of simulated QTL and heritability. Decay of LD between marker and QTL is one of the factors that can slow responses using GS relative to PS Xavier et al. 2016), although decay of LD did not contribute to responses in the initial cycles using WGS. The rate of inbreeding per cycle is also greater with GS than with PS, whereas it is similar to PS when WGS is applied. The rate of fixation of favorable alleles is lower for WGS than GS resulting in larger numbers of cycles of genetic improvement before response to selection reaches a limit (Jannink 2010). Efforts to balance the response in early cycles and later cycles have included addition of parameters to WGS (Sun and Van Raden, 2014) and dynamic weighting of rare alleles depending on the time horizon for the breeding program (Liu et al., 2015). Low frequency favorable alleles are given greater weights, drawn from a Beta distribution, in initial cycles, and the weights tend towards unity as the number of cycles of selection approaches a predefined time horizon. This shifts the balance towards retaining greater genetic variance in earlier cycles.
The applications of GS, GOCS, GSI and WGS assume that selected individuals will be randomly mated. Typically, plant breeders do not randomly mate selected genotypes, rather most use selected genotypes that exhibit the most desirable selection metrics, e.g., GEBVs, to serve as "hub" parents in networked crossing designs (Guo et al. 2013;Guo et al. 2014). Such Hub-Network (HN) mating designs (MD's) apply greater weights to genetic contributions from hub genotypes resulting in amplified loss of genetic diversity relative to random mating by reducing the effective population size.
As soybean breeders have become aware of the potential impacts due to loss of genetic diversity from use of GS, they have used various ad hoc methods to avoid crosses between related genotypes (Diers, Graef, Lorenz, Cianzio, Singh, personal communications). After quantitative geneticists working on animal breeding systems demonstrated that it is possible to use the GSI strategy with an EA to identify optimal pairs of mates (Kinghorn 2011;Pryce et al. 2012;Woolliams et al. 2015), plant quantitative geneticists developed and investigated various versions of GSI and GOCS for plant breeding (Akdemir and Sanchez 2016;Lin et al. 2017;Cowling et al. 2017;Beukelaer et al. 2017;Gorjanc et al. 2018;. Notice that the computational demand to find the optimum on the non-decreasing efficiency frontier created by all possible constraint values or relative weights in all 2 NC mating pairs is particularly well suited for application of EA's. Also, it should be noted that Akdemir and Sanchez (2016) referred to their implementation of GOCS as efficient GS. Last, we note that optimal mate selection has been referred to as optimal cross selection in plant breeding applications (Gorjanc et al. 2018;, unfortunately with the same acronym as OCS. To distinguish optimal cross selection from optimal contribution selection, we do not use an acronym for optimal cross selection. In addition to evaluating traditional PS, GS, and GOCS, Akdemir and Sanchez (2016) proposed and evaluated a novel mathematical programming model, referred to as genomic mating (GM).
They formulated the problem as minimizing a linear function of inbreeding plus a negative risk function for the realized relationship matrix of Np possible parents. Inbreeding is a function of the expected genetic diversity among Nc progeny from the Np parents and is weighted by a parameter that controls allelic diversity among all Np parents. Risk is determined for each cross as the sum of the expected breeding values of the progeny plus the expected standard deviations of marker loci weighted by a parameter that controls allelic heterozygosity of the relative contributions of the marker loci to the GEBVs. Thus, risk is similar to the usefulness criterion defined by Schnell (1983 as cited in Melchinger et al. 1988) of a selected proportion of the population and the weighting parameter reflects the breeders' emphasis on its importance. They demonstrated that their GM formulation is equivalent to an optimization problem of minimizing inbreeding subject to defined level of risk, denoted . The solution needs to calculate risk and inbreeding for the range of acceptable  values for Nc progeny from Np parents, i.e., breeders of crops that are easily self-pollinated to routinely evaluate, select and recurrently cross lines derived from one or two specific bi-parental crosses. In the vernacular of commercial soybean and maize breeders this is known as "working a population". Yabe et al (2016) demonstrated GS on populations organized as islands of families provided greater response to selection than GS after the seventh of 20 cycles of RGS. Their founder population consisted of lines derived from in silico crosses of six homozygous rice lines with an elite rice variety. They isolated the six families of RIL's for recurrent selection using GS with no or occasional exchange of selected lines among the family islands. While their results appeared to be similar to WGS, they did not compare their results with WGS. They also suggested that the trade-off between genetic gain and retention of useful genetic variance could be improved by adjusting the number and frequencies of migrants among sub-populations.
Inspired by Akdemir and Sanchez (2016) and Yabe et al (2016), we hypothesized that a breeding strategy that organized the breeding populations as island families and utilized a genomic mating MD would provide small soybean genetic improvement projects with the ability to minimize the trade-offs between maximizing genetic gain and minimizing loss of useful genetic variability.
Within the IM organized populations we evaluated three migration policies among the families.
For both non-isolated and island models we applied three selection methods and four mating designs. To evaluate the potential of these combinations of methods to realize genetic gains while retaining useful genetic diversity, we compare outcomes from simulated recurrent selection applied to contemporary soybean germplasm adapted to MZ II and III using a set of metrics (Ramasubramanian and Beavis 2020). The metrics include the standardized genotypic value (Rs), the most positive genotypic value (Mgv) among RIL's selected in cycle c, the standardized genotypic variance (SVg), the average expected heterozygosity (Hs), the lost genetic potential of populations based on the number of favorable alleles that are lost.

2.1
Simulations. Initial sets of soybean lines were generated by simulating crosses of 20 contemporary homozygous lines representing diversity of soybean germplasm adapted to MZ's II and III with IA3023 to generate in silico F1 progeny (Ramasubramanian and Beavis 2020).
Individual F1's from each of the 20 crosses were self-pollinated in silico for four generations to generate 100 lines per family forming populations of 2000 lines organized into 20 families with genotypic information at 4289 genetic loci . Thus the genetic structure of the initial simulated populations is similar to that used in the experimental SoyNAM investigation (Guo et al. 2010;Takuno et al. 2012;Song et al. 2015;Song et al. 2017;Xavier A et al. 2017;Diers B et al. 2018).
As reported previously (  Subsequent to selection, four mating designs were applied to create the next cycle of lines (Table   1). To simulate theoretical truncation selection, selected lines were randomly mated (RM). The chain rule (CR), a.k.a., a single round-robin mating design (Yabe et al. 2016), is an alternative to RM that assures all selected lines contribute to the subsequent cycle of evaluation and selection.
In contrast to the attempt to assure equal representation of selected lines through RM and CR, most soybean breeders use a mating design that assures most progeny will be derived from crosses of a few lines that exhibit the most desirable performance (Guo et al. 2013;Guo et al. 2014). Because the metaphor of hubs with spokes represents the preference for crossing most selected lines to a few "hub" lines, we refer to this mating design as a hub network (HN) and is the mating design used in our previous investigation (Ramasubramanian and Beavis 2020). The fourth mating design, genomic mating (GM), uses mathematical objective functions to assure that defined breeding objectives are used to identify pairs of crosses from among the selected lines. Genomic Mating was implemented with the 'Genomic Mating' R package (Akdemir et al. 2018). As originally described GM combines selection and mating in a single step, but we decomposed the steps to provide comparable outcomes from all other combinations of selection methods, mating designs and organized populations.

Migration Rules among family islands.
In addition to applying selection methods and mating designs to both population structures, there are many possible rules that affect migration among islands. A preliminary investigation of migration rules that was implemented included: 1) Frequency of migration -never, once every two cycles and every cycle of recurrent selection. 2) The proportion (10% and 20%) of immigrants that will be included in crosses responsible for y specifies the average genotypic value of the lines derived from the founders,  c represents the difference in response between the current cycle and the previous cycle, y' represents the asymptotic limit to selection and c consists of a sequence of integers from 0 to 39 representing each cycle of recurrent selection (Goldberg 1958;Ramasubramanian and Beavis 2020). The parameters yo, α, and β were estimated with a non-linear mixed effects method implemented in 'nlme' functions in the 'nlme' and 'nlshelper' packages (Pinheiro and Bates 2000;Baty et al. 2015;Pinheiro et al. 2019).
For asymptotic limits of responses, the number of cycles required to reach half of the limits before there is no longer response to selection is referred to as the half-life of the recurrent selection process (Robertson 1960;Dempfle 1974;Kang 1979;Cockerham & Burrows 1980;Kang and Namkoong 1980;Kang 1987;Kang and Nienstaedt 1987). From the first order recurrence equation, the half-life is estimated as ln (0.5)/ln (α), when y0 is '0' and the asymptotic limit is estimated as y' (Ramasubramanian and Beavis 2020). In the first phase of model fitting, we fit a random intercept model for estimating both alpha and beta in the recurrence equation using the 'nlme' R package. Estimates of modeled parameters from nlsList models were retained as starting values for fixed effects. Multiple ANOVA of 'nlme' objects representing the models were used to identify combinations of factors with significant effects on the non-linear response. The model with the lowest AIC score was selected as the best model. The best random intercept model in the first phase of model fitting process M15 and models with combinations of three factors (M11-M14) showed auto-correlation of residuals. Since auto-correlation violates the independence assumption, the correlation among residuals was modeled using AR-1 correlation structure. Since the genotypic values across cycles in recurrent selection are correlated, fitting AR-1 correlation doesn't remove the correlation unless cycles are used as co-variates. However, using cycles as a co-variate makes the model fitting very time consuming and often has larger AIC scores than models without cycles as covariates. The Model M15 with AR-1 correlation structure was further refined by modeling variance components using 'varIdent' structure in 'nlme'. The process of fitting, selecting and refining mixed effects models is similar to our previous study (Ramasubramanian and Beavis 2020) and is described in the vignettes in R package 'SoyNAMSelectionMethods'.

Analyses and Data Availability
Simulated data and software codes are available as part of the R package  As noted above, the best responses to selection in the first 10 to 20 cycles of NI populations were obtained using GS followed with a HN or GM mating design (respectively designated NI-GS-HN and NI-GS-GM in Figure 4). The greatest short-term responses to selection in FI populations were obtained using either GS or WGS followed by the HN mating design coupled to a FC migration policy (IM-GS-HN-FC and IM-WGS-HN-FC in Figure 4 and Supplementary Figure   4). Only slightly slower rates in the first 10 to 20 cycles were obtained using GS and WGS followed by the GM design coupled to a FC migration policy.  Table 2). If the objectives are to maximize both shortterm and long-term gains then the best solution was obtained by selecting with RRBLUP values followed by a genomic mating in FI populations and applying a fully connected migration policy (designated IM-GS-GM-FC in Table 2). Among the combinations applied on NI populations, weighted genomic selection followed by the CR mating design or RM resulted in largest longterm gains, while selection using RRBLUP values followed by a HN mating design provided the greatest short-term gains. Indeed, both GS and WGS demonstrated greater long-term responses than phenotypic selection in both NI and FI populations. Including diversity measures in a set offered advantage over truncation selection, as selected mate pairs retained rare favorable alleles better than WGS coupled with random mating. Allier et al (2019 a, b) included the impact of within-family selection to maximize genetic gain while minimizing loss of genetic variance, but they did not consider migration among families. And (Ramasubramanian and Beavis 2020) investigated GS methods for genetic improvement of soybean, but only considered the HN mating design applied to populations without regard to family affiliation. Herein we approached the challenge by disentangling breeding decisions into four distinct groups: 1) organization of the breeding population, 2) selection methods, 3) mating designs and 4) migration policies. Each of these were divided into possible alternatives within each group and treated as independent factors in orthogonal treatment combinations.

Discussion
As with our previous investigation we found that the fastest rates of genetic improvement resulted when GS followed by the HN mating design is applied among all lines regardless of their family affiliation. When combined, these three decisions have reinforcing effects on responses to selection. At the other extreme, when WGS is applied to populations organized as family islands followed by either CR or RM the tendency of all three to retain genetic diversity reinforce each other resulting in the largest genotypic values, but is not achieved until the later cycles of selection. Because the slopes of the curves resulting from WGS and PS at 40 cycles are still positive, it is possible that both selection methods could continue to produce greater genetic potential with more cycles of selection. In previous comparative studies, WGS produced longterm responses that are similar to methods such as Optimal Contribution Selection (OCS) and Expected Maximum -Haploid Value (EM-HPV) (Daetwyler et al. 2015;Muller et al. 2018).
Herein when we applied WGS to lines regardless of family affiliation and followed selection by identifying optimal mate pairs using GM the genotypic values at the limits to response were greater than the genotypic values obtained with PS or GS followed by GM. This combination also retained the largest values for heterozygosity and favorable alleles across more cycles.
However, the differences between responses to GS and WGS followed by GM were not significant when applied to lines organized into family islands.
Between the extreme response curves it was also possible to find many response curves with In our study, responses in FIs were larger than selection responses in NI only when migration events happened every cycle or once in two cycles. When migration event happened only once in 3 cycles of selection, the rates of responses in the early cycles were very low resulting in much lower genotypic values as responses to selection approached limits. Migration size and direction didn't have any significant impact on response within the small range of parameter values we tested for migration size and direction.
Also, we retained the best line within island during migration events and replaced the second best line in the ranked list of selected lines with the immigrant for the BI and RB policies.
Whereas, for the FC policy, lines ranked from 2-6 are replaced. This replacement policy allows crossing between lines that are best within islands and immigrant lines from islands with high genotypic value resulting in high rates of response within islands. However, other policies that replace lines with low genotypic value with high genotypic values from immigrant islands will reduce genetic diversity within islands and result in different outcomes compared to the policy we've implemented.
None-the-less, we found a very good tradeoff among the competing objectives. If a GS was applied to lines on FC islands and the selected lines were mated according to the pareto-optimal crosses identified using GM, then the combination preserved genetic variance for long-term gain with little penalty relative to the realized rates of improvement in early cycles by GS and the HN mating design. In summary, motivated by Akdemir and Sanchez (2016) and Yabe et al (2016), we demonstrate that it is possible to design breeding strategies to produce desired outcomes between the extremes of maximizing the rate of genetic improvement and maximizing the genetic potential of the population.

5.2
Interpretations. The results can be interpreted from other perspectives to provide alternative insights. First genetic improvement can be viewed as single or multiple connected search strategies in genotypic space (Podlich and Cooper 1999;Cooper et al. 2002;Cooper et al. 2014).
The single search strategy, a.k.a. global, corresponds to selection of lines in NI populations. The multiple connected search strategy, a.k.a. local, corresponds to selection of lines in multiple domains with infrequent exchange of lines. In addition to the perspective of global or local search strategies, selection can be viewed as cooperation vs. competition and exploitation vs.
exploration. Thus, by tuning parameters that control relative levels of cooperation or exploration in global or local search strategies, it is possible to adjust the adaptive landscape.
Genotypes co-operate when they contribute to other genotypes' fitness values, whereas they compete when they reduce the fitness values of other genotypes thereby reducing their contribution to future generations. Intermediate levels of cooperation often accelerated shifts in adaptive peaks for bi-locus genetic models (Whitley 1999;Skolicki 2007;Obolski et al. 2017).
For global selection, selection methods and mating designs control contributions of genotypes within populations thereby controlling the level of cooperation. From this perspective, GS promotes competition, while PS and WGS emphasize cooperation on the adaptive landscape. By weighting rare favorable alleles, WGS promotes cooperation and effectively retains more of the genotypic potential of the founder's fitness landscape. Among mating designs, the HN used by most plant breeders, promotes competition over cooperation, whereas the CR and RM designs promote co-operation. The GM design provides a balance between cooperation and competition.
For island selection methods, a FC migration policy provides the best balance between cooperation and competition. Further work is needed to identify optimal migration rules.
The concepts of exploitation and exploration are commonly used in EAs. In general, exploitation refers to processes such as selection that result in beneficial solutions, whereas exploration allows searches for solutions in new domains. In the breeding context, exploration maintains diversity (Goldberg 1989;Goldberg and Deb 1992;Whitley 1999;Skolicki 2007 a, b;Crepinsek et al. 2013). Because GS provides faster rates of genetic improvement than PS and WGS it is reasonable to interpret GS results as rapid exploitation, whereas PS and WGS allow exploration of new solutions, primarily through additional recombination opportunities. The HN mating design drives the populations to exploit resources for immediate gains, whereas CR and RM mating designs provide opportunities for the population to explore more of the fitness landscape.
The GM mating design provides an opportunity to choose relative importance of exploitation and exploration. By treating population organization, selection methods and mating designs as orthogonal factors we were able to blur the boundary between exploitation and exploration with combinations of factors that mixed exploitation and exploration activities in distinct phases and simultaneously.
From the perspective of tradeoffs between exploration and exploitation, selection among and within islands enables exploration of diverse domains resulting in greater probabilities of finding solutions with greater genotypic values (higher peaks or limits at convergence), whereas global selection across a NI population of lines tends to get trapped in sub-optimal peaks or local optima (Cantu-Paz 2000;Skolicki 2007 a, b;Luque 2011;Crepinsek et al. 2013). In our study, this occurred with the GM design applied to the NI populations, where the crossing process within islands are optimized at a local level. In some implementations of island model selection, both the global and local states of islands are assessed every cycle of selection and a centralized global agent makes decisions to reach optimization objectives. We do not know whether such a bi-level optimization method will result in greater genotypic values before approaching limits to responses from selection.
5.3 Future research. By framing breeding strategies as combinations of population structure, selection methods, mating designs and migration policies we illustrated the potential of the approach for a small arbitrary soybean genetic improvement project. We did not consider the relative emphasis of objectives and constraints for any specific genetic improvement project.
Consider first the structure of breeding populations. We compared a NI structure of lines with FI's created by individual crosses among the founders and then we selected within and among islands according to the same criteria. This might make sense within a single soybean genetic improvement project for lines adapted to MZ's II and III.
There are six public soybean genetic improvement projects adapted MZ's II and III. There are likewise about the same number of commercial soybean genetic improvement projects in the same MZs. All of these projects began at different times and were initiated with unique, albeit overlapping, germplasm resources (Mikel et al. 2010). While all of the projects select lines with greater genotypic values for yield, the yield values are obtained from different, albeit overlapping environments.
From the perspective of soybean genetic improvement across regions within MZ's II and III, each genetic improvement project can be represented as an island where genotypes are exchanged among project islands based on annual evaluations in uniform regional trials and according to legal licensing rules. In practice project islands exchange only the best performing lines adapted to similar environmental conditions. None-the-less, soybean breeders will maintain useful genetic variability by exchanging lines among island projects. An advantage of island selection is that diversity among islands increases with selection, even when within island diversity decreases. Eventually, beyond 40 cycles of recurrent selection, genetic variability among islands will decrease as genetic variability among islands is lost to selection.
Future investigations of breeding strategies to optimize tradeoffs between rates of genetic gains and retention of useful genetic variance in soybean adapted to MZ's II and III should consider population structures within island projects that more accurately reflect those that currently exist.
Also, future investigations should simulate genetic architectures with genotype x environment effects. It is well known that a line adapted to one environment may not perform well in other environments, and it is possible to define fitness values so that they include environmental effects. Third, future investigations should consider a broader set of migration rules and policies.
The FC migration policy is considered the upper bound of island models as all islands are connected to every other island with maximum migration rates among islands. While our results indicate that this policy provided the results we don't think it will provide the best results for genetic architectures with genotype by environment interaction effects.
Fourth, we need to recognize islands in time because every cycle of selection discards useful genetic variability. A soybean germplasm resource project was set up (Mikel et al. 2010) to recover useful genetic variability lost during domestication of soybean (Nelson 2011). Rather than trying to build long bridges to islands located in the distant past, our results suggest that there should be a large amount of useful genetic variability that was discarded in the first few cycles of modern soybean breeding. For that matter, until response to selection reaches the halflife for the population, large amounts of useful genetic variability can probably be recovered from islands represented by recent cycles of discarded lines. These conjectures should be preceded by simulations to determine the potential benefit and costs associated with sampling lines in recently discarded islands.
Fifth, it should be clear that a predefined mating design does not take advantage of opportunities created by each cycle of progeny to optimize outcomes according to most project objectives.
Thus, there continues to be a need for algorithms that efficiently and effectively identify crosses from among genotypes produced by each cycle of selection. It is tempting to adopt and investigate all EA strategies. However, only a subset are relevant to the practice of plant breeding (Hagan et al. 2012). For example, mutation and recombination rates can be controlled in a computational EA, whereas plant breeders cannot regulate these with current practices.
None-the-less there are many opportunities for cross-disciplinary research between EAs and plant breeding. There is large body of literature concerning the properties of EAs and factors and strategies that affect convergence rates and quality of solutions (Goldberg 1989;Goldberg and Deb 1992;Whitley 1999;Skolicki 2007 a, b;Crepinsek et al. 2013;Obolski et al. 2017) and working with computational scientists should reveal novel methods to maximize the genetic potential of a breeding population in a minimum number of cycles.
Akdemir and Sanchez (2016) proposed only one of many possible GA's to identify paretooptimal solution pairs. An approach introduced by Gaur and Deb (2016) and Mittal et al. (2020) would provide pareto-optimal solutions using statistical methods such as clustering and machine learning. The statistical knowledge can be used to improve the search for optimal solutions and establish several cycles of optimization. Conceptually, unveiling any relationship among paretooptimal pairs in a genotypic space is likely to provide new knowledge regarding the characteristics of such complementary pairs. Also, by modeling responses with a first order recurrence equation or a non-linear mixed effects model to predict the half-life and asymptotic limits of selection have potential to improve the efficiency of genetic algorithms by providing repair operators to alter the trajectory of population evolution towards the desired optimal tradeoffs.
Last, consider the challenge of stating explicit relative emphasis on objectives and definition of constraints for any specific genetic improvement project. As noted previously, this challenge exists because it requires assigning economic and agronomic value of short term genetic gains vs. the forecasted value of useful genetic variants that may be discarded each cycle of selection.
As a thought experiment note that the trade-off objectives can be reduced to a single 'grand' objective of creating a genotype (line) with the genotypic value equal to the full genetic potential of the founders in a single cycle. For a genetic architecture consisting of two alleles at a single locus, achieving the single grand objective is trivial. Also it is possible to imagine that the grand objective can be achieved for a complex genetic architecture with infinite resources. Clearly, given genetic architectures of complex traits and resource constraints there are no feasible solutions to the grand objective, but it is a useful reference to serve as the goal.   considered an island population. The schematic depicts the in silico steps used to generate the base population of 2000 F5 RILs derived from 20 NAM founder lines crossed to IA3023. 100 F5 RILs generated from each of the crosses form a distinct island. The depiction includes the model training step and the recurrent steps of prediction, sorting, truncation selection within islands, migration, crossing, and generation of 200 F5 RILs per island for each cycle as well as the decision steps to check if the training set should be updated and if the recurrent process should be continued for another cycle. The blue shaded circles represent lines that are descendants of the founder populations in the islands and red shaded circles represent lines that are replaced by immigrants from the island with the largest genotypic value for the 'Best Island' policy. Figure 3 Overview of the Recurrent Selection Process: Representation of entities such as genomes, associated RILs and processes such as the estimation of marker effects, selection, migration and crossing. Levels correspond to layers of information with level 1 comprised of genomic information, level 2 comprised of phenotypes of lines within and across families and level 3 comprised of higher level information including responses across cycles of selection. The factors include nQTL and H at the genome level, SM (selection method) including PS, GS and WGS. The factors at level 2 includes SI (top 10% selected fraction), MD (Mating design, which includes Hub Network, Chain Rule, Random Mating, and Genomic mating levels) and MP (Migration Policy, which includes "Best Island", "Random Best" and "Fully Connected" policies). Among the BD levels, GM method involves application of evolutionary multiobjective optimization to minimize inbreeding and maximizing gain. Level 3 is characterized using evaluation metrics such as half-life and asymptotic limits derived from recurrence equation models and metrics such as standardized responses (Rs), Standardized genetic variance (Sgv), Maximal genotypic values (Mgv) and efficiency of converting loss in genetic variance into gain (RsVar) derived from simulated outcomes. Other metrics include prediction accuracy and MSE for GP models (RR-REML) and expected heterozygosity (Hs).