Functional mutation allele mining of plant architecture and yield-related agronomic traits and characterization of their effects in wheat

Background Wheat mutant resources with phenotypic variation have been developed in recent years. These mutants might carry favorable mutation alleles, which have the potential to be utilized in the breeding process. Plant architecture and yield-related features are important agronomic traits for wheat breeders and mining favorable alleles of these traits will improve wheat characteristics. Results Here we used 190 wheat phenotypic mutants as material and by analyzing their SNP variation and phenotypic data, mutation alleles for plant architecture and yield-related traits were identified, and the genetic effects of these alleles were evaluated. In total, 32 mutation alleles, including three pleiotropic alleles, significantly associated with agronomic traits were identified from the 190 wheat mutant lines. The SNPs were distributed on 12 chromosomes and were associated with plant height (PH), tiller number, flag leaf angle (FLA), thousand grain weight (TGW), and other yield-related traits. Further phenotypic analysis of multiple lines carrying the same mutant allele was performed to determine the effect of the allele on the traits of interest. PH-associated SNPs on chromosomes 2BL, 3BS, 3DL, and 5DL might show additive effects, reducing PH by 10.0 cm to 31.3 cm compared with wild type, which means that these alleles may be favorable for wheat improvement. Only unfavorable mutation alleles that reduced TGW and tiller number were identified. A region on chromosome 5DL with mutation alleles for PH and TGW contained several long ncRNAs, and their sequences shared more than 90% identity with cytokinin oxidase/dehydrogenase genes. Some of the mutation alleles we mined were colocalized with previously reported QTLs or genes while others were novel; these novel alleles could also result in phenotypic variation. Conclusion Our results demonstrate that favorable mutation alleles are present in mutant resources, and the region between 409.5 to 419.8 Mb on chromosome 5DL affects wheat plant height and thousand grain weight.


Background
In the past few decades, mutation induction has been used to improve crop varieties. Using this procedure, more than 3200 new mutant plant varieties have been bred from over 200 species worldwide (https://mvd.iaea. org/). The development of genomics and biological techniques has facilitated the use of mutation induction to elucidate the nature of mutations and for mining novel alleles and genes affecting target traits. Chemical mutagens can induce a high rate of single nucleotide polymorphism (SNP) variation, up to more than 5000 mutations on average in each M 2 line of hexaploid wheat [1], while physical mutagens like fast neutrons and heavy ion beams induce more substitutions than small insertion-deletions or large deletions [2,3]. Using an M 2 population and reverse-genetic methodologies, such as TILLING (Targeting Induced Local Lesions in Genomes), novel alleles of known genes have been identified and functionally characterized in several plant species including wheat [4][5][6][7][8].
Wheat mutant resources have also been developed in the past decades from several different wild types [4,[9][10][11][12]. Various mutations affecting plant architecture, spike morphology, and yield-related traits have been identified, and there are many novel mutant genes/alleles available. In order to mine the mutated genes in these mutant lines, bi-parental genetic segregation populations, such as recombinant inbred lines (RILs) and near isogenic lines (NILs), need to be generated using a mutant as one of the parents. The process of developing genetic populations is very time-consuming, requiring at least two more years after the mutant is identified, and thus the speed with which mutants can be used is limited. A new methodology that could identify mutated genes/QTLs directly from the mutants should be developed which would be helpful in accelerating the gene mining process.
Plant architecture and yield-related traits are the main targets in wheat breeding and many genes/QTLs have been mapped that have dominant or/and additive effects. Over 20 wheat dwarfing genes (Rht) [13] are available, and more than half were produced by mutation induction. Among these Rht genes, Rht1, Rht2, Rht8, and Rht9 have been fully utilized in modern cultivars [14,15], and different mutant alleles of Rht1 and Rht2 resulted in variance in plant height [16,17], thus enriching genetic diversity. Moreover, there were many other potential loci distributed on multiple chromosomes that regulated plant height [18]. Yield and yieldrelated traits like thousand grain weight (TGW) and grain number per spike (GNS) are thought to be quantitative, and their genetic loci are distributed on chromosomes throughout the whole genome [19]. However, because of allelic selection during evolution and breeding processes, genetic diversity has remarkably declined [20,21]. All of these genes/loci could be mutated to produce novel alleles, and the discovery and mining of favorable alleles to improve genetic diversity would greatly benefit wheat breeding.
In the current study, we detected mutated alleles affecting plant architecture and yield-related traits in a wheat mutant resource using the GWAS method combined with t-testing, and the genetic effects of mutated alleles were further evaluated. These results will improve mutant allele identification and the mining of novel mutation alleles affecting agronomic traits. The identification of favorable mutated alleles will in turn aid efforts to increase genetic diversity and improve wheat.

Correlation between yield-and plant architecture-related traits
Correlation analysis (Table 1) showed that PH was significantly and positively correlated with the other four plant architecture-related traits, including pre-winter, maximum, and effective tiller numbers (PWT, MT, and ET, respectively), and flag leaf angle (FLA); ET was more highly correlated with MT than with PWT. TGW was significantly and positively correlated with all traits except FLA and spikelet density (SD), but the correlation coefficients were not very high.

SNPs potentially associated with plant architecture-and yield-related traits under different environments
The principal component analysis (PCA) indicated that the population was divided into five subpopulations (Additional file 1: Figure S1). High linkage disequilibrium (LD) was observed in the mutant population, with an average r 2 of 0.91 at p ≤ 0.001.
A total of 150 potential significant SNPs were detected at the given P value threshold (0.001) in at least one environment by genome-wide association analysis (Additional file 2: Table S1; Additional file 3: Figure S2; Additional file 4: Figure S3). The SNPs were distributed on all chromosomes except chr. 1D, 2D, and 4D, with 35 located on chr. 3B. Among these significant SNPs, 62 were potentially associated with plant architecturerelated traits, including eight SNPs associated with PH, five with PWT, 15 with MT, 27 with ET, and seven with FLA; 88 were potentially associated with yieldrelated traits, including 23 associated with TGW, 21 with spike length (SL), 13 with GNS, 15 with SD, and 16 with spikelet number per spike (NSL).

Candidate SNPs related to plant architecture-and yieldrelated traits
Based on t-tests, there was a significant difference in the phenotypes of plants with the wild type (WT) and mutation alleles in more than 50% of the environments for 32 out of the 150 SNPs (21.34%) ( Table 2), and these SNPs were distributed in clusters on 12 chromosomes (Fig. 1). Among the plant architecture-related traits, all eight SNPs distributed on chr. 2B, 3B, 3D, and 5D were significantly associated with PH, and the percent variance explained (PVE) ranged from 7.68-9.40%. Six out of 15 SNPs (40.00%) distributed on chr. 3B, 3D, 4A, 6B, and 7D were significantly associated with MT, and PVE ranged from 6.19 to 10.23%. Six out of 27 SNPs (22.23%) distributed on chr. 3B, 5A, 7B, and 7D were significantly associated with ET, and PVE ranged from 8.21 to 11.39%. One out of seven SNPs (14.29%) on chr. 2B was significantly associated with FLA, and the PVE was 6.18%.
For the yield-related traits, two out of 23 SNPs (8.70%) distributed on chr. 3B and 5D were significantly associated with TGW based on t-tests, and PVE ranged from 8.23 to 11.59%. Two out of 13 SNPs (15.39%) distributed on chr. 2A and 3B were significantly associated with GNS, and PVE ranged from 10.45 to 14.87%. One out of 16 SNPs (6.25%) on chr. 2A was significantly associated with NSL and explained 7.40% of the phenotypic variation. Six out of 15 SNPs (40.00%) distributed on chr.
There were three candidate SNPs showing pleiotropic effects. SNP AX-109900989 was significantly associated with both PH and FLA, AX-89425861 was associated with both MT and ET, and AX-109655447 was associated with GNS and NSL.
Effects of candidate mutation alleles on plant architecture and yield-related traits Plant height Among the 190 mutant lines, 158 lines carried only the mutant allele of SNP AX-109900989; five lines carried the mutant alleles of both SNP AX-110409382 and SNP AX-111563435, which are located on chr. 3B and 3D, respectively; and 11 lines carried the mutant alleles of all eight significant SNPs located on chr. 2B, 3B, 3D, and 5D (Additional file 5: Table S2 and Table 3). The physical distance between the five SNPs on chr. 5D is about 4.97 Mb ( Table 3).
The presence of single mutation alleles and pyramided mutation alleles resulted in a significant reduction of PH compared with WT (Fig. 2a). The average PH of the mutants only carrying the mutation allele of AX-109900989 was 11.8 cm to 18.6 cm lower than that of WT in different environments. PH of lines carrying the mutant alleles of both AX-110409382 and AX-111563435 was reduced by 10.0 cm to 19.8 cm, and the PH of lines carrying all eight mutation alleles was reduced by 20.2 cm to 31.3 cm, with the pyramided alleles implying additive effects.

Thousand grain weight
There were three and seven mutant lines carrying single mutation alleles of AX-109326075 (chr. 3B) and AX-109947280 (chr. 5D), respectively. The TGW of MT maximum tiller numbers, ET effective tiller numbers, PWT pre-winter tiller numbers, FLA flag leaf angle, PH plant height, SL spike length, GNS grain numbers per spike, NSL spikelet number per spike, SD spikelet density, TGW thousand grain weight *significant at P < 0.05 level; ** significant at P < 0.01 level; *** significant at P < 0.001 level these plants were lower than that of WT, but these differences were not significant (Additional file 6: Table S3 and Table 4). Six mutant lines carried both mutation alleles, and the TGW of these lines was significantly lower than that of WT (by 3.48 to 16.84 g) in different environments (Additional file 6: Table S3,  Table 4, and Fig. 2b). The effects of these two alleles were additive. The physical positions of both SNPs are located far away from the other SNPs on chr. 3B and 5D that affect PH.

Spikelet density
The average SD of the mutants carrying the mutation allele of AX-110371706 was higher than that of the WT ( Table 5, Fig. 3d, and Additional file 10: Table S7), with the increase ranging from 0.07 to 0.52 in different environments.

Pleiotropic SNPs
There were 67 mutant lines that only carried the mutation allele of the candidate SNP AX-89425861, and the MT and ET of these lines, especially ET, were significantly lower than those of WT in some environments ( Table 5, Fig. 3a, b, Additional file 7: Table S4 and Additional file 8: Table S5). The reduction in MT and ET ranged from 1.3 to 5.6 and 1.3 to 4.5, respectively. AX-109900989 is another pleiotropic SNP. Compared with WT, lines carrying this mutation allele not only had lower PH, but also a lower FLA, which was reduced by 4.7 to 22.8°compared with WT (Table 5, Fig. 3c, Additional file 9: Table S6).

Genes flanking the mutation alleles
There were no candidate SNPs located in genic sequence, so the flanking genes of each SNP were further searched ( Table 6). The functions of these genes are related to imidazoleglycerol-phosphate dehydratase, protein kinase domain, and Myb-like DNA-binding domain, for example. Because of the very short physical distance of the five SNPs on chr. 5DL, the genes between SNPs AX-109968486 and AX-108907798, 42 in total, were further blasted (Table 7). Among them, gene TraesCS5D02G323500 was known as an Auxin-Indole-3-Acetic Acid (Aux-IAA) family transcription factor.  Fig. 4). The sequences of these ncRNAs shared more than 90% identity (data not shown) with cytokinin oxidase/dehydrogenase genes (CKX2.3 and CKX2.4).

Discussion
The relationship between pleiotropy and phenotypic correlation Three pleiotropic SNPs were identified in the study, which regulated two or three traits. Pleiotropy is a common phenomenon, and related traits are usually significantly correlated [22,25]. For example, in barley, mutation alleles have pleiotropic effects on tillering and TGW traits [34]. In wheat, flag leaf length is correlated with yield-related traits [35]. It is well known that wheat yield is closely correlated with GNS, TGW, and number of effective spikes, and regions of chr. 5A and 6A were previously found to have pleiotropic effects on PH, grain yield, grain number, and TGW [28]. In our mutated population, GNS and NSL were correlated, which is consistent with previous results [36]. Previously, SNP markers BS00022896_51 (located at 612 Mb on chr. 2A) and wsnp_Ex_c40280_47375312 (located at 676 Mb on chr. 2A) were linked with GNS and NSL, respectively [36,37], and in our study, another pleiotropic SNP, AX-109585477, was found to closly linked with these SNPs (695 Mb on chr. 2A) that also affects both GNS and NSL. Consistent with previous observations in rice [22], we found that PH was significantly correlated with FLA. Furthermore, SNP AX-109900989  was associated with both of these traits. MT and ET, which are two different indexes of tiller number, were also strongly correlated and shared the same candidate SNP, AX-89425861, on chr. 7D. These pleiotropic mutation alleles can be transferred into Kompetitive Allele Specific PCR (KASP) markers, which could be used to identify pyramided lines rapidly in earlier generations of segregating progenies, and will finally accelerate wheat breeding practice.

Known and potentially novel mutation alleles associated with the investigated traits
The identified mutation alleles were classified into two types, one was known alleles, such as mutation alleles located at the ends of chr. 2BL and 5DL and on chr. 3BS and 3DL which reduced PH, and some of them were located close to those known regions. The dwarf gene Rht4 was mapped to the terminus of 2BL, bin 2BL6-0.89-1.00 [38], which is from 444 to 739 Mb in the IWGSC Reference genome, and SNP AX-109900989 is located in this region at 679.5 Mb. SNP AX-111563435 is located close to a reported PH QTL on chr. 3B, QTL_ height_3B_1 [39,40]. QTLs for TGW were previously mapped to chr. 3BS, 4D, 5B, 5D, 6A, and 7B [20,[41][42][43], and two more mutation alleles affecting TGW were identified and they were mapped close to the known QTLs on chr. 3BS and 5DL [20,42]. QTLs for tiller number were mapped to the terminus of 5AL [44], and a QTL on 5A affects tiller number [45]. The identified mutation allele on 5AL was also located at the 5AL terminus and might be the same as the tiller number QTL, but further phenotypic analysis of mutation lines and fine mapping data are needed to confirm this. The other type was mutation alleles which did not colocalize with reported QTLs and might be novel alleles affecting PH-related traits. The mutation alleles on chr. 3DL have not yet been reported and might correspond to a novel gene. Genes and QTLs for tiller number have been mapped to chr. 1A, 1B, 1D, 2A, 2B, 2D, 3A, 4D, 5D, 6A, 6D, and 7A [43,[45][46][47][48], while our mutation alleles affecting MT and ET were located on chr. 4A, 6B, 7B, and 7D and did not colocalize with these previously mapped QTLs.

Favorable mutation alleles and their potential application in future breeding programs
The candidate mutation alleles especially those of favorable alleles identified in the paper could enrich wheat genetic diversity. As different alleles and haplotypes of the target gene may have different effects on the phenotype [20,49], breeders usually pyramid favorable alleles to improve target traits [30], such as plant architecturerelated traits [33], so the identified favorable mutation alleles could be used in future wheat breeding programs. One potentially favorable allele is the G allele of AX-109900989, which reduced PH and FLA. Due to its effect on photosynthesis, changing FLA results in changes in yield [50]. For example, a leaf angle less than 25°enhances yield in wheat [51]. Thus, the G allele of AX-109900989 could potentially be used to decrease FLA and increase yield. Similarly, the T allele of AX-110371706, which increases SD, could also be used to increase yield. At the same time, unfavorable mutation alleles affecting TGW (G of AX-109326075 and T of AX-109947280) and tiller number (G of AX-89425861) should be avoided.

The relationship between the terminal ends of chr. 3B and 5DL with plant height and thousand grain weight
Chromosome 3B is the largest chromosome in wheat, and multiple QTLs affecting plant architecture and yieldrelated traits have been identified in this chromosome [19,39,52] that are distributed on the entire chromosome. The 11 SNPs identified on chr. 3B (Fig. 1) [19]. Physical positions of both TGW and PH related SNPs identified on chr. 5DL in the study were colocalized with previously reported genes and QTLs. The TGWassociated SNP AX-109947280 identified in our study located at 409 Mb/566 Mb, while the cell wall invertase gene TaCWI, which is significantly associated with TGW, is located in this region [20]. This gene has been under strong selection during modern wheat breeding, and its genetic diversity has declined dramatically [20]. A QTL and a favorable allele affecting TGW are also located in this region (155.4 cM/170.7 cM and 421 Mb/ 566 Mb respectively) [41,54]. Five SNPs associated with PH are located downstream of this region (414-419 Mb). The 42 genes located in this region included the Aux-IAA transcription factor, and a mutation in the auxin response gene resulted in decreased PH [55]; the PH of lines carrying these mutation SNPs were significantly lower compared with that of WT (Fig. 2a). MT maximum tiller numbers, ET effective tiller numbers, SD spikelet density, FLA flag leaf angle   Although the mutant SNPs did not occur in the genes, the five SNPs were very closely linked and the phenotypic variation might be caused by the genes in this region. The six lncRNAs located near the four SNPs from AX-108930866 to AX-108907798 (about 6.6 Mb) (Fig. 4), are expressed in stems, leaves, or roots and share sequence similarity with CKX genes. CKX genes play important roles during plant development, and the level of CKX activity in barley, Arabidopsis, and tobacco results in variation in PH [56][57][58][59]. The dwarf gene Rht23 is also located at the terminus of chr. 5DL [60]. So, it deduce that the terminus of chr. 5DL might play a critical role in wheat PH and TGW, but the relationship between our mutation alleles and Rht23, the CKX genes, and TaCWI needs to be further studied.
Combining GWAS with t-tests could effectively reduce the false negative rate Each advanced phenotypic mutants used in the study carried numerous mutations, it should think about how to rapidly identify linkage markers and mutation alleles affecting target traits. It is very difficult to use only a single mutant to identify linkage markers and mutation alleles affecting target traits, an alternative strategy is to use multiple mutants, as the probability of three individual mutants sharing a mutation in the same gene is less than 1E-05, and this probability is even lower when including more lines [61]. More than 67,000 qualified SNPs were used in the current study, and most of them should have no relationship with the target traits. It is very time consuming and unnecessary to analyze the association between each SNP and trait one by one, so TASSEL software was used to filter out most of the insignificant markers and to reserve as many potentially significant markers as possible by reducing the P value threshold from 7.4E-07 to 0.001. All potential markers were then further analyzed by t-tests to eliminate false positive markers.
Through the analysis of 10 plant architecture-and yield-related traits in 190 individual mutant lines, genome-wide association analysis and t-tests only gave similar results for PH, while for other traits, fewer than 40% of the markers were verified by t-test. This indicates that the set P threshold was appropriate, with fewer positive markers for PH or other traits omitted. This also means that if it only used genome-wide association analysis, the false negative rate would be very high, even with a relatively higher threshold. At the same time, the genetic effects analysis is based on the use of multiple mutant lines. Because mutation alleles present in only one or two mutant lines would be very difficult to identify, this would result in the identification of a certain number of false negatives. Some of the mutation alleles significantly associated with traits were identical with those identified in previous studies, and the favorable alleles could be used for wheat improvement.

Conclusion
One hundred ninety advanced phenotypic wheat mutants were genotyped by 660 K SNP assay, and 10 agronomic traits were investigated under 5 environments. By using GWAS and t-tests methods, 32 SNPs distributed on 12 chromosomes were identified as mutation alleles associated with plant architecture and yield related traits, and chromosome 5DL clustered more alleles on PH and TGW. Among them, G allele of AX-109900989 could reduce PH and FLA, and T allele of AX-110371706 increased SD, those were favorable alleles. Five SNPs, AX-110409382, AX-109041501, AX-109446470, AX-111556361 and AX-89425861, were novel alleles associated with PH and tiller abilities. The mutants carrying favorable alleles could be further used in future breeding practice as diverse germplasm, and the SNPs could be converted into KASP markers and used to assist breeding selection.

Plant materials
Up to 20,000 seeds of an elite winter wheat (Triticum aestivum L.) cultivar Jing 411 were treated with 1.0 or 1.5% ethyl methanesulfonate solution according to the reported protocol [11] and 200 Gy or 250 Gy ɣ-rays every year. The germination percentage in the field ranged from about 60-70% by 1.0% EMS, 45-55% by 1.5% EMS, 50-60% by 200 Gy ɣ-rays, and 30-40% by 250 Gy ɣ-rays batch by batch. Each M 1 plant was bagged at the heading and flowering stages to avoid hybridization and to strictly maintain self-crossing. The M 2 populations were developed using both single-seed Phenotyping and data analysis Tiller number: PWT, MT, and ET were counted for all plants in each row at the end of November before the elongation stage and after the heading stage, respectively, and average PWT, MT, and ET per plant were calculated by dividing the total tiller number in a row by the seedling number of that row. The average PWT, MT, and ET values were used for subsequent analysis. Each genotype and environment had two replications. FLA between the flag leaf and its stem was measured 15 days after flowering using a protractor. Each genotype had 10 replications. NSL and GNS of the main spike for each genotype were counted in the field 20 days after flowering with five replications. PH, SL, and TGW were measured or calculated after harvest with five replications. SD was calculated by dividing NSL by SL. Fig. 4 Diagram of a portion of chromosome 5DL. The long black rectangle represents the chromosome. The SNPs significantly associated with phenotype (green) and ncRNAs (blue) are indicated to the right of the chromosome. The physical intervals (kb) are shown on the left side of the chromosome Phenotypic data (Additional file 11: Table S8) were analyzed with QTL IciMapping v4.1 (http://www.isbreeding.net/) using the default settings to estimate the best linear unbiased estimate (BLUE) values for each environment and across environments. For calculating BLUE values, error variance of each environment is firstly calculated using model y = genotype+block+error, and then the phenotypic values of the individual in each environment are weighted and averaged by using the reciprocal of the error variance in each environment as the weight.

Statistical analysis
Analysis of correlations between the investigated traits was performed using the BLUE values from QTL Ici-Mapping v4.1. Calculation of standard deviations, t-tests, and average values of agronomic traits were performed using Microsoft Excel 2010.

Genotyping
The 660 K SNP microarray was used to genotype the WT and 190 individual mutants. Microarray analysis was performed by China Golden Marker (Beijing) Biotech Co. Ltd. (CGMB, http://www.cgmb.com.cn), and the quality of the genotyping data was assessed. SNPs with a minor allele frequency (MAF) < 5% and a failed missing test (call rate < 97%) were excluded. A total of 67,402 SNPs were included in subsequent genome-wide association analysis.

Linkage disequilibrium (LD) and population structure analysis
The LD across the chromosomes of the WT and 190 mutant lines was estimated using TASSEL 5.2 (https:// tassel.bitbucket.io/) with 67,402 SNP markers (Additional file 12: Table S9). The squared allele frequency correlation (r 2 ) was used for evaluation of LD [62] and the average r 2 of 559,830 pairs of SNP markers at P ≤ 0.001 was calculated. PCA was conducted using TASSEL 5.2 for assessment of the population structure.

Screening of candidate mutation SNPs
First, filtered SNPs and BLUE values for each trait were used in genome-wide association analysis performed using the mixed linear model (PCA + K) model of TASSEL v5.2 software [63] with the default settings. In order to reduce the risk of omitting positive candidate association markers, a less strict uniform threshold of P ≤ 0.001 was used to estimate additional potential significantly associated SNP markers. Manhattan plots were constructed using R 3.4.1 software (http://www.r-project.org/). Secondly, mutant lines were classified into two groups based on the allele of a significant SNP identified by genome-wide association analysis, and t-tests were performed to test for significant differences (P ≤ 0.05) in the phenotypes of the two groups. When significant differences in phenotype were observed in more than 50% of environments, the SNPs were designated as candidate SNPs.

Analysis of mutation allele effects
To identify target SNPs and mutation alleles leading to significant phenotypic variation, mutants carrying the mutant allele of a specific candidate SNP were identified, and variance between the phenotypes of WT and these mutants was further analyzed using t-tests.

Sequence BLAST searches and gene annotation
The flanking sequences of the candidate alleles and reported QTLs, markers, and alleles were used as queries in BLAST searches against the IWGSC RefSeq V1.0 database (http://www.wheatgenome.org/) to determine their physical positions on chromosomes. Each candidate gene sequence was used as a query in a further BLAST search against the NCBI database (https://www.ncbi.nlm.nih.gov/ ) to annotate its function. Chromosome diagrams were drawn using mapdrawer [64].