Inference of population effect and progeny selection via a multi-trait index in soybean breeding

The selection of superior genotypes of soybean entails a simultaneous evaluation of a number of favorable traits that provide a comparatively superior yield. Disregarding the population effect in the statistical model may compromise the estimate of variance components and the prediction of genetic values. The present study was undertaken to investigate the importance of including population effect in the statistical model and to determine the effectiveness of the index based on factor analysis and ideotype design via best linear unbiased prediction (FAI-BLUP) in the selection of erect, early, and highyielding soybean progenies. To attain these objectives, 204 soybean progenies originating from three populations were examined for various traits of agronomic interest. The inclusion of the population effect in the statistical model was relevant in the genetic evaluation of soybean progenies. To quantify the effectiveness of the FAI-BLUP index, genetic gains were predicted and compared with those obtained by the Smith-Hazel and Additive Genetic indices. The FAI-BLUP index was effective in the selection of progenies with balanced, desirable genetic gains for all traits simultaneously. Therefore, the FAI-BLUP index is an adequate tool for the simultaneous selection of important traits in soybean breeding.


Introduction
Similar to most autogamous species, soybean (Glycine max (L.) Merrill) breeding is aimed at the development and identification of superior genotypes and the release of new cultivars. To produce better cultivars, breeding programs of autogamous plants have intensified the production of segregating populations. To this end, a large number of progenies from several crosses are typically generated each year, resulting in several populations (Bernardo, 2003;. In traditional methods, progenies are selected without considering the merits of the populations to which they belong. However, it is important to use statistical models that include not only the effects of progenies but also the effects of populations . Including the population effect in the statistical model to select the best progenies within the best populations is a measure aimed at increased selection accuracy. Duarte and Vencovsky (2001), Piepho and Williams (2006), , and Pereira et al. (2017) highlighted the benefits of including population effect in the model.
Disregarding population effect in the statistical model may compromise the prediction of genetic values and the estimation of variance components. In this way, selection gains tend to decrease due to selection bias (Rocha & Vello, 1999;Duarte & Vencovsky, 2001;Pereira et al., 2017).
In the selection process, in addition to choosing the best statistical model to predict genetic values, plant breeders usually handle multiple traits simultaneously (Akhter & Sneller, 1996;Malek, Rafii, Shahida Sharmin Afroz, Nath, & Mondal, 2014). However, choosing high-performance soybean genotypes for multiple traits simultaneously can be a difficult task. Some selection indices have been proposed for a simultaneous selection of traits, e.g., the Smith-Hazel classical index (Smith, 1936;Hazel, 1943) and the Additive Genetic index . However, several limitations exist regarding the determination of economic weights of traits. Moreover, the Smith-Hazel index may have multicollinearity problems, undermining the effectiveness of best progeny selection.
In an effort to address the questions above, Rocha, Machado, and Carneiro (2018) proposed a multi-trait index based on factor analysis and ideotype design (FAI-BLUP index). This index considers the genetic correlation structure obtained from the data (via exploratory factor analysis according to the method description), which enables the selection of genotypes closer to those hypothesized by the breeder using the ideotype combination of desirable and undesirable factors for the selection objective.
In this scenario, the present study proposes i) to examine the importance of including population effect in the statistical model for the prediction of genetic values and ii) to evaluate the effectiveness of the FAI-BLUP index in the selection of erect, early, and high-yielding progenies of soybean.

Genetic material and experimental settings
Three populations belonging to the Soybean Breeding Program of the Federal University of Viçosa (UFV) were obtained from crosses between divergent inbred lines (TMG 123 RR/M7211 RR, UFVS Citrino RR/UFVS Turqueza RR, and M7908 RR/M7211 RR) for relative maturity groups in accordance with Carpentieri-Pípolo, De Almeida, De Souza Kiihl, and Rosolem (2000); we aimed to select earlier-maturing and high-yielding progenies. The plants were separately bulk-harvested and threshed to obtain a bulk sampling of seeds produced to form 204 progenies, which were evaluated in the field along with nine controls. Two trials were conducted in the 2015 crop year until mid-March 2016. The first trial took place in the municipality of Viçosa, Minas Gerais State, Brazil (20º45'45" S, 42º49'27" W, and 647 m altitude), and the second was carried out in São Jose do Triunfo, Minas Gerais State, Brazil (20º45'14" S, 42º52'55" W, and 667 m altitude). The experiments were set up as randomized complete block designs with three replicates. Plots consisted of two 2.0-m rows spaced 0.5 m apart with a planting stand of 13 seeds per meter, totaling a density of 256,000 plants ha -1 . All management and training operations were undertaken according to the crop requirements, following recommendations of Sediyama, Felipe, and Borem (2015).
The following traits were evaluated: number of days to flowering (FL, days); number of days to maturity (MT, days); seed-filling period (FP, days); hypocotyl diameter, measured just above the hypocotyl node (HD, mm); lodging angle (LA, degrees); 100-seed weight (SW, g); average seed yield per plant (SYPL, g/plant), corrected 'for the stand' (plant survival rate); and average seed yield per plot (SY, g). The FL and MT variables correspond to the number of days from seedling emergence until more than 50% of the plants in the plot reached the R2 stage and the number of days before 95% of the pods were mature, as indicated by their color, respectively (Fehr & Caviness, 1977). The seed-filling period was determined as the difference between FL and MT (Panthee et al., 2004). Lodging angle was measured at maturity, ranging from 81° (all plants erect) to 9° (all plants prostrate). To determine the SW, SYPL, and SY traits, the seeds were dried until reaching 13% moisture before analysis.

Statistical analyses
The Restricted Maximum Likelihood/Best Linear Unbiased Prediction (REML/BLUP) procedure was adopted for statistical analyses, following Patterson and Thompson (1971) and Henderson (1975 ). The statistical model associated with the evaluation of genotypes in a randomized complete block design, with one observation per plot in more than one environment and with population data, as shown below: where: y is the vector of phenotypic data; r is the vector of fixed effects (controls, replicate, and location) added to the overall mean; f is the effect of progeny among progenies within population (assumed random), in which ; i is the vector of progeny x location interaction effects (random), in which, ; p is the vector of among-population effects (assumed random), in which, ; j is the vector of population x location interaction effects (random), in which, ; and e is the error vector (random), in which . The capital letters (X, Z, W, S, and T) represent the incidence matrices for the r, f, i, p, and j effects, respectively. For the model without among-population effects, we removed the among-population and among population x environment interaction effects from the above model. Acta Scientiarum. Agronomy, v. 43, e44623, 2021

Evaluation of population effect and selection of progenies for multiple traits
The fit of the models with and without the among-population effects was compared by the Akaike information criterion (AIC) (Akaike, 1974) and the likelihood ratio test (LRT), following Wilks (1938) and using chi-square statistics with one degree of freedom.
After the best-fitting model was chosen, it was used to predict the genetic values (BLUPs) of the progenies; these were calculated as the sum of significant among-progeny within-population and amongpopulation effects. Otherwise, the values were calculated only as the effect that was significant. These genetic values were used in three different selection indices: i) the classical Smith-Hazel index (Smith, 1936;Hazel, 1943); ii) the Additive Genetic index (AGI) ; and, lastly, iii) the FAI-BLUP index proposed by Rocha et al. (2018). For all indices, selection was aimed at reducing FL and MT and increasing the other traits. For i), the multicollinearity diagnosis was carried out in the phenotypic correlation matrix, as recommended by Montgomery and Peck (1992), and the variables that provided a condition number higher than 100 were discarded to solve multicollinearity problems. The genetic coefficients of variation of the progenies were used as the relative economic weight for indices i) and ii) (Bhering et al., 2012). For iii), after ideotypes were determined, the distances from each genotype according to ideotypes (genotypeideotype distance) were estimated and converted into spatial probability, enabling the genotype ranking (Rocha et al., 2018). Oblique criterion rotation was used (Coan, 1959) for analytic rotation and the factor scores were calculated using weighted least squares method according to Bartlett (1938).
In oblique rotations, the assumptions of independent factors are relaxed, and the new axes are free to take any position in the factor space. However, the degree of correlation allowed among factors is generally small because two highly correlated factors are better interpreted as only one factor. Oblique rotations relax the orthogonality constraint to gain simplicity in the interpretation. Oblique rotation produces new axes that are not required to be orthogonal (Bernaards & Jennrich, 2005).

Index comparisons
Comparisons among the SH-BLUP, AGI, and FAI-BLUP indices were carried out based on the predicted genetic gains. For a more valid comparison, predicted genetic gains were calculated using the genotypes indicated by the classical Smith-Hazel index based on the genetic values (SH-BLUP) and using the genotypes indicated by the AGI and FAI-BLUP indices. Lastly, the 24 best progenies (12% selection intensity, approximately) were selected according to each index.

Software
R software (R.Core Team, 2017) was used for deviance analysis, prediction of genetic values using the lme4 package (Bates, Maechler, Bolker, & Walker, 2015), and adjustment of the FAI-BLUP index with a protocol provided by Rocha et al. (2018). Selegen-REML/BLUP software  was used for genetic variance-covariance and genetic correlation analyses and to adjust the AGI index. The SH-BLUP index was run in the GENES software (Cruz, 2013).

Population effect in the models
According to AIC, the model including population effects (Pop(+)) showed the best fit (lowest AIC value) for all traits (Table 1); thus, it was used for the prediction of genetic values. Table 1 also presents the results of deviance analysis for the Pop(+) model and the model without the effect of populations (Pop(-)). In both analyzed models, a significant effect was observed for the estimates of variance associated with the effect of progeny within populations or among progenies for all traits. However, for the Pop(+) model, significant progeny x environment interaction effects (p ≤ 0.05) were only observed for the FL, FP, LA, and SW traits. The Pop(-) model also showed a significant effect of the progeny × environment interaction for the same traits, but the SYPL and SY variables were also significant. However, for the among-population effects, variability was present (p ≤ 0.05) for the FL, MT, FP, DH, and SW traits. For the population × environment interaction effects, a significant effect was only detected for the LA, SYPL, and SY traits.

Exploratory factor analysis -FAI-BLUP index
The first four principal components had eigenvalues higher than one (Kaiser, 1958). Thus, the data may be condensed (dimensional reduction) into four factors that explain 90% of total variability. After oblimin rotation (Table 2), high genetic correlation for the first factor was observed among the DH, FP, and SW traits; the first factor was thus named the 'accumulation factor'. For the second factor, high genetic correlation was observed between SYPL and SY; therefore, it was named the 'yield factor'. The third factor was named 'time factor', since high genetic correlation was observed between MT and FL. The fourth and last factor was termed 'lodging' because LA was the only variable to have a high load on this factor.  Table 3 describes the comparisons among the indices based on predicted genetic gains. According to the SH-BLUP index, the selection and recombination of the 24 best progenies would lead to undesirable gains for FP, whereas AGI indicated undesirable gains for MT. On the other hand, the gains predicted by the FAI-BLUP index were as desired for all traits.

Index comparisons
Direct selection provides the maximum predicted gain when considering one trait at a time. The SH-BLUP, AGI, and FAI-BLUP indices revealed respective gains of 6.32, 48.20, and 48.32% in direct selection for traits whose values are desired to be increased; the indices revealed gains of 58.95, 11.66, and 24.10% for traits whose values are desired to be reduced (Table 3). These results indicate greater balance in the gains obtained via FAI-BLUP. The 24 best progenies were selected using the SH-BLUP, AGI, and FAI-BLUP indices. Coincidences between the progenies selected by the FAI-BLUP and AGI, FAI-BLUP and SH-BLUP, and AGI and SH-BLUP indices were 75, 25, and 16.16%, respectively.  -159, 359, 343, 155, 160, 157, 162, 229, 337, 364, 167, 370, 335, 147, 152, 161, 310, 150, 110, 136, 327, 163, 258, 332 Genotypes selected by the 355,335,138,370,368,324,33,360,233,248,328,134,113,12,316,332,371,329,167,314,39,125,169 Genotypes selected by the 355,322,370,328,33,39,324,368,316,332,317,342,326,314,371,343,125,138,364,167,360,143,134 # FL = days to flowering (days); MT = days to maturity (days); FP = seed-filling period (days); HD = hypocotyl diameter (mm); LA = lodging angle (degrees); SW = 100-seed weight (g); SYPL = average seed yield (g/plant); and SY = average seed yield (g/plot). ‡ ‡ 12% selection intensity (24 genotypes selected) † GCV = genetic coefficient of variation (%). ‡ Traits excluded in SH-BLUP due to multicollinearity problems. § Proportion of gain predicted for traits whose values are desired to be increased (FP, HD, LA, SW, SYPL, and SY) in relation to the gain predicted through direct selection. ¶ Proportion of gain predicted for traits whose values are desired to be reduced (FL and MT) in relation to the gain predicted through direct selection. Figure 1 shows the ranking of the 204 genotypes according to the FAI-BLUP index and its associated spatial probability (color gradient). The results allowed for a simpler, easier, and more objective genotype selection process. The first 24 progenies selected according to the FAI-BLUP index present potential to generate lines of plants that are simultaneously erect, early, and high-yielding.
The AIC is an estimator of the relative quality of statistical models for a given set of data. In a given set of candidate models for the data, the preferred model is that with the minimum AIC value. Thus, AIC rewards goodness of fit but also includes a penalty that is an increasing function of the number of estimated parameters (parsimony criterion). The penalty discourages overfitting because increasing the number of parameters in the model almost always improves the goodness of fit (Akaike, 1974).
The significance of among-progeny within-population effects and among-population effects indicated genetic variability and the possibility of obtaining gains with selection. The results found in this study reveal that the model fitted by including population effects is the most suitable for all evaluated traits (Table 1).
In soybean breeding, the use of population effect to estimate genetic and phenotypic components is often neglected. However, in forest breeding (Furtini, Ramalho, Abad, & Aguiar, 2012;Cappa et al., 2013) and animal breeding (Daetwyler, Kemper, Van Der Werf, & Hayes, 2012;Li et al., 2016), this information improves the accuracy of estimates and consequently contributes to the selection of superior genotypes. Duarte and Vencovsky (2001) and Pereira et al. (2017) pointed out that the inclusion of population information can change the ranking of progenies even under data balancing and orthogonality conditions and thus alter the genetic and phenotypic components.

Soybean ideotype
At present, in soybean breeding, the ideal plant would be erect and have a short production cycle and high grain yields, among other characteristics (Silva, Borém, Sediyama, & Ludke, 2017). On this basis, selecting genotypes with desirable earliness-and yield-related traits in different environments under biotic and abiotic stress conditions will be fundamental for obtaining future gains in soybean breeding programs (Kyei-Boahen & Zhang, 2006;Abrahão & Costa, 2018). This requirement allows farmers from different regions of the world to maximize their growing areas and accelerate the soybean cycle. As a result, a second harvest (with other crops) may be implemented and production losses may be minimized due to the shorter time of exposure to stress factors in the field (Marcos-Filho, Chamma, Casagrande, Marcos, & Regitanod'arce, 1994;Diniz et al., 2013).
In the last few years, soybean breeding programs have reported an increase in yields (Van Roekel, Purcell, & Salmerón, 2015), which have depended on the complex understanding of the genotype x environment interaction and increases in selection accuracy (Kang & Gauch, 1996;Gauch, 2013;Van Eeuwijk, Bustos-Korts, & Malosetti, 2016). According to Van Roekel et al. (2015), the low increases in soybean yield obtained from selection using the FP and SW traits can be explained by the high complexity and elevated number of alleles that contribute to these traits; the low increased in yields can also be explained by the evaluation of these traits under nonideal growing conditions, which result in a high genotype × environmental interaction and low heritability values. Therefore, new methodologies to improve the strategy of selecting the best genotypes should be investigated to minimize these effects and enable the selection of genotypes that meet production demands. Panthee, Pantalone, Saxton, West, and Sams (2007) stated that the quantitative traits of seed-filling period and lodging are related to the seed yield, which has been corroborated by our results from the AGI and FAI-BLUP indices. Lodging tolerance is an important trait for high yields and combine-harvesting effectiveness in soybean (Yamaguchi et al., 2014). Numerous studies have investigated the effect of lodging on yield (Cooper, 1971;Mancuso et al., 1991;Yamaguchi et al., 2014). In the present study, the LA variable had the highest gains in the positive direction of selection, which led to a successful recombination of the selected progenies and to a lower lodging angle in the genotypes that will be obtained; these gains occurred mainly when the AGI and FAI-BLUP indices were used.
Maximizing the yield of soybean under environmental stress (abiotic) conditions is not an easy task (Egli, Orf, & Pfeiffer, 1984;Kantolic, Peralta, & Slafer, 2013). Extending the period from flowering to the date of maturity (i.e., increasing the FP) usually allows for increasing final yields of genotypes (Cooper, 2003;Kantolic, Peralta, & Slafer, 2013). In this sense, selecting materials with shorter vegetative periods (Ve to Vn, according to Fehr & Caviness, 1977) and longer reproductive periods (R1 to R3) may provide longer seed-filling periods and consequently increase yield (Rowntree et al., 2013;.

Selection of high-yielding, early soybean progenies
Several traits have been evaluated in soybean breeding, some of which have commercial relevance for the intrinsic characterization of each cultivar or line (Lersten & Carlson, 2004;Silva et al., 2017). The morphological characterization of the soy plant is of paramount importance in tests of adaptability and stability of the crop (Lin & Binns 1994;Chaves et al., 2017). Therefore, the selection of different soybean genotypes is based on a simultaneous evaluation of many traits of interest in different biotic and abiotic stress conditions. Effective methodologies are thus necessary for the selection of superior genotypes; in this regard, selection indices are a viable alternative.
In the SH-BLUP and AGI indices, the genetic coefficient of variation was considered an economic weight. Thus, the greatest weight was assigned to the LA, SY, and SYPL traits (Table 3). Higher genetic coefficients of variation for those traits are expected to result in higher gains, as predicted by the respective indices (Rocha et al., 2018); however, this was only true with AGI.
The FAI-BLUP index led to the selection of soybean progenies that, after selection and recombination, provided balanced and desirable gains for all traits; FAI-BLUUP did not require assigning economic weights, unlike the SH-BLUP and AGI indices. Rocha et al. (2018) emphasized that FAI-BLUP is able to deal with several colinear traits-i.e., they need not be excluded-in addition to using those traits as auxiliary components.

Conclusion
To obtain truly superior genotypes, one must select those which contain a number of traits that provide comparatively higher yields and meet the consumer-market demands. Therefore, improving the phenotypic expression of several traits for which segregating populations present a continuous distribution depends on the environmental effect and on the presence of various genes involved in the genetic control of those traits. In this scenario, the FAI-BLUP index optimized genetic gains by more effectively ranking the soybean progenies that are earlier, more erect, and have a higher grain yield potential. As such, the FAI-BLUP index contributes to increasing the success in soybean breeding programs.