Estimates of the genetic parameters, optimum sample size and conversion of quantitative data in multiple categories for soybean genotypes

The objective of this study was to estimate the genetic parameters and optimal sample size for the lengths of the hypocotyl and epicotyls and to analyze the conversion of quantitative data in multiple categories for soybean genotypes. A total of 85 soybean genotypes were analyzed in four experiments in a completely randomized design under greenhouse conditions at the Federal University of Viçosa, Brazil. The magnitude of the genetic parameter estimates characterized the influence of the genetic components in the phenotypic expression of the length of the hypocotyls and epicotyls of different soybean genotypes. The optimal size of the sample varied between the genotypes and the conversion of the quantitative data in multiple categories provided estimates of correlation coefficients greater than 0.80 and coincidence greater than 81% between the converted data and the original data.


Introduction
Soybean (Glycine max [L.] Merrill) shows awide adaptation to tropical and subtropical climates (SEDIYAMA et al., 2005).During the 2010/2011 season in Brazil, it was estimated that a cultivated area of 24,165,000 hectares would produce 72,227,800 tons and an average yield of 2,989 kg ha -1 (CONAB, 2011).In 2005, because of the wide acceptance of new technologies by soybean producers in Brazil and the associated efforts of soybean breeding programs, it was estimated that the average productivity in some regions of the centralwestern region could exceed 3,000 kg ha -1 (SEDIYAMA et al., 2005).Six years later, Conab (2011) estimated that the average productivity in the central-western states (Mato Grosso, Mato Grosso do Sul, Goiás and Distrito Federal) for the 2010/2011 harvest would be 3,031 kg ha -1 .
The many uses of soybeans, such as in the oil and bran industries, agribusiness and human consumption, are incentives for breeders to develop superior cultivars (SEDIYAMA et al., 2005(SEDIYAMA et al., , 2009)).In addition, there has been an intense development of new cultivars in Brazil, especially after 1997, when the Law of Plant Varieties Protection number 9,456 of April 25, 1997, regulated by the Act number 2,366 of November 5, 1997, was enacted (NETO et al., 2005).
For a particular cultivar to be granted protection, it must meet three basic requirements: it must be distinct, uniform and stable.A cultivar is considered a distinct plant when it is clearly distinct from any other cultivar in existence on the date that the application for protection is recognized (FERRAZ;CAMPOS et al., 2009).Currently, approximately 38 descriptors between obligatory and additional are used to differentiate soybean cultivars.However, these descriptors are insufficient to distinguish the cultivars, clearly showing the need to expand the list of descriptors (NOGUEIRA et al., 2008).According to Nogueira et al. (2008), the length of the hypocotyl and epicotyl remain useful characters to distinguish soybean genotypes.
The knowledge of genetic parameters, such as the genotypic determination coefficient, the component of genotypic variability and the relationship between the coefficients of genetic variation by environmental variation (which control certain characters), is of great importance for the breeder because it allows the determination of the best method of improvement for the culture (CRUZ et al., 2004).Several studies have been performed to estimate genetic parameters in soybean (COSTA et al., 2008;FARIAS NETO;VELLO, 2001;GOMES et al., 2004;MAURO et al., 1995;NOGUEIRA et al., 2008;REIS et al., 2002) and in other cultures (ANDRADE et al., 2010;ARNHOLD;MILANI, 2011;BORGES et al., 2010;DAHER et al., 2004;ROCHA et al., 2010).
The estimate of the optimal size of the sample, with regard to the number of individuals, is based on the simulation, which evaluates the variation of the average, variance and coefficient of variation in relation to the maximum size of the test, that is, the original population (CRUZ, 2006b).Furthermore, to determine the optimal sample size, one can take into account the minimum size capable of representing the average, variance and coefficient of variation of the original population for all or a certain set of characteristics that are the most important (CRUZ, 2006b). Cargnelutti Filho et al. (2009) reported that the sample size is directly proportional to the variability of the population data and studies related to the determination of the sample size have been conducted in various cultures, with different objectives (ADAMI et al., 2010;CARGNELUTTI FILHO et al., 2011;ESTEFANEL et al., 1984;MARTIN et al., 2005;PEREIRA et al., 2009;SILVA et al., 2011).
Based on the aforementioned considerations and the fact that there are few studies in the literature involving the estimation of genetic parameters, the determination of the optimum sample size and the conversion of quantitative data in multiple categories of the lengths of the hypocotyl and epicotyl in soybean genotypes, the need to study the genotypes of soybeans to elucidate these topics is evident.Therefore, this study aimed to estimate the genetic parameters and the optimal sample size for the lengths of the hypocotyl and epicotyls and to convert the quantitative data into multiple categories of soybean genotypes.

Material and methods
The experiments were conducted under greenhouse conditions at the Soybean Genetic Improvement Program of the Federal University of Viçosa in Viçosa -Minas Gerais State (20°45'14''S, 42°52'54''W, 649 m of altitude).A total of 85 soybean genotypes were evaluated in four experiments.For planting, a random sample of seeds consisting of different sizes was used and the depth was set at 3.0 cm.After the germination, the plants were managed according to the recommendations of the culture.
Initially, the Lilliefors or Kolmogorov-Smirnov's tests were performed to assess whether the data followed the normal distribution.Posteriorly, the data were analyzed for variability in the lengths of the hypocotyl and epicotyl in each experiment, adopting a fixed statistical model.Additionally, the following genetic parameters were estimated: the genotypic quadratic component ( ), coefficient of variability ( ), coefficient of genotypic variation ( ), ( ) ratio and coefficient of genotypic determination (H 2 ).The equations are described below: where: QMG = Average square of the genotype; QMR = Average square of the residue; r = Number of experimental replications.
where: = General average of the general trait.
Based on the data from Experiment 2 for each genotype, we determined the optimal sample size of the sample that represented the average and the variance of the original character, on the basis of the simulation, graphical analysis and confidence intervals for the average and the variance at a 95% level of probability.
In addition, the conversion of the quantitative data (lengths of hypocotyl and epicotyl) of four experiments into multiple categories (3 and 5 classes) was based on the number of predefined classes from the fair division of amplitude.Although it would be interesting to evaluate the plants directly in each class, because this work can be considered an initial effort in categorizing the lengths of the hypocotyl and epicotyl, in 3 and 5 classes, it was necessary to first measure the plants and then categorize them.The converted data were analyzed by Spearman's correlationand the coefficient of coincidence, considering a sample of 50%.According to Cruz (2006a), the correlation of occurrence of accessin the coefficient of coincidence is calculated in the set formed by the observations above or below for each pair of characters regarding a predetermined sample size, n 1 , with n 1 <n.The analyses were performed using the Genes Program: Experimental Statistics and Matrices (CRUZ, 2006a) and Genes Program: Multivariate Analysis and Simulation (CRUZ, 2006b).

Results and discussion
The normality tests showed that it was possible to analyze the data through a normal distribution.Significant differences were observed (p < 0.01) by the F test for the lengths of the hypocotyl and epicotyl in the four experiments (Table 1), indicating the existence of genetic variability among the genotypes, which enables success when selecting promising materials.Similar results for the hypocotyl length were obtained by Nóbrega and Vieira (1995) and Costa et al. (1999) and for the lengths of the hypocotyl and epicotyl by Nogueira et al. (2008).The estimates of the genotypic quadratic component ( ) of the length of the epicotyl were 7.1 times greater than the estimates for the hypocotyl in the average of four experiments (Table 1).
For the experimental coefficient of variation (CV e% ), the estimates ranged from 8.0413 to 18.2839% for the hypocotyl length and 11.1836 to 13.7596% for the length of the epicotyl in the four experiments.The magnitudes of the CV e% were found to be consistent with those reported by Bays et al. (2007) andNogueira et al. (2008).The coefficient of the genetic variation (CV g% ) ranged from 7.8205 to 25.1840% and 10.8847 to 36.4231% for the lengths of the hypocotyl and epicotyl, respectively, in the four experiments.The CV g% /CV e% ratios were higher than the unit in experiments 1 and 3 (Table 1).In the experiments 1 and 3, there was the possibility of successfully identifying superior genotypes because the genotypic variation exceeded the environmental (VENCOVSKY, 1987) and because the ratio of the genetic variance coefficient (CV g% ) and experimental coefficient of variance (CV e% ) demonstrates how the experimental variance is explained by the genotypic variance.
The magnitude of the estimates in the determination of the genotypic coefficient (H 2 ) for both characters that were analyzed were above 82% and values above 98% were observed for the length of the hypocotyl in experiment 2 and for the lengths of the epicotyls in experiments 1 and 2 (Table 1).Similar results were obtained by Nogueira et al. (2008).These results indicate that the genotypes had a higher genetic variability for the length of the epicotyl and that there was a large genetic influence over the environmental effects on the lengths of the hypocotyl and epicotyl.This is because, according to Cruz (2005), the H 2 is a measure that is analogous to the heritability and expresses the phenotypic variance because of the genetic variability between the treatment averages.Thus, high estimates of H 2 indicate that most of the variation between the averages of the genotypes is genetic in nature (VENCOVSKY, 1987).In addition, the highest importance of the heritability in genetic studies of metric characters is its predictive role, which is expressed by the confidence of the phenotypic value as a guide of the genetic value or the degree of correspondence between the phenotypic value and the value of a population or a set of genotypes (FALCONER, 1987).Moreover, Andrade et al. (2010) suggested that, for the characters that have a high genetic component in their phenotypic expressions, obtaining gains by selection can be achieved via visual or mass selection.
The estimates of genetic parameters, in general, for the lengths of the hypocotyl and epicotyl in the four experiments showed a situation that is favorable for the identification of superior genotypes.These characters may be useful in the selection of genotypes of different behaviors to contribute to the distinctiveness requirement of the Plant Variety Protection.
The minimum sample size would be the one from which all of the r simulations provided values of average and variance within the confidence intervalat 95% probability (CRUZ, 2006b).Thus, the minimum optimal number of plants to be measured to represent the average hypocotyl length was 45, 53, 52 and 45 for BRS Valiosa RR, Água-Marinha RR, UFVS 2010 and NK 7059 RR, respectively, whereas the average length of the epicotyls was 49, 43, 51 and 49 for BRS Valiosa RR, Água-Marinha RR, UFVS 2010 and NK 7059 RR, respectively (Figure 1).For the variance of the length of the hypocotyl, the optimal size of the sample was 60, 64, 58 and 73 plants for BRS Valiosa RR, Água-Marinha RR, UFVS 2010 and NK 7059 RR, respectively; for the variance of the length of the epicotyl, the optimal size was 70, 29, 64 and 70 for BRS Valiosa RR, Água-Marinha RR, UFVS 2010 and NK 7059 RR, respectively (Figure 2).The optimal size for the average showed small variations for the two variables and for the four genotypes, namely, the samples for the lengths of the hypocotyl and epicotyl varied for eight plants between the genotypes.However, for the optimal minimum number that represents the greatest variance, variations among the genotypes observed: 15 plants for the length of the hypocotyl and 41 for the length of the epicotyl.The sample size for the average was lower compared with the variance for all of the genotypes.Based on other evaluation methodologies, the variability of the estimate of the sample size among genotypes for the plant height, height of first pod, stem diameter, number of nodes on the stem, number of pods on the stem, number of grains on the stem and grain weight on the stem (ESTEFANEL et al., 1984) and the number of nodes per plant was identified (CARGNELUTTI FILHO et al., 2009).
We used Spearman's correlation (Table 2) and coincidence (Table 3) to compare the variation of the averages of the three data sets (original, 3 classes and 5 classes) after the data conversion.In the correlation between the three data sets in the four experiments, we observed that the estimates were greater than or equal to 0.8000 and 0.8549 for the length of hypocotyl and epicotyl, respectively.The coefficients of coincidence ranged from 81.81 to 100.00% for the hypocotyl length and 85.71 to 100.00% for the epicotyl length in the four experiments.The correlation reflects the degree of association between the characters and this knowledge is important because it shows how the selection of a character influences the expression of another (CRUZ et al., 2004).The coefficient of coincidence is calculated by the concordance of occurrence of genotypes in the group formed by the observations from above or below for each pair of characters over a sample of predefined size (CRUZ, 2006a).For this study, it was observed that the data sets showed variation in proportional magnitude and the concordance of the results for both the upper and lower observations showed satisfactory results when the figures were converted into multiple categories with 3 and 5 classes.
Thus, the conversion of the data presented in this study may be considered useful in plant improvement, as the lengths of the hypocotyl and epicotyl showed high estimates of correlation and coincidence coefficients and the possibility of the selection of desirable genotypes.Moreover, the selection based on information from multiple categories (3 or 5 classes) can facilitate the process of evaluating and identifying superior genotypes.However, studies to identify genotype standards are necessary to standardize the methodology of the analysis of the hypocotyl and epicotyl lengths in soybean genotypes.

Conclusion
The magnitude of the genetic parameter estimates characterized the influence of the genetic components in the phenotypic expression of the length of the hypocotyls and epicotyls of different soybean genotypes The optimal size of the sample, with regard to the length of the hypocotyl, varied among the genotypes and ranged from 45 to 53 for the average and 43 to 51 for the variance.Regarding the length of the epicotyl, the optimal sample size was between 58 and 73 for the average and between 29 and 70 for the variance.
The conversion of quantitative data into multiple categories provided estimates of correlation coefficients greater than 0.80 and coincidence greater than 81% between the converted data and the original data.
**Significant at1% probability by the F test.

Table 2 .
Spearman correlation coefficients among three traits evaluated in two additional descriptors (Hypocotyl -above the diagonal and Epicotyl -below the diagonal) measured in four experiments, conducted under greenhouse conditions, in Viçosa, Minas Gerais State, 2006-2011 1 .
1 Estimation of correlation based on the average of the genotypes.

Table 3 .
Coefficients of coincidence in number and percentage (in parentheses) among three traits analyzed in two additional descriptors of soy measured in four experiments, where the coefficients above the diagonal refer to a higher average and the coefficients below refer to inferior averages, conducted under greenhouse conditions, in Viçosa, Minas Gerais State, 2006-2011 1 .