Partial diallel and genetic divergence analyses in maize inbred lines

In this study, we aimed to estimate general and specific combining abilities (GCA and SCA, respectively) and to verify genetic divergence (Rogers distance, Unweighted Pair-Group Method Using Arithmetic Average UPGMA) using microsatellite markers in maize inbred lines. Using a partial diallel scheme, a total of 19 inbred lines were crossed as (9 x 10), which were derived from the single hybrids SG6015 and P30F53, respectively. The 90 hybrids were evaluated in an incomplete randomized block design with common checks and three replications during the 2017-2018 growing season. Flowering time, average plant height, ear height insertion, average ear diameter, ear length, number of lodged and broken plants, mass of 100 grains and grain yield were measured. According to the analysis of variance, GCA, and SCA were significant (p < 0.05) in all the measured traits; inbred line B as well as 1 and 8, derived from the single hybrids SG6015 and P30F53, respectively, were selected due to their higher GCA values in grain yield to be used in crosses as testers, while the single cross hybrid (B x 1) was selected due to their higher SCA value in grain yield to be used in future breeding programs. The molecular marker analysis divided the inbred lines into two groups, where the highest dissimilarity (0.74) was observed between lines A and 9; however, these did not result in a high SCA value, therefore the hybrids obtained by such crossings were not selected for grain yield.


Introduction
Maize (Zea mays L.) is a diploid organism and has 10 pairs of chromosomes. It has great social and economic importance for humans and animals, and in industry (Grigulo, Azevedo, Krause, & Azevedo, 2011). In allogamous plant breeding, generating base populations for inbred lines is essential to obtain superior hybrids (Hallauer, Carena, & Miranda, 2010). In addition, the development of single maize hybrids depends on heterosis, which is related to genetic distance and the gene complementation effect (Lippman & Zamir, 2007;Schnable & Springer, 2013). Thus, selecting inbred lines based on genetic effects and heterotic groups to achieve superior single cross hybrids is necessary. Furthermore, diallel analysis is a widely used tool in breeding programs to obtain genetic information. This controlled mating system enables the estimation of the general and specific combining ability (GCA and SCA, respectively), where GCA reflects the proportion of additive effects related to the parents and SCA reflects the non-additive genetic effects that indicate deviation from a specific cross relative to the expected GCA performance of its parents (Cruz, Regazzi, & Carneiro, 2012). The classic diallel analysis performed in the field is advantageous as several phenotypes, with parental plants and their hybrids can be observed, under field conditions, which would be impractical in a molecular marker analysis. However, field diallel analysis is limited by the number of parents due to the labor-intensive nature of obtaining hybrids via performing multiple manual crossings between parental lines. Moreover, diallels are hindered by the low availability of hybrid seeds, since single hybrid seeds are produced in small ears by plants with inbreeding depression effects. In addition, manual crosses between early and late inbred lines can be difficult to perform with the lack of coinciding flowering times. All these difficulties limit field experiment replications, while increasing the residual mean square in the analysis of variance and lowering the probability of identifying significant differences between treatments. The loss of plant crossings often occurs, which causes data imbalances and complicates the statistical analysis. Finally, the occurrence of genotype x environment interactions is very frequent in the field, as genotypes may have different GCA and SCA values, according to the environmental conditions.
Using markers to analyze the genetic divergence can minimize the problems of field diallel analysis, such the genotype x environment interaction. However, its use is indicated only when certain conditions are satisfied, including an adequate laboratory and qualified personnel for handling the markers. When choosing markers, the genome coverage, the capacity to distinguish heterozygous from homozygous genotypes (dominant vs. codominant markers), the need for probe development, the amount of DNA per sample, the genetic information at each locus, and reproducibility need to be considered. These factors can influence the applicability of each type of marker during plant selection. In this study, we examined the results of both the field diallel analysis and the molecular analysis with their potential convergence to verify whether one or both should be taken into consideration. Convergence tends to be greater when marker coverage is broad and inclusive of the maize chromosomes. In the case of microsatellite markers (SSR) markers, results are optimized when the primers are associated with genes encoding traits of agricultural importance. If the used primers are not linked to such genes, the hybrid performance prediction is hindered in the parental molecular analysis. This limitation is even more severe when important agronomic traits are controlled by additive genes.
Recently, different molecular markers have been used to detect heterosis and polymorphisms related to gene similarity in parents (Munhoz Prioli, Amaral Junior, Scapim, & Simon, 2009), which is a main factor that affects heterosis (Hallauer et al., 2010). In this context, SSR are sequences of two to six base pairs that are repeated in tandem and are broadly used for their codominant inheritance and multiallelic nature, which provides valuable information regarding polymorphisms (Souza et al., 2008). Molecular markers have been very useful in breeding programs for genotype clustering in different heterotic groups (Reif et al., 2003). Bertan et al. (2007) reported that the analysis of genetic variability by using genetic and morphological distances is fundamental for efficient breeding programs. In this work, the diallel methodology was employed to quantify the combinatory capacity and facilitate the identification of superior genotypes. We have examined the correspondence between the combinatory capacity estimates obtained from the diallel analysis with the genetic distances of the parents estimated by pedigree information and SSR. We aimed to i) obtain the groups of inbred lines using the unweighted pair-group method using arithmetic average (UPGMA); ii) estimate the dissimilarity matrix using Rogers distance; iii) estimate the GCA and SCA values in the two divergent groups of maize inbred lines to compare with the molecular results; and iv) estimate whether the use of molecular markers can support diallel analyses in the field.

Diallel analysis
A total of nineteen maize inbred lines were selected from the core collection of the State University of Maringá maize breeding program to be used as parents in a partial diallel scheme. Parents were divided according to the population of each inbred line, where nine inbred lines were derived from the commercial single hybrid SG6015 (group I), being coded A to I, whereas the remaining ten inbred lines were derived from the commercial single hybrid P30F53 (group II), being coded as 1 to 10. Pollinations were performed in the growing season of 2016/2017 at Iguatemi Experimental Farm (latitude 23º25' S; longitude 51º57' W, and altitude 550 m asl), located at Maringá, Paraná State, Brazil. The 19 inbred lines were sown as pairs in all possible combinations of a partial diallel in 10 m rows with 0.9 m and 0.20 m spacing between the rows and plants, respectively. During flowering, pollinations were performed manually. The field trial was performed during the 2017/2018 growing season, at Fazenda Experimental de Iguatemi (latitude 23º25' S; longitude 51º57' W and altitude 550 m asl), located at Maringá, Paraná State, Brazil. The region's climate was classified as Cfa, according to the Köppen (1918) classification, with an annual average temperature of 19°C and an annual rainfall of 1,500 mm. The field trial was arranged in an incomplete randomized block design with common treatments, as proposed by Pimentel Gomes and Guimarães (1958). The 90 regular treatments were divided into groups, with three commercial checks used as the common treatment between the groups, and three replications. The hybrids P30F53, DKB 290, and 2B688 were used as commercial checks. Each plot consisted of two 5 m rows spaced 0.90 m apart, resulting in a usable area of 9 m². Each plot was thinned after 30 days to a density of 5 m -1 , yielding a population of approximately 55,500 plants ha -1 during harvesting. The following traits were evaluated: female (FF, days) and male (MF, days) flowering time; Acta Scientiarum. Agronomy, v. 43, e53540, 2021 average plant height (PH, m) and ear height insertion (EH, m) from six competitive plants; average ear diameter (ED, m) and ear length (EL, m) from ten ears; number of lodged (NL), and broken (NB) plants per plot; and mass of 100 grains (MG, kg) and grain yield (GY, kg ha -1 ), which were corrected to 13% moisture. To obtain the analysis of variance and the adjusted means of each evaluated treatment in the field trial, the following statistical model was used: In the model above, Y ijk is the value for the i-th treatment, in the k-th replications, and in j-th experimental group;  is the overall mean;  i is the fixed effect of treatment i;  j is the random effect of group j;  k is the random effect of replications (k);  i  ij is the random effect of the interaction among groups and treatments, where  i = 1 when it is a common treatment (commercial checks), or  i = 0 when it is a regular treatment; and  ijk is the error value. Least square means were estimated through this model and then used as phenotypic inputs for the diallel analysis according to model IV that was proposed by Griffing (1956) and adapted for partial diallel schemes by Geraldi and Miranda Filho (1988): In the model above, Y ijk is the average value of the hybrid combination involving the i-th parent of group 1 and the j-th parent of group 2; Y i0 is the average of the i-th parent of group 1; Y 0j is the average of the j-th parent of group 2; µ is the general average of diallel; d1, d2 are contrasts involving means of groups 1 and 2 and the general average of diallel; gi is the effect of GCA of the i-th parent of group 1; g' j is the effects of GCA of the j-th parent of group 2; s ij is the effect of specific combining ability; and jk is the mean experimental error. All analyses were performed using the statistical software suite SAS (2013) (v9.4, SAS, IBM, USA) and Genes (Cruz, 2013), an alpha of 5% probability of error was adopted.

Genetic divergence using SSR markers
The youngest leaves of five plants were sampled from each inbred line approximately 30 days after germination, immediately frozen in liquid nitrogen, and transferred to -80º C. The DNA was extracted using a protocol described by Hoisington, Khairallah, and Gonzälez (1994) with minor changes. DNA quality was evaluated on a 1% agarose gel and quantified using a Picodrop microliter UV/Vis spectrophotometer, where the DNA concentration was adjusted to 10 ng µL -1 for amplification. DNA amplification was performed in a thermal cycler using the Touchdown PCR methodology (Don, Cox, Wainwright, Baker, & Mattick, 1991) and separated using 4% agarose gel (50% agarose and 50% agarose Metaphor CAMBREX) in TBE buffer X 0.5 (44.5 mM Tris, 44.5 mM boric acid, and 1 mM EDTA). The gels were exposed to an electric field of 60 V for about 4 hours, stained with 0.5 µg mL -1 ethidium bromide solution and photographed under UV light. The alleles that were amplified were differentiated using 100 pb DNA ladder (Invitrogen, Thermo Fisher Scientific Corporation, USA). Each amplified DNA fragment identified in the gel was considered a distinct polymorphism for each phenotype, where it was considered a single locus with respect to its marker. SSR marker profiles of each inbred line were determined by numerical codes related to each allele, where presence or absence was scored as 1 or 0, respectively, according to the multiallelism of each SSR marker (Cruz et al., 2012). Heterozygosity, number of polymorphic loci in each SSR locus, and the total number of alleles were assessed using GenAIEx software v6.5 (Peakall & Smouse, 2012), while the polymorphism of each primer (PIC) was evaluated using Power Maker software (Liu & Muse, 2005). Genetic distance among the genotypes was estimated using Rogers (1972) distance with the following model: In the model above, m is the number of evaluated loci; p1lu is the allele frequency u in the population p1; and p2lu is the allele frequency u in population p2. Dendrogram clustering was performed using UPGMA with Mojena (1977) methodology for defining dendrogram cuts. Cophenetic correlations were also estimated using the Genes software (Cruz, 2013).

Results and discussion
Treatment effects were significant (p < 0.05) in almost all evaluated traits (Table 1), with the exception of PH, indicating the average differences among the genotypes. These responses constituted a key element for breeding programs, justifying the partition of variance in the groups of interest in the diallel analysis of variance, which was not performed in PH as its effects were not significant. The coefficients values in the experimental variation ranged from low to moderate in almost all traits, except for the number of lodged and broken plants, when compared to reports of diallel crosses using inbred lines (Durães et al., 2002;Silva et al., 2010;Conrado, Scapim, Bignotto, & Pinheiro, 2014;Werle et al., 2014) and reference values proposed for maize by Fritsche-Neto, Vieira, Scapim, Vieira, and Rezende (2012), indicating acceptable experimental precision. Evidently, the coefficients of experimental variation depend on the evaluated trait, the unit of evaluation, and on the genetic structure of the evaluated populations. The summary results of the diallel analysis of traits FF, MF, EH, EL, ED, NL, NB, MG, and GY are shown in Table 2, which indicated significant differences (p < 0.05) in GCA I, GCA II, and SCA effects in all evaluated traits. This indicated the different genetic contribution among the inbred lines in terms of additive effects and the differential performance of the single cross hybrid combinations when compared to what expected from the GCA of their parents. In practical terms, this meant that it was possible to select the best parental inbred lines, followed by the selection of hybrids with high SCA, which were selected from crosses derived from the same common parent that was previously highlighted by its GCA. The values of GCA estimates have been summarized in Table 3. Within the first group, the inbred lines 1, 4, 7, 8, 9, and 10 had positive ĝ i values in grain yield, indicating a likely superiority in the quality of their gametes. Within the second group, the inbred lines A, B, F, G, and H had positive values of ĝ i in grain yield, while also exhibiting a certain superiority in their gametes. Mai ze breeding programs usually seek hybrids that combine high grain yield and earliness (short period of time in days from sowing to silking) with low plant and ear height. Therefore, GCA enabled the best parents to be selected based on the additive genetic effects to form superior single cross hybrids with higher frequencies of favorable alleles (Cruz et al., 2012). In this sense, selecting inbred lines with higher negative values of ĝ i in male flowering, female flowering, plant height and ear height can result in lower seedling-flowering cycles as well as lower to moderate plant and ear height progenies.
Acta Scientiarum. Agronomy, v. 43, e53540, 2021 Considering the GCA significance of the EH trait (Table 2), inbred lines D (ĝ i = -0.08), F (ĝ i = -0.06), and H (ĝ i = -0.07) from group I (SG6015 derived) and inbred lines 3 (ĝ i = -0.11) and 5 (ĝ i = -0.08) from group II (P30F53 derived) were selected due to their negative GCA values, since their negative values of EH could result in lower ear height hybrids. In general terms, it also should be desirable to select the most promising genotypes according to their negative estimates of ĝ i to reduce the values of traits such as EH, FF, and MF for superior hybrids in future crosses. In FF and MF, inbred lines C (ĝ i = -0.7 and ĝ i = -0.57, respectively) and G (ĝ i = -0.46 and ĝ i = -0.18, respectively) from group I and inbred line 2 (ĝ i = -1.54 and ĝ i = -1.75, respectively) from group II had lower values, hence, can be used in future crosses to obtain early progenies. With regards to the number of lodged plants, inbred lines A and D as well as 3, 5, and 6 from groups I and II, respectively, were identified as superior parents given their negative GCA values (ĝ i = -0.30 , ĝ i = -0.30, ĝ i = -0.33, ĝ i = -0.25, and ĝ i = -0.25, respectively), while regarding the number of broken plants, inbred lines B, D, and F as well as 3, 5, and 8 from groups I and II were selected for their negative GCA values (ĝ i = -1.67, ĝ i = -2.28, ĝ i = -1.63, ĝ i = -1.77, ĝ i = -1.43 and ĝ i = -2.09, respectively) to reduce the number of broken plants. In addition, GCA results of groups I and II regarding ear traits revealed that, inbred lines E (ĝ i = 0.007) and F (ĝ i = 0.009) as well as 1 (ĝ i = 0.012) and 8 (ĝ i = 0.010) in groups I and II, respectively, were selected based on higher values of EL, while in terms of ED, inbred lines B (ĝ i = 0.019), C (I i = 0.022), and I (ĝ i = 0.011) from group I were selected with higher GCA values. Moreover, in terms of MG, inbred lines B (ĝ i = 0.002) and F (ĝ i = 0.003) from group I and 10 (ĝ i = 0.003) from group II could be used in future crosses given their highly positive estimates. Finally, in terms of GY, inbred lines B (ĝ i = 617.5104) as well as 1 (ĝ i = 705.7), and 8 (ĝ i = 864.3) from groups I and II, respectively, had higher GCA values. Since most of the evaluated traits were mainly quantitative, inherited, and influenced by different genes, it was nearly impossible to select a single genotype that performed the best in all traits, highlighting the challenges of plant breeding. Even so, inbred lines B (NB, ED, MG, and GY), C (ED, MF, FF, and MG), 1 (EL, ED, and GY), and 8 (NB, EL, and GY) were consecutively selected for their higher GCA values, representing a direct reflection of a higher frequency of favorable alleles with additive effects. This has great potential for obtaining superior genotypes for maize breeding.
SCA reflects parent specificity within crosses with regard to the complementation effect between alleles derived from each parent (dominance effect) and the interaction effect between alleles of different loci involved in the trait transmission (epistatic effect). Higher SCA estimates, regardless of the signal, indicate that SCA performance was different from that expected based on the GCA of the parents (Vencovsky & Barriga, 1992). Furthermore, SCA is also related to the genetic distance between parents and reveals the importance of non-additive interactions in hybrid combination (Lippman & Zamir, 2007). According to Cruz et al. (2012), the most promising hybrid combinations must be selected based on estimates of the SCA that most favor the trait in question. In this sense, the best hybrids would be those with at least one selected parent based on its ĝ i estimation, thereby presenting a higher frequency of favorable alleles relative to the average frequency in the parents involved in the crosses (Vencovsky & Barriga, 1992;Cruz et al., 2012). SCA estimations (ŝ ij ) for grain yield have been summarized in Table 4. With regard to male and female flowering, one hybrid (E x 2) obtained a greater negative SCA value with one selected parent based on ĝ i estimations, which could be selected for a reduced sowing-flowering cycle. In terms of EL, two crosses (A x 8 and F x 9) were selected with higher SCA values and at least one parent with higher GCA values, while in terms of ED, only one cross (I x 10) had a higher SCA value and one selected parent based on GCA values. However, regardless of the higher correlations that have already been reported in both traits (El-Shouny, Olfat, Ibrahim, & Al-Ahmad, 2005;Suhaisini, Ravikesavan, & Yuvaraja, 2016), no crosses could be simultaneously selected in both traits. Regarding the MG trait, one hybrid was selected with higher SCA values and a parent with higher GCA values (H x 10). Finally, in terms of GY, two crosses (B x 1 and B x 3) showed the highest levels of genetic complementation in enhancing grain yield. The genetic divergence analysis using SSR markers revealed that 75 out of 195 primers were polymorphic in all 19 inbred lines, representing 34.4% of the total, and 32 primers were selected based on their visual allelic amplification on the agarose gel. In addition, the number of alleles per locus ranged from two to five, totaling 93 different alleles. Primers Mcm0181 (five alleles) and Umc 2408 (five alleles) showed the highest number of alleles when compared to other primers. These results were higher than those described by Dandolini et al. (2008), who reported 27.4% polymorphic markers with a number of alleles that ranged from two to five. Shah et al. (2009) reported an average of only 1.56 alleles per locus when using 10 SSR markers in 17 maize inbred lines. However, when higher numbers of inbred lines and SSR markers are used, it is expected to observe higher numbers of alleles per locus, where Van Inghelandt, Melchinger, Lebreton, and Stich (2010) reported an average of 14.5 alleles per locus when using 359 SSR markers in 1,537 inbred lines. In addition, Yang et al. (2011) found an average of 8.2 alleles per locus in 154 inbred lines and 82 SSR markers, while Malik, Kumar, and Babu (2020) obtained an average of 4.9 alleles per locus when using 46 SSR markers in 47 genotypes.
According to Legesse, Myburg, Pixley, Twumasi-Afriyie, and Botha (2008), lower genetic distances among genotypes could be a limiting factor when identifying polymorphisms by reducing the number of alleles per locus. A possible reason for observing a lower number of alleles per locus in this study could be that group I and group II were consisted of inbred lines selected from single cross hybrids, which were expected to have narrow genetic bases with less alleles per locus.  Nepolean, Sing, Hossain, Pandey, and Gupta (2013), since maize is an allogamous species, residual heterozygosity could be expected at 5 to 10% rates even in advanced selfing generations. Pollen and seed contaminations, microsatellites specific mutations, and the amplification of two similar but distinct SSR regions, could also explain the presence of heterozygotes in advanced selfing generations (Liu et al., 2003;Labora, Oliveira, Garcia, Paterniani, & Souza, 2005;Legesse et al., 2008). Furthermore, inbred line 2 had two specific alleles in loci Umc 1137 and Umc 2408, while inbred line H also had a specific allele for locus Umc 2410 with 1.0 frequency. Among all the genotyped inbred lines, the average proportion of polymorphic markers was 18.2%, which is a much lower value when compared to the popcorn inbred lines estimated by Liu et al. (2003) and Dandolini et al. (2008). According to Vigourox et al. (2002) and Hamblin, Warburton, and Buckler (2007), it is expected to observe lower allele frequencies in microsatellites, mainly because these genomic regions are highly mutable, where the mutation rate per generation was estimated to range from 7.7 -4 to 1.1 -7 (Vigourox et al., 2002).
Furthermore, PIC values can be used to differentiate markers by their polymorphisms, since the allele loci number and relative frequency help estimate PIC (Cruz, Ferreira, & Pessoni, 2011). In this study, PIC values ranged from 0.69 (Bnlg 1297, four alleles) to 0.11 (Umc 1169, two alleles), with an average value of 0.45. Similar results were also found by Lopes, Scapim, Mangolin, and Machado (2014), using 15 sweet corn inbred lines in a divergence genetic study, where they found 15 out of 100 polymorphic SSR markers had an average PIC of 0.41. Almeida, Amorim, Neto, Filho, and Sereno (2011) obtained PIC values ranging from 0.26 to 0.76 in populations of field corn and teosinte, while Nikolić et al. (2019) found, in their genetic divergence study using 24 maize genotypes, PIC values ranging from 0.57 to 0.89 with an average value of 0.73. In addition, Cruz et al. (2011) highlighted that PIC values reflect whether a marker is informative or not relative to their capacity for genetic divergence analysis. According to Botstein, White, Skolnick, and Davis (1980), values greater than 0.5 or less than 0.25 were regarded as highly or marginally informative markers, respectively. Moreover, the dissimilarity matrix estimated from Rogers distance showed that the highest genetic distance was obtained in inbred lines A x 9 (0.74), whereas the lowest estimation was in inbred lines C x I (0.248). In addition, the UPGMA dendrogram clustered the 19 inbred lines into two different groups (Figure 1), where group 1 encompassed 11 inbred lines (C, I, G, A, D, B, F, H, 4, 8, and 2), while group 2 included eight inbred lines (1, 3, 5, 6, 7, 9, 10, and E). The estimated cophenetic correlation coefficient (r) was 0.73, which was higher than what was observed in Guimarães et al. (2007) (r = 0.57) and Xia et al. (2004) (r = 0.63). Ferreira (2008) suggested a value close to 1 for a better adjustment of distances, whereas Patto, Satovic, Pêgo, and Fevereiro (2004) recommended a value higher than 0.56 for maize inbred lines. According to the UPGMA clustering and the five highest SCA values in GY, four hybrids (80%) consisting of inbred lines were classified in distinct UPGMA groups, which was not consistent with the hybrid having the highest SCA estimation (H x 4) with both inbred lines clustered in the closest genetic groups. In addition, the largest genetic divergence observed among the inbred lines A x 9 did not necessarily result in a higher SCA valu. In fact, hybrid A x 9 performed poorly in terms of SCA (672.9853) in their grain yield. Sharma and Pankaj (2018) reported a concordance index value of 47% when comparing SSR markers using clustering and heterosis. Despite reports of good concordance in terms of SCA values and genetic distances estimated using SSR markers, Munhoz et al. (2009) and Fernandes, Schuster, Scapim, Vieira, and Coan (2015) found lower concordance or almost no concordance in terms of genetic divergence in grain yield and in other complex quantitative traits. In this work, we found a partial convergence in the molecular genetic distances and the genealogy of diallel parents, in which they both affect the arrangement of parents in the two groups. However, in the grain yield trait, the concordance among the SCA values and the genetic distances estimated by SSR markers was not optimal. For example, the hybrid (A x 9) was generated from parents with greater genetic distances and more heterozygosity, but it was not selected for its total grain yield. This indicated that both analyses were important in phenotyping and genotyping approaches to achieve better selections among hybrids and their parents, so that molecular markers could supplement the diallel analysis made in the field.

Conclusion
i) The molecular marker study divided the inbred lines into two groups; ii) the dissimilarity matrix showed that the largest genetic distance was obtained in inbred lines A and 9; iii) the larger genetic distance trait did not correspond with hybrids that have favorable estimates of SCA and according to the UPGMA clustering, 85% of the five hybrids had parents from different groups, where they all had favorable values of ŝ ij in grain yield, and iv) this study indicated that both the molecular and diallel analyses were useful, where molecular marker analyses could support diallel crosses in the field.