Genetic diversity and population structure of sugarcane ( Saccharum spp.) accessions by means of microsatellites markers

. The success of sugarcane ( Saccharum spp.) breeding programs depends on the choice of productive parent lines that have a high industrial yield and are genetically divergent. This study assessed the genetic divergence and population structure of sugarcane accessions that are the parents of the RB05 Series of the Sugarcane Breeding Program of Brazil. The DNA of 82 accessions was evaluated using 36 simple sequence repeat markers. The Jaccard similarity coefficient and Unweighted Pair Group Method with Arithmetic Mean clustering method were used to generate a cluster that was divided into 17 distinct groups derived from probabilistic models. The similarity coefficient used in both cases showed that the degree of similarity varied from 0.4716 (RB971551 x RB965586) to 0.9526 (RB936001 x SP89-1115), with a mean of 0.8536. This result demonstrates a high similarity between the 82 accessions and confirms Wright’s F statistic (0.125), which indicates moderate genetic variability. The less-similar crosses suggest that breeders seek a higher number of crosses using cultivar RB965586, highlighting the RB971551 x RB965586 and RB965586 x RB855511 crosses. The results demonstrate that crosses such as RB936001 x SP89-1115 and RB945954 x RB896342 should be avoided because of their high genetic similarity.


Introduction
Brazil is the world leader in the production of sugar and ethanol derived from sugarcane (Saccharum spp.) (FAO, 2015). The consumption of sugar and ethanol within the Brazilian market has risen 22.65 and 73.76%, respectively, in the last four years (CONAB, 2007;2011), thereby confirming the relevance of this crop among the most economically important crops in Brazil. The projected increases in the domestic and even global consumption of ethanol and sugar have influenced the creation of genetic improvement programs to determined new cultivars with desirable agronomic and industrial characteristics, such as improved yield and resistance to biotic and abiotic stresses, in an attempt to meet the potential demand (James, 2004).
The success of a sugarcane breeding program depends on selecting the richest and most genetically divergent parents. The search for such diversity among parents can be based on geographical origin, agronomic characteristics, pedigree and molecular marker data (Melchinger, 1999). As previously mentioned, the genetic diversity assessment based on morphological traits is limited and influenced by environmental effects; therefore, current breeding programs require techniques that measure genetic relationships without the influence of environmental factors and phenotypic properties (Singh, Mishra, Singh, Mishra, & Sharma, 2010). The analysis of molecular markers provides an effective measurement of the genetic relationships based on genetic characteristics.
The sugarcane breeding programs in Brazil, including the Inter-University Network for the Development of the Sugarcane Sector (RIDESA), are studying the genetic diversity of the accessions that are included in their germplasm banks using molecular techniques. After the divergence of a certain group of accessions is quantified, breeders prefer crosses between highly divergent accessions that have desirable characteristics so that elite progenies can be obtained in the field and genetic improvements of this crop can be furthered. Additionally, researchers agreed that field experiments are still needed to validate groupings of germplasm based on molecular markers data and more information is necessary to sugarcane breeding program of Brazil.
The aim of this study was to evaluate the genetic diversity and population structure of sugarcane accessions that were the parents of the RB05 Series of the Sugarcane Breeding Program of RIDESA and Universidade Federal do Paraná (UFPR) by using SSR markers.

Germplasm
Of the total number of sugarcane accessions, two are originated from the USA, two originated from India and one originated from Central America. The others are Brazilian accessions and are widely used in the Sugarcane Breeding Program (PMGCA), which is part of RIDESA. The accessions used in this study and their countries of origin are shown in Table 1.
Stem segments of each accession were grown in a greenhouse at the Center for Applied Agricultural Research (Núcleo de Pesquisa Aplicada à Agricultura -NUPAGRI, 23°26'15'' S and 51°53'70" W, 535 m) of the State University of Maringá, Maringá, Paraná State, Brazil. These accessions were grown in trays with substrate containing five sets of each cultivar under conditions of controlled temperature, humidity and irrigation until the leaves reached approximately 10 cm in length.

Extraction and quantification of genomic DNA
Of each accession and after the sets had sprouted, approximately 20 days passed before the leaves were harvested, placed in plastic Ziploc bags and properly identified. The leaves were then stored at -80°C. The genomic DNA was extracted from young and immature leaves free of diseases and any stains.
The DNA derived from leaf tissue was extracted using the protocol described by Aljanabi, Forget, and Dookun (1999) with minor modifications. Initially, 500 mg of the leaves of each cultivar were macerated in a crucible in liquid nitrogen until they turned to powder. After maceration, the material was transferred to 2.0 mL Eppendorf tubes that had 700 µL cetyltrimethyl ammonium bromide (CTAB) buffer (composition: 2% (w/v) CTAB, 20 mM ethylene ediamine tetra acetic acid (EDTA), 1.4 M NaCl, 100 mM Tris-HCl (pH 8.0) and 2% (v/v) β-mercaptoethanol).
The quantification of the genomic DNA was evaluated by using a Qubit™ fluorometer according to the following procedure: 10 µL of each DNA sample was diluted in 190 µL of the work sample containing 199 µL DNA Britton-Robinson (BR) buffer and 1 µL fluorophyte (covered with foil). Before the samples were read, the fluorometer was calibrated with two standard samples consisting of 190 µL of the work sample and 10 µL of each standard sample.

SSR markers
For this study, thirty-six SSR-markers were used (Table 2), and they were selected based on a high degree of polymorphism and number of alleles discriminated in previous studies (Cordeiro, Taylor, & Henry, 2000;Pan, 2006;Oliveira et al., 2009;Parida et al., 2009). The primers were synthesized by Invitrogen, USA.

Estimation of genetic diversity and population structure
The data obtained by LabIMAGE were evaluated with PowerMarker software version 3.25 (Liu & Muse, 2005). The genetic parameter analyze were the allele frequency per locus, number of alleles per locus, mean alleles per locus, mean of the higher allelic frequencies and polymorphism information content (PIC) proposed by Anderson, Churchill, Autrique, Tanksley, and Sorrells (1993) as the following equation: where: Pij is the frequency of the j-th allele for the i-th locus.
The genetic similarity was calculated among all of the pairs of possible genotypes using the Jaccard coefficient (Jaccard, 1901). Similarity was calculated as the number of bands common to the pair of genotypes divided by the total number of bands marked for that same pair of genotypes. The Jaccard coefficient has the advantage of disregarding the shared absence of bands in the pair comparison, thereby reducing the risk of overestimating the similarity (Clifford & Stephenson, 1975). The clustering technique used in this study was the Unweighted Pair Group Method with Arithmetic Mean (UPGMA). A dendrogram was constructed using FREETREE (Hampl, Pavlícek, & Flegr, 2001) with 5,000 bootstrap iterations, and it was subsequently viewed on the TreeView software (Page, 1996).
The cophenetic correlation coefficient (CCC), which is based on Rohlf and Fisher (1968), and distortion between the similarity and cophenetic matrix (EST and DIS, respectively), which was proposed by Kruskal (Kruskal, 1964) were also obtained using the NTSYSpc software version 2.1 (Rohlf, 1998).
Population structure analysis was estimated under the Markov Chain Monte Carlo (MCMC) algorithm with the Bayesian clustering method implemented in Structure software (Pritchard, Stephens, & Donnelly, 2000). For infer the number of groups was implemented ten independent runs of MCMC sampling for numbers of groups (K parameter) varying from 3 to 27, without prior population information, admixture model and non-correlated allele frequencies between populations. The MCMC was implemented with burn-in of 10,000 and run length periods of 100,000 iterations. The number of K-groups was then estimated using both the mean likelihood L(K) over 10 runs for each K and the ΔK criterion proposed by Evanno, Regnaut, and Goudet (2005). This analysis was conducted in the Structure Harvester software (Earl & VonHoldt, 2012).
According to the criteria proposed by Evanno et al. (2005), the 82 accessions of sugarcane were divided into seventeen different groups (clusters) (Figure 1). The accessions that compose each group, number of accessions in each group and relative quantity of accessions in the groups is showed in Table 1. The AMOVA showed that most of the molecular variation was found within the groups rather than between the groups. Wright's F value (0.125) suggests the existence of moderate genetic variability (Wright, 1951) because genetic variability values among populations or their progeny from 0.0 to 0.05; 0.05 to 0.15; 0.15 to 0.25; and over 0.25 indicate low, moderate, high and very high genetic divergences, respectively (Table 2).

Results and discussion
Molecular studies in sugarcane are relatively limited because of the complex genetic structure and long life cycle (Singh et al., 2008). In addition, as sugarcane is an interspecific hybrid with a high degree of polyploidy in its genome, the detection of homozygote and heterozygote accessions is difficult because the markers can locate multiple alleles in a single locus. However, considerable advances had been made by using microsatellite markers, which has been improved resource for use in managing sugarcane germplasm; trait mapping and marker assisted breeding strategies (Singh et al., 2010). At the present study the thirty-six SSR markers were polymorphic and resulted in 319 alleles that were used in the statistical analyses and the number of alleles detected ranged from 2 to 19 per locus. The number of alleles varied from 2 (mSSCIR44, UGSM29, SEGMS47, and SEGMS1069 loci) to 19 (ESTB130 locus), with a mean of 8.86 among the markers and the means of the highest allele frequencies ranged from 0.1220 (ESTB60 locus) to 0.9878 (UGSM29 locus) (Table 3). These results were similar with finding that has been reported for other RB (República Brasil) sugarcane varieties (Oliveira et al., 2009;Creste et al., 2010). Specifically, Creste et al. (2010), who used 10 SSR primers obtained between 5 to 15 loci, with a mean of 10.30 per primer. Chen et al. (2010) using 20 SSR primers reported a total of 251 alleles with amplitude of 4 -17 loci. These results suggest the great potential of microsatellite loci to investigate genetic divergence in the sugarcane germplasm. Consequently, using a higher number of SSR primers, other sugarcane breeding programs have reported a correspondingly higher number of alleles at SSR loci (Singh et al., 2008, Singh, Singh, Singh, & Sharma, 2011Oliveira et al., 2009).
The PIC values for all SSR-markers indicated that most of the markers used had high discriminatory power and are useful for genetic diversity studies (varying from UGSM59 = 0.15 to ESTB60 = 0.93 with mean = 0.57) ( Table 3). These results are considered normal according to Pinto, Oliveira, Ulian, Garcia, and Souza (2004), who found a range of 0.28 to 0.90 with a mean of 0.66 by using 30 SSR-markers in 18 cultivars. Singh et al. (2008) evaluated the UGSM29 locus in sugarcane and found a PIC of 0.80 with 13 described alleles. The study conducted by Oliveira et al. (2009) also reported high PIC values (0.92) and a high number of alleles (19 in total) in sugarcane when was used the ESTB60 locus. Duarte Filho et al. (2010) found amplitude of 0.34 to 0.78 with a mean of 0.57 per locus by using 18 SSR primers, whereas Singh et al. (2008) using 168 SSR primers reported a PIC ranging from 0.25 to 0.84 with a mean value of 0.55 per locus. This large PIC variability demonstrated high and low magnitude values because some accessions have divergent geographical distributions, which has been reported by other authors (Cordeiro et al., 2000;Pan, 2006;Singh et al., 2008); in these studies, the following differences should be considered: the number of primers and accessions (which were higher and lower, respectively) as well as the defined markers used to detect the alleles and objectives. The PIC values for SSR markers do not exhibit constant values however merely serve as a reference for the relative ability of the marker to detect genetic variability (Singh et al., 2008). In fact, molecular markers have provided a more reliable differentiation of genotypes than phenotypic data and allows establish cultivars into distinct cluster of genotypes based on power marker discrimination and genetic distance. This is valid for the present study because while more than 90% of the cultivars belong from the same country of origin, the cultivars used in a particular region of the country may not be the same cultivars used in another region.
The CCC measures the degree of fit between the original similarity matrix (S) and the matrix obtained after using the chosen clustering technique (C), meaning the fit that produces the dendrogram. Because this ratio is equivalent to the Pearson correlation, its value (0.9524) and DIS value of only 1.42% indicate that there is a close fit between the two matrices (S and C). The EST is a representation of the sum of squares of standard residues by the chosen clustering technique, and it is used to estimate the accuracy of the fit of the graphical projection. This study estimates the projection of fit of the similarity matrix in the dendrogram, and the EST was found to be 11.9021%. This result indicates that according to Kruskal's classification (Kruskal, 1964), there is good precision in the fit of the graph. Seventeen groups were observed with resulting bootstraps superior to 50% (Threshold line), which demonstrate a high differentiation among the resulted clusters (Figure 1), and consequently the unrooted cladogram identified the distribution of accessions according to the dendrogram results ( Figure 2). The breeders can use this threshold to select the crosses that will compose the future RB Series. The bootstraps in the UPGMA analysis demonstrate to be consistent at the cluster separation and the higher numbers indicates a greater likelihood of the subjects being at that similarity distance (Davison & Hinkley, 1997).
Therefore, the distance between RB945954 and RB896342 accessions has an 88% chance of being the distance represented by the cladogram (Figure 2). Some values demonstrate that the probability of the distance between accessions showed low magnitude. An example is the distance between the RB945954 and L60-14 accessions, which has an 11% probability of being the distance shown by the dendrogram. This result can be explained by the restricted number of primers used to characterize the distance between these two accessions or a lack of a certain sequence of nucleotides in the genome of these two accessions, which did not allow the 36 primers to anneal (Koskinen, Hirvonen, Landry, & Primmer, 2004). A low probability can also be found between the groups which is shown in the accessions that make up Group 9 (RB92508 and SP70-1143) and Group 6 (represented by RB945956 accession); there is 33% of likelihood that the distances are represented by the dendrogram. This information is important because understanding and classifying different individuals in homogeneous groups based on their genetic relationship may help breeders to select parent and increase the efficiency in planning crosses for breeding programs.  Acta Scientiarum. Agronomy, v. 42, e45088, 2020 Compared to other crops, it is extremely difficult to make high quality sugarcane crosses due to the fact that sugarcane has irregular meiosis as well as unusual inheritance of quantitative traits and also difficult to hybridize due to tiny size of flower (Singh et al., 2008). The study of these 36 SSR-markers showed results that were consistent with results observed in the field by breeders; for example, breeders expected that certain crosses belonged to the same group based on a visual description of phenotypic traits. A further example is shown in the accessions RB966920 and RB855156, which are in the same group (Group 7). RB966920 is the daughter of RB855156 ( Figure 3A); therefore, it was expected that they would remain in the same group. Another example would be the case of RB915124 and RB896342, which are also within Group 7; these two cultivars have the same common mother, cultivar TUC71-7 ( Figure 3B). Breeders also expected to find cultivars RB855322 ( Figure 3C) and RB855127 ( Figure 3D) in the same group because both originated from the same cross (TUC71-7 x RB72454); in this case, cultivar TUC71-7 reappeared as the female parent of two additional cultivars within the same group. The analysis of SSRs data using the Jaccard similarity coefficient showed that the degree of similarity ranged from 0.4716 (RB965586 x RB971551) to 0.9526 (RB936001 x SP89-1115), with a mean of 0.8536 among the 82 accessions of this sugarcane population (Table 4). A low degree of similarity was observed between RB855511 x R965586 (0.4890), RB863129 x RB965586 (0.4893) and RB915124 x RB965586 (0.4982) ( Table 4). These crosses could be used to a maximum degree by breeders because they tend to generate descendants with little similarity; thus, they could most likely contribute to the next stages of the breeding program. The opposite result was observed with the crosses RB936001 x SP89-1115; RB945954 x RB896342; and RB896342 x RB956911 (Table 4), which showed high similarity (0.9526, 0.9505, and 0.9505, respectively). These crosses should be highly avoided by breeders because they would tend to produce progeny with low variability that may not contribute to significant gains depending on the desired characteristics. The cultivar RB965586 appeared in all of the crosses suggested as the least similar (Table 4). Other crosses that had a low similarity are consistent with the opinions of RIDESA breeders (Table 4), such as the cross that uses cultivar RB855511, which has rapid vegetative development and no pilosity, is resistant to sugarcane smut (Sporisorium scitamineum) and leaf scald (Xanthomonas albilineans Downson) and tolerant to sugarcane rust (Puccinia melanocephala H. & P. Syd) and sugarcane mosaic virus (SCMV).
Cultivar RB863129 is also highlighted in the next crosses because it presents high agricultural productivity, rare tipping, early maturation and a long period of industrial use (RIDESA, 2010).

Conclusion
The use of thirty-six SSR-markers showed that among the 82 sugarcane accessions there was a high degree of genetic similarity, with a mean of 0.8536. The population of the 82 accessions that composed the RB05 Series of PMGCA/RIDESA/UFPR was divided into 17 distinct groups, and this population was characterized by a moderate degree of genetic similarity (highly significant Wright's F value, 0.125**). The results showed that RB971551 x RB965586; RB965586 x RB855511; RB965586 x RB863129; RB965586 x SP83-2847; RB965586 x RB915124; RB965586 x RB931604; RB965586 x RB945956; RB72454 x RB965586; RB965586 x SP85-3877; and RB965586 x SP70-1143 crosses should be exploited by sugarcane breeders to compose future series of PMGCA/RIDESA/UFPR.