Semivariogram models for rice stem bug population densities estimated by ordinary kriging

Tibraca limbativentris is considered one of the main species of insect pests in irrigated rice. This species can be found in plants in the vegetative and reproductive stages. This study aimed to select semivariogram models to estimate rice stem bug population densities by ordinary kriging. Two fields were used to survey the T. limbativentris population in Oryza sativa. A grid of 30 x 30 m was drawn, which generated 143 and 385 sample units for the first and second fields, respectively. Seven evaluations of two hundred plants per sampling unit were performed during cultivation. From the insect counts, the results were input into circular, spherical, pentaspherical, exponential, Gaussian, rational quadratic, cardinal sine, K-Bessel, J-Bessel, and stable semivariogram models via ordinary kriging interpolation and the best model was selected via cross-validation. Each assessment had a particular spatial structure and semivariogram model that best fit the experimental data.


Introduction
Tibraca limbativentris (Hemiptera: Pentatomidae) is considered one of the main species of insect pests in irrigated rice (Pazini, Botta, & Silva, 2012;Rampelotti et al., 2008;Silva, Lima, & Oliveira, 2010; Sociedade Sul-Brasileira de Arroz Irrigado [SOSBAI], 2014). This species can be found in vegetative and reproductive stages directly affecting yield components and causing symptoms of dead heart when they attack the stems, causing the formation of white panicles or partial spikelet sterility, which is the most influential component in income reduction in rice grains yield (Costa & Link, 1992;Souza et al., 2009).
According to Lasmar, Zanetti, Santos, and Fernandes (2012), the spatial distribution insects can vary with time, which interferes with pest management measures. The insect populations in growing areas can be estimated by interpolation procedures, which can generate continuous surfaces through spot-sampling units (Webster & Oliver, 2007). Ordinary kriging is one such interpolation method, as reported by Bottega, Queiroz, Pinto, and Souza (2013) and Silva et al. (2010).
Kriging uses the spatial dependence between neighboring samples, expressed in a semivariogram, to estimate values at any position within the experimental area with no tendency and minimum variance (Webster & Oliver, 2007;Coelho, Souza, Uribe-Opazo, & Pinheiro Neto, 2009;Silva et al., 2010;Souza, Lima, Xavier, & Rocha, 2010;Dinardo-Miranda, Fracasso, & Perecin, 2011). The semivariogram is the central part of geostatistics, capable of qualitatively and quantitatively describing the spatial dependence structure and is the key point in the determination of the interpolator. According to Webster and Oliver (2007), selecting a model that adequately represents the semivariances is highly desirable in the kriging process and influences the prediction of unknown values.
According to Gundogdu and Guney (2007) and Pasini, Lúcio, and Cargnelutti Filho (2014), every dataset has a different spatial structure; therefore, it is necessary to identify the semivariogram model that best fits the data, providing reliable results and with reduced error estimates.
For this purpose, Gundogdu and Guney (2007) studied groundwater levels and tested circular, spherical, pentaspherical, exponential, Gaussian, rational quadratic, cardinal sine, K-Bessel, J-Bessel and stable models and achieved the best with the rational quadratic model. Farias et al. (2008) studied the spatial distribution of Spodoptera frugiperda, and the best fit was obtained with the spherical model. Lasmar, Zanetti, Santos, and Fernandes (2012) determined the spatial distribution of leafcutter ants in eucalyptus plantations following the format of spherical, exponential and Gaussian semivariograms, achieving the best fit with the exponential model. Using geostatistics to describe the distribution of T. limbativentris, Pazini et al. (2015) achieved the best results with Gaussian and exponential models.
Thus, the study aimed to select semivariogram models to estimate the population density of the rice stem bug by ordinary kriging.

Material and methods
The study was carried out in Santa Maria, Rio Grande do Sul State, Brazil (UTM, E 785672 m, N 6720053 m, 21 J), subdivided into two fields of 4.91 and 14.1 ha. According to Köppen climate classification, the local climate is a Cfa climate: humid subtropical without dry seasons and hot summers (Heldwein, Buriol, & Streck, 2009). There was application of pesticides during the research period.
For each crop, a grid of 30 x 30 m was defined, which led to 143 sampling units for Field 1 (F1) and 385 sampling units for Field 2 (F2). The fields were evaluated for each 1 m², which had 200 rice plants. There was a direct count of T. limbativentris individuals in each plant, using the number of insects per m² (200 plants) for data analysis.
After cultivation, there were seven assessments in both fields. The first assessment (E1) was implemented in the V3 growth stage, the second assessment (E2) was in the V6 growth stage, the third evaluation (E3) was in the V9 growth stage, the fourth assessment (E4) was in the R0 stage, the fifth assessment (E5) was in the R4 stage, the sixth assessment (E6) was in the R6 stage and the seventh evaluation (E7) was in the R9 stage (Counce, Keisling, & Mitchel, 2000).
Then, the data were geostatistically analyzed and were plotted as a box plot to verify the existence of spatial dependence and, if so, quantify the degree of attributes under study, departing from the adjustment of the models to the isotropic experimental semivariograms estimated by the expression: , which is the semivariance; N(h) is the number of measured pairs, and the Z(x i ) and Z(x i +h) values are separated by a vector h (Webster & Oliver, 2007). Eleven semivariogram theoretical models were adjusted: circular, spherical, tetrasherical, pentaspherical, exponential, Gaussian, rational quadratic, hole effect, K-Bessel, Bessel, stable; the models were estimated according to the methodology proposed by Johnson, Ver Hoef, Krivoruchko, and Lucas (2001) and utilized by Pasini et al. (2014). Once the presence of spatial dependence was confirmed, inferences were performed by ordinary kriging (OK), following the method of Johnston et al. (2001), and values at locations that were not measured were estimate.
To verify the existence of spatial dependence, the spatial dependence index (SDI) was applied, which is a ratio representing the percentage of data variability explained by spatial dependence. The SDI is estimated with the expression . The spatial dependence can be classified as strong (SDI > 75%), medium (25 < SDI ≤ 75%), and low (SDI ≤ 25%).
For the selection of the semivariogram model, the Pasini, Lúcio, and Cargnelutti Filho (2014) crossvalidation technique was used. According to Webster and Oliver (2007), cross-validation allows the comparison of the impact of interpolators among the estimated values, where the model with more accurate predictions is chosen.
The indicators utilized by cross-validation were based on the methodology of Pasini et al. (2014). As a first indicator of cross-validation, linear regression was used, where the estimated values (dependent variable) were regressed with the sampled values (independent variable). From the expression, the intersect "a" was obtained, the angular coefficient "b" and the coefficient of determination "R 2 ". The best adjustment for the relation sampled and estimated values are obtained when the estimation of "a" approaches zero and "b" and "R 2 " approach 1.
The following metrics were used as indicators: the mean prediction errors (Ē), the standard deviation of the prediction errors (SD), the variation coefficient (VC), and the mean absolute error (ĒĀ). The closer these values are to zero, the better the model. In addition, the root-mean-square prediction error (RMS) and the root-mean-square standardized prediction errors (RMSS) were calculated. The best adjusted model is indicated when Ē, SD, VC, ĒĀ, and RMS are close to zero and RMSS is close to 1.
From the estimated indicators, cross-validation grades, which range from 1 to 10, were assigned according to the selected criterion of each indicator: for b, R 2 and RMSS, a value closer to 1 was assigned a Acta Scientiarum. Agronomy, v. 43, e48310, 2021 grade of 10, and the value farthest from 1 was assigned a grade of 1. For Ē, SD, VC, ĒĀ, and RMS, a value close to zero or equal to zero was assigned a grade of 10, and the value the farthest from zero was assigned a grade of 1. After grading, the sum of the grades within each model was adjusted, and the situation was evaluated adopting the criterion of choice of the model with the highest sum of the grades.

Results and discussion
While monitoring 13,806 adults, T. limbativentris was found to correspond to an average adult of 3.61 m -2 field -1 evaluation -1 . In F2, the highest number of adults was recorded, 10,454; however, the average number of adults per sample was 3.87 m -2 evaluation -2 adult -1 , which was greater than the average value found in the crop, which was in F1 (3.35 m -2 evaluation -2 adult -1 ). The observed data distribution is presented in a box plot ( Figure 1). This behavior of the data distribution is linked to the spread of T. limbativentris in tilled fields and its concentration in areas near the borders of crops, mainly in larger fields. According to Yamamoto and Landim (2013), when the distribution is positively skewed, it is necessity to transform the data during processing to avoid the influence of few high outlier on the estimates in areas characterized by low values. However, for normally distributed or symmetrical data, there is no need for data transformation. Given the above consideration, data transformation was needed.
Tables 1 and 2 present the estimates of cross-validation from the OK and their significance. From the criterion of choice, 42 semivariogram models were selected in which most of these models with a greater sum did not achieve the highest score for all indicators, revealing a discrepancy between the estimated values, which underscores the importance of using a larger number indicators for decision making (Pasini et al., 2014). Thus, there is the possibility of a better fit of the theoretical models to the experimental semivariogram, a better representation of spatial variability and estimates with minor errors (Webster & Oliver, 2007).
The range is an important parameter for semivariogram interpretation, indicating the distance for which the sample points are correlated. The values obtained were 36.5 m and 487.3 m, indicating that the sampling grid used was adequate and sufficient to express the spatial variability of T. limbativentris.
As Webster and Oliver (2007), the range is the maximum distance of spatial autocorrelation, representing the points located in an area whose radius is the scope, are more similar to each other than those separated by greater distances, representing the maximum distance of spatial dependence. According to Yamomoto and Landin (2013), the range of values can influence the quality of the estimates, since it determines the number of values used in the interpolation, so estimates with interpolation by ordinary kriging using larger ranges of values tend to be more reliable.

Conclusion
Each dataset has a different spatial structure and is necessary to define a model of semivariogram with the best fit to the experimental.
The utilization of many semivariograms is recommended, and the model that adequately represents the semivariances is highly desirable in the kriging process and influences in predicting unknown values and significant reduction in estimative errors.