A new local stochastic method for predicting data with spatial heterogeneity

Spatial data (e.g., phytopathogenic data) do not always meet assumptions such as stationarity, isotropy and Gaussian distribution, thereby requiring complex spatial methods and models. Some deterministic assumption-free methods such as the inverse distance weighting can also be applied to predict spatial data, but their output is limited for graphical solutions (mapping). We adapted a computer-based prediction method called Circular Variable Radius Moving Window (CVRMW) that is based on two others: moving window kriging (MWK) and inverse squared-distance weighting (ISDW). The algorithm is developed to meet an objective function that minimizes the index of variation of the spatial observations inside the moving window. A code in R language is presented and thoroughly described. The outputs include the range of the spatial dependence as the radius calculated at every target location and the standard error of the predicted values, mapped to provide a useful tool for spatial exploratory analysis. The method does not make any assumptions about the spatial process, and it is an alternative for dealing with spatial heterogeneity.


Introduction
The knowledge about the within-field patterns of the spatial distributions of crop diseases, pests and their natural enemies is critical to planning control tactics, developing efficient sampling plans and predicting damage (Martins et al., 2018). The spatial distributions of many species of plant pathogens still remain understudied. Moreover, the spatial variation of the soil properties within fields offers various conditions for the diverse development of crops and many soil-borne pests (Patzold et al., 2008;Hbirkou et al., 2011). For instance, heterogeneous variables such as the occurrence of cyst nematodes require additional samples at shorter distances to ensure that the spatial variation is adequately addressed (Hbirkou et al., 2011). According to Freitas et al. (2017), soil chemical variables can also affect the spatial distribution of nematodes.
Studies of the spatial distributions of agricultural pests have been carried out using aggregation indices (Holguin, Mueller, Khalilian, & Agudelo, 2015) such as the Morisita index, the variance/average ratio, Taylor's Power law (Taylor et al., 2017) and others. However, these statistics are nonspatial, that is, they do not consider the spatial dependence of the data. Geostatistical methods based on variogram analyses have also been widely recommended and applied (Pimentel, Lopes, Mexia, & Mumford, 2017;Pulakkatu-Thodi, Reisig, Greene, Reay-Jones, & Toews, 2014;Rijal, Brewster, & Bergh, 2014;Masetti, Butturini, Lanzoni, De luigi, & Burgio, 2015;Martins et al., 2017;Yawson et al., 2017;Freitas et al., 2017) and should be preferred. In these methods, a common assumption is stationarity of first and/or second degree, when the mean and variance are constant over the area. Of course, each of the parameters may be independently stationary (Terrien, Germain, Marque, & Karlsson, 2013;Ching & Lin, 2014). Nonetheless, this assumption is rather an exception since the values of a spatial process over the landscape are conditioned by different factors, which causes them to lose their intrinsic nature (Siqueira et al., 2015).
Some spatial data such as the population density of phytopathogenic nematodes in a field have a local character. In this case of nonstationarity, a heterogeneous variogram can be obtained using a moving window centered on the location to be predicted, which is an approach known as Moving Window Kriging (MWK), as described by Hass (1990). Applications of MWK in soil data were carried out by Walter, McBratney, Douaoui, and Minasny (2001) and Cattle, McBratney, and Minasny (2002). A local variogram is determined at every target location, in contrast with the conventional forms of kriging where there is only one global variogram (isotropy) produced for the entire dataset (Demšar & Harris, 2011).
The spatial and temporal heterogeneity of crops can also make it difficult to map pest or weed infestations. According to Baillod, Tscharntke, Clough, and Batáry (2017), the spatial heterogeneity of crops can be described both by their composition (e.g., crop diversity) and spatial arrangement (e.g., average field size). Temporal crop heterogeneity describes the changes in crop patterns due to the annual succession of crops. For instance, Bertrand, Burel, and Baudry (2016) found that some beetle species with high dispersal power such as Trechus quadristriatus were more abundant in landscapes with high spatial heterogeneity, whereas the abundance of less mobile species such as Poecilus cupreus was only positively influenced by the temporal crop dynamics.
Infestation maps obtained by kriging are often based on semivariance models for continuous variables that follow a known distribution. Ordinary kriging implicitly assumes a Gaussian-like distribution. It is not rare to find works which data are firstly transformed to approximate normality or to remove variance trends. According to Yamamoto and Landin (2013), a semivariogram is quite sensitive to the statistical distribution. Models that incorporate information from discrete variables (Diggle, Tawn, & Moyeed, 1998) are complex and have little appeal. On the other hand, simpler models are, in general, limited by spatial assumptions. Moreover, some variograms can show a common feature known as a "pure nugget effect", suggesting to avoid kriging.
Deterministic interpolation approaches, such as the inverse squared-distance weighting method (ISDW), can also be used to build infestation maps. This method does not make any assumptions about the data and is not model-based. However, unlike the geostatistical approach, the essential parameters in crop pest studies such as the degree of spatial dependence and the practical range are not estimated, thus making it impracticable to carry out studies on sampling for pest monitoring purposes or studies on within-field insect dispersion/mobility.
In this paper we present a heuristic method based on the MWK approach, called the Circular Variable Radius Moving Window method (CVRMW), to perform interpolation/prediction of spatial phytopathogenic data as an alternative to kriging and other interpolation methods that can deal with spatial heterogeneity. A code in R language is also presented.

Material and methods
The dataset used to illustrate the method consists of 20 spatial observations of nematode density (individuals per dm 3 of soil). A 50 × 50 prediction grid was used for the predictions. Three methods were applied: ordinary kriging, ISDW and CVRMW. In order to evaluate and compare the prediction accuracies obtained with ISDW and CVRMW, we have calculated the percentual absolute mean error (PAME) through the difference between the observed and the predicted values. All the statistical analyses and computations were performed in R (http://www.R-project.org/). Additionally, a video illustrating the CVRMW method is available from: https://youtu.be/mmg5jRti8gs.

Calculation
The proposed method is basically a mixture of MWK and ISDW. In the latter, spatial predictions are performed through the weighted mean of all spatial observations via the inverse of the squared distances to the predicted point. In addition, the CVRMW method consists of predicting spatial points using only the data within a circular window. This is done through the mean weighted by the inverse of the squareddistances between a prediction location and the spatial points within a radius h that varies according to the location and is determined to meet the objective function of minimizing the index of variation (the ratio between the coefficient of variation and the square root of the sample size). The window is then moved to the next prediction location, and the procedure is repeated for all prediction points. Thus, both of the base methods of CVRMW meet Tobler's first law.
A mathematical formulation is given as follows. Take Z(s i ) as the spatial observation at the coordinates s i (i = 1, 2, …, N) and as the k-th point to be predicted. Then, where: is the number of obsevations limited by the radius and is the squared-distance between and .

Algorithm
The first step of the algorithm (Figure 1) for implementing CVRMW in R is to define the inputs: i) the sampling data, ii) the geographical coordinates of the sampling points, and iii) the prediction grid. In Step 2, the loop process is properly given by defining an index variable 'i', a sequence ending at 'N', and the number of prediction points. In Step 2.1, the Euclidian distances between every sampling point (coords) and every point of the prediction grid are calculated and stored in the object 'd', which is a vector of dimension n (the number of sampling points). The prediction weights, w, are defined as the inverse squareddistances, and a restriction given in case 'd' is zero. In this case, w is assumed to be the maximum calculated value. In Step 2.2, an auxiliary function is written in order to calculate the index of variation. This makes the code faster and easier to debug. In Step 2.3, a sequence of length 200 is created to determine the radius to be used for predicting the i-th spatial point in 'grid'. The index of variation of the spatial observations within each radius is calculated and stored in 'vari'. The radius value is a value in 'seq.d' corresponding to the minimum value of 'vari'. In Step 2.4, those sampling points within the i-th radius or less are computed and stored in the object 'id', which is a list of characters indicating the sampling points. The i-th predicted value is the weighted mean of the spatial observations in 'id' considering their weights stored in 'w'. The prediction is stored as a column of the data frame 'pred'. The number of sampling points used for the i-th prediction and the standard error are stored as columns in 'pred' as well. The loop is broken when the last location in 'grid' is taken. In Step 3, if the user passes the same input objects to 'coords' and 'grid', the percentual absolute mean error (PAME) is calculated and printed (a side effect).

Features
The function accepts a numeric vector containing the sampling data as the input for the argument 'data'. The argument 'coords' must be a numeric matrix or data.frame containing the geographical coordinates of the sampling points. The argument 'grid' must be a numeric matrix or data.frame with the same number of columns as 'coord'.
The output is an object of the 'data.frame' class with the same number of rows as 'grid'. The first 2 or 3 columns (in general) correspond to the input grid. Then, the following columns are used: 'pred' -the spatial predictions, 'SE' -the standard error of the predicted value, 'radius' -the length of the radius used for each prediction, and 'n' -the number of sampling points used for each prediction.

Results and discussion
The graphical solution of the spatial interpolation is given in Figure 2. With the three methods, an East-West gradient is observed, although the level of detail is lower and a smoother solution occurs with kriging. Using CVRMW, the predicted values do not represent unbiased predictions as in kriging methods. However, the PAME in map (C) for ISDW is 35.08% compared with 30.04% in map (D) for CVRMW. In addition, an analysis of the spatial dependence can be done through the frequency distribution of the calculated radii ( Figure 3A). In this case, there is a pattern of spatial dependence within 0.0004 degrees. This is not as easily observed using a classical omnidirectional variogram ( Figure 3B), where a wave model was fit. Nonetheless, we found the estimate ϕ = 0.0004 for the range parameter to be quite close to the upper limit of the most frequent radius interval (0.0005 degrees) observed with CVRMW. The mean of the prediction standard errors using the CVRMW method was 365 (±150) nematodes dm -3 compared with 1,070 (±24) nematodes dm -3 using ordinary kriging. Yawson et al. (2017) did not find substantial differences in mapping the spatial distribution of the nematode population in organic and conventional fields using ISDW and ordinary kriging. Souza, Bazzi, Khosla, Uribe-Opazo, and Reich (2016) observed that ISDW performed slightly better at predicting crop yields than ordinary kriging. Rhodes, Liburd, and Grunwald (2011) obtained flower thrip infestation maps with similar accuracy using inverse distance weighting and ordinary kriging. On the other hand, Pimentel et al. (2017) stated that inverse distance weighting and ordinary kriging did not perform so well at predicting Ceratitis capitata infestations due to the spatial heterogeneity and topographical conditions. These authors recommended using a geographic weighted regression (GWR) instead. Spatial dependence is commonly studied through the analysis of variograms, such as that presented in Figure 3B. Nevertheless, using CVRMW allows one to actually see "where" the spatial dependence is larger in the field by building a map of the calculated radii.
Since predictions are performed using a loop, large datasets may increase the processing time. The timings for several analyses can be seen in Table 1. The larger the size of the prediction grid is, the more time used by the process. Nevertheless, note that the number of spatial observations (sampling points) only slightly affects the processing time.

Conclusion
An alternative computer-based heuristic method to perform spatial predictions was developed and implemented in R. The method consists of a local interpolator with stochastic features. It allows one to build effective detailed maps and to estimate the spatial dependence without any assumptions on the spatial process.