Data interpolation in the definition of management zones

Precision agriculture (PA) comprises the use of management zones (MZs). Sample data are usually interpolated to define MZs. Current research checks whether there is a need for data interpolation by evaluating the quality of MZs by five indices – variance reduction (VR), fuzzy performance index (FPI), modified partition entropy index (MPE), Kappa index and the cluster validation index (CVI), of which the latter has been focused in current assay. Soil texture, soil resistance to penetration, elevation and slope in an experimental area of 15.5 ha were employed as attributes to the generation of MZ, correlating them with data of soybean yield from 2011-2012 and 2012-2013 harvests. Data interpolation prior to MZs generation is important to achieve MZs as a smoother contour and for a greater reduction in data variance. The Kriging interpolator had the best performance. CVI index proved to be efficient in choosing MZs, with a less subjective decision on the best interpolator or number of MZs.


Introduction
The modernization of agriculture through the technology of precision agriculture (PA) has resulted in the emergence of machines equipped with sensors and technological equipments that make agricultural activity increasingly competitive.Despite the advantages of PA, the cost of obtaining soil samples to characterize field variability restricts this technology to big producers (Yan, Zhou, Feng, & Hong-Yi, 2007).
So that the relationship between economic aspects and the benefits obtained with PA could be adjusted, modifications on PA concepts are conducted to identify regions within the field with similar characteristics.The regions are called management zones (MZs).
The definition of MZ makes PA techniques easier to apply since the same systems used in conventional agriculture may be employed in crop management.According to Yan et al., (2007), information on MZ reduces the number of soil analyses necessary to create application maps in farming operations and makes PA technology more attractive.
In the case of the software used to run these algorithms, the highlights are FuzMe (Minasny & Mcbratney, 2002), used by researchers Molin and Castro (2008); the Management Zone Analyst -MENZA (Fridgen et al., 2004); and Software for Definition of Management Zones -SDMZ, developed and used by Bazzi et al. (2013).The clustering algorithms have as input only data rates of the attributes to be clustered, which need datasets with the same number of sample elements.There is often the need to perform data interpolation.Coupled to this demand, data interpolation also becomes necessary when there is a small number of sample points and it is necessary to generate MZs, obtaining as a result maps with precise definitions of sub-regions in the area (Bazzi et al., 2013).If data interpolation must be used, the most used interpolators are inverse distance (ID), inverse square distance (ISD) or kriging (KRI).Kriging is a robust interpolator and has usually provided better results with regard to ID and ISD, though it requires in-depth knowledge in geo-statistics (Mazzini & Schettini, 2009;Guastaferro et al., 2010).
In this context, current assay evaluates whether data interpolation used in the generation of MZs is justified.The quality of MZs was also evaluated by five indices, of which one is the cluster validation focused upon in current analysis.

Material and methods
Sampling points were determined and an experimental area of approximately 15.5 ha in Céu Azul, Paraná State, Brazil, was delineated by a topographic GPS Trimble Geo Explorer XT 2005 and software PathFinder.The central geographical location lies at 25º06'32''S and 53º49'55''W, at an average altitude of 752 m.The area has been cultivated under a tilling system for more than 10 years, with a crop sequence of soybean, wheat, oats and corn, for commercial purposes.
Stable attributes were used in the definition of MZ, with the exclusion of chemical soil attributes and thus satisfying the general recommendations by the literature (Doerge, 2000).A sampling grid was used, with irregular sampling points, located on an imaginary line among the intermediate contour lines.Forty sampling points (2.58 points ha -1 ) were defined in which data on altitude, slope, texture (clay, silt and sand), soil penetration resistance (SPR), density, macro-and micro-porosity and total porosity were collected.
Soil sampling was carried out with an auger, at a depth of 0-0.2 m; eight sub-samples were collected for each sampling point within a radius of 3 m from the point determined by the sampling grid (adapted from Wollenhaupt, Wolkonski, & Clayton, 1994).
SPR was determined by using a Falker PenetroLOG electronic penetration meter, with which four measurements around the point defined in the sampling grid (a maximum distance of 3 m) were taken and, subsequently, the average measurements for the representation of the point, at depths 0-0.1, 0.1-0.2 and 0.3-0.2 m.Non-deformed soil samples were collected by a volumetric ring to determine water content, soil density, macroporosity, micro-porosity and total porosity, following methodology by Embrapa (1997).Altitude was obtained by electronic total station Topcon GPT-7505.
Soybean yield during agricultural years 2011-2012 and 2012-2013 was measured by harvest monitor AFS PRO 600 locked to combined CASE IH 2388 and filtered data, suggested by Michelan, Souza, and Uribe-Opazo (2007).
Moran's bivariate spatial autocorrelation statistics (Czaplewski & Reich, 1993) were used to assess the spatial correlation between the analyzed attributes and to establish the spatial correlation matrix, which checks which attributes influence the yield positively or negatively, and whether a sample is correlated spatially (spatial autocorrelation).The attributes used in the generation of MZ were selected by the variable selection method proposed by Bazzi et al. (2013).
For a greater detailing of the attributes under analysis, data were interpolated to a 5 x 5 m grid by using the following methods: nearest neighbor (NN), inverse distance (ID), inverse square distance (ISD) and ordinary kriging (KRI).Whereas ID, ISD and KRI have been chosen because they are the most used methods in PA (Robinson & Metternicht, 2006;Ortega & Santibáñez, 2007 & Herreno, 2010), the NN method is a deterministic interpolation method in which the estimated value is always equal to its nearest sample without considering any other.In fact, NN is exact and preserves the sampled rates.ArcGIS Software was used for map plotting and geostatistical analysis, with the classic estimator of Matheron.Theoretical models (spherical, exponential, Gaussian) have been adjusted to the semivariogram, and the best model was selected, taking into account the cross-validation statistics.
Through interpolated values with regard to selected soil attributes, MZs were generated by Fuzzy C-Means clustering method.During MZ evaluation, the variance reduction -VR (Ping & Dobermann, 2003;Xiang, Yu-Chun, Zhong-Qiang, & Chun-Jiang, 2007) was calculated for soybean yield (Equation 1), expecting that the sum of the variances of MZs data would be lower than the total variance.The average comparison test (ANOVA) among MZs was also undertaken to identify whether they had significant average difference for soybean yield, assuming that there is no spatial dependence inside each MZ.

(
) where, c is the number of management zones; i W is the proportion of area in each management zone; i ZM V is the variance of data in each management zone; Field V is the sample variance of data to field.Further, the fuzziness performance index (FPI; Equation 2) was employed for the evaluation of MZs, which determined the separation degree (or confusion) among the fuzzy c-clusters of a set of data X and the modified partition entropy index (MPE; Equation 3) which estimated the disorganization established by a specified number of clusters.
where, c = number of clusters; n = number of observations; ij u = is the element ij of the fuzzy where, c = number of clusters; n = number of observations; ij u = is the element ij of the fuzzy membership matrix U .
In the case of FPI, the rates close to 0 indicate distinct classes, with a small sharing degree of members (data), whereas rates close to 1 indicate no distinct classes, with high degree of sharing of members between the classes (Fridgen et al., 2004).In the case of MPE, rates close to 1 indicate the predomination of disorganization, whereas rates close to 0 show a better organization (Boydell & Mcbratney, 2002).The best grouping (cluster) number of a set of data set is based on the minimum rate of FPI and MPE.So that the situation in which estimates point to different models would be avoided, the cluster validation index (CVI, Equation 4) has been proposed in current assay which, in the selection of j clusters, provides a rate closer to zero as FPI and MPE.Thus, in choosing between groupings, the best clustering is that with the lowest CVI.

Results and discussion
The descriptive analysis of data (Table 1) revealed that the highest coefficients of variation (CV) occurred with the 2012 (39%) and 2013 (20%) yields and slope (147%).Although soil compaction rates increased between 2011 and 2012, rates for SPR were between low and some limitations, following Canarache (1990).Total porosity averaged 0.45 cm -3 , of which 0.11 cm -3 corresponded to macro-porosity and 0.34 cm -3 to micro-porosity.Bulk density rates ranged between 1.24 and 1.47 g cm -3 , with mean 1.35 g cm -3 .Cavalcante, Alves, Souza, and Perreira (2011) reported very similar results.
The attributes for MZs definition were selected as from the spatial correlation matrix (Moran index) and by eliminating the redundant variables (Bazzi et al., 2013).This resulted in the attribute altitude for the 2012-2013 harvest, and in altitude and SPR 0-0.1 m for the 2012-2013 harvest, for which the geo-statistical analyses were prepared (Table 2).
Two, three and four MZs were generated by the Fuzzy C-Means clustering method, employing data interpolated by ID, ISD and KRI methods for a 5 x 5 m grid.In the case of 2011-2012 soybean harvest, MZs with the attribute altitude were generated (Figure 1).MZs with attributes altitude and RSP 0-0.10 m were generated within the same area for the 2012-2013 soybean harvest (Figure 2).The importance of interpolating the sampling data (Figures 1a and 2a) in MZs generation may be understood by interpolation viewing with NN (Figures 1b and 2b), which provides more deformed MZs than the other selected interpolation methods.In the evaluation of MZs for the two harvests (2011-2012 and 2012-2013), ANOVA registered a significant difference in soybean yield for all zones (2, 3 and 4), regardless of the interpolation method used (Tables 3 and 4).However, relative efficiency (RE) was greater for the first harvest, perhaps due to the fact that, in the first year, the bottom section of the area (areas corresponding to yield, ranging between 0.5 and 4.1 t ha -1 , Figure 3a,) had to be replanted and thus provided lower yields than the top area.Figures 3 and 4 demonstrate that there was a natural separation of data according to the replanting process.Consequently, VR (which is as high as the reduction of data variability or variance) ranged between 59 and 74% for the 2011-2012 harvest and from zero to 2% for the 2012-2013 one (Figures 5c and 5d).
FPI, MPE and CVI indexes (Figure 5c-h) showed a similar behavior, or rather, the decision for the best interpolation method was the same, regardless of the index used.Results show that interpolation by ID was the worst in the 2011-2012 harvest.Data on the two harvests did not reveal any gain by implementing the CVI index, due to the similar performances of FPI and MPE.However, specialized literature (Li, Shi, Wu, Li, & Li, 2008;Valente et al., 2012) reports instances in which the above failed to occur and that CVI would provide a less subjective decision on the best interpolation or on the number of management areas.
Analyses of CVI and VR for the two harvests demonstrated that kriging provided the best results.The best zones for 2011-2012 and 2012-2013 harvests were respectively two and four.It should be underscored that the methodology used to generate MZ for each harvest separately is not recommended and was only used to show the efficiency of MZ selection indexes.However, the MZs should be ideally generated by stable attributes selected according to average normalized yield during several years.
The influence of the type of interpolation in the area defined by each interpolation method for each MZ (Table 5) ranged between 0 and 30%.Greatest differences were reported between ISD and ID and ID and KRI for the division in 3 MZ.A maximum difference of 3% occurred in the case of the division into 2 MZ.For the division in 4 MZ, the greatest differences were found among MZs generated by ID and ISD (15%).
In an overall evaluation, ANOVA showed that kriging was the method that obtained more significant differences among MZs generated (Tables 3 and 4), whilst ID provided less significant differences among the divisions.
Comparisons ID x ISD, ID x KRI and ISD x KRI (Table 6) for 2, 3 and 4 MZs were employed to evaluate map concordance according to the type of interpolator.
The 2011-2012 soybean harvest demonstrates that the concordance between the maps decreased when there was an increase in MZs.Concordance was classified very strong only for 2 MZs; concordance ranged between average and very strong for division in 3 and 4 MZs.In the 2012-2013 harvest, there was little difference among the maps for the division into two MZs, whereas the maps showed very strong concordance for all comparisons.

Conclusion
Data interpolation prior to the generation of management zones (MZ) is highly important to obtain MZs with smooth contours and to have a greater reduction of data variance; The cluster validation index (CVI) proposed in current research was efficient in the choice of MZs and provided a less subjective decision on the best interpolator or on the number of MZs; The best interpolation method was kriging and justifies the selection of the most robust interpolator in MZs generation.

Table 1 .
Descriptive statistics of the soil's physical attributes in 2011 and 2012.

Table 2 .
Semivariogram parameters for the choice of the best model of selected attributes to generate MZs.
C 0 -Nugget Effect; C 1 -Contribution; IDE% -Space Dependency Index; VR -reducing measurement error; SER -standard deviation of reducing measurement error; ICE -Error comparison index; Underlined items indicate the best model.

Table 4 .
Descriptive statistics of soybean yield data in the 2012-2013 harvest, separated by MZ and cluster validation indices.

Table 5 .
Higher percentage of difference in area among ID, ISD and KRI interpolation methods as a function of the number of management zones (MZs).
# -Difference in area between the interpolation methods.

Table 6 .
Comparison between management zone (MZ) maps generated by different interpolators using the Kappa Index.