Estimating soybean yields with artificial neural networks

The complexity of the statistical models used to estimate the productivity of many crops, including soybeans, restricts the use of this practice, but an alternative is the use of artificial neural networks (ANNs). This study aimed to estimate soybean productivity based on growth habit, sowing density and agronomic characteristics using an ANN multilayer perceptron (MLP). Agronomic data from experiments conducted during the 2013/2014 soybean harvest in Anápolis, Goiás State, B razil, were used to conduct this study after being normalized to an ANN-compatible range. Then, several ANNs were trained to choose the best-performing one. After training the network, a performance analysis was conducted to select the ANN with a performance most appropriate for the problem, and the selected network had a 98% success rate with training data and a 72% data validation accuracy. The application of the MLP to the data used in the experiment shows that it is possible to estimate soybean productivity based on agronomic characteristics, growth habit and population density through AI.


Introduction
Soybeans are currently considered the main agricultural commodity of Brazil, which is the second largest producer of this oilseed.During the 2015/2016 harvest, approximately 95.4 million tons of this grain were produced (Conab, 2017), but there are productivity gaps among regions that are due to several factors during the development of the cultivar in the field.The productivity potential of soybean is genetically determined (Homrich, Wiebke-Strohm, Weber, & Bodanese-Zanettini, 2012), but how this potential can be achieved depends on the effect of limiting factors that act at different stages during the production cycle (Kron, Souza, & Ribeiro, 2008).
The factors that influence soybean productivity include row spacing and plant density (Akond et al., 2013;Souza, Teixeira, Reis, & Silva, 2016).Soybean productivity was successfully estimated in studies by Monteiro and Sentelhas (2014) using an agrometeorological model.The main agronomic characteristics that are influenced by the different behaviors of each cultivar include the number of branches produced per plant, number of pods and seeds per plant, number of internodes, insertion of the first pod, stem diameter, plant height and, obviously, grain production (Liu et al., 2010;Passos et al., 2011).
The use of agronomic traits in soybeans at the R6 stage or later makes it possible to estimate productivity without data from previous stages (Lee & Herbek, 2005).At these later stages, the grains have completely filled the cavity near the pod and are similar to the pods collected at harvest (Oliveira, Silva, Mielezrski, Lima, & Edvan, 2016).
This work aimed to evaluate the possibility of using an artificial neural network as a tool to evaluate the main agronomic traits of soybean cultivars with different growth habits and subjected to different sowing densities to obtain estimates of productivity.

Material and methods
The development of a multilayer perceptron (MLP) artificial neural network (ANN) requires supervised training to adjust the weights of the synapses.Thus, the MLP used in this study to estimate soybean productivity was created using data from an experiment conducted during the 2013/2014 harvest at an experimental site belonging to Emater/Goiás, Anápolis, Goiás State,Brazil (48°18'23'' W,16°19'44'' S).According to the Köppen classification, the climate is AW humid tropical, that is, characterized by a dry winter and a rainy summer.The soil of the area is classified as a Rhodic Hapludox (dystrophic Red Latosol).
The experiment employed a completely randomized, 3 x 3 factorial design with eight replications.The treatments consisted of three soybean cultivars with different growth habits and types (BRS Valiosa RR, BMX Potencia RR and NA 7337RR) and three plant densities (D1: 245,000 plants ha -1 , D2: 350,000 plants ha -1 and D3: 455,000 plants ha -1 ).
At harvest, the following agronomic characteristics (variables) were evaluated in ten plants from each plot: plant height, number of branches per plant, number of pods per plant, number of grains per pod, weight of 1,000 seeds (WTS) and grain yield (productivity).
To conduct ANN training, independent variables (cultivars with different growth habits and population density) and dependent variables such as the agronomic characteristics of plant height (PH), number of branches per plant (B), number of pods per plant (P), number of seeds per pod (S), weight of 1,000 seeds (WTS) and grain yield (Prod. kg ha -1 ) were selected.These variables were normalized to equalize the ANN input data (Leal, Miguel, Baio, Neves, & Leal, 2015) so that the initial weights of the variables were assumed to be equivalent at the beginning of the training, thus avoiding the difficulties posed by variables with different weights that can prevent the ANN from converging.
In addition to having different magnitudes, variables that are not numerical, such as growth habit variables and population density, should also be considered as categories.It is recommended that the same dummy treatment applied to the variables in multiple regression analysis be followed for these types of variables (Sharma, Sharma, & Kasana, 2007;Bohl, Diesteldorf, Salm, & Wilfling, 2016).
For the other input variables whose values are real numbers, a linear transformation was used.The maximum and minimum values of the variables used in this transformation are shown in Table 1.The values used as inputs and the expected results were normalized to values between minus one and one, but after the networks were trained and validated, the resulting value of the network was transformed back to its original quantity.To perform this transformation, Equation 1 was used, considering the minimum and maximum values for each variable (Table 1).
where: X = the value of the original quantity, scaledX = the transformed value, X max = the maximum possible value for the variable, X min = the minimum possible value for the variable, d1 = the value of the lower limit of the converted value (-1 in this study), and d2 = the value of the upper limit of the converted value (1 in this study).The input layer was set to begin the development of the multilayer perceptron (MLP).One neuron was used for each of the input variables (Table 2), and the output layer contained one neuron representing productivity.To conduct the training and the validation processes, a program was developed using the Levenberg-Marquardt training algorithm (Schiavo, Prinari, Gronski, & Serio, 2015) and the mean squared error (MSE) performance function, Equation 2, to enable various ANN architecture settings to be explored.
where: N = the number of data presented for the training, e = the difference between the expected value and the estimated value of the network, t = the value estimated by the network, and α = the expected value.After designing the program, a training was conducted with 20,000 MLP networks.One thousand networks were trained in each architecture by varying the number of neurons in the hidden layer between one and twenty, following the assumption that 2i+1 neurons in the hidden layer are needed to map any continuous function with i entries.
After completing the neural network training, a file was generated for each training that contained the training data (the parameters used in the training, the content of the training data, and the test, validation and performance sets).Another file was generated containing the consolidated data for all the trainings.
The resulting file contained one line for each trained network (20,000).The columns one to sixty-five were the values estimated by the network, and some columns were added (added columns filled by the formula) (Table 3).
To determine the network with the best performance, some networks were selected using the following criteria: general performance of the experiment, performance of the training set, performance of the validation set, performance of the test set, training R 2 , validation R 2 , test R 2 , and general R 2 .Pearson's correlations (training, validation and tests) followed a decreasing order because the higher the value of the Pearson's correlation, the closer the estimated value to the observed value.

Results and discussion
The descriptive statistics of the variables are presented for the test, training and validation of the MLP network (Table 4), and the low number of samples (65 samples) that were used as inputs for the training, validation and testing can be seen.Considering that MLPs learn from examples, this hindered the training and validation of the network.
The coefficient of variation for all the characteristics was equal to or greater than 10 %, which contributed to the training since the variation in the values allows for a better adjustment of the synapse connection weights, but this high coefficient of variation may also represent outliers in the data.However, the MLP networks coped with these values using cross-validation while avoiding the influence of the adjusted synapse weights, ensuring that the network did not model the noise in the samples.The 65 obtained samples were randomly divided into three subsets: training (42 samples, 65%), validation (16 samples, 25%) and test (7 samples, 10%).For each new training, a new draw was performed, and this division strategy made the training difficult because by performing the random division without considering the treatments used to obtain the samples, the sets were formed with poor sample representation.This resulted in multiple networks with high training performance but a low validation network, with 19 neurons in training 806, line 3, and an observed network performance training with an R 2 of 0.999 and a validation R 2 of 0.145 (Table 5).
The development of the program was essential for determining the architecture that could perform adequately because the scaling problem involves the adjusting the complexity of the neural model based on the complexity of the problem.It was possible to vary the complexity of the architecture using the program, thereby enabling performance evaluation in different architectures and configurations.
Another device was used to repeat the training a thousand times in every architecture.This approach, by which each new training performed a new draw of the training, validation and testing sets, as well as the initialization of the synapse weightings allowed the common problem of tending to get stuck in minimal places when training MLP networks to be overcome using a backpropagation algorithm (Zweiri, Seneviratne, & Althoefer, 2005).The importance of the repetitions was observed in the network with the best performance (R 2 = 0.987 in the training and 0.727 in the validation), which was determined after 963 repetitions in the architecture training with nine neurons in line two of the hidden layers (Table 5).
The strategy of drawing new training, validation and test sets during training led to problems regarding the convergence and generalization of the networks because the treatments were not considered.This resulted in the grouping of samples with little representation during the training, test and validation sets and proved that the network began to specialize in the training set.This is illustrated by the high R 2 of 0.999 obtained during the network training with 19 neurons after 806 repetitions (Figure 1), at which the training set line near time 48 fell dramatically, indicating that the network had memorized the training set.At the same time, the lines for the validation and test sets did not follow this patterns, showing low R 2 values (0.145 and 0.045, respectively).The network with two neurons and 212 repetitions (Figure 2) had the same problem with drawing the sets.The only difference was that the set with the best Pearson's correlation had a value of 99.55 %, but the validation set showed a correlation of only 13.45 %.
Various criteria were used to select the network that best managed to generalize the problem.Networks with fewer neurons in the hidden layer are more generalizable, but a network with few neurons in the hidden layer may not be able to solve a problem with a high degree of complexity and may result in under fitting (Patterson, 1996).This was observed for the estimated and observed values for the network with a neuron with 340 repetitions (Figure 3) that had an MSE of 0.0315, which was better than the network with nine neurons and 963 repetitions (network chosen as the most appropriate) that presented an MSE of 0.0334.Despite the low MSE, the R 2 and the Pearson's correlation values were not satisfactory (0.041 and -0.203, respectively) (Table 5), and the network failed to model the relationship between the input variables and the expected result.With the validation of the network with nine neurons and 963 repetitions, it was possible to verify the tendency of the observed values to follow the estimated values (Figure 4), which was confirmed by the R 2 of 0.726 and the Pearson's correlation of 85 %.If samples 7 and 11 and indexes 3 and 4 (Figure 4) were removed from this validation, the R 2 would rise to 0.811, and the Pearson's correlation would rise to 90 %.This simulation was performed upon observing that samples 11 and 2 have agronomic characteristics with very similar values but very different productivities (a difference of nearly 800 kg), which raises the possibility that factors that were not recorded in the experiment that created the data influenced the productivity of the samples.Among all the trained networks, the one with nine neurons and 963 repetitions, which was selected as the best solution to the problem, did not present the best R 2 during training.Rather, it was selected because it presented the best validation, with an R 2 of 0.727 and a Pearson's correlation of 85 %.The estimated and observed values were observed to be very close during the training, which is evidence of the learning capacity of the MLP (Figures 5 and 6).
Despite achieving considerable performance during validation (R 2 = 0.727 and Pearson's correlation = 85%), the selected network had some values that were distant from the line.This can be attributed to the influence of the productivity of the selected samples on the validation set, which was not considered during training.Furthermore, the performance of the test set was very low (R 2 = 0.033 and Pearson's correlation = -18%), which reinforces the point that the selection of the sets was entirely random and clustered samples with low representation.The influence of nonregistered factors should also be considered.
The difference between the observed and the estimated grain productivity values (error) of the network with the training set was 30.42 kg (Table 6), confirming the high Pearson's correlation of 99%.The absolute average error was evaluated instead of the average error (17.1 kg) to avoid masking the distance between the observed and estimated values because estimated values may be negative, indicating that the value estimated by the network is greater than the observed, which decreases the average error.
The performance of the MLP was considered good relative to other studies of soybean productivity estimation as represented by Fontana, Berlato, Lauschner, and Mello (2001).This author obtained a correlation of 0.85 between observed and estimated values using the Jensen model modified to estimate the soybean crop yield in the State of Rio Grande do Sul.Using an agrometeorological model, Monteiro and Sentelhas (2014) obtained an R 2 of 0.64 when estimating soybean productivity in different regions.

Figure 1 .
Figure 1.Graphical representation of network training with 19 neurons and 806 repetitions.

Figure 2 .
Figure 2. Graphical representation of network training with 2 neurons and 212 repetitions.

Figure 3 .
Figure 3. Comparative graphical representation of the estimated and observed values of the validation set for the network with one neuron and 340 repetitions.

Figure 4 .
Figure 4. Comparative graphical representation of the estimated and observed values of the validation set for the network with 9 neurons and 963 repetitions.

Figure 5 .
Figure 5. Graphical representation of linear R training, validation, testing and the overall network with 9 neurons and 963 trainings.

Figure 6 .
Figure 6.Comparative graphical representation of the observed and estimated training values for the network with 9 neurons and 963 repetitions.

Table 1 .
Values used for X min and X max by variable.

Table 2 .
Variables presented to the input neuron layer.

Table 3 .
Description of the columns in the file containing the information network training data.
nn: the line number.

Table 4 .
Descriptive statistics of the variables used to train the ANN.
N: number of samples; Min: minimum value; Max: maximum value.

Table 5 .
ANN training data.
C: cycles or epoch at which the architecture training was finalized; NN: -number of neurons in the hidden layer; NT: training number held in the NN architecture; Tra: training; Val: validation; Performance: mean squared error (MSE) = 7.

Table 6 .
Analysis of network errors with nine neurons and 963 repetitions.
Min: minimum; Max: maximum; Mean Absolute Error: average of the absolute error values.Minimum values below zero indicate that the network was estimated above that value.All values are in kg.