The use of multilayer perceptron artificial neural networks for the classification of ethanol samples by commercialization region

Samples of automotive ethanol, marketed in the northern and eastern regions of the state of Paraná, Brazil, underwent physical and chemical tests. Rates were assessed by Multilayer Perceptron (MLP) neural network for classification. For network training, two hundred epochs, a 0.05 learning rate and a random subdivision of samples in three groups with 70 for training, 15 for test and 15% for validation were employed. Sixty networks were trained from three different initializations. Three networks, one at each start-up, were highlighted and the one with the best performance presented 8 neurons in the hidden layer, with 95 accuracy training, 96 in the test and 96% in validation. The most important variables in classifications, identified by the network, occurred in the following order: alcohol content, density, pH and electrical conductivity. Application of MLP segmented ethanol samples and identified the commercialization regions.


Introduction
Advances in research on the production of new fuels from renewable sources to reduce toxic gas emission levels into the atmosphere has underscored ethanol as the main biofuel in Brazil.Ethanol is environmentally important since it reduces carbon dioxide emissions.In fact, parts of the gas result from ethanol combustion absorbed by large plantations (Silva, Damasceno, Silva, Madruga, & Santana, 2008;Spacino et al., 2013).
Although hydrated ethanol is mainly produced from sugar cane as raw material, many other sources, such as corn and beet, may be used.Depending on the region or even the municipality, ethanol production may have physical and chemical parameters differentiated by conductivity, pH and others (Spacino et al., 2013).Various aspects on these characteristics may be studied since they emerge from different interests, comprising distillation results, municipality of origin, separation of different sugar cane crops, comparison of yields and production year for productivity (Silva et al., 2008;Spacino et al., 2013).
According to Bona, Silva, Borsato, and Bassoli (2012), the Artificial Neural Network (ANNs) is one of the study tools which has gained great importance and has been successful in targeting samples.There are several types of neural networks such as Multilayer Perceptron (MLP), radial basis networks, Self-Organizing Maps (SOM) among others (Haykin, 2001;Bishop, 2007).
Supervised training is carried out through backpropagation algorithm based on learning by error correction (Haykin, 2001;Bishop, 2007).
Basically, the learning of MLP by backpropagation occurs when the set of pairs named input and output performs two steps among the different layers of the network: a step forward called propagation and a step backward called backpropagation.Specifically, the actual response of the network is subtracted from the expected response to produce an error signal.The error signal is back-propagated through the network, adjusting synaptic weights to make the actual response of the network move closer to the expected response by minimizing the error (Bishop, 2007).When the response does not match with what was expected by the network, the procedure is repeated several times until the input-output set achieves accuracy (Haykin, 2001;Borsato et al., 2009).
Considering the commercial importance of ethanol and the effectiveness of ANN technique, the application of MLP-type ANN for the classification of ethanol samples commercialized in two regions of the state of Paraná, Brazil, has been proposed.

Samples of hydrated ethanol
The two hundred and four samples of commercial hydrated ethanol (112 from the eastern region and 92 from the northern region of the state of Paraná) used for the application of MLP underwent alcohol content, density, pH and electrical conductivity tests.

Electrical conductivity
Electrical conductivity was determined by Digimed DM-31 equipment, according to ASTM D1125-11 (American Society for Testing and Materials [ASTM], 2014).

ANN
The classification module of Statistica 9.0 was employed for MLP-type neural networks and for automatic segmentation.Networks were trained with 70% of the samples for the training group and 15% for test and validation.The choice of samples in each group was performed randomly and learning rate was maintained at 0.05.A categorical variable region was chosen, whilst alcohol content, density, electrical conductivity and pH were selected as continuous variables.
The number of hidden layer was 1; the number of neurons comprised was in the range from 1 to 10; maximum number of epochs was 200.Selected error functions were SOS (Sum of Squares), Cross Entropy; the activation of hidden layer were Identity, Logistic, Hyberbolic Tangent, Exponential; the functions for activation of the output layer were Identity, Logistic, Hyberbolic Tangent, Exponential, and Softmax.The employed backpropagation algorithm was BFGS, opted for 20 networks training, and the 5 best performing were selected by software.

Results and discussion
Figure 1 shows the rates for density (kg m -3 ), alcohol content (g 100 -1 g), pH and electrical conductivity (μS m -1 ) of the 204 samples analyzed, separated by commercialization regions.Horizontal lines indicate the boundaries of the compliance parameters.Results showed that all samples were within the parameters set for marketing.
Table 1 shows maximum and minimum, mean and standard deviation of each parameter used for training, test and validation of the network (MLP 4-8-2), with the best performance.Electrical conductivity and alcohol content were the compliance parameters respectively with the highest and lowest standard deviation.Figure 2 demonstrates each compliance parameter with other input variables.When density rates were related to alcohol content, the behavior was close to linear, albeit with R 2 = 0.73.The relationship of parameters to each other showed a typical nonlinear behavior with few scattered samples.The nonlinear relationship among the parameter rates showed that ANNs may be applied to the case in question, since they are inherently nonlinear (Ritter, 1995).
The 204 samples tested were classified by a MLP-type neural network.The chosen neural networks comprised an input layer with one neuron for each variable, a hidden layer responsible for the separation of standards and an output layer with the decision taken by the hidden neurons.In the training of samples, the automatic module Statistica 9.0 used activation functions identity, logistic, hyperbolic tangent and exponential type for neurons of the hidden layer and the output.Twenty networks were tested and the 5 best performances were highlighted for every start-up of the program.
Since they act characteristically as detectors, the hidden neurons had an important role in the operation of a network Perceptron learning by backpropagation.As the learning process advances, the hidden neurons gradually start discovering the peculiarities that characterize the training data (Haykin, 2001;Bishop, 2007).
The number of epochs and neurons in the hidden layer cannot be too high because when a neural network learns too many input-output examples, it may end up memorizing the training data.This phenomenon is known as adjustment or overtraining and makes the network lose its generalization capacity (Haykin, 2001;Bishop, 2007).Furthermore, according to Haykin (2001), the lower the learning rate parameter, the lower are the variations in synaptic weights of interaction to another network, and the softer is the trajectory of the weight space.
On the other hand, if the learning rate is very high, the modifications will result in major synaptic weights, that may render the network unstable.Therefore, in this case, a learning rate of 0.05 was applied, a maximum number of epochs equal to 200 and between 1 and 10 neurons were chosen in the hidden layer for training the MLP.
The samples and analyzed parameters presented to the network were subdivided randomly into three groups: the first consisted of the training set comprising 70%; the second group, called test, comprised 15% of samples; and the third, called validation, also comprised 15%.The second and third groups, which were not present during training, aimed at validating and verifying the generalization capacity of the trained network (Haykin, 2001;Borsato et al., 2009;Link et al., 2014).
Error was determinedat each epoch of training and the information was used to adjust the weights to reduce the error until stabilization.Figure 3 shows the number of epochs used for training the network with the highest performance, revealing that the network needed only 60 epochs to achieve stability.As two main variability sources of the network were initialization and sampling, three initializations were used, totaling 60 networks.The 15 networks with the best performance, chosen by the computer program, presented a percentage of accuracy ranging between 88 and 95% for training, between 83 and 96% for test, and between 93 and 100% for validation, which were higher than those obtained by Anderson and Smith (2002) who applied ANNs in the differentiation of coffee samples by geographical origin.
In each initialization, a network was highlighted when the accuracy rate was compared with the other.Table 2 presents the three selected networks with their respective percentage of correct answers.Two of the selected networks had 8 neurons in the hidden layer.As each initialization used a random choice, different algorithms for training, error function, activation of hidden layer and output alter the hit rates.Table 2 presents the accuracy rates of the networks chosen for training (Tr), test (T) and validation (Val), as well as the characteristics of each algorithm (Al) used, including the error function (EF), the activation functions of the hidden layer (HA) and the output layer (OA).According to Table 2, the best performance among the networks stood out with 8 neurons in the hidden layer, and the best of 2MLP 4-8-2 with a 95 hit rate for training, 96 for test and 96% for validation.The 1MLP 4-8-2 presented 94 for training, 90 for test and 100% for validation.The third network selected showed 7 neurons in the hidden layer, with hit rates of 92 for training, 90 for test and 96% for validation.Tukey´s test applied to the averages rates for training, test and validation percentages of the 15 selected networks revealed a significant difference (p max.= 0.009) when compared to rates of the selected network (1MLP 4-8-2).The percentage of accuracy responses for the classification of samples, MLP network, according to the commercialization region, was significantly higher (Anderson & Smith, 2002).
In the case of 1MLP 4-8-2, only 11 out of 204 samples used including training, test and validation, were not classified correctly.Since there were only 9 and 15 errors respectively for 2MLP 4-8-2 and 3MLP 4-7-2, the above shows that the network 2MLP 4-8-2 had the highest number of correct responses.In fact, it was the network with the best performance in the classification of ethanol samples.
Figure 4 shows the representation of the network 2MLP 4-8-2, with 4 input parameters, 8 neurons in the hidden layer and 2 regions analyzed, whereas 3MLP 4-7-2 had 4 input parameters, 7 neurons in the hidden layer and 2 regions analyzed.In the hidden layer, neurons with more intense tones were more activated by the network in search of the target response.
From the selected networks, an order of importance could be stipulated for input parameters; in the case of network 1MLP 4-8-2, the alcohol content compliance parameter was characterized as the most important, followed by pH, density and electrical conductivity, in this order.For network 2MLP 4-8-2, the most important variable was alcohol content followed by the density, pH and  This order of importance was given by the network´s sensitivity analysis, in which the program calculates the sum of squared residuals or error rate for the classification of each network for the model when one of the compliance parameters is taken.Further, relationships are established between the full model, which includes the parameter and when it is deleted.The parameters of relevance order may be established from these data, or rather, those with higher rates are the most important for the network classification.Table 3 shows the sensitivity analysis of the selected networks (Statistica, 2009).

Conclusion
MLP-type ANN proved to be a suitable tool for the classification of ethanol samples according to their commercialization region, whilst input variables of the sensitivity analysis reveal that the alcohol contents, followed by pH and density, were the input variables of the sensitivity analysis.Alcohol contents, followed by pH and density, were instrumental for the identification of samples with importance of compliance parameters for segmentation.The features used for training the network, the learning rate, epochs number, training algorithm, and number of hidden neurons were also effective for differentiating the samples. pH

Figure 1 .
Figure 1.Data of (a) density, (b) pH, (c) alcohol content and (d) electrical conductivity for ethanol samples.

Figure 2 .
Figure 2. Matrix of graphs with parameters and their behavior.

Figure 3 .
Figure 3. Representation oferror obtained related to the number of training epochs of the network MLP 4-8-2.

Table 1 .
Statistics of parameters used for training, test and validation of the network employed.

Table 2 .
Accuracy percentages of selected MLPs and their characteristics.
Finally, in network 3MLP 4-7-2, the order of importance was density, alcohol content, electrical conductivity and pH, with the lowest percentage of correct responses (Table2).

Table 3 .
Sensitivity analysis of selected MLP.