A bootstrapped neural network model applied to prediction of the biodegradation rate of reactive Black 5 dye

Current essay forwards a biodegradation model of a dye, used in the textile industry, based on a neural network propped by bootstrap remodeling. Bootstrapped neural network is set to generate estimates that are close to results obtained in an intrinsic experience in which a chemical process is applied. Pseudomonas oleovorans was used in the biodegradation of reactive Black 5. Results show a brief comparison between the information estimated by the proposed approach and the experimental data, with a coefficient of correlation between real and predicted values for a more than 0.99 biodegradation rate. Dye concentration and the solution’s pH failed to interfere in biodegradation index rates. A value above 90% of dye biodegradation was achieved between 1.000 and 1.841 mL 10 mL of microorganism concentration and between 1.000 and 2.000 g 100 mL of glucose concentration within the experimental conditions under analysis.


Introduction
The treatment of effluents in textile industries has become a very important issue.There are over 100,000 commercially available dyes and more than 7x10 7 tons of dyestuff produced annually worldwide (AKHTAR et al., 2005;ROBINSON et al., 2001).These dyes are widely used in a number of industries, such as textiles, food, cosmetics and printing, although the textile industry is the greatest consumer of dyes (PANDEY et al., 2007).
In Brazil, there are 5,000 textile industries distributed into large (11%); small (21%) and very small (68%) companies.Whereas the Brazilian textile sector occupies 5 th place in direct jobs and 6 th in turnover, the production of dye in Brazil reaches 26,500 ton per year (SILVEIRA et al., 2009b;ULSON DE SOUZA et al., 2007).Moreover, it is estimated that at least 20% of textile dyes in the dyeing process are discharged into effluents due to losses during the process for color fixing to the fibers.
The removal of these compounds from industrial waste is one of the major environmental problems faced by the textile sector.In fact, the nontreatment of these effluents may cause serious risks to the environment and, consequently, to the whole productive chain.Therefore, the development of effluent treatment technologies is currently of great relevance due to increasing ecology conscience-awareness and to strict environmental law.Further, when companies implement new and efficient systems of effluent treatment, they show a proactive and committed stance towards environmental problems by assessing and eliminating its negative externalities.In other words, they refrain from causing impacts from their production process and, by becoming sustainable, no water consumption and discharge of polluting effluent occur.
However, the treatment of effluents by textile industries becomes more complicated with the common use of several other chemicals with different composition, such as moisturizers, colorants, electrolytes, dispersers, pH controllers, stabilizers and others used during the coloring process.
The main techniques in the literature on effluent treatment are adsorption, precipitation, biological and chemistry degradation, electrochemistry and photochemistry.Currently, the most popular methods of color removal from wastewater involve physical and chemical processes that, besides being expensive, usually include the formation of a concentrated sludge that creates a secondary and highly significant disposal problem (CHANG; KUO, 2000).
Although integrated chemical methods seem to be feasible for the treatment of such wastewater, biological methods should preferably be used when costs and technical advantages are taken into account (YU et al., 2010).Environmental biotechnology is constantly trying to find more and more solutions for the biological treatment of dye-contaminated wastewater.Although numerous microorganisms are capable of decolorizing dyes, only a few are able to mineralize these compounds into CO 2 and H 2 O (JUNGHANNS et al., 2008).Although under aerobic conditions, azo dyes are not easily metabolized by bacteria (ROBINSON et al., 2001), several bacterial strains, including Pseudomonas oleovorans, may enzymatically reduce under anaerobic conditions the azo bonds in the dye molecule to produce colorless by-products (SILVEIRA et al., 2009b).
Due to the toxicity and low biodegradability of azo dyes, it is suggested that more effective treatment methods, such as advanced oxidation processes (AOPs), should be employed for the destruction of these compounds in wastewater (KIM et al., 2004;MOHAJERANI et al., 2011).However, the main costs of the Fenton reaction process with H 2 O 2 are disadvantageous.Further, the addition of Fe 2+ produces a brown turbidity that causes the combination of hydroxyl radical.Fe 2+ reacts with hydroxyl radicals as a scavenger and since this contaminant should be removed from the effluent the process becomes more expensive still (KIM et al., 2004).
Due to these issues, several researches have been carried out to find an efficient approach to simulate different process for the degradation of azo dyes (ALEBOYEH et al., 2008;BALAN et al., 1999;CEYLAN, 2008;DANESHVAR, et al., 2006;GUIMARÃES;SILVA, 2007;GUIMARÃES et al., 2008;MOHAJERANI et al., 2011;SOLEYMANIA et al., 2011;ZAREI et al., 2010).Consequently, the use of artificial neural network models is widespread and shows good results in the prediction of degradation rates under different conditions.
However, neural networks are limited by the size of the experimental data set.Neural network experiments are conducted by using a division of the available data into training, selection, and test sets (PANCHAL et al, 2011).Unfortunately, the determination of constant values from experimental procedure has many limitations.In such cases, the initial collected data set is often very small, the magnitude of variable effects is sometimes ambiguous and data collection costs are high.Data division into multiple subsets is therefore very unproductive since less data are produced than those expected for a single subset.
An alternative method to overcome these problems is the use of a re-sampling approach such as the bootstrap method based on an imitation of a probabilistic process and on the information supplied by a given small set of random samples.Research has shown the feasible of the bootstrapping technique for estimating objects out of sample by redrawing small subsets (EFRON, 1979;EFRON;GONG, 1983).However, the use of this approach to estimate dye biodegradation rate is practically nonexistent.
Coupling bootstrap and artificial neural networks has produced some improved prediction models in different areas (FRANKE;NEUMANN, 2000;LAJBCYGIER;CONNOR, 1997).Through such an approach, the bootstrap method is used to create several designed data sets to train different neural networks.This approach identifies the distribution of a statistical estimation for the construction of confidence intervals for values that are being predicted (in this case, dye biodegradation rate).Of course, decisions based upon a number of experiments with different subsets may produce much more reliable results and may at least mitigate the sampling bias.The latter issue is an important problem in the neural network approaches and an inevitable consequence of the limited number of samples that it is possible to record in a real experiment (FRANKE;NEUMANN, 2000;LAJBCYGIER;CONNOR, 1997).
In current assay, the bootstrap method has been applied within a neural network context to create an improved dye biodegradation prediction model.Pseudomonas oleovorans was used in the biodegradation of reactive Black 5 and the predicted results of the designed model and the experimental data were compared to validate the proposed numerical approach.

Material
The microorganism Pseudomonas oleovorans (CMAI 703) was obtained from the Brazilian Collection of Industrial and Environmental Microorganisms of the State University of Campinas, Campinas, São Paulo State, Brazil.The Reactive Black 5 was obtained from the Department of Chemical Engineering of the Federal University of Sergipe, Brazil.Glucose, mono-and di-basic potassium phosphate were provided by Merck (Darmstadt, Germany).

Design and Experimental System
Table 1 shows the experimental design 2 4-1 drawn up to study the effect of dye (C dye , mg L -1 ), glucose (C glucose , g 100 mL -1 ), microorganism (C mo , mL L -1 ) concentrations and pH (pH) on dye biodegradation (%Biod) at initial time.Experimental studies were carried out in 250-mL conical flasks containing mineral medium and inoculated with Pseudomonas oleovorans at 180 rpm, room temperature and pressure at 27°C and 1 atm respectively, in an orbital shaker, for seven days.The samples were then centrifuged and dye biodegradation were estimated every 4 hours by a spectrophotometer at 600 nm (SILVEIRA et al., 2009b).When the sample was digested, COD material in that sample was oxidized by the dichromate ion.This procedure resulted in the change of chromium from the hexavalent (VI) to the trivalent (III) state.The two chromium species exhibited a color and absorbed light in the visible region of the spectrum.The dichromate ion (Cr 2 O 7 2-) absorbed strongly in the 400 nm region, whereas the chromic ion (Cr 3+ ) absorbed much less.On the other hand, in the 600 nm region, the chromic ion absorbed strongly and the dichromate ion had a near zero absorption (Standard Method 5220 D).The iodine-metric method (Winkler method) was used to determine OD concentration on zero and five day times for the measurement of BOD 5 content.The titrating metric method (Walkley-Black method) was employed to determine TOC content.All methods are shown in APHA (1995).
The biodegradation index was given by The Proposed Bootstrapped Neural Network prediction model The bootstrap method is based on the imitation of the probabilistic process using the information from a given small set of random samples.Given an original (small) dataset, the method estimates the standard error of some parameter of interest using the samples as an approximation of the population.Specifically, it takes samples with replacement from the original dataset to approximate samples from the population.A simple algorithm to illustrate the bootstrap method may be defined by the three steps below (JOHNSON, 2001): (1) Sample values with replacement from the original data and compute the parameter of interest p.
(2) Repeat step (1) from a moderate to a great number of times, B, to come up with bootstrap estimates p 1 , p 2 ,...., p B .
(3) Use the standard deviation of the B estimates in step (2) to estimate the probability distributions describing the prediction process, the standard error, and the confidence interval.
In this case, the original data set obtained from the experimental design was used to generate thirty-bootstrapped-data set for training feedforward neural networks.The artificial neural network may be defined as a set of mathematical methods and computational algorithms designed to simulate the process of information handling and the knowledge acquisition on the human brain.
Since it is inspired by human brain biology, the neural network has basic elements such as artificial neurons, synapses, neural weights, transfer functions and others (HAYKIN, 1999).The artificial neurons are grouped into layers and connected in parallel.The first layer of the neural network is called the input layer and it has a number of neurons equal to the number of independent variables.Similarly, the last is the output layer and its neurons return the correspondent values of the dependent variables.Depending on the problem, one or more hidden layers may be located between the input and output layers.The number of neurons in these hidden layers may also be defined according to the problem and the computations performed by an artificial neuron are determined by the transfer functions they apply.Therefore, the number of layers, the number of nodes in each layer and the transfer function define the configuration of a neural network.In the Neural Network training process, the input variables (concentrations of dye, pH, microorganism and glucose) are normalized within the 0-1 range and are connected to the hidden neurons by weights and bias.The neurons of the network subsequently receive this input signal and transform it into output signal that is transmitted to the next neuron in the processing direction, according to the input-output behavior defined by the transfer function.This process is accomplished in an iterative way in order to find a set of connection weights and biases that minimize the mean square error between the observed and the predicted output in the output layer.
In current assay, the bootstrap method has been used to generate thirty data sets for the neural network training.In this phase, neural networks were used with two hidden layers, 5 neurons in both layers, and bootstrapped training sets with 100 samples with replacement.Such configuration of the neural networks was reached experimentally for values within the 1-5, 2-15 and 20-100 range for number of layers, number of neurons in hidden layers and bootstrap sample size, respectively.Figure 1 shows the neural network configuration.
The number of neural networks was chosen to ensure a statistical estimation precision of about 8% with 95% confidence rate.Further, tangent sigmoid as transfer function and gradient descent with momentum back-propagation training with 0.2 of learning rate were employed (HAYKIN, 1999).
A sensitivity analysis was also performed to quantify the effect of each input variable on the network output.The sensitivity analysis approach, proposed by Fish and Blodgett ( 2003) and Delen et al. ( 2006), was adopted in current investigation.The basic idea of this method was to disturb one input variable value within a reasonable interval while keeping all the other variables unchanged.At the same time, the corresponding variation of network output was recorded and the effect of changing a single explanatory variable on the network output was found.
The model selection was also guided by the simple strategy of directly focusing on the increase of the coefficient of correlation between the outcomes and their predicted values, taking all the difficulties of this task into consideration (CURRY; MORGAN, 2006).
After the training, the neural networks were used to generate new data sets to improve the original set.This second step determined all the output values for dye biodegradation.The mean value of the output from the 30 bootstrapped neural networks was used as a predicted value of response biodegradation rate.The resulting prediction, based on a number of experiments with different subsets, may produce much more reliable results and may at least mitigate the sampling bias, which is an important problem in the neural network approaches and is an inevitable consequence of the limited number of samples that it is possible to record in a real experiment.Figure 2 illustrates the whole prediction process.

Experimental results
Table 1 shows the experimental plan and results for each assay.Most trials showed an over 90% biodegradation rate of reactive Black 5. Apparently, there was a greater influence of the microorganism content than by other factors.However, glucose concentration cannot be excluded from these important factors, because a low biodegradation index had been observed under its lower concentration.The best result occurred at 0.15 g L -1 of dye, pH 6, 1.5 mL L -1 of microorganism and 1 g 100 mL -1 of glucose, with a biodegradation index of 96%.Table 2 shows measurements of the chemical parameters.A higher removal percentage for all parameters may be observed.According to Silveira et al. (2009a and b), COD/BOD rate between 1.5 and 2.5 demonstrated that effluent was biodegradable.Consequently, as shown in Table 2, the dye effluent used in current experiment was highly biodegradable.Golob et al. (2005) achieved a significant result for the coagulation and precipitation of the dye, with more than 90% removal of reactive Black 5, using Al 2 (SO 4 ) 3 as precipitant agent.However, high concentrations of dissolved solid and chemicals were observed after the process.Consequently, the process may be discarded because it increased the environment contamination due to a decrease of the quality of ecological parameters.A lab scale active sludge reactor combined to a membrane separation process was used in the dye removal of denim textile wastewater.Although 75 and 90% efficiency was observed, Balan et al. (1999) in a biological treatment with yeast of Pseudomonas pictorum at 30°C observed a degradation of 98% of phenol for high glucose concentration.The cell content was not significant and demonstrated the importance of glucose on the Pseudomanas yeast in biodegradation processes.
The above demonstrates that P. oleovorans is very efficient in dye removal, with results similar to those with photo-fenton (ALEBOYEH et al., 2008;GUIMARÃES;SILVA, 2007;KIM et al., 2004;ZAREI et al., 2010) and with P. pictorum treatments (BALAN et al., 1999).In fact, they were higher than those by the precipitation (KIM et al., 2004), active sludge and membrane processes (SAHINKAYA et al., 2008).According to Balan et al. (1999) andSilveira et al. (2009a), Pseudomonas yeasts, when adapted to medium, used the dyes as a substrate to survive, at the end of nutrients.This occurred because they had the ability to release several enzymes (catalases) in the reaction and were able to catalyze the decomposition of the carbonic and azo links.
Since photo-fenton processes have a high cost with chemicals and increase the turbidity and the metal concentration in the effluent, the biodegradation process with P. oleovorans is still one of the cheapest and most efficient for effluent treatment by textile industries (KIM et al., 2004).

Bootstrapped neural network prediction results
A large range of assays is required for a simple simulation by the traditional neural network method.A full factorial design 2 4 has 27 assays (for a full experiment).It comprises the maximum reduction in the assay quantity from a 3 4 common experiment, with 243 total assays (with triplicates), without statistically significant losses.A factorial design 2 4-1 is a fraction from full factorial planning, resulting in 19 assays, as shown in Table 1.Therefore, the original small data set obtained from the experimental design (Table 1) was used to generate thirty bootstrapped data sets for training feed-forward neural networks.
After the training, the neural networks evaluated the biodegradation rate from the original input values.This second step determined all the output values for dye biodegradation in order to generate new data sets to improve the original one.The mean value of the output from the 30 bootstrapped neural networks was employed as a predicted value of response biodegradation rate.Figure 3   The bootstrapped neural network approach also calculated the confidence interval for all predicted values.The 95% confidence intervals for this case are presented in Figure 4.The prediction of the last sample was omitted for readability.The minimum and maximum values for each point had an almost negligible deviation and demonstrated that the algorithm reproduced the experimental data with high accuracy.Moreover, the designed data analyzed the combinations associated to all low-and highorder interactions and indentified all the effects without any aliasing, which may normally occur in fractional designs (MONTGOMERY, 2005).In fact, the main effects could be assessed more clearly and with greater statistical accuracy.Additionally, the results of the proposed approach were also compared with the experimental data.The performance of a linear regression between the network response and the corresponding target are presented in Figure 5.The proposed model presented remarkable coefficient of correlation between real and predicted values for the biodegradation rate, or rather, above 0.99.

Conclusion
The prediction model presented was highly efficient for the experimental data and the model showed a remarkable correlationship coefficient between real and predicted values for an over 0.99 biodegradation rate.The resulting prediction, based upon a number of experiments with different datasets, may produce much more reliable results since it identifies the distribution of statistical estimates, calculates standard error and constructs confidence intervals for the values that are being predicted.Moreover, this approach may at least mitigate the sampling bias, which is an important problem in neural network approaches and is an inevitable consequence of the limited number of samples possible to record in a real experiment.

Figure 2 .
Figure 2. The proposed bootstrapped neural network approach.
Aleboyeh et al. (2008) showed that the initial concentration of the dye and initial pH has strong effects on the de-colorization efficiency.Further, none of the variables studied in their work could have been neglected in the photochemical decolorization of C.I. Acid Orange 7 solution with a 90% de-colorization at the best conditions.Similar efficiency was also observed byZarei et al. (2010) in their study on the degradation of C.I. Basic Red 46 (BR46) by photoelectro-Fenton (PEF) combined with photocatalytic process.Guimarães and Silva (2007) also reported over 90% degradation efficiency of Acid Orange 52 and Acid Orange 10 and less than 80% of Acid Brown 75 and Direct Red 28 dyes.Kim et al. (2004) verified less than 60% removal efficiency by FeCl 2 precipitation reactive yellow 84 and blue 49 and an efficiency of 95% by photofenton process.
presents numerically the results.

Figure 5 .
Figure 5. Linear regression results for the bootstrapped neural network approach.

Table 1 .
Experimental design for dye biodegradation.C dye , C mo and C glucose are, respectively, the concentrations of dye, microorganism and glucose; %Biod is the biodegradation index.

Table 2 .
Measured parameters for the best experimental conditions.Ponto ou vírgula nos valores?