Investigation of NIR spectra pre-processing methods combined with multivariate regression for determination of moisture in powdered industrial egg

High productivity and meantime perishability of in natura eggs, make powdered egg attractive for patisseries and pasta industries. Water reduction in 65%, extends shelf life from 1 to 12 months, preventing also Salmonella . Maximum powdered egg moisture allowed by Brazilian law is 6.0%  (w w -1 ). However, its determination by reference technique (oven at 105 o C for 8 hour) is lengthy for processing industry. Therefore, the purpose of this study was to investigate the influence of several spectral pre-processing techniques in the application of near-infrared spectroscopy associated with chemometrics models for determination of moisture content in powdered egg, without the need of sample preparation and destruction, held at 0.5 min. Several pre-treatment techniques were evaluated to ensure spectral data reliability such as: standard normal variation; multiplicative scatter correction; smoothing and detrend. The principal component regression (PCR) and partial least squares (PLS) were evaluated with and without pre-treatment. The best results were observed in NIR/PLS model (49 samples), providing an adequate correlation (r) of 0.96, for cross-validation. Using 21 samples as prediction set, NIR/PLS showed relative error (RE) < 2.0%, compared to primary methods oven and thermobalance, indicating to be suitable for industrial quality control.


Introduction
Egg is composed of countless nutrients. It is rich in proteins and has been for a long time subject of nutritional studies involving amino acids (Tesedo, Barrado, Sanz, Tesedo, & De La Rose, 2006). However, water is the major component of chicken egg, making up for about 70% of its composition (Roe, Pinchen, Church, & Finglas, 2013). For this reason, if egg is not stored in appropriate conditions, osmotic changes may occur through its porous shell, causing contamination by microorganisms such as Salmonella (Barancelli, Martin, & Porto, 2012).
According to the Instituto Brasileiro de Geografia e Estatística (IBGE), Brazil produced 760885 dozen of chicken eggs in the 2 nd quarter of 2016, increasing the production to 816103 in the 2 nd quarter of 2017, an increase of 7.3% (Instituto Brasileiro de Geografia e Estatística [IBGE], 2017). Because of the high productivity and meantime high perishability of chicken eggs, industry seeks processing methods aiming at growth in production and exports, considering as an advantage the fact that the powdered egg has its shelf life increased when compared to fresh egg.
Processed egg is the final product of the whole fresh egg that undergoes some physical-chemical transformations, without changes in its protein source. In particular, powdered egg has been gaining importance in bakery and pasta processing industries due to its convenience of use and also for food safety (Baron et al., 2004). The high marketing price of powdered egg is balanced by transport losses reduction and shelf life increase. Powdered egg shelf life ranges from 6 to 12 months, while fresh egg shelf life is about 30 days if properly stored (Santos et al., 2009). According to Article 753 of the Rules of Industrial and Sanitary Inspection of Products of Animal Origin (RIISPOA), the maximum amount of moisture allowed in dehydrated egg is 6.0% by weight, reducing the probability of microbial growth and improving its quality (Brasil, 1952).
Moisture content of powdered egg can be determined by gravimetric difference after ovendrying at 105°C (Association of Official Analytical Chemists [AOAC], 2016), or by moisture analyzer (thermobalance). The last one is not considered an analytical reference method, but it is faster and more convenient compared to oven-drying. Traditionally, spectroscopic methods in the near infrared (NIR) range have been used as a fast, practical and nondestructive alternative method for moisture analysis and other compounds in pharmaceutical and food processing industries (Pasquini, 2003, Nagarajan, Singh, & Mehrotra, 2006. This technique is growing as a processing analytical technology (PAT), because it brings some advantages, such as the possibility of spectral data acquisition for solid and liquid samples (with minimum or no sample pre-treatment); it provides physicochemical information of the sample (such as viscosity, water content, polymorphism); it predicts and determines multiple parameters through a single spectrum (Buckton, Yonemochi, Hammond, & Moffat, 1998, Blanco & Alacalá, 2006, Reh, Gerber, Prodolliet, & Vuataz, 2006, Nagarajan et al., 2006. This spectroscopic technique was associated with chemometrics, which is based on mathematical and statistical techniques to extract fundamental analytical data from analyzed samples (Pasquini, 2003).
This study aimed to assess the feasibility of using NIR spectroscopy in industrial quality control for powdered egg moisture, by several spectra preprocessing techniques and adjusted PLS calibration model, created from a primary data base determined by the moisture analyzer.

Sample
Powdered egg samples were provided by an egg processing industry from Londrina region (Parana state, Brazil), totaling 70 samples stemmed from the same group, with different levels of moisture.

Reference method
The official reference method for food products moisture content determination is performed by oven-drying at 105°C until constant weight (AOAC, 2016). Results for determination of moisture content by oven-drying method were supplied by the egg processing industry. In addition, moisture was also determined using thermobalance (Ohaus MB45, Greifensee, Switzerland), with temperature adjusted to 105°C for 5 min. These analyses were carried out in triplicate, in the Laboratory of Food Science and Technology of the State University of Londrina (Departamento de Ciências de Alimentos -DCTA/UEL).

NIR spectroscopy
Sample spectra were collected using a spectrometer NIR (FOSS XDS ™ Rapid Content Analyser, Hillerød, Denmark) from 400-2948 nm, with 2 nm interval. Each spectrum was collected in reflectance mode and converted to absorbance [log (1/R)], as the average of 32 scans for each sample in 0.5 min. Forty nine (49) samples of whole powdered egg were analyzed for building of calibration models and twenty one (21) for validation. Samples (2.50 g) were placed inside the 'spinning', a quartz sample holder, pressed with a support to minimize the effect produced by sample particle size. These analyzes were performed at the Central of Multiuser Research Laboratories (Central Multiusuária de Laboratórios de Pesquisa da UEL -CMLP) at Laboratory to Support Agricultural Research at the State University of Londrina (Laboratório de Apoio à Pesquisa Agropecuária -LAPA/UEL).

Spectra pre-processing methods
In order to improve the efficiency of the multivariate models, spectral pre-processing techniques based on smoothing (moving average), detrend, multiplicative scattering correction (MSC) and standard normal variate (SNV) can be applied.
Smoothing's techniques are mathematical tools that reduce the ratio signal/noise. Their main application can be represented by the least squares (Dardenne, Sinnaeve, & Baeten, 2000, Ramsay, Hooker, Campbell, & Cao, 2007. Detrend is applied to eliminate or minimize the effects of baseline and curvilinear displacement, using a polynomial model that the baseline is adjusted as a function of wavelength, showing the difference of each independent spectrum (Luypaert, Heuerding, De Jong, & Massart, 2002). MSCs can be applied, since the variations of the optical path or light scattering in the sample cause interference. This can be corrected compared to a reference spectrum obtained by the average spectrum of the samples (Naes, Isaksson, Fearn, & Davies, 2017). SNV is similar to MSC, but the reference is determined independently. Adjusted deviation for each sample spectrum is simply the ratio of the average of all values for all variables, and the multiplicative setting is the standard deviation of the values of all variables. SNV improves the accuracy of prediction, but does not simplify and reduce systematic noise model (Naes et al., 2017).

Multivariate analysis methods
Sample spectra were recorded by using FOSS ISIscan software (version 3.5; Hillerød, Denmark). Quantitative NIR analysis models were developed using partial least squares regression (PLS) and principal component regression (PCR), which were done in the study herein by using FOSS WinISI II software (version 1.5; Hillerød, Denmark), and their data compiled by Statistica software (Statsoft version 8.0, Tulsa, USA).
PLS modeling is a common technique because it relates to spectral information with reference data to obtain factors (or latent variables -LV). It is robust since the regression coefficients hardly change with the addition of new samples in the calibration set. This only strengthens the importance of reducing the experimental noise, linearity and non-linearities, enabling the construction of models with a variable set larger than the number of samples. Typically, PLS provide suitable models with a smaller number of principal components compared to principal component regression (PCR) (Pasquini, 2003, Escandar, Damiani, Goicoechea, & Olivieri, 2006, Maluf, Pontarolo, Cordeiro, Nagata, & Peralta-Zamora, 2010. PLS modeling removes information from the data set of spectral matrix (matrix X -independent values) to correlate them with the information taken from the reference set (Y matrix -dependent values). From linear combinations of the spectral data and reference data, the number of latent variables necessary to correlate the spectra and concentrations is calculated. These variables are used to construct the calibration model that offers the smallest differences among the reference values and the predicted values (Morgano, Faria, Ferrão, Bragagnolo, & Ferreira, 2008).

Validation parameters of the models
In order to create a calibration model, spectral information of 49 samples were randomly selected with their respective moisture content, analyzed by the reference technique. The remaining dataset composed of 21 samples were used for external validation. In internal validation, specifically crossvalidation, part of the samples dataset is randomly selected for the calibration model, and the remaining samples are used for prediction. Alternatively, on external validation, 21 samples were selected and moisture content was predicted using the calibration model obtained (Ferreira, Antunes, Melgo, & Volpe, 1999).
With the purpose of assessing the efficiency of the models and calibration error, several parameters were calculated (Burns & Ciurczak, 2007 where: y exp is the reference value and y prev the predicted value. b) Root mean square error (RMSE), also called the root mean squared error of prediction (RMSEP), Equation 2: (2) where: Acta Scientiarum. Technology, v. 40, e30133, 2018 n is the number of samples analyzed. c) Relative error percentage (RE) between the reference method and the spectral method (Skoog, West, Holler, & Crouch, 2014), Equation 3: Correlation coefficient (r) between the estimated and experimental values of the reference method, Equation 4: where: y med is the mean of the experimental data.

Statistical comparison among the different methods for moisture content determination
Initially, the moisture content was measured by the reference method (oven) and then the thermobalance and the NIR/PLS model were used. ANOVA was applied to compare the three methods and the Tukey test for means comparison, all made by the Statistica software (Statsoft version 8.0, Tulsa, USA) at 5.0% level of significance.

Results and discussion
First, a comparison was made among the results obtained by oven and thermobalance methodologies, in which the samples were analyzed in triplicate (n = 3) for both methods, with the average of moisture content and its standard deviation are shown in Table 1. Results obtained by oven (AOAC, 2016), thermobalance and NIR/PLS method were compared by statistical analyses using Tukey and Student' t tests considering 5.0% level of significance.
No significant difference among the three methods (p < 0.05) was observed. Hence, moisture content measured by thermobalance method was used as the primary control/reference method for building the calibration model using NIR spectral data.
Pre-processing of NIR spectra NIR spectra exhibited some noise, baseline distortion and/or some variation in optical path length occasioned by particle size (Figure 1a). Thus, it was necessary to investigate the application of spectral pre-processing detrend and smooth (moving average). In addition, in order to correct the effect of light scattering on the particle surfaces of the samples, standard normal variate (SNV) and multiplicative scatter correction (MSC) were respectively applied, all represented in Figure 1b and 1c, respectively. Table 1. Average of powdered egg moisture content (n = 3) and its standard deviation analyzed by oven, thermobalance and NIR/PLS methods.

PCR models
PCR was performed in the range from 800-2000 nm, to cover the whole spectral variation based on powdered egg. Raw and pretreatment data with different methods were used to develop regression models with PCR. Results were shown in Table 2, in which it can be observed that PCR models developed on MSC showed better statistics compared with raw spectra and other pretreatment spectra for powdered egg samples. Smoothing pretreatment spectra could not improve results compared to raw spectra. PCR models developed with first and second pretreatment derivative take to low correlation factors. So, the best model for moisture forecast for powdered egg was the MSC treatment with r (0.9239), RMSEC (0.1012) and RMSEP (0.1044) (Luypaert, Heuerding, De Jong, & Massart, 2002).

PLS models
PLS models for measuring powdered egg moisture levels, constructed using quality prediction models at 1000-2000 nm range of wavelengths are shown in Table 3. PLS models in determination of moisture seem feasible due to the high value of r ranged from 0.89 to 0.93. Correlation coefficient values do not vary to a great extent for first and second derivatives prediction, or by pretreatment by MSC, but they got a small improvement for raw spectra model. A slight decrease occurred in RMSEC and RMSEP for raw spectra model. So, the best values were obtained using PLS without any pretreatment with r, RMSEC and RMSEP of 0.93, 0.08 and 0.10, respectively.

PLS calibration and internal validation
PLS regression was applied to the 49 powdered egg samples spectra calibration set with known moisture content determined by reference method. Four components and latent variables (LV) were obtained for cross-validation of the calibration model, in function of the coefficient of determination (R 2 ), as shown in Figure 3a and in function of the sum of the squares of the prediction errors (PRESS - Figure 3b) (Burns & Ciurczak, 2007).
Parameters calculated for PLS regression (Model 1 - Table 3) for calibration and internal validation or cross-validation, were compared with moisture content given by the control primary method (thermobalance) considered as a reference method (Figure 4). The parameters for model using all samples were: PRESS = 0.55; RMSE = 0.10; Average Relative Error = 1.78%; R 2 = 0.93 and r = 0.96.

External validation of calibration model
Regarding external validation of NIR calibration model, spectra of 21 samples of powdered egg were randomly acquired in the spectrometer in triplicate (n = 3). Sample moisture average contents with respective standard deviations are shown in Table 1. A comparison among the results obtained by control primary method (thermobalance) in relation to secondary NIR/PLS method, Student's t and Tukey's tests were used as descriptive statistics at 5.0% level of significance.
The p value smaller than 0.05 (Table 1) indicates no significant difference between moisture content measured by the two methods thermobalance and NIR/PLS at 5.0% level of significance. Furthermore, external validation parameters of dataset show that PRESS = 0.22; RMSE = 0.10; r = 0.96 and mean RE ~ 1.20% are lower than the analytical error of 5.0%. Figure 5 shows moisture content by prediction model done by NIR/PLS spectral method related to control content determined by thermobalance as reference method, with 92.0% of correlation (R 2 = 0.92). Similar moisture levels for egg powder analyzed by the secondary technique NIR/PLS compared with determined by two primary methods oven and thermobalance demonstrate the potential use for industrial with some advantages. First, the NIR (0.5 min for data spectra acquisition) associated with multivariate analysis decrease substantial the analytical time, saving up 4.5 min and over 8 hours, considering the thermobalance and AOAC methods respectively, without the need of sample preparation; it is environmental friendly and does not generate residues to be disposed. Second, it shows to be feasible by lowering operating cost, preserving samples, being a useful tool to introduce real time correction directly in on-line industrial processes, avoiding rework process by moisture out of specification.

Conclusion
After properly internal cross-validation and external calibration, using an adequate spectral preprocessing method, it was concluded that NIR/PLS is suitable and efficient for determining egg powder moisture content, to be directly applied to industrial process, guarantee same precision of reference method, although in a simple, fast, sustainable and reliable way.