Multi-product multivariate calibration : determination of quality parameters in soybean industrialized juices

Total acidity and vitamin C were determined by using ultraviolet spectroscopy and multiproduct multivariate calibration alternately to the reference methods, the potentiometry and Tillman's, respectively. In the developed multi-products models, different products were included (industrialized juices based on soya of different flavors and light). The linear partial least squares (PLS) method was used in the model construction and the outlier samples were evaluated. The accuracy at the 99% level, represented by the root mean square error of calibration (RMSEC) and prediction (RMSEP), was confirmed through the confidence ellipse, whereas the residuals presented random behavior, which indicates that the data fit a linear model. Sensitivity and analytical sensitivity presented adequate results in the determination of vitamin C and acidity, considering the concentration range used 0.83-16.83 mg 100 mL-1 for vitamin C and 0.17-0.34 g 100 mL-1 for total acidity. The inverse of the analytical sensitivity shows that it is possible to distinguish samples with difference in vitamin C concentration of the order of 0.73 mg 100 mL-1, and samples with difference in total acidity of the order of 6.1 x 10-3 g 100 mL1.The multiproduct PLS model present limits of detection and quantification for vitamin C of 2.43 and 7.36 mg 100 mL-1, respectively. For total acidity, the limits of detection and quantification achieved were 0.02 and 0.06 mg 100 mL-1, respectively. The values for residual prediction deviation (RPD) presented results within the range of values, which classify the models as satisfactory. In addition, the multi-product calibration is fast, because it does not require reagents/solvents and does not generate toxic waste, being an alternative to the conventional methods and being in agreement with the requirements of green chemistry.


Introduction
Partial Least Squares (PLS) is a linear multivariate regression method developed in the 1960s by H. Wold for the economics area.It was only in the early 1980's that his son, S. Wold, together with H. Martens, started applications in the chemistry field (Sanchez, 2017).Currently, the multivariate calibration from the PLS method is consolidated for first-order data, i.e. when a vector of instrumental responses is available for each sample.
The PLS regression is considered to have the least mathematical disadvantage compared to other multivariate regression methods such as Classical Least Squares (CLS), Multiple Linear Regression (MLR) or the Principal Components Regression (PCR).For instance: 1) For CLS application it is necessary to know the concentration of all species that contribute to the instrumental signal, which is most of the time impossible when working with complex matrices like food.2) MLR contours the problem described by the CLS, however, for this regression method, it is necessary to have the number of samples larger than the number of variables.This is something difficult to access when working with spectroscopy, where many variables are considered in the development of the multivariate model.3) PCR is a regression method that contours the problem presented by CLS and MLR.However, with PCR, no information about the reference method is employed in the dimensionality reduction of the instrumental matrix (Ferreira, Antunes, Melgo, & Volpe, 1999;Ferreira, 2015).
Multi-product multivariate calibration had its first scientific report in 1992 (Naes & Isaksson, 1992), and the second one in 1994 (Wang, Isaksson, & Kowalski, 1994).These works reported the possibility of developing multivariate calibration and included, in the same model different types of products.These studies evaluated a set of different products that presented homogeneous responses.Furthermore, the main goal of these researches was to evaluate new algorithms to develop multivariate regression.A data set that did not present homogeneous responses was evaluated in a research performed in the year 2000 (Berzaghi, Shenk, & Westerhaus, 2000).However, in this latter work, the main objective was to evaluate the performance of the algorithm named Local.Micklander, Kjeldahl, Egebo, and N∅rgaard (2006) introduced the term multi-product calibration to the scientific world in the year 2006.The authors investigated the use of the PLS regression method, nonlinear regression using neural networks, and three variations of the Local algorithm in the development of multiproduct multivariate calibration models.The PLS method presented larger prediction to errors, which could maybe be justified by an inconsistent sampling representativeness, or by the absence of outliers when evaluating the developed model.
The successful use of the PLS regression method in the development of multi-product multivariate calibration models is recent (Rambo, Amorim, & Ferreira, 2013;Santos, Março, & Valderrama, 2013;Santos, Lima, Março, & Valderrama, 2015;2016).The use of the PLS method by the industrial sector has been growing and gaining more and more space.In this sense, the possibility of using this method of multivariate regression applied to different products becomes an interesting alternative in terms of time and practicality.
Multivariate models maintenance can be laborious.Thus, the multi-product multivariate model has the advantages of saving time, robustness and practicality, considering the terms of keeping its maintenance.Moreover, another disadvantage is a large number of steps that the quality control analyst needs to perform with a single model.For example, in each analysis performed in a laboratory routine, a specific model is used for a small population of samples (a single product), and each sample should then be carefully identified, as well as the correct and specific model, in order for that product be properly chosen (Santos et al., 2013).
Acidity and vitamin C are quality parameters, responsible for aroma, flavor, sensory and nutritional characteristics, as well as for the state of conservation of food (Venâncio & Martins, 2012).These parameters are used by the juice industry in the quality control of the final product.
Industrialized juices have been gaining consumer preference because of their practicality.In this sense, fruit nectar -which is defined as an unfermented drink ready for consumption, that is obtained from the edible part of the fruit diluted in potable water, and that may or may not be added with sugars, acids (Santos et al., 2015) or soy.The soy juices preserve the desirable sensory characteristics of fruits, along with the functional properties of soybeans, such as the presence of bioactive compounds such as isoflavones.The isoflavones have beneficial effects to human health, such as: estrogenic, antiestrogenic activity (especially on the symptoms of the climacteric syndrome and osteoporosis), hypocholesteremic and anticarcinogenic activities (Lui, Aguiar, Alencar, Scamparini, & Park, 2003;Torrezan et al., 2004;Abreu, Pinheiro, Maia, Carvalho, & Sousa, 2007).
Although these quality parameters have already been evaluated from multivariate multi-product calibration for fruit nectar, soybean industrialized juices present very different physicochemical aspects (opacity for example), which justifies an investigation into the determination of these parameters for this type of food sample.Therefore, the objective of this study was to propose the development of multivariate calibration models based on ultraviolet (UV) spectroscopy for the determination of the total acidity and vitamin C in soybean industrialized juices of different flavors, also including the light type.

Samples and reagents
One hundred and twenty-six samples were acquired in the Campo Mourão -PR marketplaces: pineapple (21 samples), grape (18 samples), orange (9 samples), peach and apples (15 samples for each flavor), strawberry, passion fruit, light apple, light grape and light peach (6 samples for each flavor), tangerine, pomegranate, mango, papaya, lemon and light orange (3 samples for each flavor).

Methods
The total acidity (mg 100 mL -1 ) was determined, in triplicate, according to the Federation International des Producteurs de Jus de Fruit ( 2005) and to the methodology described by Santos et al. (2015).
The vitamin C (mg 100 mL -1 ) was determined, in triplicate, according to the Association of Official Analytical Chemists (AOCS) and to the Tilmman's Method (Latimer, 1990).Santos et al. (2016) are the responsible for describing this methodology in detail.The samples were previously diluted (300 μL sample: 10 mL distilled water) and UV spectra (200 -350 nm, steps of 1nm, Ocean Optics, model USB-650-UV-VIS) were obtained by using a 1mm quartz cuvette.
The multi-product multivariate calibration was performed by using the Matlab R2007b and PLS-Toolbox 5.2 (Eigenvector Research Inc.).The regression method used in the development of multi-product multivariate calibration was PLS.In the PLS, the X matrix contains the instrumental responses (UV spectra in this case) and the y vector contains the results for acidity and vitamin C (which is obtained by the reference methods).These ones are decomposed into two matrix products, a score matrix, and a loadings matrix.A least squares regression was obtained from the scores and loadings from X matrix against the scores from y vector.More detailed information on the PLS regression method, including a mathematical step-by-step, can be obtained in Ferreira (2015).
The outliers were evaluated according to ASTM E-1655-05 (American Society for Testing and Materials [ASTM], 2005) during the model development.Outliers were identified based on leverage, unmodeled residuals in spectra and unmodeled residuals on the dependent variable (residual in y).
Multi-product models were validated by calculating the parameters of merit: accuracy, Residual Prediction Deviation (RPD), sensitivity, inverse of analytical sensitivity (analytical sensitivity -1 ), limits of detection and quantification, according to the equations shown in Table 1 (Valderrama, Braga, & Poppi, 2009;Santos et al., 2016).

Results and discussion
Multi-product models were developed based on PLS regression method.For this, the UV spectra of soybean juice samples were organized into a matrix.Figure 1 shows the spectra in the UV region for all analyzed samples.It was verified the need to apply the first derivative preprocessing to the spectra.This occurred probably due to the opaque color of the soybean juice samples, even after its dilution.
Table 1.Equations for the parameters of merit.

Parameters of merit Equation Accuracy
Limit of detection =3.3

Limit of quantification =10
nv is the number of samples in the validation set, yi is the reference value for the samples and ŷ is the value predicted by the model for the sample i, nc is the number of samples in the calibration set, nVL is the number of latene variables, DPca l is the standard deviation of the reference values in the calibration set, DPval is the standard deviation of the reference values in the validation set, RMSECV is the Root Mean Square Error for Cross Validation, RMSEC is the Root Mean Square Error for Calibration, RMSEP is the Root Mean Square Error for Prediction, b is the regression coefficient vector obtained from the model, x is the instrumental noise estimation.On the RMSEC equation, the '+1' is added when the pre-processing is the mean center.The calibration and validation data sets were composed by 94 and 32 samples, respectively, selected by the kenston algorithm (Kennard & Stone, 1969).The next step in the model development was the outlier detection, in order to improve the model's quality.The outliers were identified based on data with extreme leverage, unmodeled residuals in spectral data and unmodeled residuals in the response obtained by the reference method.This procedure resulted in 80 and 76 calibration samples and, in 21 and 23 validation samples for models in the determination of total acidity and vitamin C, respectively.A detailed description of the samples identified as outliers, as well as the acidity and vitamin C values obtained through the reference methods can be seen in Appendix 1 and 2.
Models were developed with mean center pre-processing and 10 latent variables (LVs), which were determined through the Root Mean Square Error for Cross-Validation in the contiguous block of nine samples.The accuracy of the models was evaluated by the Root Mean Square Error of calibration (RMSEC) and the Prediction (RMSEP), as shown in Table 2.
RMSEC and RMSEP values are close and suggest that the number of LVs was properly chosen, i.e. it did not present overfit or even lack of fit.RMSEC values decrease with the increase in the LVs number.This occurs due to errors in spectra and concentrations included in the model adjust.In contrast, RMSECV and RMSEP occasionally increase when more LVs are included in the model.However, new samples that were not present in the calibration step will have a different behavior of random errors.Therefore, the calibration model does not 'fit' these errors to the same degree as the errors in samples employed in the calibration.In practice, obtaining the same values for these parameters is not easy and it is better than the RMSEC presents values slightly higher than the RMSEP, which suggests that this model is suitable for the random errors present in the samples that were not part of step calibration (Santos et al., 2013).
The RMSEC and RMSEP are global parameters and they incorporate random and bias errors.Therefore, it is interesting to evaluate these results along with other accuracy indicators, such as the fit of the reference values against the predicted ones (correlation coefficient -Table 1).Also, the elliptical joint confidence regions (Valderrama et al., 2009) shown in Figure 2. It is observed that the ellipse contains the ideal point (1.0) for slope and intercept, respectively, which shows that the reference values and the PLS model are not significantly differenced at the 99% confidence level.It is possible to conclude also that the values for these parameters determined by titration (potentiometric or oxidation-reduction), and the values of total acidity and vitamin C determined by the multi-product PLS model do not present significant difference with 99% confidence.
Correlation coefficient to the fit of the multiproduct model, presented by plotting the reference values against the estimated values, was 0.7188 for vitamin C and 0.7435 for total acidity.These values were considered satisfactory since previous research reported coefficient values around 0.7, when the reference method was the titration method (Valderrama, Braga, & Poppi, 2007a;2007b;Ferreira, Pallone, & Poppi, 2013;Santos et al., 2015;2016).The results presented in Table 2 and Figure 2 show that the multi-product PLS model results in more 'dispersed' results for vitamin C, which may be justified by the fact that the vitamin C is oxidized quickly, when in the presence of oxygen.In addition, the titration method shows a color turning point that may be difficult to identify, especially in colored and cloudy samples, such as soybean juices.
Figure 3 shows the residuals plot of the calibration and validation samples.The residuals distribution seems to present a random behavior, which reinforces that the data fit a linear model.
RPD value of the calibration model for vitamin C showed close value to what is considered satisfactory and may be considered adequate in relation to the RPD value for the validation of this parameter.In the model to determine acidity, the RPD can be considered satisfactory for calibration and validation.According to the literature (Botelho, Mendes, & Sena, 2013), multivariate models are considered good models when they show values for RPD above 2.4.Models with RPD values between 2.4 and 1.5 are also satisfactory.
The sensitivity and analytical sensitivity showed satisfactory results, taking into account the analytical range of the models, 0.83-16.83mg 100 mL -1 for vitamin C and 0.17-0.34mg 100 mL -1 for total acidity.The analytical sensitivity -1 allows one to establish a minimum concentration difference that is discernible by the multi-product model.Thus, it is possible to distinguish samples with vitamin C concentration in the order of 0.73 mg 100 mL -1 and samples with total acidity in the order of 6.1 x 10 -3 mg 100 mL -1 .
Detection limit shows the lowest concentration of vitamin C and total acidity that can be detected but not necessarily accurately quantified.On the other hand, the limit of quantification shows the lowest concentration of vitamin C and total acidity that can be quantified with accuracy.In the multi-product model for vitamin C determination, the results indicate that the proposed multi-product model cannot accurately detect and quantify samples with vitamin C concentration below 2.43 and 7.36 mg 100 mL -1 , respectively.

Conclusion
The results show that there is a favorable possibility of using a PLS model in the evaluation of the total acidity and vitamin C in different products (soybean juices with different flavors, and the light type of juice) simultaneously.Therefore, UV spectroscopy coupled with the PLS regression method allows the construction of multi-product calibration models.In addition, the multi-product models allow rapid quantification of the total acidity and vitamin C content and does not require the use of reagents/solvents.Thus, it does not generate toxic residues, which is an alternative to the conventional methods based on titration and being in accordance with the requirements of the green chemistry.However, we point out that the methodology could be improved (perhaps evaluating other spectral pre-processing types or different sample of dilutions) in order to obtain lower prediction errors.

Figure 1 .
Figure 1.UV spectra of soybean juice samples.(A) Raw spectra.(B) Spectra after the first derivative.

Figure 2 .
Figure 2. Elliptical joint confidence regions at 99% for the slope and intercept of the regression of predicted concentrations versus reference experimental values using ordinary least squares.(A) Total acidity.(B) Vitamin C. (•) Point where the intercept is zero and the slope is one.

Table 2 .
Multi-product model's parameters of merit.
Appendix 1. Outliers identification and total acid values obtained through reference method.