Indicators for evaluation of model performance: irrigation hydraulics applications

ABSTRACT. Several mathematical models have been developed for applications in the hydraulics of irrigation systems and several performance indicators of these models are used and suggested by the literature. Thus, the objective of this work was to investigate the performance of statistical indicators for the evaluation of models in irrigation hydraulics. For this, three case studies which represent typical irrigation hydraulics modeling were used to assess the indicators. A set of indicators were analyzed: a) difference-based: mean absolute error, mean square error, root mean square error, scaled root mean square error, and percent mean absolute error; b) efficiency-based: Nash-Sutcliffe and Legates-McCabe; c) correlation coefficient (r); d) coefficient of determination (R2); e) index of agreement index (d); f) Camargo and Sentelhas index (c); and g) graphical methods: regression error characteristic curve based on relative absolute error and 1:1 scatter plot. For the evaluated cases, which are physical phenomena, differentiable indicators are similar measures and it is appropriate to report either or both indices. The assessment of models must also be supported by graphical analysis, which shows the real scenario of errors in the model evaluation processes. Efficiency-based indicators, r, R2, c, and d are not recommended and should be avoided in modeling of irrigation hydraulics.


Introduction
In irrigation engineering, several mathematical models have been developed to assist in the sizing and decision support of hydraulic design of irrigation systems.The evaluation or assessment of models' performance is an important step when developing mathematical models.The evaluation of a model aims to quantify the deviation between observed and predicted values within the validation limits of the model.A model is suitable when the accuracy of its predictions complies with the application requirements.
Model calibration and validation are fundamental processes for establishing the credibility of models and simulations (Chatterjee & Simonoff, 2013).Quantitative and graphical methods are useful for the correct parameterization and validation of models.
Several statistical indicators and methods have been suggested to assess models' performance (Nash & Sutcliffe, 1970;Fox, 1981;Willmott, 1981;Ali & Abustan, 2014).Among the statistical indicators, Fox (1981) recommended that the difference-based measures mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) should be calculated and reported.Ali and Abustan (2014) also proposed a new difference-based indicator which can be used to evaluate model performance, the percent mean relative absolute error (PMRAE).Willmott (1981) demonstrated that Pearson's correlation coefficient (r) and determination coefficient (R 2 ) can be misleading and proposed an index of agreement (d).Regarding efficiency-based indicators, the Nash-Sutcliffe (NS E ) and Legates and McCabe (1999) (LM E ) indices are widely used for evaluation of model performance that investigate hydraulic irrigation problems, such as orifice discharge (Zhang, Chai, Li, Xu, & Li, 2019), friction losses in polyethylene pipes (Provenzano, Alagna, Autovino, Juarez, & Rallo, 2016), perforation geometry of drainage pipes (Gaj & Madramootoo, 2020) and channel stability hydraulics (Thompson, Hathaway, & Schwartz, 2018).Moriasi et al. (2007) pointed out that a model can be assessed as suitable based on one statistic but may present poor performance when evaluated according to another statistic.Furthermore, Alexandrov et al. (2011) emphasized the need for standardized evaluation tools for specific fields.For example, Bellochi, Acuit, Fila, and Donatelli (2002) suggested the use of Pearson's correlation coefficient, the relative root mean square error, efficiency-based indicators, and -student probability for solar radiation modeling.
Modeling has been widely used in irrigation hydraulics.However, the literature does not specify the most appropriate statistical indicators to evaluate models capable of representing the physical phenomena in this area.In this paper, statistical indicators were investigated to evaluate the performance of models in irrigation hydraulics, as well as to verify their limitations to help make decisions about the accuracy of the models.

Indicators for model performance evaluation
-1 to 1 -1: perfectly linearly related with a negative slope; 0: no linear dependence; +1: perfectly linearly related with a positive slope.
Coefficient of determination (R²) ] 0 to 1 The higher, the better 0 to 1 The higher, the better Camargo and Sentelhas index (c) c = d r 0 to 1 The higher, the better The first set of indicators shown in Table 1 comprises difference-based indicators, which measure the deviation between observed and predicted values in a data set: MAE, MSE, RMSE, SRMSE, and PMARE.The values of all these indicators range from 0 to +∞, and the lower the values, the better.The smallest value corresponds to the hypothetical situation of no deviation between the predicted and observed data.All these indicators compute absolute or squared deviations between observed and predicted data and do not consider the deviation signal.
The relative absolute error (δ) is applied to analyze deviations between pairs of observed and predicted values.This indicator is useful to draw the regression error characteristic (REC) curve, which is a useful graphical tool to quantify prediction errors associated with their cumulative frequency of occurrence (Sobenko, Bombardelli, Camargo, Frizzone, & Duarte, 2020).δ can be expressed in decimal or percentage units.For any cumulative frequency, the smaller the value of δ, the better.In addition, a scatter plot with a straight line (1:1) and pairs of observed and predicted values is also useful to identify data dispersion, bias, and outliers in the evaluated dataset.
Table 1 also has efficiency-based indicators (NS E and LM E ), which measure how well a model fits the observed values.Efficiency indicators have a numerator that represents the deviation between observed and predicted values and a denominator that represents the variation of observed values from the average of observed values.The numerator refers to the variation not explained by the model, while the denominator expresses the total variability in the observed data.If the predictions of a linear model are unbiased, then the results of NS E will lie in the interval from 0 to 1, but it may provide negative values for biased models.For nonlinear models, negative values can be obtained even when the model is unbiased (McCuen, Knight, & Cutter, 2006).
The correlation coefficient (r) is a dimensionless measure of the linear dependence between two data sets.If the two variables are perfectly linearly related, r is 1 (positive slope) or -1 (negative slope).If no linear relationship between the two variables exists, then r is zero.
The coefficient of determination (R 2 ) represents the amount of variability in the data explained by the regression model.R 2 values near unity do not necessarily imply that the regression model will provide accurate predictions of future observations (Montgomery & Runger, 2013).In general, R² increases when more variables are added to a model, but this does not necessarily imply that increasing the number of variables improves the model performance.
The index of agreement (d) is a measure of the degree to which a model's predictions are error free and it ranges from 0 to 1. Values near to 1 indicate better agreement between the observed and estimated variables.
The c index is the product of d and r (Camargo & Sentelhas, 1997).Pimenta et al. (2018) proposed a criterion of interpretation and classification of d, r, and c, which will be used in this study.

Data for comparison of indicators
Datasets from three typical problems of irrigation hydraulics were used as case studies to assess the indicators.Methodologies and particularities from each case study are fully described in Pimenta et al. (2018), Katsurayama et al. (2020), andCano et al. (2021).Pimenta et al. (2018) used the Colebrook and White (1937) equation (Equation 1) to obtained reference values of the friction factor (f) for pressurized conduits and compared these reference values with values predicted by the equations of Swamee and Jain (1976 -Equation 2) and Shaikh, Massan, and Wagan (2015 -Equation 3) for turbulent flow conditions (4000 ≤ R e ≤ 10 8 ). (1) where f is the coefficient of head loss of the Darcy-Weisbach formulation (dimensionless), Ɛ/D is the relative roughness of the pipe (m), and Re is the Reynolds number (dimensionless).
For the second case study, Katsurayama et al. (2020) modeled flow characteristics in microtube emitters using experimental data and dimensional analysis.They proposed the model shown in Equation 4 and compared the results with a theoretical model proposed by Souza and Botrel (2004) (Equation 5).
where H is the pressure head (m), ρ is the water density (kg m -3 ), μ is the water dynamic viscosity (Pa s), L m is the microtube length (m), Q m is the microtube flow rate (m 3 s -1 ), D in is the microtube internal diameter (m), υ is the water kinematic viscosity (m 2 s -1 ), g is the gravitational acceleration (m s -2 ), and α K and β K are the empirical coefficients which represent the minor loss coefficients as a function of Re (dimensionless).
The last case evaluated was based on the study carried out by Cano et al. (2021) in the modeling of corner taps' orifice plates to determine the flow rate in pipes.Based on experimental data, the authors adjusted an empirical equation for orifice plates with an internal diameter of 150 mm (Equation 6).The results were compared with the theoretical equation with the discharge coefficient (C d -Equation 7) obtained by the Reader-Harris/Gallagher equation for corner taps' orifice plates (ISO 5167-2, 2003) (Equation 8).
Q op = 37.903Δh 0.606  (6) C d = 0.5961 + 0.0261 β 2 -0.216β 8 + 0.000521 ( 10 6 β Re ) 0.7 (0.0188 + 0.0063 A) β 3.5 ( where Q op is the flow through the orifice plate (m 3 s -1 ), Δh is the differential pressure head on the orifice plate (m), C d is the orifice plate discharge coefficient (dimensionless), d is the orifice plate internal diameter (m), β is the ratio between the orifice plate and pipe diameters (dimensionless), and A is the coefficient depending on the Reynolds number (dimensionless).
In this way, datasets of 480, 615, and 2,000 records from Pimenta et al. (2018), Katsurayama et al. (2020), and Cano et al. (2021), respectively, were used to test the statistical indicators described above.Both indicators were calculated using an electronic spreadsheet following the equations of the respective indices.

Results and discussion
Table 2 shows the values obtained from the indicators for the predictions of the coefficient of head loss (f -case study 1), microtube length (L m -case study 2), and flow rate through the orifice plate (Q op -case study 3).The data points of each case study are, respectively, graphically illustrated in Figures 1, 2, and 3 along with 1:1 lines and REC curves.

Case study 1 -Coefficient of head loss (𝐟)
For f predictions, Table 2 shows the following: (a) according to the difference-based indicators (MAE, MSE, RMSE, SRMSE, and PMARE), the equation of Swamee and Jain (1976) presented better performance in their predictions, with values closer to zero; (b) the efficiency indicators (NS E and LM E ) also showed better predictive performance of the same equation as the difference-based, with values closer to unity; (c) composite indicators (d, r, and c) classified both f predictions as "excellent", according to the criterion proposed by Pimenta et al. (2018); (d) through the determination coefficient (R 2 ), it is possible to observe strong correlations between the data of each equation (i.e.values higher than 0.95).
This can be explained by the scatter plot, also called "1:1", shown in Figure 1a, which illustrates the relationship between observed and predicted values.They also allow for the interpretation of the prediction fitting as over-or underestimates of the observed values.It can be observed that Swamee and Jain's equation overestimated the standard values by only 1.0 ± 1% on average, while the equation of Shaikh et al. (2015) underestimated 67.5% of the observed f data, with an average of 21.2 ± 22% of the observed values.The graph shown in Figure 1b illustrates the relative error (δ) associated with its frequency of occurrence (i.e.regression error characteristic curve).This type of graph can be interpreted in several ways and can present us with some very significant information regarding predictions (Sobenko et al., 2020).Taking as an example the predictions made by the equation of Shaikh et al. (2015), it can be seen that 95% of his predictions had a relative error of up to 36.7% (δ 95% in Table 1).

Case study 2 -Modeling of microtube emitters
In the estimation of L m , the equation proposed by Katsurayama et al. (2020) performed better according to the difference-based and efficiency-based indicators.The composite indicators classified both models evaluated in this case study as "excellent", and the R 2 also showed strong correlations between the predicted and observed data (R 2 > 0.95) (Table 1).However, the equation proposed by Souza and Botrel (2004) underestimated 65.4% of the observed values, with underestimations ranging from 0.3 to 341%, and δ 95% of 38.4% (Figure 2).Also, from the regression error characteristic curve presented in Figure 2b, it can be observed that 99.2% of the predictions made by the equations of Katsurayama et al. (2020) had a relative error of up to 10%.

Case study 3 -Orifice plates
In this case, the composite indicators (d, r, and c) and R 2 suggested that the evaluated equations had the same performance in Q op predictions (Table 1).The 1:1 graph showed that the methodology of ISO 5167-2 (2003) overestimates 47.5% of the data by 8.0 ± 0.2% and underestimates 52.5% of the data by 3.1 ± 0.1% in relation to the values observed experimentally (Figure 3a).Even more precisely, 95.2 and 73.0% of the predictions made by the equations of Cano et al. (2021) andISO 5167-2 (2003), respectively, had a relative error of up to 5% (Figure 3b).Thus, this graph shows us how far those points outside the 1:1 line in Figure 3a can be accepted.In essence, MAE presented a large magnitude in the evaluations because it describes the true mean of the deviations but can vary with different data patterns/sets, and MSE is similar to the MAE but more sensitive to large errors, as it squares individual differences (Hallak & Pereira Filho, 2011;Ali & Abustan, 2014).According to Willmott (1981), MAE and RMSE are similar measures which provide estimates of the average error, but neither measure provides information about the relative size of the average difference or the nature (type) of the differences comprising MAE or RMSE.Also, the authors pointed out that MSE and RMSE are generally amenable to more in-depth mathematical or statistical analyses than MAE.Moreover, Ali and Abustan (2014) proposed the PMARE indicator, pointing out that it is capable of directly indicating the accuracy or the pitfalls of the prediction in any field of observation, regardless of the units and ranges of values.
The Nash-Sutcliffe index is based on the squares of differences, while the Legates-McCabe equation is based on the absolute values of differences.From the equations of NS E and LM E , it can be observed that they are more dependent on the observation range (O i and O ̅ ) than the difference between the observed and predicted values, being more sensitive to the observed range/fluctuation (Willmott, Robseon, & Matsuura, 2011).Thus, in irrigation hydraulics studies, which involve physical phenomena that often do not show dispersion in the observed values, the use of these indicators for model calibration, validation, or testing is not recommended.
The c index offers precision and accuracy from index d and coefficient r, respectively.From Table 1, it can be observed that all predictions made by the evaluated models showed "excellent" classifications for composite indicators and a strong correlation with the observed values, that is, high accuracy (R 2 > 0.95).Furthermore, the R 2 does not indicate whether a model provides an adequate fit to the observed data, because it just evaluates the scatter of the data points around the fitted regression line.Due to the ambiguity of these indicators, they are also not recommended for use individually in irrigation hydraulics problems (i.e. in physical processes) and should be avoided.These indicators, as well as efficiency indicators, are widely used in other areas of irrigation, such as irrigation management, which involves studies of evapotranspiration, hydrological modeling, and soil water storage (Shiri & Kisi, 2011;Bachou, Walker, Ticlavilca, & McKee, 2014;Hatiye, Prasad, & Ojha, 2018).
While the indicators give us quantitative measures, graphical methods give the overall and real scenarios, and can be regarded as exploratory tools as well.Indicator sets such as those associated with 1:1 and/or regression error curve analysis graphs have been used to calibrate, validate, and test several empirical, semiempirical, or deterministic models of different situations in the area of irrigation engineering: dimensional analysis and artificial neural networks (ANN) approaches applied to estimate minor losses due to start connectors in micro-irrigation laterals were assessed with RMSE, MAE, 1:1 graphs, and regression error characteristic curves (Sobenko et al., 2020); linear modeling and ANN techniques used to estimate losses by wind drift and evaporation in sprinkler systems were evaluated by MAE, MSE,RMSE, r, R 2 and 1:1 graphs (Al-Ghobari, El-Marazky, Dewidar, & Mattar, 2018;Sarwar, Peters, & Mohamed, 2019); RMSE, r, R 2 and 1:1 graphs were used to evaluate predictions of head loss in micro-irrigation sand filters by ANN techniques (García Nieto et al., 2017); R 2 and 1:1 graphs were used to assess an empirical local losses estimation model for lay-flat drip laterals (Elbana, Ramírez de Cartagena, & Puig-Bargués, 2013;Provenzano, Di Dio, & Leone, 2014); RMSE, NS E , and 1:1 graphs were used to evaluate and examine appropriate equations for continuous head loss calculation in real field operating center-pivot systems (Alazba, Mattar, ElNesr, & Amin, 2012); δ, RMSE, d, r, c, and 1:1 graphs were used to update or analyze the performance of equations that estimate the coefficient of head loss (Oke, Ojo, & Adeosun, 2015;Najafzadeh, Shiri, Sadeghi, & Ghaemi, 2018;Pimenta et al., 2018).
In essence, for irrigation hydraulics problems, difference-based indicators are similar measures and it is appropriate, in many cases, to report either or both indices.The diagnosis of a model's performance must be supported by the quantitative measures and graphical analysis.For this, the statistical indicators should be consistent in their results, just as was reported in this study.Otherwise, the particular indicator is not suitable for model comparison and should be avoided as a model performance measure.

Conclusion
In the process of evaluation of mathematical models in irrigation hydraulics the difference-based indicators assessed are similar measures and can be used individually or as a set.Graphical analyses are essential to identify the magnitude of the error and perform more accurate assessments of the models.Efficiency-based indicators, the correlation coefficient, the coefficient of determination, the index of agreement, and the Camargo and Sentelhas index are not recommended for consideration individually and must be supported by graphical analysis and difference-based indicators.

Figure 1 .
Figure 1.Comparison between equations in predicting the coefficient of head loss (f): (a) standard versus estimated values of f using the models of Swamee and Jain (1976) and Shaikh et al. (2015) and (b) regression error characteristic curve presenting relative errors (δ) versus frequency of error occurrence.

Figure 2 .
Figure 2. Comparison between equations in predicting the microtube length (L m ): (a) observed versus estimated values of L m using the models of Katsurayama et al. (2020) andSouza and Botrel (2004); and (b) regression error characteristic curve presenting relative errors (δ) versus frequency of error occurrence.

Figure 3 .
Figure 3.Comparison between equations in predicting the flow rate through the orifice plate (Q op ): (a) observed versus estimated values of Q op using the models of Cano et al. (2021) and ISO 5167-2 (2003); and (b) regression error characteristic curve presenting relative errors (δ) versus frequency of error occurrence.
Table 1 lists the indicators evaluated, as well as their corresponding formulas, range of output values, and basic interpretation.The equations use the following notation: observed values (O i ), predicted values (P i ), number of observations (n), average of observed values (O ̅ ), and average of predicted values (P ̅ ).

Table 1 .
Indicators for model performance evaluation.

Table 2 .
Indicators to assess the performance of equations in case studies 1, 2, and 3.