Simultaneous estimation as an alternative to young eucalyptus aboveground biomass modeling in ecophysiological experiments

Accurate forest biomass estimates require the selection of appropriate models of individual trees. Thus, two properties are required in tree biomass modeling: (1) additivity of biomass components and (2) estimator efficiency. This study aimed to develop a system of equations to estimate young eucalyptus aboveground biomass and guarantee additivity and estimator efficiency. Aboveground eucalyptus biomass models were calibrated using four methods: generalized least squares (GLS), weighted least squares (WLS), seemingly unrelated regression (SUR), and weighted seemingly unrelated regression (WSUR). The approaches were compared with regard to performance, additivity, and estimator efficiency. The methods did not differ with regard to the mean biomass estimation; therefore, their performance was similar. The GLS and WLS approaches did not satisfy the additivity principle, as the sum of the biomass components was not equal to total biomass. However, this was not observed with the SUR and WSUR approaches. With regard to estimator efficiency, the WSUR approach resulted in narrow confidence intervals and an efficiency gain of over 20%. The WSUR approach should be used in forest biomass modeling as it resulted in effective estimators while ensuring equation additivity, thus providing an easy and accurate alternative to estimate the initial biomass of eucalyptus stands in ecophysiological models.


Introduction
Biomass is the most important variable when evaluating carbon dynamics, carbon sequestration, and forest productivity (Fu, Zeng, & Tang, 2017). Direct biomass measurements require destructive sampling, which is both complicated and expensive. However, alternative variables to biomass, such as root collar diameter, individual tree diameter at 10 cm height, diameter at breast height, total height, commercial height, live crown length, and age are compatible with individual or stand biomass models (Picard, Santi-André, & Henry, 2012;Wang, Zhao, Liu, Yang, & Teskey, 2018). Over the years, forest science has increasingly incorporated statistical models into individual and stand biomass estimations (Picard et al., 2012;Zhao, Kane, Markewitz, Teskey, & Clutter, 2015). Most studies have applied different models approaches; however, these models have considered fitted biomass components independently and have not taken into account inherent correlations among them (Parresol, 1999).
When developing allometric models, the sampled trees must be carefully selected in an efficiency and consistent manner to obtain accurate estimates of individual tree biomass. These models should meet expectations regarding estimation performance, biological consistency, and estimator efficiency. Estimation performance refers to the quality of an estimate. Biological consistency refers to the additivity of components with regard to total biomass. Estimator efficiency refers to the consistency of selecting models that result in narrower confidence intervals (Parresol, 2001;Behling et al., 2018).
To meet the aforementioned expectations, models must be fit to estimate individual and total tree biomass components considering additive systems and restrictions (Parresol, 2001;Vonderach, Kändler, & Dormann, 2018). Zellner (1962) proposed seemingly unrelated regression (SUR), which considers simultaneous fitting and a correlated system of either linear or nonlinear equations. Parresol (2001) provided a methodology for using seemingly unrelated regression, and applications of this approach have been carried out in conifer stands in China (Dong, Zhang, & Li, 2014;Wang et al., 2018), the United States (Zhao et al., 2015), and Brazil (Sanquetta et al., 2015) as well as in eucalyptus stands in Portugal (António, Tomé, Tomé, Soares, & Fontes, 2007) and Spain (Vega-Nieva, Valero, Picos, & Jiménez, 2015). These studies have taken into account stands of middle to advanced age. Even so, considering the importance of forest production in Latin America, particularly that of eucalyptus stands, environmental management for ensure high productivity (Payn et al., 2015), ecophysiological models requires that guarantee consistency and efficiency with regard to biomass estimations, especially during the first years of forest development.
For this reason, Behling et al. (2018) indicated that the best model should consider all previously identified characteristics simultaneously. In addition, both evaluation and fitting compatible biomass equations are important for predicting stand biomass at different stand ages, scaling biomass or carbon stocks, and assessing root: shoot or biomass component ratios. For example, in eucalyptus forest management plans, nutrient budgets and the potential of forests to offset anthropogenic carbon emissions are important considerations (Nizami et al., 2017). In addition, accurate individual biomass estimates may allow us to understand the difference productivity responses of different eucalyptus genotypes to management and climate effects.
This study aimed to analyze the statistical estimates, biological consistency, and efficiency of independent and additive systems models for biomass estimations in young eucalyptus stands. In particular, this study focused on an additive system for modeling individual tree biomass in young eucalyptus stands, considering that total aboveground biomass model needed to be compatible with the sum of biomass components. We hypothesized that (i) systems models would provide better biomass estimates than independent models and (ii) biological consistency would always be related to parameter efficiency in biomass models.

Material and methods
The data of this study were collected in the experimental project 'EUCAHYDRO II: Predictive model of water use, water use efficiency and drought resistance of Eucalyptus plantations and genotypes' which aimed to evaluate physiological traits and environmental factors that drive initial growth behavior in eucalyptus stands. The project was established on October 31, 2017 in the Yumbel, Bio-Bio region in southern Central Chile (37 o 8′0.01″ S, 72 o 27′34.70′′ W).
The soils in the study site are volcanic black sands with low water-holding capacity. A warm-summer climate that is classified as Mediterranean (Csb) is present in the region. The annual rainfall (mean of 1299 mm) occurs predominantly during winter (~ 60%). The mean annual temperature is 14.7 o C , with maximum and minimum temperatures of 40 o C in summer (February) and -5 o C in winter (June).
The experiment was carried out with three repetitions and two borderlines. The plots consisted of sixteen trees planted with 1.5 × 1.0 m spacing. The data consisted of the aboveground biomass measurements of 136 sample trees divided into 34 genotypes. The selected eucalyptus genotypes were divided into three species or crosses, which are the most commonly planted species by Chilean forest companies. The species or crosses consisted of sixteen Eucalyptus globulus Labill. genotypes, three Eucalyptus nitens (H.Deane & Maiden) genotypes, and fifteen Eucalyptus nitens x Eucalyptus globulus crosses.
The soil was prepared without a subsoiler. Weed control was carried out with glyphos ate (2.5 L ha -1 ) until crown closure. Fertilization included the application of 30 g of controlled -release fertilizer (Basacote ® Plus 12M) per plant at planting. The trees were irrigated at planting and until March 2018 and from November 2018 to February 2019. Both irrigation periods took place during the dry summer months and consisted of the application of 1.6 L of water m -2 h -1 . The total amount of water applied during the spring and summer months was 664.5 mm during November 2017 to March 2018 and 157.8 mm during November 2018 to February 2019.
The aboveground biomass of individual trees was measured using a destructive method (Picard et al., 2012) when the trees reached 18.5 months in age. The trees were selected according to a distribution based on the diameter 10 cm above the groundline (d), and four trees were selected per genotype. The selected trees consisted of (1) a tree with the greatest d value, (2) a tree with the d mean value plus one standard deviation, (3) a tree with the mean d value, and (4) a tree with the mean d value minus one standard deviation. For each tree that was harvested, we separated the aboveground biomass into the stem (wood + bark), branch, and leaf components. First, the individual trees were felled and removed from the experimental site. Second, d was measured with a caliper (mm), total height (h) was determined with a measuring tape (m), and the branch diameters (bd) at 2 cm from the stem insertion point were measured with a caliper (mm).
After all branches were measured, we determined the bd range per genotype. We randomly selected 15 branches per genotype, and the branches were cut 2 cm from the stem insertion point. For each cut branch, we separated the living foliage from the wood of the branch. The foliage was dried them to a constant weight at 105 o C. The biomass of individual tree branches (w b ) and leaves (w l ) was estimated by fitting exponential models as a function of bd per genotype, as follow: After removing the branches, the stem was sectioned into three or four sections. Individual tree stem biomass was determined with a balance with a precision of 0.05 kg. A sample disc 8-cm thick was cut from each tree stem 10 cm from the groundline to determine the moisture content with an analytical balance with a precision of 0.01 g. The dry weights of the discs were measured after the samples were dried to a constant weight at 105 o C. Then, individual stem biomass was calculated based on the ratio of dry biomass to fresh biomass.
The estimated biomass component models were based on a nonlinear equation with additive error terms. These are the basic model forms that have been used in other forest stand studies (António et al., 2007). The independent variables in the models were d, dh, and d 2 h.
where: is the dry biomass component, (stem, branches, leaves, or total) and are model coefficients, are the independent variables, and is the random error of the th observation.
In the first step, each biomass component was independently fitted using a generalized least squares (GLS) approach with the MODEL procedure in SAS ® software (Statistical Analysis System [SAS], 2009).
The model coefficients were equal to zero as the initial parameters in the interaction process. When selecting the best model for each biomass component, the significance of the parameters (α = 0.05) was considered, and the following statistics were calculated: where; is the mean residuals (Kg), is the root mean square error (Kg), 2 is the fitted coefficient of determination (%), is the coefficient of variation (%), w c-est is the estimated dry biomass of component , is the number of data used in the fitting, p is the number of parameters in the model, and ̅ is the mean value of the observed dependent variable for component c.
In the second step, the models were fitted using SUR to guarantee additivity (Vega-Nieva et al., 2015). The system model was composed of stem, branch, leaf, and total biomass according to the specification that total biomass was a function of the independent variables in the equations of the biomass components. SUR was fitted considering the best models selected for the independent fitting. The initial parameters in the interaction process were equal to zero. Iterative convergence to minimize the residual sum of squares was performed with the Gauss-Newton method. The analysis was performed in SAS ® using the ITSUR option in the MODEL procedure.
In the third step, residual heteroscedasticity of the GLS was determined with graphical analyses. The estimated residues (ê i ) of the GLS model were used as the dependent variable in the error variance model, taking the natural logarithm (ln) of the square residues [(ê i ) 2 ].
Acta Scientiarum. Agronomy, v. 43, e52126, 2021 The natural logarithm of independent variables fitted by GLS was used to define a matrix, Δ, which was an element-wise square root based on the inverse of the weight matrix for observation, , in component .
The fourth step employed the same approach as the third step; however, the estimated SUR errors were used to model the error variance and weight calculations resulting in weighted seemingly unrelated regression (WSUR). After fixing the weighting functions in steps three and four, we refitted the equations in steps one and two using the MODEL procedure in SAS ® .
According to Parresol (1999Parresol ( , 2001, the β estimation is obtained by minimizing the residual sum of squares: is the partial derivative matrix, T is the number of observations, K is the number of parameters in the model, and ̂(̂) is the variance of mean estimated observation i.
The weighted least square (WLS) estimator was defined as: where: (̂) is a diagonal matrix of weights dependent on a fixed number of parameters denoted by vector . For more details, see Parresol (1999). The system matrix of weights was written in block-diagonal form: Finally, the confidence interval of the mean estimated value was defined as: where: Cl is the confidence interval, t is the value of t distribution at 95% probability. The estimate of vector β employing SUR, which minimizes the residual sum of squares, was given by a restriction of the parameter β: where: ∑ is the covariance matrix, ⊗ is the Kronecker product, I is identity matrix, ( ) is partial derivatives for the SUR system, ∑ is the estimated covariance matrix of parameter, ̂ is the variance of SUR, M is the number of equations used, and 2̂ is estimated variance from the equation j th of SUR of the i th observation (̂).
The estimated covariance matrix of the parameter and variance of weighted-nonlinear seemingly unrelated regressions were calculated as follows: where: ∑ is the covariance matrix of parameter in WSUR, ̂ is the variance of WSUR equations, ̂ is the matrix of weights.
The confidence interval of the mean estimated value at 95% probability for SUR and WSUR was defined as follows: For more details, see Parresol (2001). Biological consistency was assessed in relation to the additivity of the biomass components with regard to total biomass and was only tested for the equations fitted by independent models in steps one and three. If the condition (w leaves + w brances + w stem ) -w total ≠ 0 was not satisfied, then the biomass estimates were considered to lack biological consistency.
The efficiency of the estimators of the biomass equations fitted by all steps was assessed in relation to the precision of the calculated confidence intervals as follows (Parresol, 2001) where: is the precision of the confidence interval for observation , is the confidence interval for observation i, is the observed value for observation i, and is the estimated values for observation i. The effect of the lack of additivity and efficiency on the estimates of biomass components and total biomass at the species level were evaluated with the forest inventory data. The biomass components and total biomass were estimated considering the last forest inventory assessment and extrapolated to hectare scale. The estimates were compared using a Chi-square test, in which the null hypothesis assumes that the values estimated by each step are equal at 95% probability (Behling et al., 2018).

Results
Mean bd values ranged between 7.5 and 9.0 mm among eucalyptus genotypes. The minimum and maximum bd values were 1.06 and 29.32 mm, respectively, with fewer branches present with either small or large bd. The relationship between bd and crown biomass followed similar patterns among eucalyptus genotypes, and the largest branches presented the largest biomass values (Figure 1). The R² adj and RMSE of the fitted exponential models were 0.92 ± 0.07 and 9.45 g ± 6.58, respectively, for branch biomass and 0.88 ± 0.08 and 12.96 g ± 8.67, respectively, for leaf biomass. The summary statistics for d, h, and the biomass components of all sample trees are shown in Table 1. The coefficients of variation (CV) were higher for biomass components (i.e., 34.62, 36.36, 33.32, and 31.17 for the stem, branch, leaf, and total biomass, respectively) compared those of d (17.33) or h (10.53). The Pearson correlations between the biomass components and d or h were positive (0.68 -0.92) and significant at 1%. However, the correlation coefficients for biomass and d were higher than those for biomass and h. With regard to the relationships among biomass components, values greater than 0.90 were found, especially between stem and total biomass and branch and leaf biomass (data not shown). Biomass partitioning was similar among eucalyptus species. Individual stem biomass represented 50% of the total biomass, ranging between 31 -68%, 38 -60%, and 34 -62%, for E. globulus, E. nitens, and E. nitens × E. globulus, respectively. Individual branch biomass represented 20% of total biomass and ranging between 12 -27%, 14 -28%, and 14 -26% for E. globulus, E. nitens, and E. nitens × E. globulus, respectively. Individual leaf biomass represented 30% of the total biomass, and the largest variation among species ranged from 19 -42%, 23 -33%, and 22 -39%, for E. globulus, E. nitens, and E. nitens × E. globulus, respectively. Leaf biomass represented approximately 50% of crown biomass.
For the equations fitted by GLS, we observed that most equations underestimated individual biomass (Table 2). Branch and leaf biomass presented the lowest worst values of R² adj , the highest RMSE, and the highest CV because these variables presented the weakest correlations with d or h when compared to those of stem and total biomass.
The selection of the best model considered the statistical analysis in Table 2 and model parsimony for SUR fitting. The best model was determined by considering the variables d and h for stem biomass, d²h for branch and total biomass and d for leaf biomass.
A pattern was detected in the increase of d and residual values, indicating that the residues fitted by SUR were not distributed in the same manner between tree sizes ( Figure 2). As such, variability was present in the error model structure. To deal with this variation, the weight function was considered. The same behavior was observed when we compared GLS to WLS (data not shown). The weight functions used to correct the error variability used by WLS and WSUR are presented in Table 3 and show an improvement in the pattern of the residuals (Figure 3).  The first hypothesis outlined in this paper was not supported, which suggests that systems models provide better biomass estimates than those of independent models for young eucalyptus stands. The estimated coefficients and their standard deviations indicated that similar variation was present among the four methods used in this study (Table 4). Large differences were found with regard to the interceptions of the equations (β 0 ). Considering the performance of equations, we can observe that there were similar RMSE, R² adj , and CV values for the methods employed. However, there were differences in MRES using WSUR, which overestimated the biomass values for all biomass components and standard deviations of the parameters. It was possible to observe a decrease in standard deviation (SD) when a weight function was used for the variability parameter estimation.
The biomass estimates at the hectare level obtained by the GLS, WLS, SUR, and WSUR methods not significantly different when compared with a Chi-square test (p > 0.05). The mean values that were estimated according to the aforementioned methods were similar among biomass components, indicating that there was no difference with regard to the performance of the estimations (Figure 4). Biological consistency, according to the restriction that the sum of the biomass components must be equal to the total biomass, was not observed with the equations fitted by the GLS and WLS methods since ( + ℎ + ) − ≠ 0. The non-additivity discrepancy of the equations fitted by GLS and WLS at the tree level ranged from -1.68 -8.28% and -1.76 -7.05%. Biological inconsistency was also observed at the hectare level. The nonadditivity discrepancy of the equations ranged between -0.87 -9.98% and -0.80 -8.46% for GLS and WLS, respectively, resulting in mean differences of 0.004 t ha -1 and 0.005 t ha -1 , respectively. Regardless, these differences were not very large, the GLS and WLS equations were considered to be biological inconsistent.
Using the equations fitted by GLS, the precision of the estimates ranged from 0.56 -24.56%, 0.38 -34.08%, 0.02 -26.07%, and 0.22 -21.87% for stem, branch, leaf, and total biomass, respectively. For the equations fitted by WLS, the values for stem, branch, leaf, and total biomass ranged from 0.38 -20.72%, 0.20 -27.95%, 0.25 -17.55%, and 0.13 -17.49%, respectively. The biomass estimates of the components and total biomass using the equations fitted by WLS resulted in narrow confidence intervals and were more efficient than GLS.
The equations fitted by SUR resulted in estimate precisions ranging from 0.09 -8.13% (stems), 0.13 -29.99% (branches), 0.67 -23.67% (leaves), and 0.16 -13.51% (total biomass). As expected, SUR was more efficient when estimating total biomass; however, SUR did not perform as well when estimating branch and leaf biomass when compared to those of WLS, indicating that biological consistency did not guarantee the most efficient estimators. Therefore, the second hypothesis outlined in this paper was also not supported. Biological consistency is not always related to parameter efficiency in biomass models for young eucalyptus stands since SUR showed biological consistency; however, it does not show higher estimation efficiency for crown biomass components.
For WSUR, precision ranged from 0.02 -9.12%, 1.53 -11.43%, 0.11 -13.06%, and 0.55 -4.55% for stem, branch, leaf, and total biomass, respectively. The WUSR provided the most efficient estimators for all biomass components, resulting in a better model approaches when compared to those of GLS, WLS, and SUR. As shown in Figure 5, narrower confidence intervals were present with WSUR when compared to those of GLS, especially for branch and leaf biomass. Figure 5. Differences in confidence intervals (red lines) between branch, leaf, stem, and total biomass estimates using GLS (a, c, e, g) and The mean gain efficiency in the confidence intervals of the equations fitted by WSUR was calculated by , and the values were 33. 05, 47.34, 34.64, and 64.71% for stem, branch, leaf, and total biomass, respectively. The efficiencies using WSUR were positive and sometimes greater than 50% when compared to those of the GLS, WLS, and SUR methods ( Figure 6). Figure 6. Efficiency of the confidence intervals for the component and total biomass estimates fitted by WUSR compared to those of GLS, WLS, and SUR.

Discussion
The worst statistical evaluation of the power functions was found for the crown biomass components, while the best was observed for stem and total biomass. Similar results have also been found in other studies, such as those by António et al. (2007) with E. globulus in Portugal and Viera et al. (2017) with Eucalyptus saligna Sm. and Eucalyptus grandis Hill ex Maiden × Eucalyptus urophylla S. T. Blake crosses in Brazil. The reasons for this are two-fold. Firstly, the reason is that the crown biomass components, leaves, and branches were correlated to a lesser degree with d and h. Secondly, leaf and branch biomass were more variable and depend on age, temperature, relative humidity, soil water content, and the species or genotype, as highlighted by António et al. (2007) and Zhang et al. (2015). The relationships between crown biomass and d or h improves as forest stands age (Behling et al., 2018).
The performance of the estimators in biomass models using the GLS, WLS, SUR , and WSUR methods were similar with regard to the quality of the statistical fitting. However, some statistics obtained by the equation fitted using SUR were better than those obtained with GLS while others were not. For WSUR, all statistics obtained by the fitted models were worse when compared to those of the other approaches. This is an effect of the flexibility of the estimator to meet the requirement of the additivity of biomass components in the SUR and WSUR methods, and it was also observed by Behling et al. (2018). Apart from this, the performance of the estimators did not affect the predictions, since the estimates of the biomass components and total biomass did not differ at either tree or hectare levels among the methods according the results of the Chi-square test. In conclusion, the GLS, WLS, SUR, and WSUR methods are equivalent with regard to performance.
Modeling biomass components and total biomass with GLS has been widely used in forest science; however, this approach does not guarantee additivity and most often results in larger errors when expanded to larger forest inventory areas (Sanquetta et al., 2015). The condition of biological consistency was not satisfied for the equations fitted through GLS and WLS, and non-additivity was observed at the tree and hectare levels, resulting in positive and negative results. The biological inconsistency percentage was higher for GLS than for WLS, mainly due to the weight function for the corrected error variance structure. In addition, biological inconsistency was also higher at tree level compared to that at hectare level given that at tree level, positive and negative results do not cancel each other out, which results in greater inconsistency when compared to that at either the plot or hectare levels, as highlighted by Behling et al. (2018).
Most of the models used in biomass estimates for planted forests were not fitted considering additivity. Given that these equations did not achieve biological consistency, it is important to take precaution when using them, especially when considering the variance of total biomass, which is a function of the variances and covariances of the estimates of the other biomass components. To deal with the lack of additivity, other studies have used system equations with restrictions on model coefficients according to total biomass (Zhao et al., 2019). In this study, the SUR method guaranteed additivity and resulted in a reduction in the differences with regard to observed and estimated total biomass due to the correlations between biomass components, coefficient restrictions, and allometric relationships, which ensured biological consistency (Wang et al., 2018). However, GLS and SUR presented residual heteroscedasticity, as has been found in other studies (Zhao et al., 2015;Vonderach et al., 2018). As the d and d²h values increased, the absolute value of the residues increased for all biomass components (leaf, branch, stem) and total biomass. If this is not corrected, the variance among larger trees would be underestimated and that among smaller trees would be overestimated (Parresol, 2001).
A lack of consideration with regard to heteroscedasticity could lead to non-reliable parameter estimates of the biomass components, with increased variance and wide confidence intervals, which decreases estimator efficiency, as was found in this study. The correction of heteroscedasticity is accomplished by assigning a weight factor assigned to each tree (Zhang, Peng, Huang, & Zeng, 2016). Estimator efficiency results in a process with minimum variance, as was found for the WLS and WSUR methods, and results in narrow confidence intervals, especially for the crown biomass components. The WSUR improved the parameter efficiency and considered all the benefits of the simultaneous system of equations, although for some components, statistical evaluations were slightly worse than those for GLS and SUR. Nonetheless, the confidence intervals were better for WSUR compared to those of the other methods, showing that the variance in the biomass estimates was lower.
Estimator efficiency was not always related to biological consistency since WLS showed biological inconsistency; however, WLS presented greater efficiency estimators than that of SUR. Although the crown biomass components are difficult to model, it is necessary to use an estimator that results more accurate estimates, primarily because the leaves are raw material that ensure energy production and are important components for ecophysiological modeling (Binkley et al., 2004). Zhao et al. (2015) highlighted that accurate modeling of young tree growth is necessary to understand forest development, schedule silvicultural treatments, and identify environmental variables that could influence initial biomass production. The biological consistency ensured by the additive equations is a desirable property in modeling biomass and affects the estimations of forest biomass. In addition to biomass additivity, a correction of heteroscedastic residues is established by the weight function of GLS methods and ensures that a reliable estimate of biomass with statistical efficiency is produced. Sanquetta et al. (2015) observed that the use of SUR had positive effects on confidence intervals. This was effect associated with the efficiency of the parameters obtained by applying WSUR in the present study, which made this approach a powerful tool for eucalyptus biomass models.
The precision of the confidence intervals of the equations used with the WSUR approach generated estimators that are highly efficient, constituting an important alternative to forest ecophysiological models that estimate biomass. These approaches can be easily implemented in SAS using the MODEL procedure, or in other programs, despite the mathematical complexity associated with WSUR with regard to the estimated biomass values. Future studies should focus on the implementation of these approaches with eucalyptus forest stands of different ages and in different to evaluated the validity of the ecophysiology models, considering the relationships among dendrometric variables, biomass, and environmental management strategies.

Conclusion
The fitted procedures of this study do not differ with regard to the performance evaluation since the system of equations do not improve the predictions of the biomass components. The estimated total biomass from the GLS and WLS methods are not equal to the sum of estimated biomass components, resulting in biological inconsistencies. The models in WLS and WSUR produces present smaller variance and narrow confidence intervals, resulting in a more efficient estimators when compared to GLS and SUR. It is recommended that the WSUR approach be used with independent equations in forest ecophysiology models, for accurate biomass estimates.