Nonlinear regression models for estimating linseed growth, with proposals for data collection

. Nonlinear regression models represent an alternative way to describe plant growth. In this study, we aimed to model the growth of linseed using four methods for data collection (longitudinal, mean, random, and cross-sectional) and fitting the logistic and Von Bertalanffy nonlinear regression models. The data came from experiments conducted between 2014 and 2020 in the municipality of Curitibanos, Santa Catarina, Brazil. The study had a randomized block design, with experimental units consisting of six lines, 5.0 m long and 3.0 m wide, containing the varieties and cultivars of linseed with four replicates. We performed weekly assessments of the number of secondary stems and plant height and measured total dry mass fortnightly. After tabulation, the data were analyzed using the four methods, and the logistic and Von Bertalanffy models were fitted. The logistic model for the plant height variable exhibited the best performance using the longitudinal, mean, and cross-sectional methods. It was an alternative approach that reduced the time and labor required to conduct the experiment.


Introduction
Linseed (Linum usitatissimum L.) is important in the economic, nutritional, and social context but has received little scientific attention in regions with productive potential in South America.Thus, studies aligned with basic science, such as those assessing the growth and development patterns of linseed plants, have been limited and not performed recently; the research must be renewed.Growth can be described using nonlinear models.However, for the culture under study, analyses were conducted using only linear regression models for density characteristics (Tomassoni et al., 2013;Rossi et al., 2014).
Growth curve adjustments are usually performed by adopting the longitudinal method of variable observation and measurement.In this method, measurements of the same individuals are obtained over time, which improves the data's accuracy and, consequently, the results obtained.However, this methodology requires a large amount of time for analysis since many samples are evaluated concurrently.There are also cross-sectional and random methods in which only one individual is evaluated over time or different individuals are evaluated at a single time, respectively.Cross-sectional and random methodologies were used in the fields of health, to evaluate child growth curves, and animal husbandry, to study Mangalarga Marchador horses (Souza et al., 2017;2019).
The use of cross-sectional and random data collection methods has not been explored for the study of plant growth.Despite the previously highlighted benefits, the use of these methods may lead to a reduction in the quality of the experiment and loss of self-correction due sampling of a smaller number of individuals.Thus, these methods should be used only due to limitations of time, financial resources, and labor and/or when experimental plots are lost.In addition, the sampled plant must be representative of the other plants in the experimental unit (Steel, Torrie, & Dickey, 1997).
Knowledge of the growth variables of agricultural crops is fundamental since it generates information that assists in planning, management, adaptability, product quality, and final productivity (Stanck, Becker, & Bosco, 2017).The growth of linseed, like that of other agricultural species, is characterized by a slow initial growth followed by an acceleration until reaching a maximum point, later tending to stabilize, thus exhibiting a sigmoidal growth response (Mischan & Pinho, 2014).Thus, the adjustment of nonlinear regression models can be essential for describing plant growth, given that they are parsimonious and include parameters with biological and practical interpretation (Sousa et al., 2014;Archontoulis & Miguez, 2015) Estimates of the parameters and critical points of the function also allow us to explain the similarity between the methods of data collection in both models and to make inference about the species under study (Carini et al., 2020;Silva et al., 2021).
Several nonlinear regression models best describe the growth curves of agricultural crops: Brody, Gompertz, logistic, Richards, and Von Bertallanfy (Archontoulis & Miguez, 2015).Among these, the logistic model is widely used to represent the growth of living organisms due to the ease in adjusting the parameters and interpreting their estimates (Seber & Wild, 1989); the Von Bertalanffy model (Von Bertalanffy, 1957) has been used to describe animal growth and, more recently, in studies of plants (Lúcio et al., 2015a).
Based on the hypothesis that both nonlinear regression models can estimate the growth of linseed and that the cross-sectional and random data collection methods lead to responses similar to those observed with the longitudinal method, in this study, we aimed to model the growth of linseed using four methods of data collection (longitudinal, mean, random, and cross-sectional), making adjustments using the logistic and Von Bertalanffy nonlinear regression models.This modeling approach has the potential to assist in the interpretation of data generated with current linseed genotypes from coherently fitted methods.

Material and methods
The data come from experiments with linseed conducted in seven agricultural years, from 2014 to 2020, at the Federal University of Santa Catarina (27°16'25" S, 50°30'12" W, and 993 m altitude) in the municipality of Curitibanos, state of Santa Catarina, Brazil.The region's climate type is humid subtropical Cfb, with rainfall well distributed throughout the year, and subtropical from the thermal perspective, with an average annual rainfall of approximately 1,480 mm and an average maximum and minimum temperature of 22.0 and 12.4°C, respectively (Wrege, Steinmetz, Reisser, & Almeida, 2012).The soil is classified as Clayey Humic Cambisol, according to Embrapa (2013), originating from basalt, with a very clayey texture.
The study had a random block experimental design, with treatments varying over the years, composed of local varieties (brown and golden) and Argentine cultivars of brown color (Aguará INTA and Caburé INTA) with four replicates (Table 1).The experimental units consisted of six lines, 5.0 m long and 3.0 m wide, considering an 8 m² central floor area.Seeding was performed manually, with a spacing of 2 cm between plants and 50 cm between lines (in 2014), and a spacing of 2 cm between plants and 35 cm between lines was later adopted.The management of linseed was conducted according to guidelines for agroecological cultivation of plants.The meteorological data were obtained from INMET's automatic meteorological station located at Curitibanos airport, 5 km away from the experimental area.The daily thermal sum data were calculated according to Equation 1: where STd 1 is the daily thermal sum, Tmed is the average daily temperature calculated by the arithmetic mean between the maximum and minimum temperatures, and Tb is the base cardinal temperature.The base temperature adopted was Tb= 5.3°C (Bert, 2013) from emergence (Equation 2): where n is the duration in days of the development phase.
The growth of the linseed plants was determined using five plants per replicate marked with colored wire in the useful area, totaling 20 plants per variety or cultivar, and evaluated weekly by counting the number of secondary stems and measuring plant height (cm) using a ruler.Total dry mass (g) was determined by collecting three plants per replicate, 12 plants per treatment, fortnightly and randomly, in the useful area, excluding the marked plants, and subsequently drying them in an air oven at 65°C until a constant weight was reached.
The data obtained were organized to allow for the occurrence of four data collection methods: longitudinal, mean, random, and cross-sectional.For the longitudinal method, values of 20 plants of each treatment were used, marked, and evaluated throughout the experiment.For the other methods, three other possibilities for data collection were simulated.The mean method referred to the mean of each treatment derived from the longitudinal method.The random method simulated randomness in each evaluation, using only one plant selected at random in each evaluation.For the transversal method, a single plant was evaluated, which was marked and evaluated throughout the experiment (at each measurement time).
The collected variables were fitted to nonlinear logistic models for each treatment (Seber & Wild 1989) and Von Bertalanffy (Von Bertalanffy, 1957) (Equations 3 and 4): where Y is the measured variable;  time (STa, after emergence); 1 is the horizontal asymptote, 2 reflects the distance between the initial value (observation) and the asymptote; 3 is associated with the growth rate; and  is the experimental error.The adopted allometric coefficient for the Von Bertalanffy model, which is directly linked to the development standard, was 3/4.Parameter estimates were obtained by the least squares method using the Gauss-Newton iterative method (Bard, 1974).This procedure was performed using the nls function in R software.After choosing and adjusting the model, the assumptions were tested by applying the Shapiro-Wilk, Breusch-Pagan, and Durbin-Watson tests to verify the residual normality, homoscedasticity, and independence, respectively.The lmtest and car functions in R software were used to test the homoscedasticity of variances and residue independence, respectively.However, due to the violation of the assumptions, bootstrap resampling estimation was implemented using the nlsboot function from the nlstools package in R software.
Five adjustment quality evaluators were used: the Akaike information criterion (AIC) (Akaike, 1974), Bayesian information criterion (BIC) (Schwarz, 1978), fitted coefficient of determination (R² aj ), fitted standard error (ASE), and residual standard deviation (RSD).The closest linear approximation of the model was obtained with values below 0.3 for intrinsic nonlinearity (c I ) and below 1.0 for parametric nonlinearity (c θ ) (Fernandes, Muniz, Pereira, Muniz, & Muianga, 2015).The statistical significance of c I and c θ comparing their values with/2√F were evaluated, where F is the critical value.We used the rms.curv function in the MASS package of R software to perform this test.

Results and discussion
When adjusting the logistic and Von Bertalanffy nonlinear regression models, considering the four data collection methods, the same response pattern was observed for the variables plant height, number of secondary stems, and dry mass, regardless of the planned treatments (cultivars) and growing years (2014)(2015)(2016)(2017)(2018)(2019)(2020).Thus, only a part of the results will be presented since the interpretations must be carried out similarly for the other variables.
The data collection methods resulted in different numbers of samples.Therefore, som e results were directly influenced, such as the indication adjustment quality AIC, BIC, and R 2 aj criteria (Akaike, 1974;Schwarz, 1978).The longitudinal and cross-sectional methods evaluated over time were found to be dependent on previous evaluations.Thus, there was a violation of the assumptions of heteroscedasticity and residue dependence (Table 2).Normality was not met in 75% of the methods when working with the logistic model and in 50% with the Von Bertalanffy model.Regardless of the adjusted mode l, the longitudinal and random methods exhibited nonnormality, and the cross-sectional method met this assumption.The reliability of the tests was affected due to the assumptions of the model not being met for most of the evaluated situations, regardless of the model and method used since they induce estimates with low accuracy and the degree of adjustment of the model is lower (Muniz, Nascimento, & Fernandes, 2017).Data transformation did not efficiently overcome the problems related to noncompliance with assumptions; therefore, the confidence intervals of the parameters were estimated by bootstrap resampling by the empirical methodology (Diel et al., 2019).Therefore, the reliability of the statistical model and the practicability of its use are only valid when the assumptions are met (Souza et al., 2017).
The quality of the evaluator adjustments should be considered when choosing the most appropriate model (Muianga, Muniz, Nascimento, Fernandes, & Savian, 2016;Sari et al., 2018;Sari et al., 2019a).In general, the values of R2aj were high, close to 1, indicating that the data provided good adjustment quality for both nonlinear models (Table 3).Higher values of R2aj and lower values of RSE, RSD, AIC, and BIC indicate better adjustments (Morais, Ribeiro, Veloso, & Veloso, 2020).These results were similar for both the logistic and Von Bertalanffy nonlinear regression models.
The low variability in the adjustment quality indices (Table 3) may cause doubt when choosing the model that best represents the effect and response obtained.This result was also observed by Lúcio, Nunes, and Rego (2016) when estimating the production of pod beans.Using the same models as the present study, the authors obtained similar adjustment quality.Sari et al. (2019a) concluded that these criteria alone cannot assess parameter bias and may select incorrect models to describe biological growth.The model will represent plant growth when it is close to linear, given by intrinsic (c I ) and parametric (c θ ) nonlinearity, obtained with lower values below 0.3 and 1.0, respectively (Fernandes et al., 2015).Sari, Lúcio, Santana, and Savian (2019b) described the importance of nonlinearity measures to evaluate model adjustment quality to describe tomato growth.Different adjustments were implemented based on the nonlinearity measures.The percentages of acceptance are presented in Table 4.The logistic model exhibited the highest percentages of adjustment for the variables plant height, number of secondary stems, and total dry mass in the different treatments, indicating that the model has a good linear approximation and that its parameters are reliable.
The logistic nonlinear regression model showed adequate adjustment for the variable plant height in all data collection methods, years, varieties and cultivars, with some exceptions only for the random and crosssectional methods (Table 4).The cross-sectional method obtained similar results to the longitudinal and mean methods; that is, the use of the same plant evaluated throughout the experiment can be an alternative since it presents parameter estimates and similar adjustment quality indices.
The variable total dry mass for the Caburé INTA and golden varieties presented adjustments for the longitudinal and mean methods, while the Aguará INTA cultivar presented adjustment in all methods, and the brown variety did not obtain adjustment (Table 4).For the variable number of secondary stems, there were good adjustments for the longitudinal and mean methods in the cultivars Aguará INTA (50%) and Caburé INTA (50%) and golden variety (100%), while the brown variety showed 100% adjustment in the four data collection methods.This divergence between the adjusted methods for each variety is associated with differences in data variability.For a model to be fitted, it is essential that the mathematical expression faithfully represents the reality of the dataset.In the brown variety, variability between observations is high, a feature that makes adjusting models difficult, especially for conditions that enhance variability in the collection of observations, which is the case with the random method.
When applied to the variable plant height, the Von Bertalanffy nonlinear regression model obtained variable adjustments between the data collection methods and fitted 100% in the longitudinal method for the cultivars Aguará INTA and Caburé INTA and golden variety and in the mean method for the cultivar Aguará INTA (Table 4).There was no adjustment in any of the data collection methods for the total dry mass, and for the number of secondary stems, there was adjustment only in the cultivar Caburé INTA in the mean collection method (33.33%).Thus, this model is not suitable for describing the growth of oilseed flax since it indicates that the results of the parameters have no approximation to the linear one.These results were also obtained by Diel et al. (2019), who used the Von Bertalanffy model and obtained high nonlinearity values, indicating low efficiency and accuracy in the description of strawberry production data.There was a difference between the longitudinal and mean methods and the cross-sectional methods in the logistic nonlinear regression model in the Aguará INTA cultivar for plant height (Figure 1) since the confidence intervals did not cross for parameter 1 and critical points IP, MDP, and ADP.In addition, the random method did not present reliability due to greater variability in estimates and greater confidence intervals.However, the cross-sectional method can be applied to model the growth curves of the variables plant height and total dry mass of linseed using the nonlinear logistic model.
Acta Scientiarum.Agronomy, v. 46, e65771, 2024 In general, the confidence intervals of the logistic model were tighter than those of Von Bertalanffy in all conditions evaluated (exemplified in Figures 1 and 2).Thus, the Von Bertalanffy model was inadequate for describing the growth of the variables plant height, number of secondary stems, and total dry mass, corroborating studies already conducted with other crops with multiple harvests (Sari et al., 2018;2019a;2019b).
Intervals of confidence were constructed to compare data collection methods using parameter estimates and critical points obtained by bootstrap resampling, which were used to verify the equivalence of random and cross-sectional methods with longitudinal ones.Figures 1 and 2 show these intervals for only one condition, representing the other results obtained in the different measured variables, years of experimentation, varieties, and cultivars evaluated.
The results of this study indicated that the height description curves of linseed plants were very similar for the logistic and Von Bertalanffy nonlinear regression models (Figure 3).Therefore, the quality of adjustment of the models should be considered when using or recommending a model that represents the real response of the variables over time.Thus, the logistic nonlinear regression model had high descriptive capacity for the evaluated variables and works in several cultures, such as Italian zucchini, peppers, cherry tomatoes (Lúcio et al., 2015a;Lúcio, Nunes, & Rego, 2015b), strawberry (Diel et al., 2019;2020b), tomato (Sari et al., 2019a;2019b), and biquinho pepper (Diel et al., 2020a).There was an increase in asymptotic growth in both models in the estimates of the parameter β 1 .However, the logistic model presented values closer to reality (Figure 3).Low values were found regarding the growth rate estimates (β 3 ), especially when fitted to the Von Bertalanffy model due to the model reaching the inflection point slightly earlier than the logistic model.Thus, the estimates of the parameter β 2 were always lower in this model.
The longitudinal data collection method is characterized by a large sample size, which tends to lead to high experimental accuracy.However, it requires more time and resources for variable measurements.This method is considered a standard.Any method that is equivalent to this would be an advantage in terms of resource savings, precision and quality of adjustment, without losing the ability to interpret and discuss practical estimates of the parameters of the models and their critical points.According to (Fernandes et al., 2015), these benefits are the same for describing the growth of crossbred rabbits and can be recommended when rapid measurement is needed.
The logistic nonlinear regression model presented the highest values of R 2 aj and lower values of RSA, RSD, AIC, and BIC, and intrinsic and parametric nonlinearity since the closer the nonlinear model is to the linear model, the greater the accuracy for the longitudinal, mean, and cross-sectional data collection methods.

Conclusion
For oilseed flax, the nonlinear logistic regression model showed better fit-quality indices compared to Von Bertalanffy for the variables plant height, number of secondary stems, and total dry mass.The quality of fit of the models by the cross-sectional data collection method was similar to that of the longitudinal method, thus standing out as an applicable alternative for tests with loss of experimental units and reduced availability of manpower, space, time, and/or financial resources.

Figure 3 .
Figure 3. Growth curves for the variable plant height (cm) for linseed in the cultivar Aguará INTA, logistic nonlinear regression Model (A) and its critical points (C), Von Bertalanffy nonlinear regression Model (B), and its critical points (D), 2016 (April 26 th , 2016).

Table 2 .
P value for the tests of normality (Shapiro-Wilk -SW), heteroscedasticity (Breusch Pagan -BP), and error independence (Durbin Watson -DW) for the nonlinear logistic and Von Bertalanffy models for the four data collection methods.

Table 3 .
Adjustment quality indices: fitted coefficient of determination (R 2 aj), random standard error (RSE), adjustment standard deviation (ASD), Akaike information criterion (AIC), and Bayesian information criterion (BIC) for the nonlinear logistic and Von Bertalanffy models for the four data collection methods for linseed plant height of cultivarAguará INTA, 2016 (April 26 th , 2016).

Table 4 .
Percentage of adjustments for intrinsic (c I ) and parametric (c θ ) nonlinearity in the nonlinear logistic and Von Bertalanffy models in the cultivars Aguará INTA and Caburé INTA and golden and brown varieties for the four data collection methods concerning the variables plant height, number of secondary stems, and total dry mass for linseed.