Nonlinear models applied to seed germination of Rhipsalis cereuscula

The objective of this analysis was to fit germination data of Rhipsalis cereuscula Haw seeds to the Weibull model with three parameters using Frequentist and Bayesian methods. Five parameterizations were compared using the Bayesian analysis to fit a prior distribution. The parameter estimates from the Frequentist method were similar to the Bayesian responses considering the following non-informative a priori distribution for the parameter vectors: gamma (103, 103) in the model M1, normal (0, 10 ) in the model M2, uniform (0, Lsup) in the model M3, exp (μ) in the model M4 and Lnormal (μ, 10) in the model M5. However, to achieve the convergence in the models M4 and M5, we applied the μ from the estimates of the Frequentist approach. The best models fitted by the Bayesian method were the M1 and M3. The adequacy of these models was based on the advantages over the Frequentist method such as the reduced computational efforts and the possibility of comparison.


Introduction
Nonlinear regression is a statistical technique in which a nonlinear mathematical model describes the relationship of response variables to predictor variables.In general, a nonlinear model is y = η(t, β) + e where η(t, β) is a function with at least one nonlinear parameter, β is a vector of p unknown parameters, t is the predictor variable and e is a random error with normal distribution, zero mean and variance σ² (e ~ N(0, σ²)).
In seed germination studies, the function η(t, β) represents the number or the proportion of germinated seeds obeying a growth curve.This component is deterministic and the usual equations representing the model are mathematical equations such as the asymptotic exponential, logistic, Gompertz and Weibull.Although such equations are empirical representations of the biological mechanism, they permit the biological interpretation of the parameters.Very often, otherwise, the Frequentist approach has been applied to model such type of data, and this application has biased the responses, which results in ineffectiveness of the estimates and therefore inappropriate conclusions.
Ratkowsky (1983,1990) stood out that the nonlinear parameters do not require the same properties as the linear models.However, close to linear models have similar properties asymptotically, and therefore can be parameterized to respond as a linear one.The dissemination of nonlinear models has induced some authors to investigate tests to evaluate the degree of nonlinearity of a model.O'Brien (2008) reported some tests to detect spurious nonlinearity, and Peña and Rodriguez (2005) reported a method to verify both the presence of nonlinearity and the power of the nonlinear regression.
In terms of cumulative numbers of germinating seeds during the chronological time, the responses can be represented by asymptotic or sigmoid curves.Currently, the parameters have been usually estimated using Frequentist methods as the mean squares to conclude about the responses, but without reporting nonlinear and goodness of fit tests.
Time for radicle protrusion can be used as a random variable when the objective is to report the germination curve during a period of evaluation.Thus, the distribution of the time for seed germination can be reported by some probability distributions while the curve along the time represents the cumulative shape of the probability distribution.Hunter et al. (1984) reported the normal distribution to represent the curve of seed germination.Recently, seed technologists have noticed about the asymmetry in the time for seed germination after proposing other probability distribution.O'Neill et al. (2004) investigated the germination of Perennial ryegrass seeds and suggested the inverse normal as an alternative model to the Lognormal, Log logistic, and Weibull distributions.The parameter estimation as suggested by Hunter et al. (1984), Brain and Butler (1988) applying the maximum likelihood method, and the deviation from the inverse normal distribution was lower and therefore the best response.Soliman et al. (2006) also estimated parameters using the Frequentist and Bayesian methods from the Weibull model to investigate the time for failure of industrial machines, and reported more accurate estimates with the Bayesian than with the maximum likelihood method.
The majority of authors consider the error, e, a continuous random variable with normal distribution, zero mean and variance σ², e ~ N(0, σ²).In such cases and when the errors do not follow the normal distribution, the Bayesian methods have been applied to model nonlinear equations.De la Cruz-Mesia and Marshall (2003) applied a procedure for nonlinear errors following a continuous autoregressive process.They argued about the advantage of the process because of the application of additional information based on previous experiments, which is suitable for small samples that is not based on the asymptotic theory.
The Bayesian method has also been suggested to estimate growth curves to describe replicate measurements where every individual has the measures replicated over time.Thus, Blasco et al. (2003) studied the Bayesian analysis from the selection effect on rabbit growth curves, and Martins Filho et al. ( 2008) fitted a logistic model to growth data of two cultivars of common beans.We sought to investigate the Frequentist and the Bayesian approaches to fit data over the time necessary for seed germination using the Weibull distribution with three parameters.

Material and methods
Models for the responses of seed germination One seed is considered germinated just after radicle protrusion indicating the presence of a normal seedling capable of developing a normal plant under field conditions.Time for radicle protrusion t follows the Weibull distribution (WEIBULL, 1951): in which f denotes the density probability function, F is the cumulative distribution function of the random variable T and θ = (b, c) is the vector of parameters.
Modeling the percentage of seed germination over time t, Carneiro et al. (2000) and Carneiro (1994) suggested the following Weibull curve with three parameters: where: M is the third parameter.The parameters of nonlinear models represent quantitative experimental responses and permit direct interpretation.For example, in the following model: where: M is the maximum of seed germination (BROWN, 1987;BROWN;MAYER, 1988a and b;CARNEIRO, 1994CARNEIRO, , 1996;;CARNEIRO;GUEDES, 1995), b is time to 63.21% of M and c is the spread over the time t (CARNEIRO, 1994(CARNEIRO, , 1996;;CARNEIRO;GUEDES, 1995).

Frequentist approach
In this context, the nonlinear regression model for the total of seed germination for the time t is ( ) ( , ) The errors e i are non-correlated and y i |θ, σ e ² ~ N(F(t i ,θ), σ e ²), whose maximum likelihood function is: The application of the frequentist analysis to nonlinear models requires prior knowledge of the model and data to suggest initial values of parameters to obtain the estimates using the iterative process.Computer routines use these initial values to determine in the parametric space the values to maximize the logarithm of the likelihood function or minimize the sum of squared errors.
The value of the parameter vector that minimizes the residual sum of squares has sampling distribution close to normal with the covariance matrix σ e ² (W'W) -1 , where W is the matrix n × p from the first derivatives of F (t i , θ) in relation to  .The concern is on the right choice of the numerical algorithm to obtain the estimates.In this context, the R software has the Gauss-Newton routines, and the SAS Institute Inc. ( 2008) has several possibilities in the proc nlin as the Gauss, Marquardt, Newton, Gradient, and DUD, which is the default method that uses numerical estimates of the derivatives.
The statistical properties of nonlinear models, the responses to the estimation process and the quality of asymptotic inferences for finite samples are all due to the model curvature.The origins of measuring the linearization level using the curvature of a nonlinear function was introduced by Bates and Watts (1980,1988), discussed by Ratkowsky (1983Ratkowsky ( , 1990) and other authors.This curvature detains two components: the intrinsic curvature (IN) and the parameter curvature (PE).The parameter curvature indicates the nonlinearity due to the parameterization of the model.The intrinsic curvature measures the change in the nonlinear model whether the parameter values are somewhat modified (BATES; WATTS, 1980WATTS, , 1988;;RATKOWSKY, 1990).The lower the curvatures, the better is the validity of the asymptotic inference.
The lower the curvature, the better is the validity of the asymptotic inference.Bates andWatts (1980, 1988) suggested using a significant level α, the limit of: to test both curvatures, where F is the quantiles of F distribution of Snedecor with p and n-p degrees of freedom, p is the number of parameters and n is the sample size.Another important measurement to diagnose the nonlinearity is the bias of Box, which helps to identify the parameter responsible for the excess of curvature.Ratkowsky (1983) suggested the limit of 1% of relative bias, or the absolute value of the bias quotient to the parameter estimate.These estimates can be achieved by the algorithm from proc iml in the SAS Institute Inc. ( 2008) (SOUZA, 1998).
The parameters were estimated by the proc nlin in SAS Institute Inc. ( 2008), and the presence of normal errors was checked by the proc univariate.The quality of the estimates was verified by designing a software in the proc iml following Souza (1998) recommendation.

Bayesian approach
In Bayesian inference, the researcher can combine prior information, which is called as a prior distribution.These types of information are obtained from previous studies carried out with the same experiment or from sampling data.Otherwise, they can be vague, but in both cases a probability density function must be expressed for every parameter in the model.Usually, this information is expressed by the likelihood function, which means the pool of the density function from the observations conditioned in the parameters.Based on the Bayes theorem, the a prior function is combined with the sampling information by multiplying the a priori density function versus the likelihood, and the product is a function on the parametric space.Based on the a prior choice for the parameters θ = ([M, b, c)] of the Weibull model [2], we suppose that the a prior density function is a product from two density functions, π(θ, σ²) = π(θ)π(σ²), θ  R³, σ² > 0.
De la Cruz-Mesia and Marshall (2003) suggested for nonlinear models the following a prior distributions in the parameter vector and random error: where: N 3 denotes a normal tridimensional distribution, and IG is the gamma inverse distribution.Although the specification of the hyper parameter μ 0 , Σ 0 , a 1 and a 2 can be difficult, non-informative a prior distributions can be used as values for these hyper parameters.In the current proposition we will consider five different a prior non-informative distributions for the vector of parameters θ = (M, b, c): where:  will be estimated by the sampling mean and L sup will be 10 for M, 100 for b, 1,000 for c.The best model will be compared by the DIC values for every parameter (Deviance Information criterion (SPIEGELHALTER et al., 2002).Thus the models considered were: Model M 1 -Assuming non-informative gamma distribution as a prior for all parameters M, b, c ~ gamma (10³, 10³).
Model M 3 -Assuming non-informative uniform distributions a prior for all the parameters: M, b, c ~ uniform (0, L sup ), L sup =100 for M, 1,000 for b, and 10 for c.
Model M 4 -Assuming non-informative exponential distributions as a prior for all the parameters: M, b, c ~ exp (μ), with μ mean estimated by the frequentist method.
Model M₅-Assuming non-informative lognormal distributions as a prior for all the parameters: M, b, c ~ lognormal (μ, 10 6 ), with μ mean estimated by the frequentist method.
Supposing that the Weibull model can describe the data, 200,000 values will be generated for each chain, with a burning period of 1,000.The final sampling will be composed of values selected with jumps of 20, which means a sample size of 10,000.The chain convergence will be verified by the CODA Software (BEST et al., 1995), andHeidelberger andWelch (1983) criteria.The a posteriori marginal distribution for all the parameters will be obtained by the BRugs software (SPIEGELHALTER et al., 1994) available in the R software.

Application
The comparison of methods was illustrated from seed germination of Rhipsalis cereuscula Haw (Cactaceae) growing attached to the trees in the Ingá Yard Conservation Reserve at Maringá town, Paraná State, Brazil.The seeds were manually collected from various fruit, manually extracted, and dried in the shade under environmental light and temperature in the Seed Laboratory of the Universidade Estadual de Maringá, Experimental Research Farm, at Iguatemi County, Paraná State, Brazil.Dried seeds were stored in open plastic containers.One hundred seeds were germinated on three germitest papers using plastic box measuring (11 x 11 x 5 cm) in the seed germinator Mangelsdorf protected by a germination room, both maintained at 20°C.The data were collected at 8h intervals, and every seed with the protrusion of the hypocotyl-radicle was counted as germinated.
The model was fit to the number of germinated seeds, and the analyses were based on the frequentist and Bayesian approaches conceiving the nonlinear model of Weibull with three parameters for describing the seed germination curve.The logarithm function of the likelihood is described in [3] and [4].

Frequentist analysis
The hypothesis of normal errors was verified using the tests of Kolmogorov-Smirnov, Cramervon Mises and Anderson-Darling whose p-values were higher than 10%.
Evaluating the results after fitting the Weibull model, the maximum of germinated seeds was about 31, in which 63.21% germinated in 429h with the spread of 4.64 (Table 1).The diagnostic of the fitting quality was based on curvature measures, bias of Box, and relative bias (Table 1) (RATKOWSKY, 1983(RATKOWSKY, , 1990)).Considering the relative bias of Box (Table 1), the parameter estimates of M and c were higher than 1%.Therefore, they are the most nonlinear parameters in this model.The limit for the curvatures are 0.8226 with p = 3 and n = 92 at 5% probability.Therefore, IN and PE were lower than 0.8226 validating the process of asymptotic inference.

Bayesian analysis
The following are estimates: mean, standard error and Icr (95%), i.e., the interval with 95% reliability respectively for parameters of the Weibull model for the germination of Rhipsalis cereuscula Haw for every model.
Regarding the responses in Table 2, the maximum of seed germination was about 31 and 63.21% of the seed protrusion required 429h.Therefore, the estimates from the Frequentist are similar to the Bayesian method whether the vector of parameter has non-informative a prior distributions.To fit the models M 4 and M 5 we made use of μ estimated by the frequentist method, which may be a disadvantage of both models to the M 1 and M 3 .In the same Table 2, the model M 2 has the higher value of DIC indicating the worst fitting quality among all the five models investigated.
The frequentist method required the initial parameter values in the proc nlin of SAS Institute Inc. ( 2008), and further analysis to verify the goodness of fit of the model.In contrast, the Bayesian method required only a prior distribution, which can be non-informative for the vector of parameter.Considering the responses, except the model with a prior normal distribution, the other estimates are similar, but using less computational efforts.Another advantage of the Bayesian is the possibility of modeling several a prior distributions using the DIC by comparison.
The result of the modeling using the Bayesian approach corroborates the findings of De la Cruz-Mesia and Marshall (2003), who suggest that this is a worthwhile procedure, as well as adding prior information based on the experience of the researcher is also suitable for small samples, because it is not based on the asymptotic theory.We observed also that the results from this seed germination experiment agree with the findings of Soliman et al. (2006) when they compared the parameters of the Weibull model estimated by Bayesian and frequentist methods.Bayes estimates obtained from this model, as concluded Soliman et al. (2006), have more accuracy than the corresponding estimated by maximum likelihood method.

Conclusion
Modeling germinating seeds over time to radicle protrusion can be done using the frequentist and Bayesian approaches because both provide close estimates, but the Bayesian inference required less computational efforts.Considering the responses from these five models, the less appropriate was the normal model with zero mean and variance 10 6 .

Table 1 .
Frequentist estimates of Weibull parameters, and qualitative measurements of goodness of fit of data describing the seed germination of Rhipsalis cereuscula Haw at 20 °C.

Table 2 .
Bayesian estimates of parameters from the Weibull model to describe the seed germination of Rhipsalis cereuscula Haw at 20 0 C.