Stevia rebaudiana (Bert) Bertoni: regression models with mixed effects for investigating seed germination data

We investigated regression models with mixed effects using generalized linear statistics to evaluate germination data from Stevia rebaudiana (Bert) Bertoni. Estimates and validation of statistical parameters were conducted using the “gamlss” package in the R software. Generalized linear mixed effects followed the binomial, the beta-binomial and the multinomial distribution with the logit link to explain data based on the following explanatory variables: seed germinator, plastic tray position on every tier of shelves, illuminance conditions (light and darkness) and seed lots. We did not find differences in proportional responses from seed germinators, but we did find differences in the illuminance conditions, plastic tray position on the tiers of shelves in the seed germinators and seed lots. The estimates of the generalized Akaike information criterion (GAIC), Akaike information criterion (AIC), global deviance (GD) and Bayesian information criterion of Schwarz (BIC) indicate similar goodness-of-fit for the binomial and beta-binomial models. All of the models indicate that the position of the germination tray on every tier of shelves and illuminance conditions affected the proportions of normal seedlings. The seed germination in the plastic tray on the uppermost position under fluorescent day light lamps had an effect on the proportion of normal seedlings of Stevia.


Introduction
Stevia rebaudiana (Bert) Bertoni is the scientific name of the herbaceous Paraguayan plant Ka'a he'ẽ, from which steviol-glycosides have been extracted by sweetener industrial companies for amending food and beverages (Lemus-Mondaca, Vega-Gálvez, Zura-Bravo, & Ah-Hen, 2012). Despite the possibilities of producing bedding plants from rooting stem cuttings, this agricultural methodology is fragile because certified plants lose their meristematic region where vegetative buds differentiate stems during the crop establishment, and these bedding plants are usually infected by usual growing medium-borne diseases, thereby eliminating plants that die soon after the first crop period and growing season. Thus, seeds are still the best way to produce bedding plants to establish sustainable Stevia rebaudiana crops under field conditions.
The first step to improve our understanding about germination responses is to analyse data from controlled conditions, where the seed germinator is the first step in agricultural production lines for Stevia rebaudiana crops. In seed germinators, we can control light conditions, temperature and humidity, which are important for designing cost-effective systems of crop production. In contrast, seed lots are the foremost external factor influencing bedding plant production, and the literature about Stevia rebaudiana still requires in-depth understanding about this random variable. The percentage of seed germination of Stevia rebaudiana has usually been investigated on germitest paper in acrylic germination box following the dimensions 11 x 11 x 5 cm (Induslab, Brasil). However, seed-borne diseases (Verzignassi, Vida, & Homechin, 1997) usually contaminate the germination box environment, and this fact has made it difficult for seed analysts to discriminate the components of the test under the influence of many treatments. Recent results from our laboratory have indicated the use of plastic trays with 100 cells of 4 cm 3 volume each filled with hydrophilic cotton fibres as an alternative to the conventional method (ISTA, 2013), but the illuminance is reduced in the lowest tier of shelves because every shelf supports exactly four trays. Experiments with many treatments reduce the capability of every seed germinator, because the lower shelves cannot afford the same illuminance to trays as in the germination box method.
Seeds under germination develop botanical structures essential for the establishment of plant populations (Carneiro, 2007). The percentage of normal seedlings is usually affected by light, temperature, water and oxygen because of intrinsic factors in the seed embryos. However, environmental heterogeneity rules in glasshouses, growth chambers (Potvin, 1993) and in seed germinators despite the thermal control in the germination rooms under tropical conditions. Thus, the control of factors during seed germination and bedding plant production has been the foremost drawback faced by agronomists, seed technologists and plant growers to establish cost-effective crops of Stevia rebaudiana.
Otherwise, the literature discriminating treatments based on the germination of Stevia rebaudiana seeds suggests descriptive methods using multivariate tests (Abdullateef & Osman, 2011), the analysis of variance (Abdullateef, Osman, & Zainuddin; or the analysis of variance with the multiple Duncan test (Simlat et al., 2016;Özyğit, Uçar, & Turgut 2015;Uçar, Özyğit, & Turgut, 2016). However, all these analyses in seed technology have investigated seed lots as fixed effects, thereby making it difficult to make inferences about the estimates. Alternatively, the current statistical literature suggests regression models with generalized linear mixed models based on data, where the response variable is qualitative and the explanatory variables have fixed and random effects. Mead, Curnow, and Hasted (1993) investigated models and made inferences about proportions using the binomial distribution. These authors also suggested the use of the logit link function, where data set analysts assume similar germination probabilities from every seed in the lot. In 1972, Nelder and Wedderburn reported that methods of regression and analysis of variance could be applied to response variables with data distribution in the exponential family. This assumption allowed higher flexibility in studying functional relationships of the mean with its linear predictor (Nelder & Baker, 1972). As the generalized linear models were initially limited to fixed-effects models (Mead et al., 1993), more flexible and less restrictive models led to the method reported by Rigby and Stasinopoulos (2005), when a new class of statistical models -the generalized additive models for position, scale and shape (GAMLSS)allowed fitting an extensive family of distributions for response variables without having to be part of the exponential family. Furthermore, this method allowed not only fixed explanatory variables in its regression structure but also the random effects with flexibility and efficacy in the statistical analysis. Currently, fitting these GAMLSS models is possible with the package "gamlss" in the Software R (R Core Team, 2016).
The aim of the current work is to model hierarchical data from the components of the seed germination test of Stevia rebaudiana using the binomial, beta-binomial and multinomial distributions to discriminate effects from fixed and random explanatory variables. Here, the illuminance conditions and the position of plastic germination trays on every tier of shelves were fixed because they can be replicated; otherwise, seed germinator can randomly be chosen in germination rooms, and seed lots have random effects because it is impossible to replicate them in space, time, or both.

The data set
We harvested two seed lots under a field plot ( Figure 1a Next, the quality from these seeds were recorded in the Seed Science and Technology Laboratory, where we randomly chose four seed germinators in the germination room at constant temperature of 20±1°C. We protected two of the germinators with black plastic film (darkness), and the other two were illuminated with fluorescent day light lamps 1992;Carneiro, 1996). In this experiment, we evaluated 9,600 Stevia rebaudiana seeds randomly distributed on plastic trays with 100 cells per replication. Every cell was filled with hydrophilic cotton fibres as germination medium. Every cell had the volume of 4 cm 3 and held only one mature fruit-like seed wet with 2 mL distilled and deionized water applied onto the germination medium. In the Iguatemi Research Farm, this germination medium became important for Stevia rebaudiana and other small seeds because the hydrophilic cotton fibres avoid spreading seed-borne diseases, and thus, the analyst can safely discriminate the components of the germination test. We also decided to investigate fruit-like seeds to avoid mechanical seed damage during the elimination of the pappus, thereby reducing other sources of the experimental error. Every seed germinator had 24 plastic trays with 8 units on the two uppermost, 8 on the two middle, and 8 on the two lowest tiers of shelves (hereafter Pos1, Pos2 and Pos3). In every tier, we randomly replicated the position of the plastic trays. The experimental design can be seen in Figure 2. Ten days later, we evaluated the components of the seed germination test by counting records of normal seedlings, abnormal seedlings, and the non-germinated seeds (ISTA, 2013) based on trait images from the Figure 3. The length of the normal seedlings in Figure 3a is discrepant because the time to first and last radicle protrusion ranges from the fourth to the tenth day, respectively. These responses are usual in newly domesticated species.

Generalized linear mixed models
GLMMs are statistics models describing data based on explanatory variables with fixed and random effects. The response variables are associated with statistical distributions linking the events to their respective probabilities. In the current investigation, the aim is to understand the effects of explanatory variables on the proportion of the components from the germination test with Stevia rebaudiana seeds taking into account random effects from seed germinators and seed lots. We fit the proportion of normal seedlings, and abnormal seedling in conjunction with non-germinated seeds to the distribution binomial and beta-binomial, and next, we fit the proportion of normal seedlings, abnormal seedlings, and nongerminated seeds to the multinomial distribution following the design in Figure 4. Figure 3. The components of the seed germination test of Stevia rebaudiana: normal (a) and abnormal seedlings (b) and nongerminated seeds with the carpodium and pappus touching the hydrophilic cotton fibres at left, and an immature or an empty achene fruit-like seed at right (c) (Carneiro, 2007).

Mixed Model of the Binomial Regression
We based on the random and independent sample Y1, Y2, ..., Yn from the random variable Y such as the Yn  Bin(ni, i). The regression equation for the generalized linear model with mixed effects using the link function is as follows: (1) where: i is the proportion of normal seedlings in the i-th experimental unity, with the parameters 0, 1, 2, and 3, indicating the fixed effects and the parameters b1 and b2 indicating the random effects.

Mixed model of the Beta-binomial
Given Y1, Y2, ..., Yn from the independent random sample Y where Yi  BB(ni, i, i), the regression model with mixed effects has the following link function: ( 2) and (3) where: i is the proportion of normal seedlings in the i-th experimental unit; the parameters 10, 11, 12, and 13 indicate the fixed effects; the parameters b11 and b12 indicate the random effects; i is the data variability; and 20 is the intercept.

Mixed model of the multinomial regression
Given Y1, Y2, ..., Yn an independent random sample from a variable Y, such as Yi  Multi(ni, 1i, 2i, 3i); the regression model with mixed effects using the generalized linear model has the following link function: ( 4) and (5) where: 2 is the proportion of normal seedlings in the i-th experimental unit, with the parameters 10, 11, 12, and 13 indicating the fixed effects and the parameters b11 and b12 indicating the random effects; and 2 is the proportion of abnormal seedlings in the i-th experimental unit, with the parameters 20, 21, 22, and 23 indicating the fixed effects and the parameters b21 and b22 indicating the random effects.

Odds and odds ratio
The reasoning about regression using generalized linear mixed models is based on the odds ratio: where the odds are as follows: and the probability of fail is as follows: Thus, the odds are as follows: in which we calculated the odds (success) from Pos2 or Pos3 and Darkness. Thus, the odds ratio was the success ratio from every combination of levels in the explanatory variable. For example, the chance of success in the Position 2 in comparison with Position 1 (Pos2/Pos1) is as follows: The reasoning in the odds ratio is as follows: Odds ratio when there is no difference from both odds; Odds ratio when the odds success assigned in the numerator is higher than in the denominator; Odds ratio when the odds assigned in the denominator is higher than it is in numerator. The odds ratio is an effect-size statistic (McHugh, 2009) that indicates the explanatory variable with the best odds on the normal seedlings during the germination of Stevia rebaudiana. For more details on odds ratio, see Hosmer, Lemeshow, and Sturdivant (2013).

Generalized additive model for location, scale and shape
The GAMLSS model replaces the restrictive assumption of an exponential-family distribution by allowing a general distribution, hereafter D. D(y) has the vector of parameters  and y as a response variable with continuous or discrete distribution. The package "gamlss" in the software R (R Core Team, 2016) allows fitting up to four parameters distributions. The methodology is the same when the distribution has more than four parameters, but it has not yet been implemented in the R package. The parameter vector  = (, , v, ) contains the position (), scale (), asymmetry (v) and kurtosis () (Rigby & Stasinopoulos, 2005).
Thus, GAMLSS assumes that given a vector Y = (y1, y2, ..., yn) t of n independent observations, identically distributed by the link function gk(k), we have the following model: (11) in which gk(k) is the link function needed to link the parameter to the linear predictor k, with k = 1, 2, 3 or 4; X is the n x 1 (1 + p) matrix of known values with respect to the intercept and the p covariables of fixed effects; Z is the n x q matrix of known values from q covariables of random effects;  is the p x 1 vector of parameters related to the fixed effects; b is the q x 1 vector of parameters related to the random effects, for which we assume the multivariate distribution N(0, D), in which D is the covariance matrix of the random effects with dimension q x q;  is the n x 1 vector of random errors associated with the model with multivariate distribution N(0, R), in which R is the covariance matrix with dimension n x n.

Estimation method
The methods of estimation based on sampling information are required to estimate the true population value as accurately as possible. The parameters from the model (11) were estimated by penalized maximum likelihood. From the assumption that all the observations of the response variable are independent and identically distributed, we obtain the penalized log likelihood in the GAMLSS (11) model as follows: (12) where: is the log likelihood function as , is the probability density function of the response variable, is the vector of fixed hyper-parameters, are measurements representing predictions of random effects, and G is the symmetric matrix with dimension qk x qk that can be dependent on the vector of hyper-parameters. Rigby and Stasinopoulos (2005) proposed estimating the function (12) in R using the algorithms from Cole and Green (CG) and Rigby and Stasinopoulos (RS), because of the difficulty or impossibility of finding analytical solutions to estimate these parameters (Rigby & Stasinopoulos, 2005).

Model selection
The choice of a model is very important in statistical modelling when we can explain the responses from explanatory variables with lower error possibility. We used global deviance (GD) and the generalized Akaike information criterion (GAIC) to choose the model Global Deviance: , with the log likelihood.
GAIC: a fixed penalty  is added to the GD for every effective degree of freedom in the model: , in which df is the total effective degrees of freedom in the model. When  = 2, we have the original Akaike information criterion (AIC), and when  = log(n), we have the original Bayesian information criterion of Schwarz (BIC). Both criteria allow the comparison of non-nested models, where models with higher numbers of parameters are penalized. The selected model must have the lowest GD and GAIC () as available in the literature (Rigby & Stasinopoulos, 2005).

Results and discussion
The responses from all the current explanatory variables are important for analysts, seed technologists and plant growers because they highlighted the effects of ex situ factors on the seed quality of Stevia rebaudiana. The responses also reinforce the influence of seed lots as a random effect on the botanical traits of germinating seeds. Seed production of Stevia rebaudiana under field conditions have the percentage of normal seedlings reduces to less than fifty percent -and this figure is still supported in the literature (Abdullateef & Osman, 2011;Abdullateef et al., 2015;Simlat et al., 2016;Özyğit, Esra, & Turgut, 2015) -compared to seed lots harvested under protected conditions when the pappus is eliminated (Carneiro & Guedes, 1992;Carneiro & Guedes, 1995;Takahashi, Melges, & Carneiro, 1996;Carneiro, 1996). The alternative use of plastic trays with hydrophilic cotton fibres made it necessary to investigate the germinative responses because the plastic trays reduce the illuminance in the lowest shelves. Thus, the statistical approach in the current experiment allowed us a better understanding of these effects.

Exploratory data analysis (EDA)
First, the EDA indicated the proportions 0.56 of normal seedlings, 0.13 of abnormal seedlings, and 0.32 of non-germinated seeds (Figure 5a), or 0.44 of abnormal seedlings and non-germinated seeds (Figure 5b) across all treatments. Although the proportion 0.44 seems to cast doubt upon the seed quality evaluated in the experiment, we can also see that the proportion of abnormal seedlings was only 0.13 when all the seed germination traits were discriminated during the investigation. Inconsistent quality among seed lots is suggested by the proportion of 0.32 of non-germinated seed. The explanation for this proportion rests on the use of fruit-like seeds from seed lots harvested under field conditions. Next, Figure 6 allowed us to quantify the proportions of normal and abnormal seedlings and of nongerminated seeds with respect to each factor influencing the components of the seed germination test. In general, seed germinators 3 and 4 seem to produce more normal seedlings than seed germinators 1 and 2. Seed lots 1 and 2 contrast with lots 3 and 4. The presence of light seems to have better responses in the same Figure 6. The lowest tray position on the seed germinator seems to reduce the proportion of normal seedlings.
However, these visual indications of responses from the data set are not sufficient to support precise conclusions about the quality of the Stevia seeds under germination, but they were useful for fitting the models.

Model fitting and considerations
The six regression models from the binomial, beta-binomial and multinomial distributions considered the relationship between the proportion of normal seedlings of Stevia rebaudiana and explanatory variables such as tray position on the tiers of shelves, illuminance conditions and seed lots in different seed germinators. Thus, the best fit in Table 1 had the lowest value of AIC, AIC , DG and BIC estimated from the M3 model for all three distributions. These results stem from the significance of the following explanatory variables on the prevalence of normal seedlings: tray position, illuminance conditions and seed lots. As we expected, the seed germinators had no influence on the percentage of seed germination of Stevia rebaudiana.
The presence of significant effects from illuminance (light or darkness) is in agreement with other experiments where the authors (Abdullateef et al., 2015) made use of different light sources and seed lots. In the current experiment, the seed lots were chosen based on agricultural expertise. Seed lots produced under field conditions have lower percentage of seed germination than seed lots harvested under glasshouse conditions (Carneiro & Guedes, 1995;Takahashi et al., 1996) when pollination was controlled by bees and where recent estimates have frequently exceeded 80%. The current estimates were in part higher than reported in the literature (Abdullateef & Osman, 2011;Abdullateef et al., 2015;Simlat et al., 2016;Özyğit et al., 2015;Uçar et al., 2016), where the influence of sources of light was studied, but lower than responses from our laboratory recorded soon after 1990, where the seeds had been produced under the protected environment of the glasshouse (Carneiro & Guedes, 1992;Carneiro, 1996). In fact, the germination of fruit-like seeds in the current experiment also explains our lower results, but they were necessary to exclude factors like physical damage to the seed embryo when the pappus is mechanically removed, although this method does not filter out seeds with immature embryos, which also reduces the estimates from the seed lots. The current estimate of 37.14% from one seed lot harvested under field conditions was expected in the context of the current experiment. Figure 6. Quantity of normal (N) and abnormal seedlings (AB), and non-germinated seeds (NG) based on the explanatory variables: seed germinator, plastic tray position on tiers of shelves, illuminance conditions and seed lots. In the Table 2, the estimates of the M3 parameters in conjunction with the confidence intervals at p < 0.95 had similar for the binomial and the beta-binomial distributions but lower for both conditions when we fit the multinomial distribution for and for ). The choice of distribution to make inferences about seed quality will depend on the researchers' goals. In the case of Stevia rebaudiana, the multinomial distribution is much more informative because it considers all three components of the germination test's experimental design. As the proportion of abnormal seedlings was only 0.13 ( Figure 5), we can pay more attention to the causes of immature, empty fruit-like seeds in the seed lots, or in the seed production system. Empty fruit-like seeds are eliminated during the mechanical elimination of the pappus.  The validation of the above models based on the residual assumption using random quantile residuals is shown in Figure 7.
These worm plots indicate that the binomial, beta-binomial and multinomial distributions are all suitable to describe the components of the seed germination test of Stevia rebaudiana because their respective deviations were within the tolerance limit of 5% (Donoghoe & Marschner, 2015). The dots indicate the residuals from the model, and the elliptical curves define the CI region at 95%. A model's goodness of fit is evidenced by 95% of these dots falling between the two elliptical curves. Estimates higher than 5% indicate that a model is inadequate for explaining the response variables. Table 3 indicates similar results for the binomial and beta-binomial distributions because these data had neither under-dispersion nor over-dispersion. Thus, the conclusion from both distributions is similar odds for normal seedlings when the plastic trays are on the middle or uppermost tier of shelves. The ratio 0.77 reduced the odds for normal seedlings when the tray position is on the lowest than in the uppermost tiers, or 1.33 (1/0.77) in the uppermost position than in lowest two tiers. Illuminance conditions also influenced the proportion of normal seedlings, with odds ratio of 0.79, or 1.26 higher odds for normal seedlings under light than under darkness. These responses reinforce the determination to use light during the seed germination of Stevia rebaudiana. However, our estimates of normal seedlings under darkness are higher than reported in the literature, where the estimate of 19.4% (Simlat et al., 2016) contrasts with the value of 80.49% in the seed lot produced under protected environment.
Two steps can be seen after the adjustment of the multinomial model (Table 3) following in both cases the model [10] in the methodology. First, we compared the proportion of normal seedlings with the proportion of non-germinated seeds. The results indicated similar odds of normal seedlings in the middle and in the uppermost two tiers of shelves where the odds are 1.59 times higher (1/0.63) than in the lowest two positions. Seeds under light conditions have the odds of normal seedlings increased by a factor of 1.37 over seeds under darkness. Next, we compared the proportion of abnormal seedlings with non-germinated seeds. The abnormal seedlings' odds increase 1.32 times in the uppermost tier of shelves over the middle tiers and 2.04 times over the lowest tier. Finally, the odds of abnormal seedlings under light conditions are 1.30 times higher than under darkness. The increased odds of abnormal seedlings result from the effects of light in chemical reactions of seeds damaged by rain water and insect attacks. Therefore, light is an essential fixed factor for investigating the germination of Stevia rebaudiana seeds under controlled conditions, which corroborates the findings reported in the literature (Abdullateef & Osman, 2011;Simlat et al., 2016;Uçar et al., 2016). Random effects from seed lots, otherwise, will be significant when the next lots will detain the estimates of variability with a high probability of being found within the confidence intervals indicated in Table 2. In the analysis of seed germination of Stevia rebaudiana, the choice of the distribution shall be defined by the researcher. Thus, when the objective is to analyse only the proportion of normal seedlings the researcher can make use of the binomial and beta-binomial distribution as well as when the records indicate low levels of both the abnormal seedlings and non-germinated seeds. In the future, studies investigating the presence of abnormal seedlings caused by thermal stress in the ageing test, for example, suggest fitting the multinomial distribution using data from all components of the seed germination test. A fourth component can be addressed in the germination test by counting the seeds with symptoms of seed-borne diseases, because the current methodology of germination allows the safety discrimination of this component in the non-germinated seeds, as in Figure 3c, where the carpodium displays the initial symptoms of seed-borne disease. The responses from the current experiment allowed us to improve the seed germinator illuminance by using red LEDs on all tiers of shelves.

Conclusion
Generalized linear mixed models using binomial, beta-binomial and multinomial distributions are suitable to explain the responses from the components of the germination test of Stevia rebaudiana seeds. The binomial and beta-binomial distributions are indicated when the aim of the experiment is to understand the effects from explanatory variables only on the proportion of normal seedlings. All the models indicate that the position of the tray on the tiers of shelves and the light conditions affected the odds. The best influence of the tier of shelves on the proportion of normal seedlings of Stevia rebaudiana under germination was in the uppermost position where we had more illuminance. Studies investigating all three components of the germination test require the use of the multinomial distribution.