Population parameters and selection of kale genotypes using Bayesian inference in a multi-trait linear model

Variance components must be obtained to estimate genetic parameters and predict breeding values. This information can be obtained through Bayesian inference. When multiple traits are evaluated, Bayesian inference can be used in multi-trait models. The objective of this study was to obtain estimates of genetic parameters, gains with selection, and genetic correlations among traits. Likewise, we aim to predict the genetic values and select the best kale genotypes using the Bayesian approach in a multi-trait linear model. The following traits were evaluated: stem diameter, plant height, number of shoots, number of marketable leaves and fresh weight of leaves using Bayesian inference in 22 kale genotypes. The experiment consisted of a randomized block design with three replications and four plants per plot. Genetic effects predominated over environmental effects. The highest correlation estimates were found between the fresh weight of leaves and stem diameter and between the plant height and number of marketable leaves. The following commercial cultivars and genotypes are recommended for cultivation and to integrate into breeding programs: UFLA 11, UFLA 5, UFLA 6, UFVJM 3 and UFVJM 19. The estimates of the gain with selection indicate the potential for improvement of the studied population.


Introduction
Kale (Brassica oleracea L. var.acephala DC.) is an annual or biennial vegetable that belongs to the Brassicaceae family.Due to its new uses in culinary dishes and recent discoveries about its nutraceutical properties, the kale consumption has gradually increased (Moreno, Carvajal, Lopez-Berenguer, & Garcia-Viguera, 2006;Vilar, Cartea, & Padilla, 2008;Soengas, Sotelo, Cartea, & Velasco, 2011).The aim of a kale-breeding program is the facilitation of cultural practices and the increase in the yield per area.Thus, there is a growing interest in selecting plants with a lower height, lower number of shoots, higher stem diameter, and higher number of leaves (Azevedo et al., 2012).
To define strategies for breeding programs, it is necessary to estimate variance components, predict breeding values and obtain estimates of genetic parameters (Gonçalves-Vidigal, Mora, Bignotto, Munhoz, & Souza, 2008;Oliveira, Santana, Oliveira, & Santos, 2014).The variance components are unknown and are usually estimated by the method of moments, maximum likelihood (ML), or restricted maximum likelihood (REML).Generally, two or more traits are simultaneously evaluated in studies with kale.In this case, the multi-traits models can be applied, which allow the improvement of the predictions (Viana, Sobreira, Resende, & Faria, 2010) and the determination of associations among traits.In this case, Bayesian inference can be advantageously used because it enables the calculation of the densities of the marginal posterior distributions and the credibility intervals of the variance components, breeding values and genetic parameters, such as heritability, coefficient of genotypic variation, coefficient of residual variation, relative variation index and genotypic correlation (Waldmann & Ericsson, 2006).
Thus, the objective of this work was to use the Bayesian approach considering a multi-trait linear model to obtain estimates of the genetic parameters, assess the genetic correlation between traits, predict breeding values, and select the best kale genotypes available in the germplasm bank of the UFVJM (Federal University of the Valleys Jequitinhonha and Mucuri).
On June 7 th , 2013, shoots were collected for seedling formation.These shoots were three to four centimeters in height and had two leaflets.After collection, the shoots were planted in trays with 72 cells filled with a commercial substrate.These trays were kept in a greenhouse for 30 days for better rooting.On July 7 th , 2013, the seedlings were transplanted into 2.50 m wide and 0.30 m high beds, spaced at 1 m between rows and 0.50 m between plants.Fertilization was carried out according to the recommendations available for the crop.
In each plant, the number of shoots (when they were removed), number of marketable leaves and fresh weight of marketable leaves were evaluated.These assessments were made during 15 harvests in the period from September 8 th , 2013 to March 4 th , 2014.Fully expanded leaves with a leaf length larger than 15 cm and no signs of senescence were considered marketable (Azevedo et al., 2012).The plant height (using measuring tape) and stem diameter (measured with a pachymeter at half the height of the plant) were evaluated on January 6 th , 2014.
Statistical analyses were carried out using the plot mean.We consider the multi-trait mixed model by Henderson and Quaas (1976).Let vector y 1 represent the n 1 observations for trait 1, y 2 represent the n 2 observations for trait 2 and y n represent the n n observations for trait n.Then, the multi-trait mixed linear model for n traits can be written as follows: i=1,2,...,n; where: X i = is the incidence matrix of the block effects associated with trait i; β i = is the vector of fixed blocks effects summed to the overall mean associated with trait i; Z i = is the incidence matrix of the genetic effect of each genotype for trait i; u i = is the vector of random effects of genotypes associated with trait i; e i = is the vector of random residual effects associated with trait i.The assumptions on the distribution of y, u, and e are described as: Here, R and G are covariance matrices associated with the vector e of residuals and vector u of random effects.If R0 (of order 5 × 5) is the residual covariance for the five traits, then R can be calculated as R = R0 ⊗ I (here '⊗' is the Kronecker product of two matrices and I is the identity matrix).Similarly, the genetic covariance matrix G can be calculated as G = G0 ⊗ I.In the mixed model used, β is considered the vector of solutions for the systemic effects; however, from the Bayesian point of view, it is a vector of random effects in which the initial distribution values have uninformative priors; thus, they do not provide much information about the parameter and, therefore, have a uniform probability distribution (Everling, et al., 2014).This type of probability distribution indicates the same probability of occurrence of each of the possible variable values.Gaussian and inverted Wishart distributions were defined as a priori distributions for random effects and (co)variance components, respectively.In all the cases were considered vague priori with flat probability (uninformative prior distributions).
The Bayesian inference was used for estimating the variance components and predicting the breeding values of the accessions.The data analysis was carried out with the R software (R Development Core Team, 2012) using the support of MCMCglmm package (Hadfield, 2010), through the Gibbs sampling algorithm.A Markov chain with 2,000,000 cycles was generated.It was considered a burn in of 100,000 cycles and a thinning interval of 1,000 cycles, which ensured that the serial correlations were zero or very low.As a check criterion for convergence, the p-value of the Geweke test was analyzed.
From the posterior distribution, the following parameters were calculated: the genetic variation coefficients: For the variance components, genetic parameters and breeding values, we calculated the mean, median, mode and interval of higher density of posterior distribution (HPD) with the support of the boa package.

Results
The p-value of the Geweke test was used as an indication of convergence with the Markov Chains (Table 1).All the p-value estimates obtained by the Geweke test were lower than 0.05.Values closer to the mean, median and mode for the marginal posterior distribution of the parameters obtained were found.
The highest heritability estimates were found for the number of leaves and fresh weight of leaves (Table 1).The plant height had the lowest heritability estimates.However, these estimates did not significantly differ from the credibility interval.The plant height also had the lowest estimate for the genetic variation coefficient.This estimate did not significantly differ from that found for the stem diameter.The number of leaves and fresh weight of leaves presented the highest coefficient of genetic variation.
The mode for the posterior distribution of the coefficient of environmental variation ranged from 8.659% for plant height to 14.448% for the fresh weight of leaves (Table 1).Only the plant height and fresh weight of leaves showed significant differences in the estimates of the coefficient of environmental variation.Moreover, only the plant height had a mode of the posterior distribution of the coefficient of relative variation lower than 1.00 (0.876).However, its estimates were not significantly lower than 1.00 according to the credibility interval (0.479-1.466).The number of leaves showed the highest coefficient of relative variation, which differ significantly from the estimates found for stem diameter and plant height.
The highest correlation estimates were found in the fresh weight of leaves and stem diameter (0.964), plant height (-0.904) or number of marketable leaves (0.930; Table 1).Lower correlation estimates were found between the number of shoots and stem diameter (0.243) or fresh weight of leaves (0.585), which were not significantly different from zero.The commercial cultivars COM 1, COM 2 and COM 3 (Figure 1) presented the highest stem diameters, which were significantly different only from the UFVJM 36 genotype, according to the credibility interval.Accession UFLA 5 had the lowest height, which was significantly different only from the UFVJM 26 and UFVJM 36 genotypes.The genotype with the lowest number of shoots was UFVJM 36, and it did not significantly differ from the UFLA 5, UFLA 12, UFVJM 7, UFVJM 13, UFVJM 16, UFVJM 24, and COM 2 genotypes.For the number of leaves and fresh weight of leaves, the best results were found for the commercial cultivars, which were significantly different only from the UFLA 12 and UFVJM 36 genotypes.
The plant height presented the lowest gain with selection estimates (Figure 2).For this trait, the mode of the posterior distribution was -10%.The other traits showed higher estimates for gain with selection.The fresh weight of leaves, number of leaves and number of shoots had higher estimates of gain with selection.

Discussion
The obtained asymmetric credible intervals for the variance components, genetics parameters and breeding values are a peculiarity of Bayesian inference.They make this approach notably informative (Mathew et al., 2012) and facilitate hypothesis testing.According to Apiolaza, hauhan, and Walker (2011), asymmetric credibility intervals obtained by posterior distribution make the conclusions more realistic than those based on symmetric confidence intervals of frequentist statistics.Additionally, Bayesian inference allows the evaluation of unbalanced experiments and the study of more complex statistical models (Bink et al., 2007).Consequently, its use has become more common among breeders not only for the analysis of molecular data but also for phenotypic data.
The results of the Geweke test demonstrated the reliability of the results presented for all the parameters.Therefore, it is reasonable to believe that the samples are truly representative in the underlying stationary distribution of the Markov chain (Cowles & Carlin, 1996).
The lowest heritability estimate found for plant height is the consequence of the lowest genotypic variation because the residual variation was low when compared with the other traits.This trait also presented the lowest estimate of genetic variation, which indicates a difficulty for the improvement of plant height in this population.However, there was overlap of the credibility intervals between the traits regarding heritability and coefficient of relative variation.This outcome shows that these estimates were not significantly different.The estimates obtained for heritability and for the relative coefficient of variation indicate a higher possibility of success with selection than the success observed by Azevedo et al. (2012), who evaluated 30 kale genotypes.According to these authors, in kale genetic breeding, it is important to increase the stem diameter and reduce plant height, which reduces the need for staking.Likewise, according to Chakwizira et al. (2009), the stem diameter is not only influenced by genetic factors but also by weather conditions of the growing region.The reduction in the number of shoots is advantageous because it reduces the need for cultural practices, such as sprout thinning.From a commercial point of view, another advantage of a lower number of shoots is the reduction of the potential for vegetative propagation of the plant, which can increase the continuous sale of seeds (Azevedo et al., 2012).

Figure 1 .
Figure 1.Mode of breeding values between the highest posterior density intervals (HPD -95%) of stem diameter (SD), plant height (PH), shoot number (NS), number of leaves (NL) and fresh weight of leaves (FWL) in kale genotypes.