Subset selection of markers for the genome-enabled prediction of genetic values using radial basis function neural networks
Abstract
This paper aimed to evaluate the effectiveness of subset selection of markers for genome-enabled prediction of genetic values using radial basis function neural networks (RBFNN). To this end, an F1 population derived from the hybridization of divergent parents with 500 individuals genotyped with 1000 SNP-type markers was simulated. Phenotypic traits were determined by adopting three different gene action models – additive, additive-dominant, and epistatic, representing two dominance situations: partial and complete with quantitative traits having a heritability (h2) of 30 and 60%; traits were controlled by 50 loci, considering two alleles per locus. Twelve different scenarios were represented in the simulation. The stepwise regression was used before the prediction methods. The reliability and the root mean square error were used for estimation using a fivefold cross-validation scheme. Overall, dimensionality reduction improved the reliability values for all scenarios, specifically with h2 =30 the reliability value from 0.03 to 0.59 using RBFNN and from 0.10 to 0.57 with RR-BLUP in the scenario with additive effects. In the additive dominant scenario, the reliability values changed from 0.12 to 0.59 using RBFNN and from 0.12 to 0.58 with RR-BLUP, and in the epistasis scenarios, the reliability values changed from 0.07 to 0.50 using RBFNN and from 0.06 to 0.47 with RR-BLUP. The results showed that the use of stepwise regression before the use of these techniques led to an improvement in the accuracy of prediction of the genetic value and, mainly, to a large reduction of the root mean square error in addition to facilitating processing and analysis time due to a reduction in dimensionality.
Downloads
References
Akidemir, D., Jannink, J. L., & Isidro-Sánchez, J. (2017). Locally epistatic models for genome-wide prediction and association by importance sampling. Genetics Selection Evolution, 49(1), 49-74. DOI: 10.1186/S1271101703488
Almeida-Filho, J. E., Guimarães, J. F. R., Silva, F. F., de Resende, M. D. V., Muñoz, P., Kirst, M., & Resende Jr., M. F. R. (2016). The contribution of dominance to phenotype prediction in a pine breeding and simulated population. Heredity, 117(1), 33-41. DOI: 10.1111/1468-0009.12357
Azevedo, C. F., de Resende, M. D. V., Fonseca, F., Lopes, P. S., & Guimarães, S. E. F. (2013). Regressão via componentes independentes aplicada à seleção genômica para características de carcaça em suínos. Pesquisa Agropecuária Brasileira, 48(6), 619-626. DOI: 10.1590/S0100-204X2013000600007
Azevedo, C. F., Silva, F. F., de Resende, M. D. V., Lopes, M. S., Duijvesteijn, N., Guimarães, S. E. F., ... Knol, E. F. (2014). Supervised independent component analysis as an alternative method for genomic selection in pigs. Journal of Animal Breeding and Genetics, 131(6), 452-461. DOI: 10.1111jbg12104
Braga, A.P., Carvalho, A. P. L. F., & Ludermir, T. B. (2011). Redes neurais artificiais - teoria e aplicações (2a. ed.). Rio de Janeiro, RJ: LTV.
Crossa, J., Pérez-Rodríguez, P., Cuevas, J., Montesinos-López, O., Jarquín, D., de los Campos, G., ... Dreisigacker, S.(2017). Genomic selection in plant breeding: Methods, models, and perspectives. Trends in Plant Science, 22(11), 961-975. DOI: 10.1016/j.tplants.2017.08.011
Chen, S., Cowan, C. F., & Grant, P. M. (1991). Orthogonal least squares learning algorithm for radial basis function networks. IEEE Transactions on Neural Networks, 2(2), 302-309. DOI: 10.11097280341
Cruz, C. D., & Nascimento, M. (2018). Inteligência computacional aplicada ao melhoramento genético. Vicosa, MG: Editora UFV.
Cruz, C. D. (2016) Genes Software-extended and integrated with the R, Matlab and Selegen. Acta Scientiarum. Agronomy, 38(4), 547-552. DOI: 10.4025/actasciagron.v38i4.32629
Denis, M., & Bouvet, J. M. (2011). Genomic selection in tree breeding: testing accuracy of prediction models including dominance effect. BMC Proceedings, 5(7), 1-2. DOI: 10.1186/175365615S7O13
Dudley, J. W. (2008). Epistatic interactions in crosses of Illinois high oil 9 Illinois low oil and of Illinois high protein 9 Illinois low protein. Crop Science. 48, 59-68. DOI: 10.2135/cropsci2007.04.0242
Dudley, J. W., & Johnson, G. R. (2009). Epistatic models improve prediction of performance in corn. Crop Science, 49(3), 763-770. DOI: 10.2135/cropsci2008.08.0491
Felipe, V. P., Okut, H., Gianola, D., Silva, M. A., & Rosa, G. J. (2014). Effect of genotype imputation on genome-enabled prediction of complex traits: an empirical study with mice data. BMC Genetics, 15(1), 1-10. DOI: 10.1186/s12863-014-0149-9
Gianola, D., Fernando, R. L., & Stella, A. (2006). Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics, 173(3), 1761-1776. DOI: 101534genetics105049510
Gianola, D., Okut, H., Weigel, K. A., & Rosa, G. J. (2011). Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics, 12(1), 1-14. DOI: 10.1186/1471-2156-12-87
González-Camacho, J. M., de Los Campos, G., Pérez, P., Gianola, D., Cairns, J. E., Mahuku, G., ... Crossa, J. (2012). Genome-enabled prediction of genetic values using radial basis function neural networks. Theoretical and Applied Genetics, 125(4):759-771. DOI: 10.1007s0012201218689
Holland, J.B. (2006). Theoretical and biological foundations of plant breeding. In K. R. Lamkey, & M. Lee (Ed.), Plant breeding: the Arnel R Hallauer International Symposium (p. 127-140). Ames, IA: Blackwell Publishing. DOI: 10.1002/9780470752708.ch9
Howard, R., Carriquiry, A. L., & Beavis, W. D. (2014). Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3: Genes, Genomes, Genetics, 4(6), 1027-1046. DOI: 101534g3114010298
Lee, S. H., van der Werf, J. H., Hayes, B. J., Goddard, M. E., & Visscher, P. M. (2008). Predicting unobserved phenotypes for complex traits from whole-genome SNP data. PLoS Genetics, 4(10), 1-11. DOI: 10.1371journalpgen1000231
Long, N., Gianola, D., Rosa, G. J., & Weigel, K. A. (2011a). Marker-assisted prediction of non-additive genetic values. Genetica, 139(7), 843-854. DOI: 10.1007s1070901195887
Long, N., Gianola, D., Rosa, G. J. M., & Weigel, K. A. (2011b). Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. Journal of Animal Breeding and Genetics, 128(4), 247-257. DOI: 10.1111j14390388201100917x
Long, N., Gianola, D., Rosa, G. J., Weigel, K. A., Kranis, A., & Gonzalez-Recio, O. (2010). Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetics Research, 92(3), 209-225. DOI: 10.1017S0016672310000157
Long, N., Gianola, D., Rosa, G. J., Weigel, K. A., & Avendano, S. (2007). Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. Journal of Animal Breeding and Genetics, 124(6), 377-389. DOI: 101159000317279
Mackay, T. F., Stone, E. A., & Ayroles, J. F. (2009). The genetics of quantitative traits: challenges and prospects. Nature Reviews Genetics, 10(8), 565. DOI: 101111j14390388200700694x
MATLAB. (2010). Matlab Version 7.10.0. Natick, MA: The Math Works Inc.
Meuwissen T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829.
Montgomery, D. C., Peck, E. A., & Vining, G. G. (1982). Introduction to linear regression analysis. New York, US: John Wiley and Sons.
Pérez-Rodríguez, P., Gianola, D., González-Camacho, J. M., Crossa, J., Manès, Y., & Dreisigacker, S. (2012). Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3: Genes, Genomes, Genetics, 2(12), 1595-1605. DOI: 101534g3112003665
R Core Team. (2018). R: A language and environment for statistical computing. Vienna, AU: R Foundation for Statistical Computing. Retrieved on Dec. 20, 2018 from https://www.R-project.org.
Santos, V. S., Martins Filho, S., Resende, M. D. V., Azevedo, C. F., Lopes, P. S., Guimarães, S. E. F., & Silva, F. F. (2016). Genomic prediction for additive and dominance effects of censored traits in pigs. Genetics and Molecular Research, 15(4), 1-16. DOI: 10.4238/gmr15048764
Sant'Anna, I. C., Nascimento, M., Silva, G. N., Cruz, C. D., Azevedo, C. F., Silva, F. F., & Gloria, L. S. (2019). Genome-enabled prediction of genetic values for using radial basis function neural networks. Functional Plant Breeding Journal, 1, 29-40. DOI:10.35418/2526-4117/v1n2a1
Viana, J. M. S., & Piepho, H. P. (2017). Quantitative genetics theory for genomic selection and efficiency of genotypic value prediction in open-pollinated populations. Scientia Agricola, 74(1), 41-50. DOI: 10.1590/0103-9016-2014-0383
Weigel, K. A., de Los Campos, G., Vazquez, A. I., Rosa, G. J. M., Gianola, D., & Van Tassell, C. P. (2010a). Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science, 93(11), 5423-5435. DOI: 103168jds20103149
Weigel, K. A., Van Tassell, C. P., O’Connell, J. R., VanRaden, P. M., & Wiggans, G. R. (2010b). Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. Journal of Dairy Science, 93(5), 2229-2238. DOI: 10.3168jds20092849
Xu, Y., Wang, X., Ding, X., Zheng, X., Yang, Z., Xu, C., & Hu, Z. (2018). Genomic selection of agronomic traits in hybrid rice using an NCII population. Rice, 11(1), 1-10. DOI: 10.1186s1228401802234
Zheng, S. J., Li, Z. Q., & Wang, H. T. (2011). A genetic fuzzy radial basis function neural network for structural health monitoring of composite laminated beams. Expert Systems with Applications, 38(9), 11837-11842. DOI: 101016jeswa201103072
DECLARATION OF ORIGINALITY AND COPYRIGHTS
I Declare that current article is original and has not been submitted for publication, in part or in whole, to any other national or international journal.
The copyrights belong exclusively to the authors. Published content is licensed under Creative Commons Attribution 4.0 (CC BY 4.0) guidelines, which allows sharing (copy and distribution of the material in any medium or format) and adaptation (remix, transform, and build upon the material) for any purpose, even commercially, under the terms of attribution.