Alternative measures to evaluate the accuracy and bias of genomic predictions with censored records
Resumo
This study aimed to propose and compare metrics of accuracy and bias of genomic prediction of breeding values for traits with censored data. Genotypic and censored-phenotypic information were simulated for four traits with QTL heritability and polygenic heritability, respectively: C1: 0.07-0.07, C2: 0.07-0.00, C3: 0.27-0.27, and C4: 0.27-0.00. Genomic breeding values were predicted using the Mixed Cox and Truncated Normal models. The accuracy of the models was estimated based on the Pearson (PC), maximal (MC), and Pearson correlation for censored data (PCC) while the genomic bias was calculated via simple linear regression (SLR) and Tobit (TB). MC and PCC were statistically superior to PC for the trait C3 with 10 and 40% censored information, for 70% censorship, PCC yielded better results than MC and PC. For the other traits, the proposed measures were superior or statistically equal to the PC. The coefficients associated with the marginal effects (TB) presented estimates close to those obtained for the SLR method, while the coefficient related to the latent variable showed almost unchanged pattern with the increase in censorship in most cases. From a statistical point of view, the use of methodologies for censored data should be prioritized, even for low censoring percentages.
Downloads
Referências
Alemu, S. W., Calus, M. P. L., Muir, W. M., Peeters, K., Vereijken, A., & Bijama, P. (2016). Genomic prediction of survival time in a population of brown laying hens showing cannibalistic behavior. Genetics Selection Evolution, 48(68), 1-10. DOI: https://doi.org/10.1186/s12711-016-0247-4
Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica, 41(6), 997-1016. DOI: https://doi.org/10.2307/1914031
Amore, M. D., & Murtinu, S. (2019). Tobit models in strategy research: Critical issues and applications. Global Strategy Journal, 11(3), 331-355. DOI: https://doi.org/10.1002/gsj.1363
Araujo, A. C., Carneiro, P. L., Alvarenga, A. B., Oliveira, H. R., Miller, S. P., Retallick, K., & Brito, L. F., (2022). Haplotype-based single-step GWAS for yearling temperament in American Angus cattle. Genes, 13, 17. DOI: https://doi.org/10.3390/genes13010017
Blázquez, F. L., & Miño, B. S. (2014). Maximal correlation in a non-diagonal case. Journal of Multivariate Analysis, 131(C), 265-278. DOI: https://doi.org/10.1016/j.jmva.2014.07.008
Breiman, L., & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391), 580-598. DOI: https://doi.org/10.1080/01621459.1985.10478157
Brito, F. V., Neto, J. B., Sargolzaei, M., Cobuci, J. A., & Schenkel, F. S. (2011). Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genetics, 12(80), 1-10. DOI: https://doi.org/10.1186/1471-2156-12-80
Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D., & Calus, M. P. L. (2013). Whole genome regression and prediction methods applied to plant and animal breeding. Genetics, 193(2), 327-345. DOI: https://doi.org/10.1534/genetics.112.143313
Costa, E. V., Ventura, H. T., Veroneze, R., Silva, F. F., Pereira, M. A., & Lopes, P. S. (2019). Bayesian linear-threshold censored models for genetic evaluation of age at first calving and stayability in Nellore cattle. Livestock Science, 230(103833). DOI: https://doi.org/10.1016/j.livsci.2019.103833
Deebani, W., & Kachouie, N. N. (2020) Monte Carlo ensemble correlation coefficient for association detection. Communications in Statistics - Simulation and Computation, 51(12), 7095-7109. DOI: https://doi.org/10.1080/03610918.2020.1823413
Feizi, S., Makhdoumi, A., Duffy, K., Kellis, M., & Medard, M. (2017). Network maximal correlation. IEEE Transactions on Network Science and Engineering, 4(4), 229-247. DOI: https://doi.org/10.1109/TNSE.2017.2716966
Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society of London, 45(273-279), 135-145. DOI: https://doi.org/10.1098/rspl.1888.0082
Gebelein, H. (1941). Das statistische problem der correlation als variation und eigenwertproblem und sein zusammenhang mit der ausgleichrechnung. Zeitschrift für angewandte Mathematik und Mechanik, 21(6), 364-379. DOI: https://doi.org/10.1002/zamm.19410210604
Giolo, S. R., & Demétrio, C. G. B. (2011). A frailty modeling approach for parental effects in animal breeding. Journal of Applied Statistics, 38(3), 619-629. DOI: https://doi.org/10.1080/02664760903521492
Hou, Y., Madsen, P., Labouriau, R., Zhang, Y., Lund, M. S., & Su, G. (2009). Genetic analysis of days from calving to first insemination and days open in Danish Holsteins using different models and censoring scenarios. Journal Dairy Science, 92(3), 1229-1239. DOI: https://doi.org/10.3168/jds.2008-1556
Kärkkäinen, H. P., & Sillanpää, M. J. (2013). Fast Genomic Predictions via Bayesian G-BLUP and Multilocus Models of Threshold Traits Including Censored Gaussian Data. G3, 3(9), 1511-1523. DOI: https://doi.org/10.1534/g3.113.007096
Kendall, M. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81-93. DOI: https://doi.org/10.1093/biomet/30.1-2.81
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 1-16. DOI: https://doi.org/10.1103/PhysRevE.69.066138
Lewis, R. A., & McDonald, J. B. (2014). Partially Adaptive Estimation of the Censored Regression Model. Econometric Reviews, 33(7), 732-750. DOI: https://doi.org/10.1080/07474938.2012.690691
Li, Y., Gillespie, B. W., Shedden, K., & Gillespie, J. A. (2018). Profile Likelihood Estimation of the Correlation Coefficient in the Presence of Left, Right or Interval Censoring and Missing Data. The R Journal, 10(2), 159-179. DOI: https://doi.org/10.32614/RJ-2018-040
Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.
Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health, 23, 151-169. DOI: https://doi.org/10.1146/annurev.publhealth.23.100901.140546
Massender, E., Brito, L.F., Maignel, L., Oliveira, H.R., Jafarikia, M., Baes, C.F., ... Schenkel, F.S., (2022). Single-and multiple-breed genomic evaluations for conformation traits in Canadian Alpine and Saanen dairy goats. Journal of Dairy Science, 105(7), 5985-6000. DOI: https://doi.org/10.3168/jds.2021-21713
Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. DOI: https://doi.org/10.1093/genetics/157.4.1819
Moon, Y., Rajagopalan, B., & Lall, U. (1995). Estimation of mutual information using kernel density estimators. Physical Review E, 52(3), 2318-2321. DOI: https://doi.org/10.1103/PhysRevE.52.2318
Newton, E., & Rudel, R. (2007). Estimating correlation with multiply censored data arising from the adjustment of singly censored data. Environmental science & technology, 41, 221-228. DOI: https://doi.org/10.1021/es0608444
Oakes, D. (1982). A concordance test for independence in the presence of censoring. Biometrics, 38(2), 451-455.
Oliveira, H. R., Miller, S. P., Brito, L. F., & Schenkel, F. S. (2021). Impact of censored or penalized data in the genetic evaluation of two longevity indicator traits using random regression models in North American Angus cattle. Animals, 11(3). DOI: https://doi.org/10.3390/ani11030800
Palaiokostas, C., Ferraresso, S., Franch, R., Houston, R. D., & Bargelloni, L. (2016). Genomic Prediction of Resistance to Pasteurellosis in Gilthead Sea Bream (Sparus aurata) Using 2b-RAD Sequencing. G3, 6(11), 3693-3700. DOI: https://doi.org/10.1534/g3.116.035220
Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13, 25-45. DOI: https://doi.org/10.1093/biomet/13.1.25
Pérez, P., & Campos, G. (2014). Genome-wide regression and prediction with the BGLR statistical package. Genetics, 198(2), 483-495. DOI: https://doi.org/10.1534/genetics.114.164442
R Development Core Team. (2020). R: a language and environment for statistical computing. Vienna, AU: R Foundation for Statistical Computing. Retrieved from https://cran.r-project.org/bin/windows/base/
Rényi, A. (1959). On measures of dependence. Acta Mathematica Hungarica, 10, 441-451. DOI: https://doi.org/10.1007/bf02024507
Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., … Sabeti, P. C. (2011). Detecting Novel Associations in Large Datasets. Science, 334(6062), 1518-1524. DOI: https://doi.org/10.1126/science.1205438
Ripatti, S., & Palmgren, J. (2000). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics, 56(4), 1016-1022. DOI: https://doi.org/10.1111/j.0006-341x.2000.01016.x
Santos, S. S., Takahashi, D. Y., Nakata, A., & Fujita, A. (2014). A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics, 15(6), 906-918. DOI: https://doi.org/10.1093/bib/bbt051
Santos, V. S., Martins, F. S., Resende, M. D., Azevedo, C. F., Lopes, P. S., Guimarães, S. E., ... Silva, F. F. (2015). Genomic selection for slaughter age in pigs using the Cox frailty model. Genetics and Molecular Research, 14(4), 12616-12627. DOI: https://doi.org/10.4238/2015.October.19.5
Sargolzaei, M., & Schenkel, F. S. (2009). QMSim: A large-scale genome simulator for livestock. Bioinformatics, 25(5), 680-681. DOI: https://doi.org/10.1093/bioinformatics/btp045
Smith, B. J. (2007). boa: An R Package for MCMC Output Convergence Assessment and Posterior Inference. Journal of Statistical Software, 21(11), 1-37. DOI: https://doi.org/10.18637/jss.v021.i11
Spearman, C. (1904). "General intelligence", objectively determined and measured. The American Journal of Psychology, 15(2), 201-292. DOI: https://doi.org/10.2307/1412107
Spector, P., Friedman, J., Tibshirani, R., Lumley, T., Garbett, S., & Baron, J. (2016). Acepack: ACE and AVAS methods for choosing regression transformations. R package version 1.4.1. Retrieved from https://cran.r-project.org/web/packages/acepack/index.html
Szekely, G., Rizzo, M., & Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769-2794. DOI: https://doi.org/10.1214/009053607000000505
Teissier, M., Larroque, H., Brito, L. F., Rupp, R., Schenkel, F. S., & Robert-Granié, C., (2020). Genomic predictions based on haplotypes fitted as pseudo-SNP for milk production and udder type traits and SCS in French dairy goats. Journal of Dairy Science, 103(12), 11559-11573. DOI: https://doi.org/10.3168/jds.2020-18662
Therneau, T. M., Grambsch, P. M., & Pankratz, V. S. (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics, 12, 156-175. DOI: https://doi.org/10.1198/1061860031365
Therneau, T. M. (2020). Coxme: Mixed Effects Cox Models. R-package description., 1-14. Retrieved from https://cran.r-project.org/web/packages/coxme/vignettes/coxme.pdf
Tobin, J. (1958). Estimation of Relationships for Limited Dependent Variables. Econometrica, 26, 24-36. DOI: https://doi.org/10.2307/1907382
Vallejo, R. L., Leeds, T. D., Fragomeni, B. O., Gao, G., Hernandez, A. G., Misztal, I., ... Palti, Y. (2016). Evaluation of genome-enabled selection for bacterial cold-water disease resistance using progeny performance data in rainbow trout: Insights on genotyping methods and genomic prediction models. Frontiers in Genetics, 7(96), 1-13. DOI: https://doi.org/10.3389/fgene.2016.00096
VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Sciences, 91(11), 4414-4423. DOI: https://doi.org/10.3168/jds.2007-0980
Wientjes, Y. C. J., Veerkamp, R. F., & Calus, M. P. L. (2013). The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction. Genetics, 193(2), 621-631. DOI: https://doi.org/10.1534/genetics.112.146290
DECLARAÇÃO DE ORIGINALIDADE E DIREITOS AUTORAIS
Declaro que o presente artigo é original, não tendo sido submetido à publicação em qualquer outro periódico nacional ou internacional, quer seja em parte ou em sua totalidade.
Os direitos autorais pertencem exclusivamente aos autores. Os direitos de licenciamento utilizados pelo periódico é a licença Creative Commons Attribution 4.0 (CC BY 4.0): são permitidos o compartilhamento (cópia e distribuição do material em qualqer meio ou formato) e adaptação (remix, transformação e criação de material a partir do conteúdo assim licenciado para quaisquer fins, inclusive comerciais.
Recomenda-se a leitura desse link para maiores informações sobre o tema: fornecimento de créditos e referências de forma correta, entre outros detalhes cruciais para uso adequado do material licenciado.