Alternative measures to evaluate the accuracy and bias of genomic predictions with censored records

Palavras-chave: genomic selection; statistical modeling; simulation; mixed Cox model; truncated normal model.

Resumo

This study aimed to propose and compare metrics of accuracy and bias of genomic prediction of breeding values for traits with censored data. Genotypic and censored-phenotypic information were simulated for four traits with QTL heritability and polygenic heritability, respectively: C1: 0.07-0.07, C2: 0.07-0.00, C3: 0.27-0.27, and C4: 0.27-0.00. Genomic breeding values were predicted using the Mixed Cox and Truncated Normal models. The accuracy of the models was estimated based on the Pearson (PC), maximal (MC), and Pearson correlation for censored data (PCC) while the genomic bias was calculated via simple linear regression (SLR) and Tobit (TB). MC and PCC were statistically superior to PC for the trait C3 with 10 and 40% censored information, for 70% censorship, PCC yielded better results than MC and PC. For the other traits, the proposed measures were superior or statistically equal to the PC. The coefficients associated with the marginal effects (TB) presented estimates close to those obtained for the SLR method, while the coefficient related to the latent variable showed almost unchanged pattern with the increase in censorship in most cases. From a statistical point of view, the use of methodologies for censored data should be prioritized, even for low censoring percentages.

Downloads

Não há dados estatísticos.

Referências

Alemu, S. W., Calus, M. P. L., Muir, W. M., Peeters, K., Vereijken, A., & Bijama, P. (2016). Genomic prediction of survival time in a population of brown laying hens showing cannibalistic behavior. Genetics Selection Evolution, 48(68), 1-10. DOI: https://doi.org/10.1186/s12711-016-0247-4

Amemiya, T. (1973). Regression analysis when the dependent variable is truncated normal. Econometrica, 41(6), 997-1016. DOI: https://doi.org/10.2307/1914031

Amore, M. D., & Murtinu, S. (2019). Tobit models in strategy research: Critical issues and applications. Global Strategy Journal, 11(3), 331-355. DOI: https://doi.org/10.1002/gsj.1363

Araujo, A. C., Carneiro, P. L., Alvarenga, A. B., Oliveira, H. R., Miller, S. P., Retallick, K., & Brito, L. F., (2022). Haplotype-based single-step GWAS for yearling temperament in American Angus cattle. Genes, 13, 17. DOI: https://doi.org/10.3390/genes13010017

Blázquez, F. L., & Miño, B. S. (2014). Maximal correlation in a non-diagonal case. Journal of Multivariate Analysis, 131(C), 265-278. DOI: https://doi.org/10.1016/j.jmva.2014.07.008

Breiman, L., & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391), 580-598. DOI: https://doi.org/10.1080/01621459.1985.10478157

Brito, F. V., Neto, J. B., Sargolzaei, M., Cobuci, J. A., & Schenkel, F. S. (2011). Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genetics, 12(80), 1-10. DOI: https://doi.org/10.1186/1471-2156-12-80

Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D., & Calus, M. P. L. (2013). Whole genome regression and prediction methods applied to plant and animal breeding. Genetics, 193(2), 327-345. DOI: https://doi.org/10.1534/genetics.112.143313

Costa, E. V., Ventura, H. T., Veroneze, R., Silva, F. F., Pereira, M. A., & Lopes, P. S. (2019). Bayesian linear-threshold censored models for genetic evaluation of age at first calving and stayability in Nellore cattle. Livestock Science, 230(103833). DOI: https://doi.org/10.1016/j.livsci.2019.103833

Deebani, W., & Kachouie, N. N. (2020) Monte Carlo ensemble correlation coefficient for association detection. Communications in Statistics - Simulation and Computation, 51(12), 7095-7109. DOI: https://doi.org/10.1080/03610918.2020.1823413

Feizi, S., Makhdoumi, A., Duffy, K., Kellis, M., & Medard, M. (2017). Network maximal correlation. IEEE Transactions on Network Science and Engineering, 4(4), 229-247. DOI: https://doi.org/10.1109/TNSE.2017.2716966

Galton, F. (1888). Co-relations and their measurement, chiefly from anthropometric data. Proceedings of the Royal Society of London, 45(273-279), 135-145. DOI: https://doi.org/10.1098/rspl.1888.0082

Gebelein, H. (1941). Das statistische problem der correlation als variation und eigenwertproblem und sein zusammenhang mit der ausgleichrechnung. Zeitschrift für angewandte Mathematik und Mechanik, 21(6), 364-379. DOI: https://doi.org/10.1002/zamm.19410210604

Giolo, S. R., & Demétrio, C. G. B. (2011). A frailty modeling approach for parental effects in animal breeding. Journal of Applied Statistics, 38(3), 619-629. DOI: https://doi.org/10.1080/02664760903521492

Hou, Y., Madsen, P., Labouriau, R., Zhang, Y., Lund, M. S., & Su, G. (2009). Genetic analysis of days from calving to first insemination and days open in Danish Holsteins using different models and censoring scenarios. Journal Dairy Science, 92(3), 1229-1239. DOI: https://doi.org/10.3168/jds.2008-1556

Kärkkäinen, H. P., & Sillanpää, M. J. (2013). Fast Genomic Predictions via Bayesian G-BLUP and Multilocus Models of Threshold Traits Including Censored Gaussian Data. G3, 3(9), 1511-1523. DOI: https://doi.org/10.1534/g3.113.007096

Kendall, M. (1938). A new measure of rank correlation. Biometrika, 30(1/2), 81-93. DOI: https://doi.org/10.1093/biomet/30.1-2.81

Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 1-16. DOI: https://doi.org/10.1103/PhysRevE.69.066138

Lewis, R. A., & McDonald, J. B. (2014). Partially Adaptive Estimation of the Censored Regression Model. Econometric Reviews, 33(7), 732-750. DOI: https://doi.org/10.1080/07474938.2012.690691

Li, Y., Gillespie, B. W., Shedden, K., & Gillespie, J. A. (2018). Profile Likelihood Estimation of the Correlation Coefficient in the Presence of Left, Right or Interval Censoring and Missing Data. The R Journal, 10(2), 159-179. DOI: https://doi.org/10.32614/RJ-2018-040

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.

Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The importance of the normality assumption in large public health data sets. Annual Review of Public Health, 23, 151-169. DOI: https://doi.org/10.1146/annurev.publhealth.23.100901.140546

Massender, E., Brito, L.F., Maignel, L., Oliveira, H.R., Jafarikia, M., Baes, C.F., ... Schenkel, F.S., (2022). Single-and multiple-breed genomic evaluations for conformation traits in Canadian Alpine and Saanen dairy goats. Journal of Dairy Science, 105(7), 5985-6000. DOI: https://doi.org/10.3168/jds.2021-21713

Meuwissen, T. H. E., Hayes, B. J., & Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829. DOI: https://doi.org/10.1093/genetics/157.4.1819

Moon, Y., Rajagopalan, B., & Lall, U. (1995). Estimation of mutual information using kernel density estimators. Physical Review E, 52(3), 2318-2321. DOI: https://doi.org/10.1103/PhysRevE.52.2318

Newton, E., & Rudel, R. (2007). Estimating correlation with multiply censored data arising from the adjustment of singly censored data. Environmental science & technology, 41, 221-228. DOI: https://doi.org/10.1021/es0608444

Oakes, D. (1982). A concordance test for independence in the presence of censoring. Biometrics, 38(2), 451-455.

Oliveira, H. R., Miller, S. P., Brito, L. F., & Schenkel, F. S. (2021). Impact of censored or penalized data in the genetic evaluation of two longevity indicator traits using random regression models in North American Angus cattle. Animals, 11(3). DOI: https://doi.org/10.3390/ani11030800

Palaiokostas, C., Ferraresso, S., Franch, R., Houston, R. D., & Bargelloni, L. (2016). Genomic Prediction of Resistance to Pasteurellosis in Gilthead Sea Bream (Sparus aurata) Using 2b-RAD Sequencing. G3, 6(11), 3693-3700. DOI: https://doi.org/10.1534/g3.116.035220

Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13, 25-45. DOI: https://doi.org/10.1093/biomet/13.1.25

Pérez, P., & Campos, G. (2014). Genome-wide regression and prediction with the BGLR statistical package. Genetics, 198(2), 483-495. DOI: https://doi.org/10.1534/genetics.114.164442

R Development Core Team. (2020). R: a language and environment for statistical computing. Vienna, AU: R Foundation for Statistical Computing. Retrieved from https://cran.r-project.org/bin/windows/base/

Rényi, A. (1959). On measures of dependence. Acta Mathematica Hungarica, 10, 441-451. DOI: https://doi.org/10.1007/bf02024507

Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., … Sabeti, P. C. (2011). Detecting Novel Associations in Large Datasets. Science, 334(6062), 1518-1524. DOI: https://doi.org/10.1126/science.1205438

Ripatti, S., & Palmgren, J. (2000). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics, 56(4), 1016-1022. DOI: https://doi.org/10.1111/j.0006-341x.2000.01016.x

Santos, S. S., Takahashi, D. Y., Nakata, A., & Fujita, A. (2014). A comparative study of statistical methods used to identify dependencies between gene expression signals. Briefings in Bioinformatics, 15(6), 906-918. DOI: https://doi.org/10.1093/bib/bbt051

Santos, V. S., Martins, F. S., Resende, M. D., Azevedo, C. F., Lopes, P. S., Guimarães, S. E., ... Silva, F. F. (2015). Genomic selection for slaughter age in pigs using the Cox frailty model. Genetics and Molecular Research, 14(4), 12616-12627. DOI: https://doi.org/10.4238/2015.October.19.5

Sargolzaei, M., & Schenkel, F. S. (2009). QMSim: A large-scale genome simulator for livestock. Bioinformatics, 25(5), 680-681. DOI: https://doi.org/10.1093/bioinformatics/btp045

Smith, B. J. (2007). boa: An R Package for MCMC Output Convergence Assessment and Posterior Inference. Journal of Statistical Software, 21(11), 1-37. DOI: https://doi.org/10.18637/jss.v021.i11

Spearman, C. (1904). "General intelligence", objectively determined and measured. The American Journal of Psychology, 15(2), 201-292. DOI: https://doi.org/10.2307/1412107

Spector, P., Friedman, J., Tibshirani, R., Lumley, T., Garbett, S., & Baron, J. (2016). Acepack: ACE and AVAS methods for choosing regression transformations. R package version 1.4.1. Retrieved from https://cran.r-project.org/web/packages/acepack/index.html

Szekely, G., Rizzo, M., & Bakirov, N. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35(6), 2769-2794. DOI: https://doi.org/10.1214/009053607000000505

Teissier, M., Larroque, H., Brito, L. F., Rupp, R., Schenkel, F. S., & Robert-Granié, C., (2020). Genomic predictions based on haplotypes fitted as pseudo-SNP for milk production and udder type traits and SCS in French dairy goats. Journal of Dairy Science, 103(12), 11559-11573. DOI: https://doi.org/10.3168/jds.2020-18662

Therneau, T. M., Grambsch, P. M., & Pankratz, V. S. (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics, 12, 156-175. DOI: https://doi.org/10.1198/1061860031365

Therneau, T. M. (2020). Coxme: Mixed Effects Cox Models. R-package description., 1-14. Retrieved from https://cran.r-project.org/web/packages/coxme/vignettes/coxme.pdf

Tobin, J. (1958). Estimation of Relationships for Limited Dependent Variables. Econometrica, 26, 24-36. DOI: https://doi.org/10.2307/1907382

Vallejo, R. L., Leeds, T. D., Fragomeni, B. O., Gao, G., Hernandez, A. G., Misztal, I., ... Palti, Y. (2016). Evaluation of genome-enabled selection for bacterial cold-water disease resistance using progeny performance data in rainbow trout: Insights on genotyping methods and genomic prediction models. Frontiers in Genetics, 7(96), 1-13. DOI: https://doi.org/10.3389/fgene.2016.00096

VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Sciences, 91(11), 4414-4423. DOI: https://doi.org/10.3168/jds.2007-0980

Wientjes, Y. C. J., Veerkamp, R. F., & Calus, M. P. L. (2013). The effect of linkage disequilibrium and family relationships on the reliability of genomic prediction. Genetics, 193(2), 621-631. DOI: https://doi.org/10.1534/genetics.112.146290

Publicado
2023-08-17
Como Citar
Pereira, G. M. da C., Martins Filho, S., Veroneze, R., Brito, L. F., Santos, V. S. dos, & Glória , L. S. (2023). Alternative measures to evaluate the accuracy and bias of genomic predictions with censored records. Acta Scientiarum. Animal Sciences, 45(1), e61509. https://doi.org/10.4025/actascianimsci.v45i1.61509
Seção
Produção Animal

0.9
2019CiteScore
 
 
29th percentile
Powered by  Scopus