A comparative evaluation of the minimum covariance determinant and isolation forest methods for robust multivariate outlier detection

  • Mohanad Ne'ma Abdul Sayed Department of Computer Systems Techniques, Technical Institute/ Qurna, Southern Technical University
  • Rana H. Shamkhi

Résumé

This paper studies and compares the two most well-known statistical techniques in multivariate outlier detection: the Minimum Covariance Determinant estimator and the M-estimator. The authors present their findings regarding the performance of these estimators using simulated and real-world datasets, where performance is compared along key statistical measures such as breakdown points, computational efficiency, and resistance to contaminants. The results indicate that MCD is more robust than the M-estimator in extreme contamination, but the latter is better in terms of processing speed and sensitivity to usual levels of contamination. This shows clear trade-offs between the two methods. The empirical study provides comprehensive insights that enable statisticians and data scientists to make informed choices about which method to use for enhancing accuracy and reliability in multivariate statistical analysis.

Téléchargements

Les données sur le téléchargement ne sont pas encore disponible.

Références

Charu C. Aggarwal, Supervised outlier detection, Outlier Analysis, Springer, Cham, Switzerland, 2017, pp. 197–228.

M. M. Breunig, H. Kriegel, R. T. Ng, and J. Sander, Lof: identifying density-based local outliers, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas, USA), ACM, May 16–18 2000, pp. 93–104.

S. Chakraborty, A. Basu, and A. Ghosh, A componentwise estimation procedure for multivariate location and scatter: Robustness, efficiency and scalability, arXiv:2410.21166 (2024), arXiv preprint.

V. Chandola, A. Banerjee, and V. Kumar, Anomaly detection: A survey, ACM Computing Surveys (CSUR) 41 (2009), no. 3, 1–58.

D.Q.F. De Menezes, D.M. Prata, A.R. Secchi, and J.C. Pinto, A review on robust m-estimators for regression analysis, Computers & Chemical Engineering 147 (2021), 107254.

P. D. Doma´nski, Statistical outlier labelling–a comparative study, 2020 7th International Conference on Control, Decision and Information Technologies (CoDIT) (Prague, Czech Republic), vol. 1, IEEE, June 2020, pp. 439–444.

P. Filzmoser, R. G. Garrett, and C. Reimann, Multivariate outlier detection in exploration geochemistry, Computers & Geosciences 31 (2005), no. 5, 579–587.

V. Hodge and J. Austin, A survey of outlier detection methodologies, Artificial Intelligence Review 22 (2004), 85–126.

J.P. Irmer, A.G. Klein, and K. Schermelleh-Engel, Model-implied simulation-based power estimation for correctly specified and distributionally misspecified models: Applications to nonlinear and linear structural equation models, Behavior Research Methods 56 (2024), no. 8, 8955–8991.

Julien Patrick Irmer, Andreas G. Klein, and Karin Schermelleh-Engel, A general model-implied simulation-based power estimation method for correctly and misspecified models: Applications to nonlinear and linear structural equation models, OSF Preprint pe5bj, Center for Open Science, 2024.

J. Laurikkala, Improving identification of difficult small classes by balancing class distribution, Artificial Intelligence in Medicine. AIME 2001 (S. Quaglini, P. Barahona, and S. Andreassen, eds.), Lecture Notes in Computer Science, vol. 2101, Springer, Berlin, Heidelberg, 2001, pp. 494–503.

C. Leys, O. Klein, Y. Dominicy, and C. Ley, Detecting multivariate outliers: Use a robust variant of the mahalanobis distance, Journal of Experimental Social Psychology 74 (2018), 150–156.

F. T. Liu, K. M. Ting, and Z. H. Zhou, Isolation forest, 2008 Eighth IEEE International Conference on Data Mining (Pisa, Italy), IEEE, 2008, pp. 413–422.

R. A. Maronna, R. D. Martin, V. J. Yohai, and M. Salibi´an-Barrera, Robust statistics: theory and methods (with r), John Wiley & Sons, 2019.

M. Mayrhofer, Robustness and explainable outlier detection for multivariate, matrix-variate, and functional settings, Doctoral dissertation, Technische Universit¨at Wien, Vienna, Austria, 2024.

M. A. A. M. Mokhtar, N. S. Yusoff, and C. Z. Liang, Robust hotelling’s t2 statistic based on m-estimator, Journal of Physics: Conference Series 1988 (2021), no. 1, 012116.

E. Nkum, Robust multivariate estimation and inference with the minimum density power divergence estimator, Doctoral dissertation, The University of Texas, Austin, Texas, 2024.

M.A. Pimentel, D.A. Clifton, L. Clifton, and L. Tarassenko, A review of novelty detection, Signal Processing 99 (2014), 215–249.

A. Prasad, A.S. Suggala, S. Balakrishnan, and P. Ravikumar, Robust estimation via robust gradient estimation, Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (2020), no. 3, 601–627.

J.E. Pustejovsky and E. Tipton, Meta-analysis with robust variance estimation: Expanding the range of working models, Prevention Science 23 (2022), no. 3, 425–438.

C. Reimann and Wiley InterScience (Online service), Statistical data analysis explained: Applied environmental statistics with r, John Wiley & Sons, Chichester, UK, 2008.

P.J. Rousseeuw and M. Hubert, Robust statistics for outlier detection, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (2011), no. 1, 73–79.

O. K. Sajana, A study on robust multivariate techniques, Doctoral dissertation, St. Thomas’ College (Autonomous), Thrissur, University of Calicut, India, 2020.

B. Sch¨olkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, Estimating the support of a highdimensional distribution, Neural Computation 13 (2001), no. 7, 1443–1471.

Publiée
2025-08-10
Rubrique
Research Articles