MALWARE DETECTION USING MACHINE LEARNING

  • RAVI SANKAR YANNAM Department of Electrical and Elctronics Engineering, QIS college of Engineering and Technology
  • Anumala Alekhya
  • E. Deena Dayalan
  • Kattiram Lakshmi Prasanna

Resumo

Malware attacks companies and critical infrastructure. This turns into a significant issue in
modern cybersecurity. Malware simply is malicious software, or in other words, programs that are meant
to infiltrate, destroy, or corrupt a system, such as a computer or a network. This may lead to data theft,
loss of money, and crashing of the system. Detecting malware using signature-based methods is not effective
against the evolving nature of malware. Therefore, we need the smarter and cleverer methods of locating it.
The present work presents a machine learning approach to detect and classify different types of malware with
the help of supervised learning. In this study, a dataset is used that has various types of malware, some of
them being adware, SMS malware, benign software, riskware, and banking malware. The method involves
data preparation, feature selection through Chi-Square and Extra Trees Classifier, and classification through
Support Vector Machine (SVM), Decision Tree, and Naive Bayes classifiers. The database is split into a
training and a testing set to test the efficacy of each of the models. The methods of selecting features identify
the largest features that determine the accuracy of the classifications, which enhances the performance of the
model. Experimental data showed that the decision tree classifier was more accurate than the SVM and Na¨ıve
Bayes classifiers. The accuracy scores and confusion matrices are used to evaluate the trained model and
give a clear picture of its classification capacity. The visualizations, such as correlation heatmaps and feature
significance plots, are used to enhance the interpretability of the researchers. The trained model is pickled,
and the finished model is saved, which may be used in real-time malware detector systems. The methods of
selecting features identify the largest features that determine the accuracy of the classifications, which enhances
the performance of the model. Experimental data indicated that the decision tree classifier outperforms both
the SVM and na¨ıve Bayes classifiers in terms of accuracy. The accuracy scores and confusion matrices are used
to evaluate the trained model and give a clear picture of its classification capacity. To enhance interpretability,
data visualization methods like correlation heatmaps or feature significance plots are used. SCOPE: The scope
of the study is limited to the particular examples of a decision tree, support vector machine, na¨ıve Bayes,
feature selection, cybersecurity, and AI-based threat detection.

Downloads

Não há dados estatísticos.
Publicado
2026-01-21
Seção
Special Issue: Recent Advances in Computational and Applied Mathematics: Mode...