Advanced Machine Learning Based Malware Detection Systems

Song Kyoo Kim, Xiaomei Feng, Hussam Al Hamadi, Ernesto Damiani, Chan Yeob Yeun, Sivaprasad Nandyala

Research output: Contribution to journalArticlepeer-review

Abstract

In the area of machine learning (ML) training data optimization through the construction of compact data, the focus of this paper is presented. The concept of compact data design, aimed at creating an optimized dataset that maximizes benefits without the need to manage a vast amount of complex data, is introduced. Improvements in the methods for optimizing ML training have been incorporated into the development of artificial intelligence (AI) systems. The introduction of understanding ML training datasets as a facet of Explainable AI (XAI), comprehensible to humans, has been made. Among the methods of XAI, the evaluation of input feature importance stands out as a way to enhance the accuracy of complex ML models. The innovative method of compact data design for optimizing ML training through dataset reduction is proposed. The performance of an ML-based malware detection system, along with its variant utilizing compact data, has been assessed, demonstrating the maintenance of 99% accuracy. By applying a 76% reduced input dataset, the speed of ML training with the novel compact data design could be maximized, suggesting that an ML system trained in this manner could achieve statistically equivalent accuracy with only 57% of the original data sample size.

Original languageEnglish
Pages (from-to)115296-115305
Number of pages10
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Keywords

  • Compact data
  • artificial intelligence
  • data complexity
  • data reduction
  • machine learning
  • malware
  • robust classification
  • security
  • supervised learning

Fingerprint

Dive into the research topics of 'Advanced Machine Learning Based Malware Detection Systems'. Together they form a unique fingerprint.

Cite this