Improvement of DGA Long Tail Problem Based on Transfer Learning

Baoyu Fan, Yue Liu, Laurie Cuthbert

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As the number of classes increases in traditional multiple classification and recognition tasks, there is often the problem of a long tail: the sample data is mainly distributed in a few classes. In the detection of domain names generating malware (DGA - domain generation algorithm), due to the variability of malware, the number of classes of DGA is also increasing and shows a long tail nature. However, in previous DGA detection research focused on the classes of a large amount of data so they do not address the long tail characteristics. We propose an effective knowledge transfer DGA detection model that transfers the knowledge learned in the previous stage of training to the next stage, and optimizes the impact of the long tail problem on the detection model. In order to inherit the continuity of the model, we propose a data balance review method, which can alleviate the catastrophic forgetting problem of transfer learning and detect new classes without retraining the whole model. Finally, the macro average F1 score of our model is 76.6%, 8.74% higher than ATT_BiLSTM and 6.34% higher than ATT_CNN_BiLSTM. So our model optimizes the long tail problem and better predicts all classes.

Original languageEnglish
Title of host publicationComputer and Information Science
EditorsRoger Lee
PublisherSpringer Science and Business Media Deutschland GmbH
Pages139-152
Number of pages14
ISBN (Print)9783031121265
DOIs
Publication statusPublished - 2023
Event22nd IEEE/ACIS International Conference on Computer and Information Science, ICIS 2022 - Zhuhai, China
Duration: 26 Jun 202228 Jun 2022

Publication series

NameStudies in Computational Intelligence
Volume1055
ISSN (Print)1860-949X
ISSN (Electronic)1860-9503

Conference

Conference22nd IEEE/ACIS International Conference on Computer and Information Science, ICIS 2022
Country/TerritoryChina
CityZhuhai
Period26/06/2228/06/22

Keywords

  • DGA
  • Data balanced review
  • Deep learning
  • Long tail problem
  • Transfer learning

Fingerprint

Dive into the research topics of 'Improvement of DGA Long Tail Problem Based on Transfer Learning'. Together they form a unique fingerprint.

Cite this