跳至主導覽 跳至搜尋 跳過主要內容

Loyalty-SMOTE: Data synthesis algorithm for effective imbalanced data classification

  • Shengquan Hu
  • , Junfei Li
  • , Zefeng Li
  • , Zihao Zhang
  • , Yan Feng
  • , K. L.Eddie Law

研究成果: Article同行評審

摘要

Imbalanced datasets are always problematic in training machine learning models, so that classifiers often struggle to achieve satisfactory performance. Numerous approaches have been developed to tackle imbalanced data problems. Among them, some data-level methods perform linear interpolations between neighboring minority class samples to generate new data points, while others focus on oversampling boundary samples which are specific to certain classes. However, many methods fail to consider scenarios involving noise susceptibility. In this paper, we propose a novel data-level method called the Loyalty-SMOTE algorithm. We introduce the concept of Loyalty to identify noise and boundaries within datasets. After identifying potential noisy datapoints, SMOTE (Synthetic Minority Oversampling Technique) algorithm is applied to oversample the minority class boundary data. Subsequently, a denoising process based on Loyalty is conducted to obtain a balanced dataset. To extend our design, the concept of Attraction is introduced to generalize the denoising technique for multiclass problems. In our study, the SVM (Support Vector Machine) classifier is used as our base learner,and extensive experiments are performed to evaluate and compare different algorithms. Our results demonstrate that Loyalty-SMOTE achieved superior performance across multiple metrics on both binary and multiclass UCI datasets. For 30 binary datasets, it achieved the highest scores in 26 datasets (87%) for F1-score, 29 datasets (97%) for AUROC, 26 datasets (87%) for recall, and 27 datasets (90%) for G-mean. For 5 multiclass datasets, our design achieved scores of 0.8317, 0.6153, 0.8537, and 0.6717, respectively.

原文English
文章編號108677
期刊Neural Networks
199
DOIs
出版狀態Published - 7月 2026

指紋

深入研究「Loyalty-SMOTE: Data synthesis algorithm for effective imbalanced data classification」主題。共同形成了獨特的指紋。

引用此