TY - JOUR
T1 - Loyalty-SMOTE
T2 - Data synthesis algorithm for effective imbalanced data classification
AU - Hu, Shengquan
AU - Li, Junfei
AU - Li, Zefeng
AU - Zhang, Zihao
AU - Feng, Yan
AU - Law, K. L.Eddie
N1 - Publisher Copyright:
© 2026 Elsevier Ltd
PY - 2026/7
Y1 - 2026/7
N2 - Imbalanced datasets are always problematic in training machine learning models, so that classifiers often struggle to achieve satisfactory performance. Numerous approaches have been developed to tackle imbalanced data problems. Among them, some data-level methods perform linear interpolations between neighboring minority class samples to generate new data points, while others focus on oversampling boundary samples which are specific to certain classes. However, many methods fail to consider scenarios involving noise susceptibility. In this paper, we propose a novel data-level method called the Loyalty-SMOTE algorithm. We introduce the concept of Loyalty to identify noise and boundaries within datasets. After identifying potential noisy datapoints, SMOTE (Synthetic Minority Oversampling Technique) algorithm is applied to oversample the minority class boundary data. Subsequently, a denoising process based on Loyalty is conducted to obtain a balanced dataset. To extend our design, the concept of Attraction is introduced to generalize the denoising technique for multiclass problems. In our study, the SVM (Support Vector Machine) classifier is used as our base learner,and extensive experiments are performed to evaluate and compare different algorithms. Our results demonstrate that Loyalty-SMOTE achieved superior performance across multiple metrics on both binary and multiclass UCI datasets. For 30 binary datasets, it achieved the highest scores in 26 datasets (87%) for F1-score, 29 datasets (97%) for AUROC, 26 datasets (87%) for recall, and 27 datasets (90%) for G-mean. For 5 multiclass datasets, our design achieved scores of 0.8317, 0.6153, 0.8537, and 0.6717, respectively.
AB - Imbalanced datasets are always problematic in training machine learning models, so that classifiers often struggle to achieve satisfactory performance. Numerous approaches have been developed to tackle imbalanced data problems. Among them, some data-level methods perform linear interpolations between neighboring minority class samples to generate new data points, while others focus on oversampling boundary samples which are specific to certain classes. However, many methods fail to consider scenarios involving noise susceptibility. In this paper, we propose a novel data-level method called the Loyalty-SMOTE algorithm. We introduce the concept of Loyalty to identify noise and boundaries within datasets. After identifying potential noisy datapoints, SMOTE (Synthetic Minority Oversampling Technique) algorithm is applied to oversample the minority class boundary data. Subsequently, a denoising process based on Loyalty is conducted to obtain a balanced dataset. To extend our design, the concept of Attraction is introduced to generalize the denoising technique for multiclass problems. In our study, the SVM (Support Vector Machine) classifier is used as our base learner,and extensive experiments are performed to evaluate and compare different algorithms. Our results demonstrate that Loyalty-SMOTE achieved superior performance across multiple metrics on both binary and multiclass UCI datasets. For 30 binary datasets, it achieved the highest scores in 26 datasets (87%) for F1-score, 29 datasets (97%) for AUROC, 26 datasets (87%) for recall, and 27 datasets (90%) for G-mean. For 5 multiclass datasets, our design achieved scores of 0.8317, 0.6153, 0.8537, and 0.6717, respectively.
KW - Classification
KW - Imbalanced data
KW - Loyalty-SMOTE
KW - SMOTE
UR - https://www.scopus.com/pages/publications/105029593975
U2 - 10.1016/j.neunet.2026.108677
DO - 10.1016/j.neunet.2026.108677
M3 - Article
AN - SCOPUS:105029593975
SN - 0893-6080
VL - 199
JO - Neural Networks
JF - Neural Networks
M1 - 108677
ER -