TY - GEN
T1 - Virtual Sample Generation Approach for Imbalanced Classification
AU - Lu, Cao
AU - Shen, Hong
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Imbalanced classification problem is a hot topic in machine learning and data mining. The traditional classification algorithms assume that class distribution is balanced and the effect is not ideal when handling imbalanced datasets. In this paper, the support vector machine is used as basic classifier and a virtual sample generation method based on support vector is proposed to solve the problem of imbalanced classification and to improve the recognition rate of the minority class according to the characteristic that support vector machine is a classifier that relies heavily on support vectors. Firstly, support vector machine is used to learn training set to obtain support vectors of the minority class. Then, a certain number of virtual samples are generated around the support vector of the minority samples through the smoothness hypothesis to balance the data set. The generated samples can conform to the statistical characteristics of the original training data, which proves the rationality of the generated virtual samples. Finally, the new dataset is learned by support vector machine. Experimental results show that the method is effective in both artificial datasets and UCI standard datasets.
AB - Imbalanced classification problem is a hot topic in machine learning and data mining. The traditional classification algorithms assume that class distribution is balanced and the effect is not ideal when handling imbalanced datasets. In this paper, the support vector machine is used as basic classifier and a virtual sample generation method based on support vector is proposed to solve the problem of imbalanced classification and to improve the recognition rate of the minority class according to the characteristic that support vector machine is a classifier that relies heavily on support vectors. Firstly, support vector machine is used to learn training set to obtain support vectors of the minority class. Then, a certain number of virtual samples are generated around the support vector of the minority samples through the smoothness hypothesis to balance the data set. The generated samples can conform to the statistical characteristics of the original training data, which proves the rationality of the generated virtual samples. Finally, the new dataset is learned by support vector machine. Experimental results show that the method is effective in both artificial datasets and UCI standard datasets.
KW - Imbalanced-classiciation
KW - Oversampling
KW - Support-vector
KW - Support-vector-machine
UR - http://www.scopus.com/inward/record.url?scp=85065645408&partnerID=8YFLogxK
U2 - 10.1109/PAAP.2018.00038
DO - 10.1109/PAAP.2018.00038
M3 - Conference contribution
AN - SCOPUS:85065645408
T3 - Proceedings - International Symposium on Parallel Architectures, Algorithms and Programming, PAAP
SP - 177
EP - 182
BT - Proceedings - 2018 9th International Conference on Parallel Architectures, Algorithms and Programming, PAAP 2018
PB - IEEE Computer Society
T2 - 9th International Conference on Parallel Architectures, Algorithms and Programming, PAAP 2018
Y2 - 26 December 2018 through 28 December 2018
ER -