CSS: Handling imbalanced data by improved clustering with stratified sampling

Lu Cao, Hong Shen

研究成果: Article同行評審

8 引文 斯高帕斯(Scopus)

摘要

The traditional support vector machine technique (SVM) has drawbacks in dealing with imbalanced data. To address this issue, in this paper we propose an algorithm of improved clustering with stratified sampling technique (CSS) to improve the classification performance of SVMs on imbalanced datasets. Instead of applying a single type of sampling method as used in the literature, our algorithm treats different type of classes with different sampling methods. For minority classes, the algorithm uses oversampling method by adding noise which obeys normal distribution around every support vector to generate new samples. For majority classes, samples are first divided into different clusters by applying first the improved clustering by fast search to find of density peaks (CFSFDP) to obtain latent structure information in each majority class and then stratified sampling method is applied to extract samples from each subcluster of the majority class. Moreover, we further extend this method into an ensemble classifiers that use multiple base SVM classifiers for prediction. The experimental results of classification on several imbalanced classification datasets show that our CSS is more effective than the state-of-the-art sampling methods.

原文English
文章編號e6071
期刊Concurrency Computation Practice and Experience
34
發行號2
DOIs
出版狀態Published - 25 1月 2022
對外發佈

指紋

深入研究「CSS: Handling imbalanced data by improved clustering with stratified sampling」主題。共同形成了獨特的指紋。

引用此