Imbalanced data classification using improved clustering algorithm and under-sampling method

Lu Cao, Hong Shen

研究成果: Conference contribution同行評審

11 引文 斯高帕斯(Scopus)

摘要

Imbalanced classification problem is a hot issue in data mining and machine learning. Traditional classification algorithms are proposed based on some form of symmetry hypothesis of class distribution, whose main purpose is to improve the overall classification performance. It is difficult to obtain ideal classification result when handling imbalanced datasets. In order to improve the classification performance of imbalanced datasets, this paper proposes a cluster-based under-sampling algorithm (CUS) according to the important characteristic of support vector machines (SVM) classification relying on support vector. Firstly, majority class is divided into different clusters using improved clustering by fast search and find of density peaks (CFSFDP) algorithm. The improved clustering algorithm can realize automatic selection of clustering centers, which overcomes the limitation of the original algorithm. Then the minority class and each cluster of the majority class are used to construct training set to get the support vector of each cluster by support vector machine. Retaining support vectors for each cluster and deleting non-support vectors are to construct a new majority class sample points to obtain relatively balanced datasets. Finally, the new datasets are classified by support vector machines and the performance is evaluated by cross validation sets. The experimental results show that CUS algorithm is effective.

原文English
主出版物標題Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
編輯Hui Tian, Hong Shen, Wee Lum Tan
發行者Institute of Electrical and Electronics Engineers Inc.
頁面358-363
頁數6
ISBN(電子)9781728126166
DOIs
出版狀態Published - 12月 2019
對外發佈
事件20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019 - Gold Coast, Australia
持續時間: 5 12月 20197 12月 2019

出版系列

名字Proceedings - 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019

Conference

Conference20th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2019
國家/地區Australia
城市Gold Coast
期間5/12/197/12/19

指紋

深入研究「Imbalanced data classification using improved clustering algorithm and under-sampling method」主題。共同形成了獨特的指紋。

引用此