A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution

Dandan Zhang, Hong Shen, Tian Hui, Yidong Li, Jun Wu, Yingpeng Sang

研究成果: Conference article同行評審

6 引文 斯高帕斯(Scopus)

摘要

Classification is an important and practical tool which uses a model built on historical data to predict class labels for new arrival data. In the last few years, there have been many interesting studies on classification in data streams. However, most such studies assume that those data streams are relatively balanced and stable. Actually, skewed data streams (e.g., few positive but lots of negatives) are very important and typical, which appear in many real world applications. Concept drifts and skewed distributions, two common properties of data streams, make the task of learning in streams particularly difficult and the traditional data mining algorithms no longer work. In this paper, we propose a method (Selectively Re-train Approach Based on Clustering) which can deal with concept-drifting and skewed distribution simultaneously. We evaluate our algorithm on both synthetic and real data sets simulating skewed data streams. Empirical results show the proposed method yields better performance than the previous work.

原文English
頁(從 - 到)413-424
頁數12
期刊Lecture Notes in Computer Science
8444 LNAI
發行號PART 2
DOIs
出版狀態Published - 2014
對外發佈
事件18th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, PAKDD 2014 - Tainan, Taiwan, Province of China
持續時間: 13 5月 201416 5月 2014

指紋

深入研究「A selectively re-train approach based on clustering to classify concept-drifting data streams with skewed distribution」主題。共同形成了獨特的指紋。

引用此