跳至主導覽 跳至搜尋 跳過主要內容

Improved data streams classification with fast unsupervised feature selection

  • Lulu Wang
  • , Hong Shen

研究成果: Conference contribution同行評審

7 引文 斯高帕斯(Scopus)

摘要

Data streams classification poses three major challenges, namely, infinite length, concept-drift, and featureevolution. The first two issues have been widely studied. However, most existing data stream classification techniques ignore the last one. DXMiner [17], the first model which addresses featureevolution by using the past labeled instances to select the top ranked features based on a scores computed by a formula. This semi-supervised feature selection method depends on the quality of the past classification and neglects the possible correlation among different features, thus unable to produce an optimal feature subset which deteriorates the accuracy of classification. Multi-Cluster Feature Selection (MCFS) [5] proposed for static data classification and clustering applies unsupervised feature selection to address the feature-evolution problem, but suffers from the high computational cost in feature selection. In this paper, we apply MCFS in the DXMiner framework to handle each window of data in a data stream for dynamic data stream-classification. With unsupervised feature selection, our method produces the optimal feature subset and hence improves DXMiner on the classification accuracy. We further improve the time complexity of the feature selection process in MCFS by using the locality sensitive hashing forest (LSH Forest) [4]. The empirical results indicate that our approach outperforms stateof-The-Art streams classification techniques in classifying real-life data streams.

原文English
主出版物標題Proceedings - 17th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2016
編輯Hong Shen, Hong Shen, Yingpeng Sang, Hui Tian
發行者IEEE Computer Society
頁面221-226
頁數6
ISBN(電子)9781509050819
DOIs
出版狀態Published - 2 7月 2016
對外發佈
事件17th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2016 - Guangzhou, China
持續時間: 16 12月 201618 12月 2016

出版系列

名字Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
0

Conference

Conference17th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2016
國家/地區China
城市Guangzhou
期間16/12/1618/12/16

指紋

深入研究「Improved data streams classification with fast unsupervised feature selection」主題。共同形成了獨特的指紋。

引用此