跳至主導覽 跳至搜尋 跳過主要內容

Clustering high dimensional data streams with representative points

  • Xiujun Wang
  • , Hong Shen

研究成果: Conference contribution同行評審

3 引文 斯高帕斯(Scopus)

摘要

In this paper, we propose a novel algorithm for clustering high dimensional data streams with representative data points. The fixed-size interval partitioning adopted in traditional grid based clustering methods can not capture clusters in each dimension well when they are applied in evolving high dimensional data streams. It may generate unnecessary dense grids which misrepresent clusters in a subspace. To overcome these drawbacks, we quantify each dimension (attribute) of data points separately and use the generated representative data points for each dimension instead of fixed-size intervals. These data points are updated with incoming data points continuously so that they can capture the cluster trends in each dimension more accurately than the fixed-size intervals. Instead of discarding the historical data point as a whole, our algorithm confines data discarding at attribute level with the statistics stored in the representative data points. This enables us to keep useful parts of data points and discard the trivial parts. Experiment results on synthetic and real data sets display the high effectiveness and accuracy of the proposed method.

原文English
主出版物標題6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
頁面449-453
頁數5
DOIs
出版狀態Published - 2009
對外發佈
事件6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009 - Tianjin, China
持續時間: 14 8月 200916 8月 2009

出版系列

名字6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
1

Conference

Conference6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
國家/地區China
城市Tianjin
期間14/08/0916/08/09

指紋

深入研究「Clustering high dimensional data streams with representative points」主題。共同形成了獨特的指紋。

引用此