Clustering high dimensional data streams with representative points

Xiujun Wang, Hong Shen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

In this paper, we propose a novel algorithm for clustering high dimensional data streams with representative data points. The fixed-size interval partitioning adopted in traditional grid based clustering methods can not capture clusters in each dimension well when they are applied in evolving high dimensional data streams. It may generate unnecessary dense grids which misrepresent clusters in a subspace. To overcome these drawbacks, we quantify each dimension (attribute) of data points separately and use the generated representative data points for each dimension instead of fixed-size intervals. These data points are updated with incoming data points continuously so that they can capture the cluster trends in each dimension more accurately than the fixed-size intervals. Instead of discarding the historical data point as a whole, our algorithm confines data discarding at attribute level with the statistics stored in the representative data points. This enables us to keep useful parts of data points and discard the trivial parts. Experiment results on synthetic and real data sets display the high effectiveness and accuracy of the proposed method.

Original languageEnglish
Title of host publication6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
Pages449-453
Number of pages5
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009 - Tianjin, China
Duration: 14 Aug 200916 Aug 2009

Publication series

Name6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
Volume1

Conference

Conference6th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2009
Country/TerritoryChina
CityTianjin
Period14/08/0916/08/09

Keywords

  • Clustering
  • High dimensional data streams
  • Probability density estimation
  • Quantification
  • Representative data points

Fingerprint

Dive into the research topics of 'Clustering high dimensional data streams with representative points'. Together they form a unique fingerprint.

Cite this