MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce

Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng, Jianping Fan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

207 Citations (Scopus)

Abstract

Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scaleup of our work are very efficient.

Original languageEnglish
Title of host publicationProceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
Pages473-480
Number of pages8
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011 - Tainan, Taiwan, Province of China
Duration: 7 Dec 20119 Dec 2011

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISSN (Print)1521-9097

Conference

Conference2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
Country/TerritoryTaiwan, Province of China
CityTainan
Period7/12/119/12/11

Keywords

  • DBSCAN
  • Data mining
  • Mapreduce
  • Parallel system

Fingerprint

Dive into the research topics of 'MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce'. Together they form a unique fingerprint.

Cite this