MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce

Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng, Jianping Fan

研究成果: Conference contribution同行評審

201 引文 斯高帕斯(Scopus)

摘要

Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scaleup of our work are very efficient.

原文English
主出版物標題Proceedings - 2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
頁面473-480
頁數8
DOIs
出版狀態Published - 2011
對外發佈
事件2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011 - Tainan, Taiwan, Province of China
持續時間: 7 12月 20119 12月 2011

出版系列

名字Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
ISSN(列印)1521-9097

Conference

Conference2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
國家/地區Taiwan, Province of China
城市Tainan
期間7/12/119/12/11

指紋

深入研究「MR-DBSCAN: An efficient parallel density-based clustering algorithm using MapReduce」主題。共同形成了獨特的指紋。

引用此