MRTune: A simulator for performance tuning of MapReduce jobs with skewed data

Xibo Zhou, Wuman Luo, Haoyu Tan

研究成果: Conference contribution同行評審

2 引文 斯高帕斯(Scopus)

摘要

MapReduce is a programming model designed by Google that has been widely used for both high performance computing and big data processing. Although the programming model is simple, it is very challenging to conduct performance tuning for a MapReduce job, considering the complexities of the configuration parameters and various tradeoffs between the performance gain of the optimization approaches and the extra overhead they bring about. One naive way to address this issue is to run the MapReduce jobs repeatedly using different combinations of configuration parameters and optimization methods, then select the one with the shortest running time. However, real execution is impractical because the combinations may be too many and the time of one run of each combination may be too long. Therefore, it is desirable if we can efficiently estimate the runtime of a job without real execution using only the input data and the configuration parameter settings of the cluster. In this paper, we propose a novel MapReduce simulator called MRTune for runtime estimation of MapReduce jobs. MRTune takes the key distribution of input data into consideration and can work well even when the key distribution of data is skewed. Moreover, MRTune can estimate the runtime of a job in the presence of unpredictable task failures. We evaluate MRTune implementing MapReduce jobs with Zipfian distributed input data. The result shows that MRTune can estimate the runtime of MapReduce jobs with high accuracy and efficiency while the key distribution of input data is skewed. We also conduct two case studies to analyse the impact of data skew and task failures on a MapReduce job.

原文English
主出版物標題2014 20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014 - Proceedings
發行者IEEE Computer Society
頁面352-359
頁數8
ISBN(電子)9781479976157
DOIs
出版狀態Published - 2014
對外發佈
事件20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014 - Hsinchu, Taiwan, Province of China
持續時間: 16 12月 201419 12月 2014

出版系列

名字Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
2015-April
ISSN(列印)1521-9097

Conference

Conference20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014
國家/地區Taiwan, Province of China
城市Hsinchu
期間16/12/1419/12/14

指紋

深入研究「MRTune: A simulator for performance tuning of MapReduce jobs with skewed data」主題。共同形成了獨特的指紋。

引用此