MRTune: A simulator for performance tuning of MapReduce jobs with skewed data

Xibo Zhou, Wuman Luo, Haoyu Tan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

MapReduce is a programming model designed by Google that has been widely used for both high performance computing and big data processing. Although the programming model is simple, it is very challenging to conduct performance tuning for a MapReduce job, considering the complexities of the configuration parameters and various tradeoffs between the performance gain of the optimization approaches and the extra overhead they bring about. One naive way to address this issue is to run the MapReduce jobs repeatedly using different combinations of configuration parameters and optimization methods, then select the one with the shortest running time. However, real execution is impractical because the combinations may be too many and the time of one run of each combination may be too long. Therefore, it is desirable if we can efficiently estimate the runtime of a job without real execution using only the input data and the configuration parameter settings of the cluster. In this paper, we propose a novel MapReduce simulator called MRTune for runtime estimation of MapReduce jobs. MRTune takes the key distribution of input data into consideration and can work well even when the key distribution of data is skewed. Moreover, MRTune can estimate the runtime of a job in the presence of unpredictable task failures. We evaluate MRTune implementing MapReduce jobs with Zipfian distributed input data. The result shows that MRTune can estimate the runtime of MapReduce jobs with high accuracy and efficiency while the key distribution of input data is skewed. We also conduct two case studies to analyse the impact of data skew and task failures on a MapReduce job.

Original languageEnglish
Title of host publication2014 20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014 - Proceedings
PublisherIEEE Computer Society
Pages352-359
Number of pages8
ISBN (Electronic)9781479976157
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014 - Hsinchu, Taiwan, Province of China
Duration: 16 Dec 201419 Dec 2014

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume2015-April
ISSN (Print)1521-9097

Conference

Conference20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014
Country/TerritoryTaiwan, Province of China
CityHsinchu
Period16/12/1419/12/14

Keywords

  • MapReduce
  • performance tuning
  • runtime estimation
  • simulator
  • skew

Fingerprint

Dive into the research topics of 'MRTune: A simulator for performance tuning of MapReduce jobs with skewed data'. Together they form a unique fingerprint.

Cite this