TY - JOUR
T1 - Prediction of job characteristics for intelligent resource allocation in HPC systems
T2 - a survey and future directions
AU - Hou, Zhengxiong
AU - Shen, Hong
AU - Zhou, Xingshe
AU - Gu, Jianhua
AU - Wang, Yunlan
AU - Zhao, Tianhai
N1 - Publisher Copyright:
© 2022, Higher Education Press.
PY - 2022/10
Y1 - 2022/10
N2 - Nowadays, high-performance computing (HPC) clusters are increasingly popular. Large volumes of job logs recording many years of operation traces have been accumulated. In the same time, the HPC cloud makes it possible to access HPC services remotely. For executing applications, both HPC end-users and cloud users need to request specific resources for different workloads by themselves. As users are usually not familiar with the hardware details and software layers, as well as the performance behavior of the underlying HPC systems. It is hard for them to select optimal resource configurations in terms of performance, cost, and energy efficiency. Hence, how to provide on-demand services with intelligent resource allocation is a critical issue in the HPC community. Prediction of job characteristics plays a key role for intelligent resource allocation. This paper presents a survey of the existing work and future directions for prediction of job characteristics for intelligent resource allocation in HPC systems. We first review the existing techniques in obtaining performance and energy consumption data of jobs. Then we survey the techniques for single-objective oriented predictions on runtime, queue time, power and energy consumption, cost and optimal resource configuration for input jobs, as well as multi-objective oriented predictions. We conclude after discussing future trends, research challenges and possible solutions towards intelligent resource allocation in HPC systems.
AB - Nowadays, high-performance computing (HPC) clusters are increasingly popular. Large volumes of job logs recording many years of operation traces have been accumulated. In the same time, the HPC cloud makes it possible to access HPC services remotely. For executing applications, both HPC end-users and cloud users need to request specific resources for different workloads by themselves. As users are usually not familiar with the hardware details and software layers, as well as the performance behavior of the underlying HPC systems. It is hard for them to select optimal resource configurations in terms of performance, cost, and energy efficiency. Hence, how to provide on-demand services with intelligent resource allocation is a critical issue in the HPC community. Prediction of job characteristics plays a key role for intelligent resource allocation. This paper presents a survey of the existing work and future directions for prediction of job characteristics for intelligent resource allocation in HPC systems. We first review the existing techniques in obtaining performance and energy consumption data of jobs. Then we survey the techniques for single-objective oriented predictions on runtime, queue time, power and energy consumption, cost and optimal resource configuration for input jobs, as well as multi-objective oriented predictions. We conclude after discussing future trends, research challenges and possible solutions towards intelligent resource allocation in HPC systems.
KW - cloud computing
KW - high-performance computing
KW - intelligent resource allocation
KW - job characteristics
KW - machine learning
KW - performance prediction
UR - http://www.scopus.com/inward/record.url?scp=85130725697&partnerID=8YFLogxK
U2 - 10.1007/s11704-022-0625-8
DO - 10.1007/s11704-022-0625-8
M3 - Review article
AN - SCOPUS:85130725697
SN - 2095-2228
VL - 16
JO - Frontiers of Computer Science
JF - Frontiers of Computer Science
IS - 5
M1 - 165107
ER -