TY - JOUR
T1 - CAGE
T2 - A Curiosity-Driven Graph-Based Explore-Exploit Algorithm for Solving Deterministic Environment MDPs With Limited Episode Problem
AU - Yu, Yide
AU - Liu, Yue
AU - Wong, Dennis
AU - Li, Huijie
AU - Egas-Lopez, Jose Vicente
AU - Ma, Yan
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - The explore-exploit dilemma in Markov Decision Processes (MDPs) is a fundamental challenge, especially in deterministic environments akin to real-world scenarios. Balancing exploration and exploitation within limited episodes is crucial to optimize decision-making. Despite existing research, challenges like parameter sensitivity, lack of global optimality, and inefficient exploration of low-value regions remain. We introduce the Curiosity-driven Algorithm based on Graph for Exploration (CAGE), which addresses these issues through a graph-based framework. CAGE includes two variants: CAGE-greedy, ensuring optimal solutions with ample episodes, and CAGE-centrality, prioritizing significant states in limited episodes. Key contributions include eliminating parameter sensitivity, guaranteeing global optimality, and enhancing exploration efficiency. To validate the performance of the CAGE algorithm series, we design a grid world experiment. The experimental results demonstrate that the CAGE algorithm outperforms a comparative algorithm, indicating its feasibility for implementation in the industry and its high level of explainability. Experimental results validate CAGE's effectiveness in complex environments.
AB - The explore-exploit dilemma in Markov Decision Processes (MDPs) is a fundamental challenge, especially in deterministic environments akin to real-world scenarios. Balancing exploration and exploitation within limited episodes is crucial to optimize decision-making. Despite existing research, challenges like parameter sensitivity, lack of global optimality, and inefficient exploration of low-value regions remain. We introduce the Curiosity-driven Algorithm based on Graph for Exploration (CAGE), which addresses these issues through a graph-based framework. CAGE includes two variants: CAGE-greedy, ensuring optimal solutions with ample episodes, and CAGE-centrality, prioritizing significant states in limited episodes. Key contributions include eliminating parameter sensitivity, guaranteeing global optimality, and enhancing exploration efficiency. To validate the performance of the CAGE algorithm series, we design a grid world experiment. The experimental results demonstrate that the CAGE algorithm outperforms a comparative algorithm, indicating its feasibility for implementation in the industry and its high level of explainability. Experimental results validate CAGE's effectiveness in complex environments.
KW - Markov decision process
KW - curiosity-driven
KW - explore-exploit problem
KW - graph theory
UR - http://www.scopus.com/inward/record.url?scp=85205144777&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3468027
DO - 10.1109/ACCESS.2024.3468027
M3 - Article
AN - SCOPUS:85205144777
SN - 2169-3536
VL - 12
SP - 144106
EP - 144121
JO - IEEE Access
JF - IEEE Access
ER -