TY - GEN
T1 - Measuring the State-Observation-Gap in POMDPs
T2 - 19th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2023
AU - Yu, Yide
AU - Ma, Yan
AU - Liu, Yue
AU - Wong, Dennis
AU - Lei, Kin
AU - Egas-López, José Vicente
N1 - Publisher Copyright:
© 2023, IFIP International Federation for Information Processing.
PY - 2023
Y1 - 2023
N2 - The objective of this study is to measure the discrepancy between states and observations within the context of the Partially Observable Markov Decision Process (POMDP). The gap between states and observations is formulated as a State-Observation-Gap (SOG) problem, represented by the symbol Δ, where states and observations are treated as sets. The study also introduces the concept of Observation Confidence (OC) which serves as an indicator of the reliability of the observation, and it is established that there is a positive correlation between OC and Δ. To calculate the cumulative entropy λ of rewards in ⟨ o, a, · ⟩, we propose two weighting algorithms, namely Universal Weighting and Specific Weighting. Empirical and theoretical assessments carried out in the Cliff Walking environment attest to the effectiveness of both algorithms in determining Δ and OC.
AB - The objective of this study is to measure the discrepancy between states and observations within the context of the Partially Observable Markov Decision Process (POMDP). The gap between states and observations is formulated as a State-Observation-Gap (SOG) problem, represented by the symbol Δ, where states and observations are treated as sets. The study also introduces the concept of Observation Confidence (OC) which serves as an indicator of the reliability of the observation, and it is established that there is a positive correlation between OC and Δ. To calculate the cumulative entropy λ of rewards in ⟨ o, a, · ⟩, we propose two weighting algorithms, namely Universal Weighting and Specific Weighting. Empirical and theoretical assessments carried out in the Cliff Walking environment attest to the effectiveness of both algorithms in determining Δ and OC.
KW - Information Theory
KW - Partially Observable Markov Decision Process
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=85163405534&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-34111-3_13
DO - 10.1007/978-3-031-34111-3_13
M3 - Conference contribution
AN - SCOPUS:85163405534
SN - 9783031341106
T3 - IFIP Advances in Information and Communication Technology
SP - 137
EP - 148
BT - Artificial Intelligence Applications and Innovations - 19th IFIP WG 12.5 International Conference, AIAI 2023, Proceedings
A2 - Maglogiannis, Ilias
A2 - Iliadis, Lazaros
A2 - MacIntyre, John
A2 - Dominguez, Manuel
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 14 June 2023 through 17 June 2023
ER -