Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms

Yide Yu, Yan Ma, Yue Liu, Dennis Wong, Kin Lei, José Vicente Egas-López

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

The objective of this study is to measure the discrepancy between states and observations within the context of the Partially Observable Markov Decision Process (POMDP). The gap between states and observations is formulated as a State-Observation-Gap (SOG) problem, represented by the symbol Δ, where states and observations are treated as sets. The study also introduces the concept of Observation Confidence (OC) which serves as an indicator of the reliability of the observation, and it is established that there is a positive correlation between OC and Δ. To calculate the cumulative entropy λ of rewards in ⟨ o, a, · ⟩, we propose two weighting algorithms, namely Universal Weighting and Specific Weighting. Empirical and theoretical assessments carried out in the Cliff Walking environment attest to the effectiveness of both algorithms in determining Δ and OC.

Original languageEnglish
Title of host publicationArtificial Intelligence Applications and Innovations - 19th IFIP WG 12.5 International Conference, AIAI 2023, Proceedings
EditorsIlias Maglogiannis, Lazaros Iliadis, John MacIntyre, Manuel Dominguez
PublisherSpringer Science and Business Media Deutschland GmbH
Pages137-148
Number of pages12
ISBN (Print)9783031341106
DOIs
Publication statusPublished - 2023
Event19th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2023 - León, Spain
Duration: 14 Jun 202317 Jun 2023

Publication series

NameIFIP Advances in Information and Communication Technology
Volume675 IFIP
ISSN (Print)1868-4238
ISSN (Electronic)1868-422X

Conference

Conference19th IFIP WG 12.5 International Conference on Artificial Intelligence Applications and Innovations, AIAI 2023
Country/TerritorySpain
CityLeón
Period14/06/2317/06/23

Keywords

  • Information Theory
  • Partially Observable Markov Decision Process
  • Reinforcement Learning

Fingerprint

Dive into the research topics of 'Measuring the State-Observation-Gap in POMDPs: An Exploration of Observation Confidence and Weighting Algorithms'. Together they form a unique fingerprint.

Cite this