TY - GEN
T1 - Sequential State Q-learning Uplink Resource Allocation in Multi-AP 802.11be Network
AU - Liu, Yue
AU - Yu, Yide
AU - Du, Zhenyu
AU - Cuthbert, Laurie
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Expected high demand of user applications in the WLAN is a driver for WLANs to share radio resources more efficiently. The move to 802.11be with OFDMA and MU-MIMO makes Radio Resource Management (RRM) a multi-dimensional problem in a complex wireless environment. Traditionally, the way that an RRM problem is formulated always leads to either a large state space or action space, which makes reinforcement learning impossible to be applied. In this paper, we propose a Sequential State Q-learning algorithm (SSQL) aimed at solving the Resource Unit (RU) allocation for scheduled uplink transmission to maximize system bitrate in a multi-AP 802.11be OFDMA network. The AP acts as the agent with the serving stations as 'states' and their RU allocations as 'actions'. The AP observes the wireless environment, continuously refreshing the Q-values of the state-action pairs and outputs the RU allocation to optimize the objective. Through simulations, we demonstrated that the performance of SSQL is 89.67% of the global optimal with very fast convergence, which makes it more practical for use in varying wireless networks.
AB - Expected high demand of user applications in the WLAN is a driver for WLANs to share radio resources more efficiently. The move to 802.11be with OFDMA and MU-MIMO makes Radio Resource Management (RRM) a multi-dimensional problem in a complex wireless environment. Traditionally, the way that an RRM problem is formulated always leads to either a large state space or action space, which makes reinforcement learning impossible to be applied. In this paper, we propose a Sequential State Q-learning algorithm (SSQL) aimed at solving the Resource Unit (RU) allocation for scheduled uplink transmission to maximize system bitrate in a multi-AP 802.11be OFDMA network. The AP acts as the agent with the serving stations as 'states' and their RU allocations as 'actions'. The AP observes the wireless environment, continuously refreshing the Q-values of the state-action pairs and outputs the RU allocation to optimize the objective. Through simulations, we demonstrated that the performance of SSQL is 89.67% of the global optimal with very fast convergence, which makes it more practical for use in varying wireless networks.
KW - IEEE 802.11be
KW - Markov Decision Process
KW - Q-learning
KW - Radio Resource Management
UR - http://www.scopus.com/inward/record.url?scp=85146983493&partnerID=8YFLogxK
U2 - 10.1109/VTC2022-Fall57202.2022.10013045
DO - 10.1109/VTC2022-Fall57202.2022.10013045
M3 - Conference contribution
AN - SCOPUS:85146983493
T3 - IEEE Vehicular Technology Conference
BT - 2022 IEEE 96th Vehicular Technology Conference, VTC 2022-Fall 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 96th IEEE Vehicular Technology Conference, VTC 2022-Fall 2022
Y2 - 26 September 2022 through 29 September 2022
ER -