TY - GEN
T1 - Deep Reinforcement Learning for Network Security Applications With A Safety Guide
AU - Liu, Zhibo
AU - Lu, Xiaozhen
AU - Chen, Yuhan
AU - Xiao, Yilin
AU - Xiao, Liang
AU - Bu, Yanling
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Most of the typical reinforcement learning algorithms help wireless devices choose the security policy such as the moving strategy and communication policy by exploring all the possible state-action pairs including the risky policies that cause a severe collision or network disaster. In this paper, we design a safe reinforcement learning algorithm for safety-critical applications (e.g., intelligent transportation systems) to guide the learning agent to avoid exploring risky policies. This algorithm uses Q-network (i.e., a convolutional neural network or a deep neural network) to choose the policy and designs a safety guide to modify the chosen policy that results in dangerous status. More specifically, the safety guide includes a risk alarm module that evaluates the immediate warning value corresponding to the risk of each state-action pair and a G-network that estimates the long-term risk value. By adding the long-term risk value and the long-term expected reward output by the Q-network, this algorithm uses a safety dock to modify the chosen policy. This algorithm uses the immediate warning value to formulate a safe buffer and a risky buffer for the G-network updating to ensure fully exploration in the initial learning process. As a case study, we apply the designed algorithm in a cargo transportation system, in which the experimental results verify the effectiveness of our algorithm compared with the benchmark safe deep Q-network.
AB - Most of the typical reinforcement learning algorithms help wireless devices choose the security policy such as the moving strategy and communication policy by exploring all the possible state-action pairs including the risky policies that cause a severe collision or network disaster. In this paper, we design a safe reinforcement learning algorithm for safety-critical applications (e.g., intelligent transportation systems) to guide the learning agent to avoid exploring risky policies. This algorithm uses Q-network (i.e., a convolutional neural network or a deep neural network) to choose the policy and designs a safety guide to modify the chosen policy that results in dangerous status. More specifically, the safety guide includes a risk alarm module that evaluates the immediate warning value corresponding to the risk of each state-action pair and a G-network that estimates the long-term risk value. By adding the long-term risk value and the long-term expected reward output by the Q-network, this algorithm uses a safety dock to modify the chosen policy. This algorithm uses the immediate warning value to formulate a safe buffer and a risky buffer for the G-network updating to ensure fully exploration in the initial learning process. As a case study, we apply the designed algorithm in a cargo transportation system, in which the experimental results verify the effectiveness of our algorithm compared with the benchmark safe deep Q-network.
KW - Deep reinforcement learning
KW - cargo transportation
KW - long-term risk
KW - network security
KW - safety guide
UR - http://www.scopus.com/inward/record.url?scp=85173019089&partnerID=8YFLogxK
U2 - 10.1109/ICCC57788.2023.10233612
DO - 10.1109/ICCC57788.2023.10233612
M3 - Conference contribution
AN - SCOPUS:85173019089
T3 - 2023 IEEE/CIC International Conference on Communications in China, ICCC 2023
BT - 2023 IEEE/CIC International Conference on Communications in China, ICCC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/CIC International Conference on Communications in China, ICCC 2023
Y2 - 10 August 2023 through 12 August 2023
ER -