With the development of the 802.11be standard which incorporates Orthogonal Frequency Division Multiple Access (OFDMA), there will be more real-world scenarios which have more stations (STAs) and more Access Points (APs) in the dense network environment. When the communication environment has become more complex, STAs have higher transmission demands, such as higher network throughput and lower latency. When using reinforcement learning (RL) to deal with the resource allocation problem in current, the state space and action space will become very complex. To better solve the problem in context, we propose a resource allocation algorithm based on the Deep Deterministic Policy Gradient (DDPG) to maximize the system bitrate in a multi-STA multi-AP 802.11be OFDMA network. We add reparameterization to fit DDPG framework into our problem with high-dimensional discrete action space. Meanwhile a reward function is carefully designed to stabilize the performance of DDPG. Simulation results show that our proposed DDPG-based algorithm has more stable and higher performance compared to traditional RL algorithm, reaching 95.45% of the global optimum, and has good applicability in the complex multi-AP communication environment.