Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach

The pursuit of rate maximization in wireless communication frequently encounters substantial challenges associated with user fairness. This paper addresses these challenges by exploring a novel power allocation approach for delay optimization, utilizing graph neural networks (GNNs)-based reinforcement learning (RL) in device-to-device (D2D) communication. The proposed approach incorporates not only channel state information but also factors such as packet delay, the number of backlogged packets, and the number of transmitted packets into the components of the state information. We adopt a centralized RL method, where a central controller collects and processes the state information. The central controller functions as an agent trained using the proximal policy optimization (PPO) algorithm. To better utilize topology information in the communication network and enhance the generalization of the proposed method, we embed GNN layers into both the actor and critic networks of the PPO algorithm. This integration allows for efficient parameter updates of GNNs and enables the state information to be parameterized as a low-dimensional embedding, which is leveraged by the agent to optimize power allocation strategies. Simulation results demonstrate that the proposed method effectively reduces average delay while ensuring user fairness, outperforms baseline methods, and exhibits scalability and generalization capability.
View on arXiv@article{fang2025_2505.12902, title={ Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach }, author={ Hao Fang and Kai Huang and Hao Ye and Chongtao Guo and Le Liang and Xiao Li and Shi Jin }, journal={arXiv preprint arXiv:2505.12902}, year={ 2025 } }