Reinforcement Learning With Selective Exploration for Interference Management in mmWave Networks

Dinh-van, Son and Nguyen, van-Linh and Cebecioglu, Berna Bulut and Masaracchia, Antonino and Higgins, Matthew D. (2025) Reinforcement Learning With Selective Exploration for Interference Management in mmWave Networks. IEEE Transactions on Machine Learning in Communications and Networking, 3. pp. 280-295. ISSN 2831-316X

Preview

Text
Reinforcement_Learning_With_Selective_Exploration_for_Interference_Management_in_mmWave_Networks.pdf - Published Version
Available under License Creative Commons Attribution.
Download (5MB)

Official URL: https://ieeexplore.ieee.org/document/10869481

Abstract

The next generation of wireless systems will leverage the millimeter-wave (mmWave) bands to meet the increasing traffic volume and high data rate requirements of emerging applications (e.g., ultra HD streaming, metaverse, and holographic telepresence). In this paper, we address the joint optimization of beamforming, power control, and interference management in multi-cell mmWave networks. We propose novel reinforcement learning algorithms, including a single-agent-based method (BPC-SA) for centralized settings and a multi-agent-based method (BPC-MA) for distributed settings. To tackle the high-variance rewards caused by narrow antenna beamwidths, we introduce a selective exploration method to guide the agent towards more intelligent exploration. Our proposed algorithms are well-suited for scenarios where beamforming vectors require control in either a discrete domain, such as a codebook, or in a continuous domain. Furthermore, they do not require channel state information, extensive feedback from user equipments, or any searching methods, thus reducing overhead and enhancing scalability. Numerical results demonstrate that selective exploration improves per-user spectral efficiency by up to 22.5% compared to scenarios without it. Additionally, our algorithms significantly outperform existing methods by 50% in terms of per-user spectral effciency and achieve 90% of the per-user spectral efficiency of the exhaustive search approach while requiring only 0.1% of its computational runtime.

Item Type:	Article
Identification Number:	10.1109/TMLCN.2025.3537967
Dates:	Date Event 3 February 2025 Accepted 3 February 2025 Published Online
Uncontrolled Keywords:	Beam training, deep reinforcement learning, interference management, mmWave, multi agent, power control, selective exploration
Subjects:	CAH11 - computing > CAH11-01 - computing > CAH11-01-01 - computer science
Divisions:	Architecture, Built Environment, Computing and Engineering > Computer Science
Depositing User:	Gemma Tonks
Date Deposited:	08 Jul 2025 10:30
Last Modified:	08 Jul 2025 10:30
URI:	https://www.open-access.bcu.ac.uk/id/eprint/16493