Abstract: Q-learning is a widely used reinforcement learning (RL) algorithm for optimizing wireless networks, but faces challenges with exploration in large state spaces. In [1], a multi-environment ...