Abstract:
Mutual information-based unsupervised skill learning algorithms entangle an agent's exploration process with its skill learning process, which may result in exploration limitations in complex environments and long-horizon decision-making tasks. To address this issue, this paper systematically reveals the causes of these limitations through theoretical analysis and experimental validation, and proposes an unsupervised skill policy learning method based on goal-conditioned reinforcement learning. Firstly, exploration is decoupled from skill learning by treating the goal space as the primitive skill space. By leveraging generalization across goals, the learning efficiency of the goal-conditioned policy is improved, thereby improving exploration and obtaining an initial exploration policy. Subsequently, a Go-Explore interaction scheme is adopted for policy fine-tuning to further enhance skill policy learning. In addition, new quality evaluation metrics are constructed based on the state space coverage rate and the skill consistency to comprehensively assess the overall performance of skill policies. Experiments on 4 classic maze environments demonstrate that the proposed two-stage method effectively overcomes exploration limitations and accelerates skill learning, achieving an average improvement of 72.1% over existing methods under the proposed metrics.