Unsupervised Skill Policy Learning Based on Goal-conditioned Reinforcement Learning

ZHANG Tian; WANG Zhenli; ZHANG Yifan

doi:10.13973/j.cnki.robot.240336

ZHANG Tian, WANG Zhenli, ZHANG Yifan. Unsupervised Skill Policy Learning Based on Goal-conditioned Reinforcement LearningJ. Robot, 2026, 48(3): 602-613. DOI: 10.13973/j.cnki.robot.240336

Citation:

ZHANG Tian, WANG Zhenli, ZHANG Yifan. Unsupervised Skill Policy Learning Based on Goal-conditioned Reinforcement LearningJ. Robot, 2026, 48(3): 602-613. DOI: 10.13973/j.cnki.robot.240336

Citation:

ZHANG Tian, WANG Zhenli, ZHANG Yifan. Unsupervised Skill Policy Learning Based on Goal-conditioned Reinforcement LearningJ. Robot, 2026, 48(3): 602-613. DOI: 10.13973/j.cnki.robot.240336

Unsupervised Skill Policy Learning Based on Goal-conditioned Reinforcement Learning

Graphical Abstract

Abstract

Abstract

Mutual information-based unsupervised skill learning algorithms entangle an agent's exploration process with its skill learning process, which may result in exploration limitations in complex environments and long-horizon decision-making tasks. To address this issue, this paper systematically reveals the causes of these limitations through theoretical analysis and experimental validation, and proposes an unsupervised skill policy learning method based on goal-conditioned reinforcement learning. Firstly, exploration is decoupled from skill learning by treating the goal space as the primitive skill space. By leveraging generalization across goals, the learning efficiency of the goal-conditioned policy is improved, thereby improving exploration and obtaining an initial exploration policy. Subsequently, a Go-Explore interaction scheme is adopted for policy fine-tuning to further enhance skill policy learning. In addition, new quality evaluation metrics are constructed based on the state space coverage rate and the skill consistency to comprehensively assess the overall performance of skill policies. Experiments on 4 classic maze environments demonstrate that the proposed two-stage method effectively overcomes exploration limitations and accelerates skill learning, achieving an average improvement of 72.1% over existing methods under the proposed metrics.

FullText(HTML)

References (25)

Cited By

Unsupervised Skill Policy Learning Based on Goal-conditioned Reinforcement Learning

Abstract

Catalog

Export File

Citation

Format

Content