10
0

GoalLadder: Incremental Goal Discovery with Vision-Language Models

Main:9 Pages
5 Figures
Bibliography:4 Pages
4 Tables
Appendix:3 Pages
Abstract

Natural language can offer a concise and human-interpretable means of specifying reinforcement learning (RL) tasks. The ability to extract rewards from a language instruction can enable the development of robotic systems that can learn from human guidance; however, it remains a challenging problem, especially in visual environments. Existing approaches that employ large, pretrained language models either rely on non-visual environment representations, require prohibitively large amounts of feedback, or generate noisy, ill-shaped reward functions. In this paper, we propose a novel method, GoalLadder\textbf{GoalLadder}, that leverages vision-language models (VLMs) to train RL agents from a single language instruction in visual environments. GoalLadder works by incrementally discovering states that bring the agent closer to completing a task specified in natural language. To do so, it queries a VLM to identify states that represent an improvement in agent's task progress and to rank them using pairwise comparisons. Unlike prior work, GoalLadder does not trust VLM's feedback completely; instead, it uses it to rank potential goal states using an ELO-based rating system, thus reducing the detrimental effects of noisy VLM feedback. Over the course of training, the agent is tasked with minimising the distance to the top-ranked goal in a learned embedding space, which is trained on unlabelled visual data. This key feature allows us to bypass the need for abundant and accurate feedback typically required to train a well-shaped reward function. We demonstrate that GoalLadder outperforms existing related methods on classic control and robotic manipulation environments with the average final success rate of \sim95% compared to only \sim45% of the best competitor.

View on arXiv
@article{zakharov2025_2506.16396,
  title={ GoalLadder: Incremental Goal Discovery with Vision-Language Models },
  author={ Alexey Zakharov and Shimon Whiteson },
  journal={arXiv preprint arXiv:2506.16396},
  year={ 2025 }
}
Comments on this paper