Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective

v1v2 (latest)

Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective

23 May 2025

ArXiv (abs)PDF HTML

Papers citing "Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective"

4 / 4 papers shown

Title
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 883 13,176 0 04 Mar 2022
Learning to summarize from human feedback Nisan Stiennon Long Ouyang Jeff Wu Daniel M. Ziegler Ryan J. Lowe Chelsea Voss Alec Radford Dario Amodei Paul Christiano ALM 252 2,184 0 02 Sep 2020
Proximal Policy Optimization Algorithms John Schulman Filip Wolski Prafulla Dhariwal Alec Radford Oleg Klimov OffRL 532 19,265 0 20 Jul 2017
High-Dimensional Continuous Control Using Generalized Advantage Estimation John Schulman Philipp Moritz Sergey Levine Michael I. Jordan Pieter Abbeel OffRL 125 3,438 0 08 Jun 2015