ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17997
  4. Cited By
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective
v1v2 (latest)

Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective

23 May 2025
Jintian Shao
YiMing Cheng
Hongyi Huang
Beiwen Zhang
ZhiYu Wu
You Shan
Mingkai Zheng
    LRM
ArXiv (abs)PDFHTML

Papers citing "Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective"

4 / 4 papers shown
Title
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
883
13,176
0
04 Mar 2022
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
252
2,184
0
02 Sep 2020
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
532
19,265
0
20 Jul 2017
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
125
3,438
0
08 Jun 2015
1