ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.05118
  4. Cited By
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

7 April 2025
Yu Yue
Yufeng Yuan
Qiying Yu
Xiaochen Zuo
Ruofei Zhu
Wenyuan Xu
Jiaze Chen
Changbo Wang
Tiantian Fan
Zhengyin Du
Xiangpeng Wei
X. Yu
Gaohong Liu
Qingbin Liu
L. Liu
H. Lin
Zhiqi Lin
Bole Ma
Chenyi Zhang
Mofan Zhang
Wang Zhang
Hang Zhu
Ru Zhang
Xin Liu
Mingxuan Wang
Yonghui Wu
Lin Yan
    OffRL
    LRM
ArXivPDFHTML

Papers citing "VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks"

7 / 7 papers shown
Title
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization
Minghan Chen
Guikun Chen
Wenguan Wang
Yi Yang
12
0
0
18 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Yuriy Nevmyvaka
Mingyi Hong
LRM
12
0
0
17 May 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
21
0
0
16 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
127
5
0
29 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
42
2
0
18 Apr 2025
ToolRL: Reward is All Tool Learning Needs
ToolRL: Reward is All Tool Learning Needs
Cheng Qian
Emre Can Acikgoz
Qi He
Hongru Wang
Xiusi Chen
Dilek Hakkani-Tur
Gokhan Tur
Heng Ji
OffRL
LRM
38
6
0
16 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
100
5
0
09 Apr 2025
1