Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08290
Cited By
On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling
14 November 2023
Nicholas Corrado
Josiah P. Hanna
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling"
1 / 1 papers shown
Title
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Yifan Sun
Jingyan Shen
Yibin Wang
Tianyu Chen
Zhendong Wang
Mingyuan Zhou
Huan Zhang
92
0
0
05 Jun 2025
1