Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.13351
Cited By
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
16 June 2025
Yifei Xu
Tusher Chakraborty
Srinagesh Sharma
Leonardo Nunes
Emre Kıcıman
Songwu Lu
Ranveer Chandra
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks"
1 / 1 papers shown
Title
CC-LEARN: Cohort-based Consistency Learning
Xiao Ye
Shaswat Shrivastava
Zhaonan Li
Jacob Dineen
Shijie Lu
Avneet Ahuja
Ming shen
Zhikun Xu
Ben Zhou
OffRL
LRM
38
0
0
18 Jun 2025
1