ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.13351
  4. Cited By
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks

Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks

16 June 2025
Yifei Xu
Tusher Chakraborty
Srinagesh Sharma
Leonardo Nunes
Emre Kıcıman
Songwu Lu
Ranveer Chandra
    OffRLLRM
ArXiv (abs)PDFHTML

Papers citing "Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks"

1 / 1 papers shown
Title
CC-LEARN: Cohort-based Consistency Learning
CC-LEARN: Cohort-based Consistency Learning
Xiao Ye
Shaswat Shrivastava
Zhaonan Li
Jacob Dineen
Shijie Lu
Avneet Ahuja
Ming shen
Zhikun Xu
Ben Zhou
OffRLLRM
38
0
0
18 Jun 2025
1