ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.03492
  4. Cited By
Teaching Language Models to Critique via Reinforcement Learning

Teaching Language Models to Critique via Reinforcement Learning

5 February 2025
Zhihui Xie
Jie Chen
Lu Chen
Weichao Mao
Jingjing Xu
Dianbo Sui
    ALMLRM
ArXiv (abs)PDFHTML

Papers citing "Teaching Language Models to Critique via Reinforcement Learning"

8 / 8 papers shown
Title
Training Language Models to Generate Quality Code with Program Analysis Feedback
Training Language Models to Generate Quality Code with Program Analysis Feedback
Feng Yao
Zilong Wang
Liyuan Liu
Junxia Cui
Li Zhong
Xiaohan Fu
Haohui Mai
Vish Krishnan
Jianfeng Gao
Jingbo Shang
30
0
0
28 May 2025
Think Only When You Need with Large Hybrid-Reasoning Models
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang
Xun Wu
Shaohan Huang
Qingxiu Dong
Zewen Chi
Li Dong
Xingxing Zhang
Tengchao Lv
Lei Cui
Furu Wei
OffRLLRM
149
5
0
20 May 2025
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu
Tian Liang
Zhiwei He
Jiahao Xu
Wenxuan Wang
Pinjia He
Zhaopeng Tu
Haitao Mi
Dong Yu
OffRLReLMLRM
102
0
0
19 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELMLRM
119
0
0
19 May 2025
DeepCritic: Deliberate Critique with Large Language Models
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALMLRM
96
1
0
01 May 2025
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Balaji Rao
William Eiers
Carlo Lipizzi
133
0
0
23 Apr 2025
Heimdall: test-time scaling on the generative verification
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
95
7
0
14 Apr 2025
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
Liangjie Huang
Dawei Li
Huan Liu
Lu Cheng
LRM
100
0
0
03 Apr 2025
1