ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.13346
  4. Cited By
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
v1v2v3 (latest)

J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization

19 May 2025
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
    ELMLRM
ArXiv (abs)PDFHTML

Papers citing "J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization"

50 / 72 papers shown
Title
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRLALMLRM
108
9
0
23 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq Joty
ELMALMLRM
142
5
0
21 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLMLRM
196
121
0
18 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Haoran Wang
Tianyu Pang
Michael Shieh
OffRLLRM
436
16
0
17 Apr 2025
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems
Zixuan Ke
Fangkai Jiao
Yifei Ming
Xuan-Phi Nguyen
Austin Xu
...
Chengwei Qin
Peifeng Wang
Siyang Song
Caiming Xiong
Shafiq Joty
LRM
114
20
0
12 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
99
11
0
12 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRLLRM
175
51
0
03 Apr 2025
JudgeLRM: Large Reasoning Models as a Judge
JudgeLRM: Large Reasoning Models as a Judge
Nuo Chen
Zhiyuan Hu
Qingyun Zou
Jiaying Wu
Qian Wang
Bryan Hooi
Bingsheng He
ReLMELMLRM
152
14
0
31 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Min Lin
OffRLLRM
192
167
0
26 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
200
213
0
18 Mar 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
217
37
0
17 Mar 2025
BIG-Bench Extra Hard
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELMLRM
284
13
0
26 Feb 2025
Improve LLM-as-a-Judge Ability as a General Ability
Improve LLM-as-a-Judge Ability as a General Ability
Jiachen Yu
Shaoning Sun
Xiaohui Hu
Jiaxu Yan
Kaidong Yu
Xuelong Li
ELM
146
7
0
17 Feb 2025
Teaching Language Models to Critique via Reinforcement Learning
Teaching Language Models to Critique via Reinforcement Learning
Zhihui Xie
Jie Chen
Lu Chen
Weichao Mao
Jingjing Xu
Dianbo Sui
ALMLRM
54
12
0
05 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
377
1,967
0
22 Jan 2025
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
  Reinforcement Learning Perspective
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Xuanjing Huang
Xipeng Qiu
ELMAI4TSLRM
149
36
0
18 Dec 2024
Enhancing LLM Reasoning via Critique Models with Test-Time and
  Training-Time Supervision
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision
Zhiheng Xi
Dingwen Yang
Jixuan Huang
Jixin Tang
Guanyu Li
...
Zuxuan Wu
Qi Zhang
Xipeng Qiu
Xuanjing Huang
Yu-Gang Jiang
LRMLLMAG
111
25
0
25 Nov 2024
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and
  Evolution
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Maosong Cao
Alexander Lam
Haodong Duan
Hongwei Liu
Shanghang Zhang
Kai Chen
AILawELM
88
20
0
21 Oct 2024
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety
  and Style
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu
Zijun Yao
Rui Min
Yixin Cao
Lei Hou
Juanzi Li
OffRLALM
95
41
0
21 Oct 2024
How to Evaluate Reward Models for RLHF
How to Evaluate Reward Models for RLHF
Evan Frick
Tianle Li
Connor Chen
Wei-Lin Chiang
Anastasios Nikolas Angelopoulos
Jiantao Jiao
Banghua Zhu
Joseph E. Gonzalez
Ion Stoica
ALMOffRL
63
18
0
18 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELMALM
122
52
0
16 Oct 2024
Generative Reward Models
Generative Reward Models
Dakota Mahan
Duy Phung
Rafael Rafailov
Chase Blagden
Nathan Lile
Louis Castricato
Jan-Philipp Fränken
Chelsea Finn
Alon Albalak
VLMSyDaOffRL
59
42
0
02 Oct 2024
Beyond Scalar Reward Model: Learning Generative Judge from Preference
  Data
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Qingyao Ai
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
84
17
0
01 Oct 2024
Direct Judgement Preference Optimization
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
80
13
0
23 Sep 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than
  Scaling Model Parameters
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
192
692
0
06 Aug 2024
Self-Taught Evaluators
Self-Taught Evaluators
Tianlu Wang
Ilia Kulikov
O. Yu. Golovneva
Ping Yu
Weizhe Yuan
Jane Dwivedi-Yu
Richard Yuanzhe Pang
Maryam Fazel-Zarandi
Jason Weston
Xian Li
ALMLRM
41
27
0
05 Aug 2024
OffsetBias: Leveraging Debiased Data for Tuning Evaluators
OffsetBias: Leveraging Debiased Data for Tuning Evaluators
Junsoo Park
Seungyeon Jwa
Meiying Ren
Daeyoung Kim
Sanghyuk Choi
ALM
68
43
0
09 Jul 2024
LLM Critics Help Catch LLM Bugs
LLM Critics Help Catch LLM Bugs
Nat McAleese
Rai Michael Pokorny
Juan Felipe Cerón Uribe
Evgenia Nitishinskaya
Maja Trebacz
Jan Leike
ALMLRM
62
82
0
28 Jun 2024
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White
Samuel Dooley
Manley Roberts
Arka Pal
Ben Feuer
...
Willie Neiswanger
Micah Goldblum
Tom Goldstein
Willie Neiswanger
Micah Goldblum
ELM
100
20
0
27 Jun 2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and
  BenchBuilder Pipeline
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline
Tianle Li
Wei-Lin Chiang
Evan Frick
Lisa Dunlap
Tianhao Wu
Banghua Zhu
Joseph E. Gonzalez
Ion Stoica
ALM
84
181
0
17 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELMALMLM&MA
175
44
0
09 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language
  Understanding Benchmark
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRMELM
115
461
0
03 Jun 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating
  Other Language Models
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMeALMELM
108
205
0
02 May 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
113
444
0
12 Mar 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
122
279
0
21 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
146
1,274
0
05 Feb 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
322
86
0
31 Dec 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
115
728
0
20 Nov 2023
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Hamish Ivison
Yizhong Wang
Valentina Pyatkin
Nathan Lambert
Matthew E. Peters
...
Joel Jang
David Wadden
Noah A. Smith
Iz Beltagy
Hanna Hajishirzi
ALMELM
89
194
0
17 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language
  Models for Instruction Controllable Summarization
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
75
64
0
15 Nov 2023
MoCa: Measuring Human-Language Model Alignment on Causal and Moral
  Judgment Tasks
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
66
40
0
30 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALMLM&MAELM
78
240
0
12 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELMALM
100
191
0
11 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELMALM
61
91
0
09 Oct 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
192
2,303
0
12 Sep 2023
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLMALM
116
136
0
14 Aug 2023
Shepherd: A Critic for Language Model Generation
Shepherd: A Critic for Language Model Generation
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ALM
74
86
0
08 Aug 2023
Retroformer: Retrospective Large Language Agents with Policy Gradient
  Optimization
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Weiran Yao
Shelby Heinecke
Juan Carlos Niebles
Zhiwei Liu
Yihao Feng
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAGLM&Ro
86
80
0
04 Aug 2023
A Critical Evaluation of Evaluations for Long-form Question Answering
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu
Yixiao Song
Mohit Iyyer
Eunsol Choi
ELM
86
103
0
29 May 2023
Large Language Models are not Fair Evaluators
Large Language Models are not Fair Evaluators
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
122
571
0
29 May 2023
12
Next