Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.13346
Cited By
v1
v2
v3 (latest)
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
19 May 2025
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization"
22 / 72 papers shown
Title
RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
Afra Feyza Akyürek
Ekin Akyürek
Aman Madaan
Ashwin Kalyan
Peter Clark
Derry Wijaya
Niket Tandon
ALM
KELM
94
100
0
15 May 2023
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Emre Kıcıman
Robert Osazuwa Ness
Amit Sharma
Chenhao Tan
LRM
ELM
124
281
0
28 Apr 2023
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
266
1,134
0
17 Oct 2022
FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han
Hailey Schoelkopf
Yilun Zhao
Zhenting Qi
Martin Riddell
...
Yingbo Zhou
Caiming Xiong
Rex Ying
Arman Cohan
Dragomir R. Radev
ReLM
LRM
105
109
0
02 Sep 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
524
3,721
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
883
13,176
0
04 Mar 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
789
0
01 Dec 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLM
FaML
181
2,386
0
05 Mar 2021
Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Luyu Gao
Yunyi Zhang
Jiawei Han
Jamie Callan
86
99
0
18 Jan 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
345
735
0
06 Jan 2021
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
252
2,180
0
02 Sep 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
375
18,859
0
13 Feb 2020
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
125
254
0
11 Feb 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
210
12,121
0
13 Nov 2019
Model-Based Reinforcement Learning Exploiting State-Action Equivalence
Mahsa Asadi
M. S. Talebi
Hippolyte Bourel
Odalric-Ambrym Maillard
OffRL
89
9
0
09 Oct 2019
Latent Retrieval for Weakly Supervised Open Domain Question Answering
Kenton Lee
Ming-Wei Chang
Kristina Toutanova
RALM
112
1,017
0
01 Jun 2019
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
167
2,648
0
14 Mar 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
529
19,237
0
20 Jul 2017
Near Optimal Behavior via Approximate State Abstraction
David Abel
D Ellis Hershkowitz
Michael L. Littman
OffRL
73
164
0
15 Jan 2017
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
107
3,434
0
08 Jun 2015
Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence
Nihar B. Shah
Sivaraman Balakrishnan
Joseph K. Bradley
Abhay K. Parekh
Kannan Ramchandran
Martin J. Wainwright
170
164
0
06 May 2015
Sample Complexity of Multi-task Reinforcement Learning
Emma Brunskill
Lihong Li
86
138
0
26 Sep 2013
Previous
1
2