Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2506.14175
Cited By
v1
v2 (latest)
GRAM: A Generative Foundation Reward Model for Reward Generalization
17 June 2025
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Qiaozhi He
Murun Yang
Bei Li
Tong Xiao
Chunliang Zhang
Tongran Liu
Jingbo Zhu
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"GRAM: A Generative Foundation Reward Model for Reward Generalization"
13 / 13 papers shown
Title
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
359
20
0
05 May 2025
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
Chris Yuhao Liu
Liang Zeng
Qingbin Liu
Rui Yan
Jujie He
Chaojie Wang
Shuicheng Yan
Yang Liu
Yahui Zhou
AI4TS
107
109
0
24 Oct 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
113
6
0
22 Aug 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
159
261
0
20 Mar 2024
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste
Usman Anwar
Robert Kirk
David M. Krueger
NoLa
ALM
80
138
0
04 Oct 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
393
4,422
0
09 Jun 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
132
605
0
22 May 2023
A Survey on Aspect-Based Sentiment Classification
Gianni Brauwers
Flavius Frasincar
LLMAG
69
118
0
27 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
883
13,148
0
04 Mar 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
789
0
01 Dec 2021
Does label smoothing mitigate label noise?
Michal Lukasik
Srinadh Bhojanapalli
A. Menon
Surinder Kumar
NoLa
187
351
0
05 Mar 2020
Fine-grained Sentiment Classification using BERT
Manish Munikar
Sushil Shakya
Aakash Shrestha
SSeg
56
204
0
04 Oct 2019
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
207
1,953
0
06 Jun 2019
1