Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.10020
Cited By
Self-Rewarding Language Models
18 January 2024
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Rewarding Language Models"
50 / 276 papers shown
Title
RTBAgent: A LLM-based Agent System for Real-Time Bidding
Leng Cai
Junxuan He
Yongqian Li
Junjie Liang
Yuanping Lin
Ziming Quan
Yawen Zeng
Jin Xu
OffRL
123
1
0
02 Feb 2025
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Boostrapping
Pu Yang
Yunzhen Feng
Ziyuan Chen
Yuhang Wu
Zhuoyuan Li
DiffM
121
0
0
31 Jan 2025
Diverse Preference Optimization
Jack Lanchantin
Angelica Chen
Shehzaad Dhuliawala
Ping Yu
Jason Weston
Sainbayar Sukhbaatar
Ilia Kulikov
190
4
0
30 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Wentao Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
171
21
0
21 Jan 2025
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models
Qiming Bao
Juho Leinonen
A. Peng
Wanjun Zhong
Gaël Gendron
Tim Pistotti
Alice Huang
Paul Denny
Michael Witbrock
Jing Liu
AI4Ed
LRM
220
1
0
20 Jan 2025
Aligning Instruction Tuning with Pre-training
Yiming Liang
Tianyu Zheng
Xinrun Du
Ge Zhang
Qingbin Liu
...
Zhaoxiang Zhang
Wenhao Huang
Jiajun Zhang
Xiang Yue
Jiajun Zhang
154
3
0
16 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
132
101
0
03 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
409
0
0
31 Dec 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
101
11
0
31 Dec 2024
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Chia-Yu Hung
Navonil Majumder
Zhifeng Kong
Ambuj Mehrish
Rafael Valle
Bryan Catanzaro
Soujanya Poria
Bryan Catanzaro
Soujanya Poria
98
8
0
30 Dec 2024
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng
Xiao-Chang Liu
C. Wang
Xiaotao Gu
Yaojie Lu
Dan Zhang
Yuxiao Dong
J. Tang
Hongning Wang
Minlie Huang
LRM
145
4
0
16 Dec 2024
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
156
1
0
15 Dec 2024
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu
Peng Xia
Yun Li
Hongtu Zhu
Sheng Wang
Huaxiu Yao
138
3
0
09 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
167
15
0
05 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
177
2
0
01 Dec 2024
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
103
0
0
25 Nov 2024
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
153
24
0
25 Nov 2024
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Song Jiang
Da JU
Andrew Cohen
Sasha Mitts
Aaron Foss
Justine T Kao
Xian Li
Yuandong Tian
113
3
0
21 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
136
4
0
18 Nov 2024
SEE-DPO: Self Entropy Enhanced Direct Preference Optimization
Shivanshu Shekhar
Shreyas Singh
Tong Zhang
62
4
0
06 Nov 2024
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models
Jianyi Zhang
Da-Cheng Juan
Cyrus Rashtchian
Chun-Sung Ferng
Heinrich Jiang
Yiran Chen
68
4
0
01 Nov 2024
GrammaMT: Improving Machine Translation with Grammar-Informed In-Context Learning
Rita Ramos
Everlyn Asiko Chimoto
Maartje ter Hoeve
Natalie Schluter
83
2
0
24 Oct 2024
M-RewardBench: Evaluating Reward Models in Multilingual Settings
Srishti Gureja
Lester James V. Miranda
Shayekh Bin Islam
Rishabh Maheshwary
Drishti Sharma
Gusti Winata
Nathan Lambert
Sebastian Ruder
Sara Hooker
Marzieh Fadaee
LRM
97
21
0
20 Oct 2024
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Xiaochuan Li
Zichun Yu
Chenyan Xiong
SyDa
69
1
0
18 Oct 2024
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Chengyu Du
Jinyi Han
Yizhou Ying
Aili Chen
Qianyu He
...
Haoran Guo
Jiaqing Liang
Zulong Chen
Liangyue Li
Yanghua Xiao
KELM
CLL
LRM
60
1
0
17 Oct 2024
Anchored Alignment for Self-Explanations Enhancement
Luis Felipe Villa-Arenas
Ata Nizamoglu
Qianli Wang
Sebastian Möller
Vera Schmitt
52
0
0
17 Oct 2024
Retrospective Learning from Interactions
Zizhao Chen
Mustafa Omer Gul
Yiwei Chen
Gloria Geng
Anne Wu
Yoav Artzi
LRM
86
1
0
17 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
115
49
0
16 Oct 2024
CREAM: Consistency Regularized Self-Rewarding Language Models
Zhaoxiang Wang
Weilei He
Zhiyuan Liang
Xuchao Zhang
Chetan Bansal
Ying Wei
Weitong Zhang
Huaxiu Yao
ALM
132
11
0
16 Oct 2024
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao
Wenxuan Ding
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
59
1
0
14 Oct 2024
Thinking LLMs: General Instruction Following with Thought Generation
Tianhao Wu
Janice Lan
Weizhe Yuan
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
LRM
56
22
0
14 Oct 2024
Language Model Preference Evaluation with Multiple Weak Evaluators
Zhengyu Hu
Jieyu Zhang
Zhihan Xiong
Alexander Ratner
Hui Xiong
Ranjay Krishna
111
4
0
14 Oct 2024
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
Han Wang
Yilin Zhao
Dian Li
Xiaohan Wang
Gang Liu
Xuguang Lan
Haoran Wang
LRM
124
1
0
14 Oct 2024
VideoAgent: Self-Improving Video Generation
Achint Soni
Sreyas Venkataraman
Abhranil Chandra
Sebastian Fischmeister
Percy Liang
Bo Dai
Sherry Yang
LM&Ro
VGen
106
9
0
14 Oct 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Enyu Zhou
Guodong Zheng
Binghai Wang
Zhiheng Xi
Shihan Dou
...
Yurong Mou
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
117
19
0
13 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
ALM
LRM
152
7
0
11 Oct 2024
Unsupervised Data Validation Methods for Efficient Model Training
Yurii Paniv
55
1
0
10 Oct 2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
106
12
0
10 Oct 2024
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu
Lingyong Yan
Zihan Wang
Dawei Yin
Pengjie Ren
Maarten de Rijke
Zhaochun Ren
114
10
0
10 Oct 2024
Self-Boosting Large Language Models with Synthetic Preference Data
Qingxiu Dong
Li Dong
Xingxing Zhang
Zhifang Sui
Furu Wei
SyDa
69
12
0
09 Oct 2024
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Xiyao Wang
Linfeng Song
Ye Tian
Dian Yu
Baolin Peng
Haitao Mi
Furong Huang
Dong Yu
LRM
99
12
0
09 Oct 2024
Counterfactual Causal Inference in Natural Language with Large Language Models
Gaël Gendron
Jože M. Rožanec
Michael Witbrock
Gillian Dobbie
CML
61
3
0
08 Oct 2024
Accelerated Preference Optimization for Large Language Model Alignment
Jiafan He
Huizhuo Yuan
Q. Gu
49
3
0
08 Oct 2024
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Devon Hjelm
Alexander Toshev
Bogdan Mazoure
LM&Ro
ELM
OffRL
LRM
79
2
0
08 Oct 2024
Self-rationalization improves LLM as a fine-grained judge
Prapti Trivedi
Aditya Gulati
Oliver Molenschot
Meghana Arakkal Rajeev
Rajkumar Ramamurthy
Keith Stevens
Tanveesh Singh Chaudhery
Jahnavi Jambholkar
James Zou
Nazneen Rajani
LRM
64
7
0
07 Oct 2024
Rule-based Data Selection for Large Language Models
Xiaomin Li
Mingye Gao
Zhiwei Zhang
Chang Yue
Hong Hu
70
6
0
07 Oct 2024
MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?
Guanzhen Li
Yuxi Xie
Min-Yen Kan
VLM
371
0
0
06 Oct 2024
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao
Genta Indra Winata
Anirban Das
Shi-Xiong Zhang
D. Yao
Wenpin Tang
Sambit Sahu
84
9
0
05 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
125
11
0
03 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
43
3
0
02 Oct 2024
Previous
1
2
3
4
5
6
Next