Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12366
Cited By
A Critical Evaluation of AI Feedback for Aligning Large Language Models
19 February 2024
Archit Sharma
Sedrick Scott Keh
Eric Mitchell
Chelsea Finn
Kushal Arora
Thomas Kollar
ALM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Critical Evaluation of AI Feedback for Aligning Large Language Models"
20 / 20 papers shown
Title
Research on Superalignment Should Advance Now with Parallel Optimization of Competence and Conformity
HyunJin Kim
Xiaoyuan Yi
Jing Yao
Muhua Huang
Jinyeong Bak
James Evans
Xing Xie
44
0
0
08 Mar 2025
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
118
1
0
24 Feb 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Alexander Zhang
Marcus Dong
Jing Liu
Wei Zhang
Yejie Wang
...
Yancheng He
K. Deng
Wangchunshu Zhou
Wenhao Huang
Zhenru Zhang
LRM
62
2
0
23 Feb 2025
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Zhenfang Chen
Delin Chen
Rui Sun
Wenjun Liu
Chuang Gan
LLMAG
62
3
0
17 Feb 2025
On The Global Convergence Of Online RLHF With Neural Parametrization
Mudit Gaur
Amrit Singh Bedi
Raghu Pasupathy
Vaneet Aggarwal
28
0
0
21 Oct 2024
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
67
5
0
21 Aug 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
68
486
0
06 Aug 2024
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs
Ashwinee Panda
Berivan Isik
Xiangyu Qi
Sanmi Koyejo
Tsachy Weissman
Prateek Mittal
MoMe
45
13
0
24 Jun 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
47
10
0
21 Jun 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Amrith Rajagopal Setlur
Saurabh Garg
Xinyang Geng
Naman Garg
Virginia Smith
Aviral Kumar
42
45
0
20 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
38
6
0
17 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
39
5
0
15 Jun 2024
Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation
Oishi Banerjee
Hong-Yu Zhou
Subathra Adithan
Stephen Kwak
Kay Wu
Pranav Rajpurkar
MedIm
47
3
0
10 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
43
13
0
10 Jun 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
44
108
0
22 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
43
28
0
16 Apr 2024
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
Ruoxi Cheng
Haoxuan Ma
Shuirong Cao
Jiaqi Li
Aihua Pei
Zhiqiang Wang
Pengliang Ji
Haoyu Wang
Jiaqi Huo
AI4CE
29
6
0
15 Apr 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
309
0
05 Jan 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
298
1,610
0
18 Sep 2019
1