Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.11704
Cited By
From Lists to Emojis: How Format Bias Affects Model Alignment
18 September 2024
Xuanchang Zhang
Wei Xiong
Lichang Chen
Dinesh Manocha
Heng Huang
Tong Zhang
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Lists to Emojis: How Format Bias Affects Model Alignment"
11 / 61 papers shown
Title
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
27
426
0
13 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
144
13,788
0
15 Mar 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
170
2,457
0
12 Apr 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
587
12,525
0
04 Mar 2022
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
...
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
146
1,241
0
17 Dec 2021
Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
Logan Engstrom
Andrew Ilyas
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
L. Rudolph
Aleksander Madry
AAML
31
225
0
25 May 2020
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
120
2,023
0
16 Apr 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
391
1,664
0
18 Sep 2019
On the Weaknesses of Reinforcement Learning for Neural Machine Translation
Leshem Choshen
Lior Fox
Zohar Aizenbud
Omri Abend
51
105
0
03 Jul 2019
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
183
18,685
0
20 Jul 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
77
3,197
0
12 Jun 2017
Previous
1
2