
From Lists to Emojis: How Format Bias Affects Model Alignment
Papers citing "From Lists to Emojis: How Format Bias Affects Model Alignment"
50 / 61 papers shown
Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |
![]() Online Merging Optimizers for Boosting Rewards and Mitigating Tax in
Alignment Keming Lu Bowen Yu Fei Huang Yang Fan Runji Lin Chang Zhou |
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James V. Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |
![]() Mistral 7B Albert Q. Jiang Alexandre Sablayrolles A. Mensch Chris Bamford Devendra Singh Chaplot ...Teven Le Scao Thibaut Lavril Thomas Wang Timothée Lacroix William El Sayed |
![]() Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi ...Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom |