Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.05826
Cited By
The CRINGE Loss: Learning what language not to model
10 November 2022
Leonard Adolphs
Tianyu Gao
Jing Xu
Kurt Shuster
Sainbayar Sukhbaatar
Jason Weston
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The CRINGE Loss: Learning what language not to model"
35 / 35 papers shown
Title
Exploring Criteria of Loss Reweighting to Enhance LLM Unlearning
Puning Yang
Qizhou Wang
Zhuo Huang
Tongliang Liu
Chengqi Zhang
Bo Han
MU
26
0
0
17 May 2025
Teaching Models to Understand (but not Generate) High-risk Data
Ryan Yixiang Wang
Matthew Finlayson
Luca Soldaini
Swabha Swayamdipta
Robin Jia
157
0
0
05 May 2025
Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning
Zhihan Zhang
Yixin Cao
Lizi Liao
28
0
0
03 Apr 2025
Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems
Magdalena Kaiser
P. Ernst
György Szarvas
74
0
0
25 Nov 2024
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration
Qintong Li
Jiahui Gao
Sheng Wang
Renjie Pi
Xueliang Zhao
Chuan Wu
Xin Jiang
Zhiyu Li
Lingpeng Kong
SyDa
28
3
0
22 Oct 2024
LLM Unlearning via Loss Adjustment with Only Forget Data
Yaxuan Wang
Jiaheng Wei
Chris Liu
Jinlong Pang
Qiang Liu
A. Shah
Yujia Bao
Yang Liu
Wei Wei
KELM
MU
43
8
0
14 Oct 2024
Minor DPO reject penalty to increase training robustness
Shiming Xie
Hong Chen
Fred Yu
Zeye Sun
Xiuyu Wu
Yingfan Hu
40
2
0
19 Aug 2024
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Karel DÓosterlinck
Winnie Xu
Chris Develder
Thomas Demeester
A. Singh
Christopher Potts
Douwe Kiela
Shikib Mehri
41
11
0
12 Aug 2024
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
Xavier Suau
Pieter Delobelle
Katherine Metcalf
Armand Joulin
N. Apostoloff
Luca Zappella
P. Rodríguez
MU
AAML
42
8
0
02 Jul 2024
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou
Ravi Agrawal
Shujian Zhang
Sathish Indurthi
Sanqiang Zhao
Kaiqiang Song
Silei Xu
Chenguang Zhu
35
18
0
17 Jun 2024
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
Yihe Deng
Pan Lu
Fan Yin
Ziniu Hu
Sheng Shen
James Zou
Kai-Wei Chang
Wei Wang
SyDa
VLM
LRM
44
37
0
30 May 2024
Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang
Weizhe Yuan
Kyunghyun Cho
He He
Sainbayar Sukhbaatar
Jason Weston
LRM
47
113
0
30 Apr 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
44
108
0
22 Apr 2024
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
71
5
0
11 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
152
114
0
04 Apr 2024
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Kyungjae Lee
Dasol Hwang
Sunghyun Park
Youngsoo Jang
Moontae Lee
46
8
0
21 Mar 2024
On Prompt-Driven Safeguarding for Large Language Models
Chujie Zheng
Fan Yin
Hao Zhou
Fandong Meng
Jie Zhou
Kai-Wei Chang
Minlie Huang
Nanyun Peng
AAML
52
47
0
31 Jan 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
242
298
0
18 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
77
96
0
03 Jan 2024
Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss
Jing Xu
Andrew Lee
Sainbayar Sukhbaatar
Jason Weston
20
86
0
27 Dec 2023
Controlled Decoding from Language Models
Sidharth Mudgal
Jong Lee
H. Ganapathy
Yaguang Li
Tao Wang
...
Michael Collins
Trevor Strohman
Jilin Chen
Alex Beutel
Ahmad Beirami
36
69
0
25 Oct 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
177
0
26 Sep 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang
Stephen Roller
Kyunghyun Cho
He He
Jason Weston
51
7
0
26 Jul 2023
System-Level Natural Language Feedback
Weizhe Yuan
Kyunghyun Cho
Jason Weston
41
5
0
23 Jun 2023
Improving Open Language Models by Learning from Organic Interactions
Jing Xu
Da Ju
Joshua Lane
M. Komeili
Eric Michael Smith
...
Rashel Moritz
Sainbayar Sukhbaatar
Y-Lan Boureau
Jason Weston
Kurt Shuster
25
8
0
07 Jun 2023
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning
Chujie Zheng
Pei Ke
Zheng Zhang
Minlie Huang
BDL
23
31
0
06 Jun 2023
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond
Haw-Shiuan Chang
Zonghai Yao
Alolika Gon
Hong-ye Yu
Andrew McCallum
46
10
0
20 May 2023
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models
Jimmy Wei
Kurt Shuster
Arthur Szlam
Jason Weston
Jack Urbanek
M. Komeili
LLMAG
44
40
0
26 Apr 2023
Learn What NOT to Learn: Towards Generative Safety in Chatbots
Leila Khalatbari
Yejin Bang
Dan Su
Willy Chung
Saeedeh Ghadimi
Hossein Sameti
Pascale Fung
41
7
0
21 Apr 2023
Controllable Text Generation with Language Constraints
Howard Chen
Huihan Li
Danqi Chen
Karthik R. Narasimhan
14
16
0
20 Dec 2022
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
35
24
0
16 Nov 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
366
12,003
0
04 Mar 2022
Internet-Augmented Dialogue Generation
M. Komeili
Kurt Shuster
Jason Weston
RALM
244
281
0
15 Jul 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,610
0
18 Sep 2019
1