Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19327
Cited By
Paying Alignment Tax with Contrastive Learning
25 May 2025
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Antonio del Rio Chanona
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Paying Alignment Tax with Contrastive Learning"
8 / 8 papers shown
Title
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
58
76
0
12 Sep 2023
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation
Neeraj Varshney
Wenlin Yao
Hongming Zhang
Jianshu Chen
Dong Yu
HILM
85
169
0
08 Jul 2023
CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization
Shuyang Cao
Lu Wang
HILM
53
181
0
19 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
302
384
0
28 Feb 2021
Measuring and Reducing Gendered Correlations in Pre-trained Models
Kellie Webster
Xuezhi Wang
Ian Tenney
Alex Beutel
Emily Pitler
Ellie Pavlick
Jilin Chen
Ed Chi
Slav Petrov
FaML
72
258
0
12 Oct 2020
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
133
1,194
0
24 Sep 2020
Towards Debiasing Sentence Representations
Paul Pu Liang
Irene Li
Emily Zheng
Y. Lim
Ruslan Salakhutdinov
Louis-Philippe Morency
70
238
0
16 Jul 2020
Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology
Ran Zmigrod
Sabrina J. Mielke
Hanna M. Wallach
Ryan Cotterell
61
281
0
11 Jun 2019
1