ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.08215
  4. Cited By
Aligning Language Models with Preferences through f-divergence
  Minimization

Aligning Language Models with Preferences through f-divergence Minimization

16 February 2023
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
ArXivPDFHTML

Papers citing "Aligning Language Models with Preferences through f-divergence Minimization"

50 / 68 papers shown
Title
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes
Zhuocheng Gong
Jian-Yu Guan
Wei Yu Wu
Huishuai Zhang
Dongyan Zhao
64
1
0
08 May 2025
FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions
FLoRA: Sample-Efficient Preference-based RL via Low-Rank Style Adaptation of Reward Functions
Daniel Marta
Simon Holk
Miguel Vasco
Jens Lundell
Timon Homberger
F. L. Busch
Olov Andersson
Danica Kragic
Iolanda Leite
38
0
0
14 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Dinesh Manocha
Jieyu Zhao
LRM
78
1
0
07 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
34
0
0
04 Apr 2025
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance
Chen Hu
Timothy Neate
Shan Luo
Letizia Gionfrida
49
0
0
04 Apr 2025
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Souradip Chakraborty
Sujay Bhatt
Udari Madhushani Sehwag
Soumya Suvra Ghosal
Jiahao Qiu
Mengdi Wang
Dinesh Manocha
Furong Huang
Alec Koppel
Sumitra Ganesh
46
2
0
27 Mar 2025
Modifying Large Language Model Post-Training for Diverse Creative Writing
Modifying Large Language Model Post-Training for Diverse Creative Writing
John Joon Young Chung
Vishakh Padmakumar
Melissa Roemmele
Yuqian Sun
Max Kreminski
MoMe
46
0
0
21 Mar 2025
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
66
0
0
13 Mar 2025
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
101
0
0
03 Mar 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Zehan Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
38
1
0
20 Feb 2025
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Direct Unlearning Optimization for Robust and Safe Text-to-Image Models
Yong-Hyun Park
Sangdoo Yun
Jin-Hwa Kim
Junho Kim
Geonhui Jang
Yonghyun Jeong
Junghyo Jo
Gayoung Lee
76
13
0
17 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
50
0
0
07 Jan 2025
From Novice to Expert: LLM Agent Policy Optimization via Step-wise
  Reinforcement Learning
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning
Zhirui Deng
Zhicheng Dou
Bo Li
Ji-Rong Wen
Ruibin Xiong
Mang Wang
Xin Wu
39
6
0
06 Nov 2024
Uncertainty-Penalized Direct Preference Optimization
Uncertainty-Penalized Direct Preference Optimization
Sam Houliston
Alizée Pace
Alexander Immer
Gunnar Rätsch
29
0
0
26 Oct 2024
Preference Optimization with Multi-Sample Comparisons
Preference Optimization with Multi-Sample Comparisons
Chaoqi Wang
Zhuokai Zhao
Chen Zhu
Karthik Abinav Sankararaman
Michal Valko
...
Zhaorun Chen
Madian Khabsa
Yuxin Chen
Hao Ma
Sinong Wang
66
10
0
16 Oct 2024
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection
Han Shen
Pin-Yu Chen
Payel Das
Tianyi Chen
ALM
26
11
0
09 Oct 2024
Guaranteed Generation from Large Language Models
Guaranteed Generation from Large Language Models
Minbeom Kim
Thibaut Thonet
Jos Rozen
Hwaran Lee
Kyomin Jung
Marc Dymetman
46
1
0
09 Oct 2024
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning
Hao Ma
Tianyi Hu
Zhiqiang Pu
Boyin Liu
Xiaolin Ai
Yanyan Liang
Min Chen
50
3
0
08 Oct 2024
Reward Learning From Preference With Ties
Reward Learning From Preference With Ties
Jinsong Liu
Dongdong Ge
Ruihao Zhu
29
3
0
05 Oct 2024
Beyond Squared Error: Exploring Loss Design for Enhanced Training of
  Generative Flow Networks
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
Rui Hu
Yifan Zhang
Zhuoran Li
Longbo Huang
35
0
0
03 Oct 2024
Generalizing Alignment Paradigm of Text-to-Image Generation with
  Preferences through $f$-divergence Minimization
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through fff-divergence Minimization
Haoyuan Sun
Bo Xia
Yongzhe Chang
Xueqian Wang
EGVM
35
2
0
15 Sep 2024
CBF-LLM: Safe Control for LLM Alignment
CBF-LLM: Safe Control for LLM Alignment
Yuya Miyaoka
Masaki Inoue
28
2
0
28 Aug 2024
Predicting vs. Acting: A Trade-off Between World Modeling & Agent
  Modeling
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling
Margaret Li
Weijia Shi
Artidoro Pagnoni
Peter West
Ari Holtzman
45
6
0
02 Jul 2024
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Thom Lake
Eunsol Choi
Greg Durrett
44
9
0
25 Jun 2024
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
49
9
0
24 Jun 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in
  RLHF
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu
Lingfeng Shen
Xinyu Yang
Weiting Tan
Beidi Chen
Huaxiu Yao
61
2
0
12 Jun 2024
Preference Learning Algorithms Do Not Learn Preference Rankings
Preference Learning Algorithms Do Not Learn Preference Rankings
Angelica Chen
Sadhika Malladi
Lily H. Zhang
Xinyi Chen
Qiuyi Zhang
Rajesh Ranganath
Kyunghyun Cho
30
23
0
29 May 2024
QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine
  Translation
QUEST: Quality-Aware Metropolis-Hastings Sampling for Machine Translation
Gonccalo R. A. Faria
Sweta Agrawal
António Farinhas
Ricardo Rei
José G. C. de Souza
André F. T. Martins
26
4
0
28 May 2024
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with
  Minimal Impact on Coherence and Evasiveness in Dialogue Agents
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
San Kim
Gary Geunbae Lee
AAML
38
3
0
21 May 2024
Safeguarding Vision-Language Models Against Patched Visual Prompt
  Injectors
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Jiachen Sun
Changsheng Wang
Jiong Wang
Yiwei Zhang
Chaowei Xiao
AAML
VLM
34
3
0
17 May 2024
Efficient Compression of Multitask Multilingual Speech Models
Efficient Compression of Multitask Multilingual Speech Models
Thomas Palmeira Ferraz
41
0
0
02 May 2024
Regularized Conditional Diffusion Model for Multi-Task Preference
  Alignment
Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
Xudong Yu
Chenjia Bai
Haoran He
Changhong Wang
Xuelong Li
40
6
0
07 Apr 2024
A Survey on Multilingual Large Language Models: Corpora, Alignment, and
  Bias
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias
Yuemei Xu
Ling Hu
Jiayi Zhao
Zihan Qiu
Yuqi Ye
Hanwen Gu
LRM
27
36
0
01 Apr 2024
On the Essence and Prospect: An Investigation of Alignment Approaches
  for Big Models
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
41
16
0
07 Mar 2024
Exploring Precision and Recall to assess the quality and diversity of
  LLMs
Exploring Precision and Recall to assess the quality and diversity of LLMs
Florian Le Bronnec
Alexandre Verine
Benjamin Négrevergne
Y. Chevaleyre
Alexandre Allauzen
44
14
0
16 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
176
449
0
02 Feb 2024
The Language Barrier: Dissecting Safety Challenges of LLMs in
  Multilingual Contexts
The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts
Lingfeng Shen
Weiting Tan
Sihao Chen
Yunmo Chen
Jingyu Zhang
Haoran Xu
Boyuan Zheng
Philipp Koehn
Daniel Khashabi
34
38
0
23 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
  Diverse Reward LoRA Ensembles
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
72
32
0
30 Dec 2023
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Align on the Fly: Adapting Chatbot Behavior to Established Norms
Chunpu Xu
Steffi Chern
Ethan Chern
Ge Zhang
Zekun Wang
Ruibo Liu
Jing Li
Jie Fu
Pengfei Liu
24
20
0
26 Dec 2023
Stealthy and Persistent Unalignment on Large Language Models via
  Backdoor Injections
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Yuanpu Cao
Bochuan Cao
Jinghui Chen
26
24
0
15 Nov 2023
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech
  Models via Language-Specific Experts
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts
Thomas Palmeira Ferraz
Marcely Zanon Boito
Caroline Brun
Vassilina Nikoulina
29
12
0
02 Nov 2023
Unpacking the Ethical Value Alignment in Big Models
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
24
11
0
26 Oct 2023
COPR: Continual Learning Human Preference through Optimal Policy
  Regularization
COPR: Continual Learning Human Preference through Optimal Policy Regularization
Han Zhang
Lin Gui
Yuanzhao Zhai
Hui Wang
Yu Lei
Ruifeng Xu
CLL
43
0
0
24 Oct 2023
Compositional preference models for aligning LMs
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
21
15
0
17 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
27
268
0
10 Oct 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
36
82
0
28 Sep 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
32
21
0
28 Sep 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
176
0
26 Sep 2023
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM
Bochuan Cao
Yu Cao
Lu Lin
Jinghui Chen
AAML
30
133
0
18 Sep 2023
Exploring the impact of low-rank adaptation on the performance,
  efficiency, and regularization of RLHF
Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF
Simeng Sun
Dhawal Gupta
Mohit Iyyer
21
17
0
16 Sep 2023
12
Next