Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

8 June 2024

Bowen Zhou

Papers citing "Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing"

28 / 28 papers shown

Title
Understanding the Logic of Direct Preference Alignment through Logic Kyle Richardson Vivek Srikumar Ashish Sabharwal 189 2 0 23 Dec 2024
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift Seongho Son William Bankes Sayak Ray Chowdhury Brooks Paige Ilija Bogunovic 103 4 0 26 Jul 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Ruohong Zhang Liangke Gui Zhiqing Sun Yihao Feng Keyang Xu ... Di Fu Chunyuan Li Alexander G. Hauptmann Yonatan Bisk Yiming Yang MLLM 107 78 0 01 Apr 2024
Direct Language Model Alignment from Online AI Feedback Shangmin Guo Biao Zhang Tianlin Liu Tianqi Liu Misha Khalman ... Thomas Mesnard Yao-Min Zhao Bilal Piot Johan Ferret Mathieu Blondel ALM 87 160 0 07 Feb 2024
Diffusion Model Alignment Using Direct Preference Optimization Bram Wallace Meihua Dang Rafael Rafailov Linqi Zhou Aaron Lou Senthil Purushwalkam Stefano Ermon Caiming Xiong Shafiq Joty Nikhil Naik EGVM 131 287 0 21 Nov 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization Zhanhui Zhou Jie Liu Chao Yang Jing Shao Yu Liu Xiangyu Yue Wanli Ouyang Yu Qiao 68 61 0 05 Oct 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Rafael Rafailov Archit Sharma E. Mitchell Stefano Ermon Christopher D. Manning Chelsea Finn ALM 389 4,139 0 29 May 2023
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization Rajkumar Ramamurthy Prithviraj Ammanabrolu Kianté Brantley Jack Hessel R. Sifa Christian Bauckhage Hannaneh Hajishirzi Yejin Choi OffRL 101 248 0 03 Oct 2022
General Incremental Learning with Domain-aware Categorical Representations Jiangwei Xie Shipeng Yan Xuming He OOD CLL 116 39 0 08 Apr 2022
On Generalizing Beyond Domains in Cross-Domain Continual Learning Christian Simon M. Faraki Yi-Hsuan Tsai Xiang Yu S. Schulter Yumin Suh Mehrtash Harandi Manmohan Chandraker FedML OOD CLL 60 32 0 08 Mar 2022
Generative Adversarial Networks Gilad Cohen Raja Giryes GAN 298 30,150 0 01 Mar 2022
LoRA: Low-Rank Adaptation of Large Language Models J. E. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang Weizhu Chen OffRL AI4TS AI4CE ALM AIMat 490 10,496 0 17 Jun 2021
An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization Elan Rosenfeld Pradeep Ravikumar Andrej Risteski 100 36 0 25 Feb 2021
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning Riccardo Volpi Diane Larlus Grégory Rogez VLM OOD CLL 67 74 0 08 Dec 2020
Class-Incremental Domain Adaptation Jogendra Nath Kundu R. Venkatesh Naveen Venkat Ambareesh Revanur R. Venkatesh Babu CLL 51 51 0 04 Aug 2020
Smoothed Analysis of Online and Differentially Private Learning Nika Haghtalab Tim Roughgarden Abhishek Shetty 70 51 0 17 Jun 2020
The Nonstochastic Control Problem Elad Hazan Sham Kakade Karan Singh 46 120 0 27 Nov 2019
Fine-Tuning Language Models from Human Preferences Daniel M. Ziegler Nisan Stiennon Jeff Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano G. Irving ALM 474 1,766 0 18 Sep 2019
LAMOL: LAnguage MOdeling for Lifelong Language Learning Fan-Keng Sun Cheng-Hao Ho Hung-yi Lee CLL KELM 87 211 0 07 Sep 2019
Online Control with Adversarial Disturbances Naman Agarwal Brian Bullins Elad Hazan Sham Kakade Karan Singh 44 240 0 23 Feb 2019
Task-Free Continual Learning Rahaf Aljundi Klaas Kelchtermans Tinne Tuytelaars CLL 131 362 0 10 Dec 2018
Experience Replay for Continual Learning David Rolnick Arun Ahuja Jonathan Richard Schwarz Timothy Lillicrap Greg Wayne CLL 116 1,171 0 28 Nov 2018
End-to-End Incremental Learning F. M. Castro M. Marín-Jiménez Nicolás Guil Mata Cordelia Schmid Alahari Karteek CLL 87 1,160 0 25 Jul 2018
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence Arslan Chaudhry P. Dokania Thalaiyasingam Ajanthan Philip Torr CLL 105 1,145 0 30 Jan 2018
Memory Aware Synapses: Learning what (not) to forget Rahaf Aljundi F. Babiloni Mohamed Elhoseiny Marcus Rohrbach Tinne Tuytelaars KELM CLL 87 1,646 0 27 Nov 2017
Deep reinforcement learning from human preferences Paul Christiano Jan Leike Tom B. Brown Miljan Martic Shane Legg Dario Amodei 218 3,365 0 12 Jun 2017
Overcoming catastrophic forgetting in neural networks J. Kirkpatrick Razvan Pascanu Neil C. Rabinowitz J. Veness Guillaume Desjardins ... A. Grabska-Barwinska Demis Hassabis Claudia Clopath D. Kumaran R. Hadsell CLL 374 7,561 0 02 Dec 2016
Learning without Forgetting Zhizhong Li Derek Hoiem CLL OOD SSL 308 4,428 0 29 Jun 2016