ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.05534
  4. Cited By
Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

8 June 2024
Biqing Qi
Pengfei Li
Fangyuan Li
Junqi Gao
Kaiyan Zhang
Bowen Zhou
ArXiv (abs)PDFHTML

Papers citing "Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing"

28 / 28 papers shown
Title
Understanding the Logic of Direct Preference Alignment through Logic
Understanding the Logic of Direct Preference Alignment through Logic
Kyle Richardson
Vivek Srikumar
Ashish Sabharwal
189
2
0
23 Dec 2024
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son
William Bankes
Sayak Ray Chowdhury
Brooks Paige
Ilija Bogunovic
103
4
0
26 Jul 2024
Direct Preference Optimization of Video Large Multimodal Models from
  Language Model Reward
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
107
78
0
01 Apr 2024
Direct Language Model Alignment from Online AI Feedback
Direct Language Model Alignment from Online AI Feedback
Shangmin Guo
Biao Zhang
Tianlin Liu
Tianqi Liu
Misha Khalman
...
Thomas Mesnard
Yao-Min Zhao
Bilal Piot
Johan Ferret
Mathieu Blondel
ALM
87
160
0
07 Feb 2024
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
131
287
0
21 Nov 2023
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct
  Preference Optimization
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
68
61
0
05 Oct 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,139
0
29 May 2023
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
101
248
0
03 Oct 2022
General Incremental Learning with Domain-aware Categorical
  Representations
General Incremental Learning with Domain-aware Categorical Representations
Jiangwei Xie
Shipeng Yan
Xuming He
OODCLL
116
39
0
08 Apr 2022
On Generalizing Beyond Domains in Cross-Domain Continual Learning
On Generalizing Beyond Domains in Cross-Domain Continual Learning
Christian Simon
M. Faraki
Yi-Hsuan Tsai
Xiang Yu
S. Schulter
Yumin Suh
Mehrtash Harandi
Manmohan Chandraker
FedMLOODCLL
60
32
0
08 Mar 2022
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
298
30,150
0
01 Mar 2022
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
490
10,496
0
17 Jun 2021
An Online Learning Approach to Interpolation and Extrapolation in Domain
  Generalization
An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization
Elan Rosenfeld
Pradeep Ravikumar
Andrej Risteski
100
36
0
25 Feb 2021
Continual Adaptation of Visual Representations via Domain Randomization
  and Meta-learning
Continual Adaptation of Visual Representations via Domain Randomization and Meta-learning
Riccardo Volpi
Diane Larlus
Grégory Rogez
VLMOODCLL
67
74
0
08 Dec 2020
Class-Incremental Domain Adaptation
Class-Incremental Domain Adaptation
Jogendra Nath Kundu
R. Venkatesh
Naveen Venkat
Ambareesh Revanur
R. Venkatesh Babu
CLL
51
51
0
04 Aug 2020
Smoothed Analysis of Online and Differentially Private Learning
Smoothed Analysis of Online and Differentially Private Learning
Nika Haghtalab
Tim Roughgarden
Abhishek Shetty
70
51
0
17 Jun 2020
The Nonstochastic Control Problem
The Nonstochastic Control Problem
Elad Hazan
Sham Kakade
Karan Singh
46
120
0
27 Nov 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
474
1,766
0
18 Sep 2019
LAMOL: LAnguage MOdeling for Lifelong Language Learning
LAMOL: LAnguage MOdeling for Lifelong Language Learning
Fan-Keng Sun
Cheng-Hao Ho
Hung-yi Lee
CLLKELM
87
211
0
07 Sep 2019
Online Control with Adversarial Disturbances
Online Control with Adversarial Disturbances
Naman Agarwal
Brian Bullins
Elad Hazan
Sham Kakade
Karan Singh
44
240
0
23 Feb 2019
Task-Free Continual Learning
Task-Free Continual Learning
Rahaf Aljundi
Klaas Kelchtermans
Tinne Tuytelaars
CLL
131
362
0
10 Dec 2018
Experience Replay for Continual Learning
Experience Replay for Continual Learning
David Rolnick
Arun Ahuja
Jonathan Richard Schwarz
Timothy Lillicrap
Greg Wayne
CLL
116
1,171
0
28 Nov 2018
End-to-End Incremental Learning
End-to-End Incremental Learning
F. M. Castro
M. Marín-Jiménez
Nicolás Guil Mata
Cordelia Schmid
Alahari Karteek
CLL
87
1,160
0
25 Jul 2018
Riemannian Walk for Incremental Learning: Understanding Forgetting and
  Intransigence
Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Arslan Chaudhry
P. Dokania
Thalaiyasingam Ajanthan
Philip Torr
CLL
105
1,145
0
30 Jan 2018
Memory Aware Synapses: Learning what (not) to forget
Memory Aware Synapses: Learning what (not) to forget
Rahaf Aljundi
F. Babiloni
Mohamed Elhoseiny
Marcus Rohrbach
Tinne Tuytelaars
KELMCLL
87
1,646
0
27 Nov 2017
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,365
0
12 Jun 2017
Overcoming catastrophic forgetting in neural networks
Overcoming catastrophic forgetting in neural networks
J. Kirkpatrick
Razvan Pascanu
Neil C. Rabinowitz
J. Veness
Guillaume Desjardins
...
A. Grabska-Barwinska
Demis Hassabis
Claudia Clopath
D. Kumaran
R. Hadsell
CLL
374
7,561
0
02 Dec 2016
Learning without Forgetting
Learning without Forgetting
Zhizhong Li
Derek Hoiem
CLLOODSSL
308
4,428
0
29 Jun 2016
1