Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.12842
Cited By
Direct Preference-based Policy Optimization without Reward Modeling
30 January 2023
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
KyungHyun Kim
Hyun Oh Song
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference-based Policy Optimization without Reward Modeling"
7 / 7 papers shown
Title
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
29
0
0
06 May 2025
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
83
4
0
19 Mar 2025
Aligning Transformers with Continuous Feedback via Energy Rank Alignment
Shriram Chennakesavalu
Frank Hu
Sebastian Ibarraran
Grant M. Rotskoff
41
3
0
21 May 2024
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya
D. Kajtoch
46
0
0
27 Apr 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
214
843
0
12 Oct 2021
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Gaon An
Seungyong Moon
Jang-Hyun Kim
Hyun Oh Song
OffRL
105
262
0
04 Oct 2021
1