Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.09574
Cited By
Online Bandit Learning with Offline Preference Data for Improved RLHF
13 June 2024
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Online Bandit Learning with Offline Preference Data for Improved RLHF"
2 / 2 papers shown
Title
e-COP : Episodic Constrained Optimization of Policies
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Sahil Singla
OffRL
35
1
0
13 Jun 2024
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Siddhartha Banerjee
Sean R. Sinclair
Milind Tambe
Lily Xu
Chao Yu
AI4TS
31
6
0
30 Sep 2022
1