ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.09574
  4. Cited By
Online Bandit Learning with Offline Preference Data for Improved RLHF

Online Bandit Learning with Offline Preference Data for Improved RLHF

13 June 2024
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
    OffRL
ArXivPDFHTML

Papers citing "Online Bandit Learning with Offline Preference Data for Improved RLHF"

2 / 2 papers shown
Title
e-COP : Episodic Constrained Optimization of Policies
e-COP : Episodic Constrained Optimization of Policies
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Sahil Singla
OffRL
35
1
0
13 Jun 2024
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Siddhartha Banerjee
Sean R. Sinclair
Milind Tambe
Lily Xu
Chao Yu
AI4TS
31
6
0
30 Sep 2022
1