ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.16878
  4. Cited By
Online Preference-based Reinforcement Learning with Self-augmented
  Feedback from Large Language Model

Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

22 December 2024
Songjun Tu
Jingbo Sun
Qichao Zhang
Xiangyuan Lan
Dongbin Zhao
ArXivPDFHTML

Papers citing "Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model"

3 / 3 papers shown
Title
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
Xueyi Liu
Zuodong Zhong
Yuxin Guo
Yun-Fu Liu
Zhiguo Su
...
Yinfeng Gao
Yupeng Zheng
Qiao Lin
Huiyong Chen
Dongbin Zhao
LRM
7
0
0
26 May 2025
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
Le Qiu
Zelai Xu
Qixin Tan
Wenhao Tang
Chao Yu
Yu Wang
AAML
71
0
0
24 Mar 2025
Online Bandit Learning with Offline Preference Data for Improved RLHF
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
63
2
0
13 Jun 2024
1