ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.14925
  4. Cited By
A Survey of Reinforcement Learning from Human Feedback

A Survey of Reinforcement Learning from Human Feedback

22 December 2023
Timo Kaufmann
Paul Weng
Viktor Bengs
Eyke Hüllermeier
    OffRL
ArXivPDFHTML

Papers citing "A Survey of Reinforcement Learning from Human Feedback"

21 / 21 papers shown
Title
What Can RL Bring to VLA Generalization? An Empirical Study
What Can RL Bring to VLA Generalization? An Empirical Study
Jijia Liu
Feng Gao
Bingwen Wei
Xinlei Chen
Qingmin Liao
Yi Wu
Chao Yu
Yu Wang
OffRL
92
0
0
26 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
104
0
0
25 May 2025
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques
Aman Raj
Lakshit Arora
Sanjay Surendranath Girija
Shashank Kapoor
Dipen Pradhan
Ankit Shetgaonkar
134
0
0
13 May 2025
Emotions in Artificial Intelligence
Emotions in Artificial Intelligence
Hermann Borotschnig
55
0
0
01 May 2025
Urban Computing in the Era of Large Language Models
Urban Computing in the Era of Large Language Models
Zhonghang Li
Lianghao Xia
Xubin Ren
J. Tang
Tianyi Chen
Yong-mei Xu
Chenyu Huang
142
0
0
02 Apr 2025
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Adaptive Group Policy Optimization: Towards Stable Training and Token-Efficient Reasoning
Chen Li
Nazhou Liu
Kai Yang
78
4
0
20 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
120
77
0
12 Mar 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
101
0
0
20 Feb 2025
Competing LLM Agents in a Non-Cooperative Game of Opinion Polarisation
Competing LLM Agents in a Non-Cooperative Game of Opinion Polarisation
Amin Qasmi
Usman Naseem
Mehwish Nasim
58
1
0
17 Feb 2025
Feasible Learning
Juan Ramirez
Ignacio Hounie
Juan Elenter
Jose Gallego-Posada
Meraj Hashemizadeh
Alejandro Ribeiro
Simon Lacoste-Julien
53
2
0
28 Jan 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
113
5
0
16 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
340
0
0
31 Dec 2024
VideoAgent: Self-Improving Video Generation
VideoAgent: Self-Improving Video Generation
Achint Soni
Sreyas Venkataraman
Abhranil Chandra
Sebastian Fischmeister
Percy Liang
Bo Dai
Sherry Yang
LM&Ro
VGen
70
8
0
14 Oct 2024
LoRTA: Low Rank Tensor Adaptation of Large Language Models
LoRTA: Low Rank Tensor Adaptation of Large Language Models
Ignacio Hounie
Charilaos I. Kanatsoulis
Arnuv Tandon
Alejandro Ribeiro
89
0
0
05 Oct 2024
Problem Solving Through Human-AI Preference-Based Cooperation
Problem Solving Through Human-AI Preference-Based Cooperation
Subhabrata Dutta
Timo Kaufmann
Goran Glavaš
Ivan Habernal
Kristian Kersting
Frauke Kreuter
Mira Mezini
Iryna Gurevych
Eyke Hüllermeier
Hinrich Schuetze
117
1
0
14 Aug 2024
Natural Language Outlines for Code: Literate Programming in the LLM Era
Natural Language Outlines for Code: Literate Programming in the LLM Era
Kensen Shi
Deniz Altınbüken
Saswat Anand
Mihai Christodorescu
Katja Grünwedel
...
Tobias Welp
Pengcheng Yin
Manzil Zaheer
Satish Chandra
Charles Sutton
83
6
0
09 Aug 2024
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Huanqian Wang
Yang Yue
Rui Lu
Jingxin Shi
Andrew Zhao
Shenzhi Wang
Shiji Song
Gao Huang
LM&Ro
KELM
93
7
0
11 Jul 2024
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen
Lin Li
Yongqi Yang
Bin Wen
Fan Yang
Tingting Gao
Yu Wu
Long Chen
VLM
VGen
67
7
0
15 Jun 2024
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang
Honghao Wei
Lei Ying
OffRL
83
2
0
11 Jun 2024
Inverse Constitutional AI: Compressing Preferences into Principles
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
70
12
0
02 Jun 2024
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma
Zekun Wang
Joyce Chai
94
4
0
22 May 2024
1