ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.13316
  4. Cited By
Humans are not Boltzmann Distributions: Challenges and Opportunities for
  Modelling Human Feedback and Interaction in Reinforcement Learning

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning

27 June 2022
David Lindner
Mennatallah El-Assady
    OffRL
ArXivPDFHTML

Papers citing "Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning"

12 / 12 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
32
1
0
03 Apr 2025
On the Effect of Robot Errors on Human Teaching Dynamics
On the Effect of Robot Errors on Human Teaching Dynamics
Jindan Huang
Isaac S. Sheidlower
Reuben M. Aronson
E. Short
33
2
0
15 Sep 2024
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with
  LLM-driven AI Agents in a Real-time Shared Workspace Task
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task
Shao Zhang
Xihuai Wang
Wenhao Zhang
Yongshan Chen
Landi Gao
Dakuo Wang
Weinan Zhang
Xinbing Wang
Ying Wen
LLMAG
40
9
0
13 Sep 2024
Towards Trustworthy AI: A Review of Ethical and Robust Large Language
  Models
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models
Meftahul Ferdaus
Mahdi Abdelguerfi
Elias Ioup
Kendall N. Niles
Ken Pathak
Steve Sloan
39
11
0
01 Jun 2024
Direct Preference Optimization With Unobserved Preference Heterogeneity
Direct Preference Optimization With Unobserved Preference Heterogeneity
Keertana Chidambaram
Karthik Vinay Seetharaman
Vasilis Syrgkanis
39
7
0
23 May 2024
Impact of Preference Noise on the Alignment Performance of Generative
  Language Models
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao
Dana Alon
Donald Metzler
40
16
0
15 Apr 2024
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement
  Learning with Diverse Human Feedback
Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback
Yifu Yuan
Jianye Hao
Yi Ma
Zibin Dong
Hebin Liang
Jinyi Liu
Zhixin Feng
Kai-Wen Zhao
Yan Zheng
OffRL
ALM
24
14
0
04 Feb 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
216
198
0
20 Oct 2023
RLHF-Blender: A Configurable Interactive Interface for Learning from
  Diverse Human Feedback
RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback
Yannick Metz
David Lindner
Raphael Baur
Daniel A. Keim
Mennatallah El-Assady
AI4CE
34
10
0
08 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
52
473
0
27 Jul 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
Stephen Casper
Jason Lin
Joe Kwon
Gatlen Culp
Dylan Hadfield-Menell
AAML
8
83
0
15 Jun 2023
Reward (Mis)design for Autonomous Driving
Reward (Mis)design for Autonomous Driving
W. B. Knox
A. Allievi
Holger Banzhaf
Felix Schmitt
Peter Stone
83
113
0
28 Apr 2021
1