ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.01693
  4. Cited By
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

2 June 2023
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
    ALM
ArXivPDFHTML

Papers citing "Fine-Grained Human Feedback Gives Better Rewards for Language Model Training"

50 / 254 papers shown
Title
Rule Based Rewards for Language Model Safety
Rule Based Rewards for Language Model Safety
Tong Mu
Alec Helyar
Johannes Heidecke
Joshua Achiam
Andrea Vallone
Ian Kivlichan
Molly Lin
Alex Beutel
John Schulman
Lilian Weng
ALM
48
36
0
02 Nov 2024
Token-level Proximal Policy Optimization for Query Generation
Token-level Proximal Policy Optimization for Query Generation
Yichen Ouyang
Lu Wang
Fangkai Yang
Pu Zhao
Chenghua Huang
...
Saravan Rajmohan
Weiwei Deng
Dongmei Zhang
Feng Sun
Qi Zhang
OffRL
184
3
0
01 Nov 2024
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large
  Language Models
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
Junda Wu
Xintong Li
Ruoyu Wang
Yu Xia
Yuxin Xiong
...
Xiang Chen
B. Kveton
Lina Yao
Jingbo Shang
Julian McAuley
OffRL
LRM
29
1
0
31 Oct 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
72
4
0
30 Oct 2024
L3Ms -- Lagrange Large Language Models
L3Ms -- Lagrange Large Language Models
Guneet S. Dhillon
Xingjian Shi
Yee Whye Teh
Alex Smola
201
0
0
28 Oct 2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional
  Supervision
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Wenbo Su
Bo Zheng
34
5
0
25 Oct 2024
MAP: Multi-Human-Value Alignment Palette
MAP: Multi-Human-Value Alignment Palette
Xinran Wang
Qi Le
A. N. Ahmed
Enmao Diao
Yi Zhou
Nathalie Baracaldo
Jie Ding
Ali Anwar
16
2
0
24 Oct 2024
Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning
Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning
Haining Wang
Jason Clark
Hannah McKelvey
Leila Sterman
Zheng Gao
Zuoyu Tian
Sandra Kübler
Xiaozhong Liu
35
1
0
22 Oct 2024
Learning from others' mistakes: Finetuning machine translation models
  with span-level error annotations
Learning from others' mistakes: Finetuning machine translation models with span-level error annotations
Lily H. Zhang
Hamid Dadkhahi
M. Finkelstein
Firas Trabelsi
Jiaming Luo
Markus Freitag
34
1
0
21 Oct 2024
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety
  and Style
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
Yantao Liu
Zijun Yao
Rui Min
Yixin Cao
Lei Hou
Juanzi Li
OffRL
ALM
25
32
0
21 Oct 2024
Self-Explained Keywords Empower Large Language Models for Code
  Generation
Self-Explained Keywords Empower Large Language Models for Code Generation
Lishui Fan
Mouxiang Chen
Zhongxin Liu
45
1
0
21 Oct 2024
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
31
3
0
19 Oct 2024
Personalized Adaptation via In-Context Preference Learning
Personalized Adaptation via In-Context Preference Learning
Allison Lau
Younwoo Choi
Vahid Balazadeh
Keertana Chidambaram
Vasilis Syrgkanis
Rahul G. Krishnan
VLM
OffRL
22
3
0
17 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
53
2
0
17 Oct 2024
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
Aniket Deroy
Subhankar Maity
Sudeshna Sarkar
LLMAG
LRM
41
3
0
16 Oct 2024
Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
Peter Vamplew
Conor F. Hayes
Cameron Foale
Richard Dazeley
Hadassah Harland
48
0
0
15 Oct 2024
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang
Ahmed Elgohary
Ahmed Magooda
Daniel Khashabi
Benjamin Van Durme
180
2
0
11 Oct 2024
HyperDPO: Hypernetwork-based Multi-Objective Fine-Tuning Framework
HyperDPO: Hypernetwork-based Multi-Objective Fine-Tuning Framework
Yinuo Ren
Tesi Xiao
Michael Shavlovsky
Lexing Ying
Holakou Rahmanian
23
0
0
10 Oct 2024
Increasing the Difficulty of Automatically Generated Questions via
  Reinforcement Learning with Synthetic Preference
Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference
William Thorne
Ambrose Robinson
Bohua Peng
Chenghua Lin
Diana Maynard
16
2
0
10 Oct 2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
60
6
0
10 Oct 2024
ReinDiffuse: Crafting Physically Plausible Motions with Reinforced
  Diffusion Model
ReinDiffuse: Crafting Physically Plausible Motions with Reinforced Diffusion Model
Gaoge Han
Mingjiang Liang
Jinglei Tang
Yongkang Cheng
Wei Liu
Shaoli Huang
VGen
46
5
0
09 Oct 2024
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh
Eunsu Kim
Jiseon Kim
Wenda Xu
Inha Cha
William Yang Wang
Alice Oh
34
0
0
09 Oct 2024
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield
  Better Language Models
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models
Yanjun Chen
Dawei Zhu
Yirong Sun
Xinghao Chen
Wei Zhang
Xiaoyu Shen
ALM
31
1
0
09 Oct 2024
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu
Tong Xiao
Rui Wang
Wang Zhu
Pengchuan Zhang
Guan Pang
Robin Jia
Lawrence Chen
68
6
0
07 Oct 2024
LRHP: Learning Representations for Human Preferences via Preference
  Pairs
LRHP: Learning Representations for Human Preferences via Preference Pairs
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Qiaozhi He
Murun Yang
Tong Xiao
Chunliang Zhang
Tongran Liu
Jingbo Zhu
AI4TS
37
0
0
06 Oct 2024
Structured List-Grounded Question Answering
Structured List-Grounded Question Answering
Mujeen Sung
Song Feng
James Gung
Raphael Shu
Yi Zhang
Saab Mansour
RALM
32
0
0
04 Oct 2024
Can LLMs Generate Diverse Molecules? Towards Alignment with Structural
  Diversity
Can LLMs Generate Diverse Molecules? Towards Alignment with Structural Diversity
Hyosoon Jang
Yunhui Jang
Jaehyung Kim
Sungsoo Ahn
25
2
0
04 Oct 2024
Grounded Answers for Multi-agent Decision-making Problem through
  Generative World Model
Grounded Answers for Multi-agent Decision-making Problem through Generative World Model
Zeyang Liu
Xinrui Yang
Shiguang Sun
Long Qian
Lipeng Wan
Xingyu Chen
Xuguang Lan
22
3
0
03 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua Wu
201
1
0
03 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed
  Bandits
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
23
3
0
02 Oct 2024
FactAlign: Long-form Factuality Alignment of Large Language Models
FactAlign: Long-form Factuality Alignment of Large Language Models
Chao-Wei Huang
Yun-Nung Chen
HILM
30
2
0
02 Oct 2024
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Shengyu Feng
Xiang Kong
Shuang Ma
Aonan Zhang
Dong Yin
Chong-Jun Wang
Ruoming Pang
Yiming Yang
LRM
32
0
0
02 Oct 2024
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Angela Lopez-Cardona
Carlos Segura
Alexandros Karatzoglou
Sergi Abadal
Ioannis Arapakis
ALM
62
2
0
02 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
41
9
0
30 Sep 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
40
5
0
25 Sep 2024
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
33
2
0
20 Sep 2024
Language Models Learn to Mislead Humans via RLHF
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen
Ruiqi Zhong
Akbir Khan
Ethan Perez
Jacob Steinhardt
Minlie Huang
Samuel R. Bowman
He He
Shi Feng
32
34
0
19 Sep 2024
Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation
  with LLMs
Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs
Yifan Wang
David Stevens
Pranay Shah
Wenwen Jiang
Miao Liu
...
Boying Gong
Daniel Lee
Jiabo Hu
Ning Zhang
Bob Kamma
40
1
0
16 Sep 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
106
3
0
13 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
53
12
0
04 Sep 2024
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding
  Data
Pre-Training Multimodal Hallucination Detectors with Corrupted Grounding Data
Spencer Whitehead
Jacob Phillips
Sean Hendryx
31
0
0
30 Aug 2024
Sequence to Sequence Reward Modeling: Improving RLHF by Language
  Feedback
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback
Jiayi Zhou
Yalan Qin
Juntao Dai
Yaodong Yang
41
4
0
30 Aug 2024
Beyond Preferences in AI Alignment
Beyond Preferences in AI Alignment
Tan Zhi-Xuan
Micah Carroll
Matija Franklin
Hal Ashton
41
16
0
30 Aug 2024
Selective Preference Optimization via Token-Level Reward Function
  Estimation
Selective Preference Optimization via Token-Level Reward Function Estimation
Kailai Yang
Zhiwei Liu
Qianqian Xie
Jimin Huang
Erxue Min
Sophia Ananiadou
33
10
0
24 Aug 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
71
4
0
22 Aug 2024
Personalizing Reinforcement Learning from Human Feedback with
  Variational Preference Learning
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
S. Poddar
Yanming Wan
Hamish Ivison
Abhishek Gupta
Natasha Jaques
40
37
0
19 Aug 2024
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes
Xuanyu Su
Yansong Li
Diana Inkpen
Nathalie Japkowicz
VLM
89
2
0
11 Aug 2024
Diffusion Guided Language Modeling
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
44
6
0
08 Aug 2024
Patchview: LLM-Powered Worldbuilding with Generative Dust and Magnet
  Visualization
Patchview: LLM-Powered Worldbuilding with Generative Dust and Magnet Visualization
John Joon Young Chung
Max Kreminski
45
10
0
07 Aug 2024
Self-Directed Synthetic Dialogues and Revisions Technical Report
Self-Directed Synthetic Dialogues and Revisions Technical Report
Nathan Lambert
Hailey Schoelkopf
Aaron Gokaslan
Luca Soldaini
Valentina Pyatkin
Louis Castricato
SyDa
45
2
0
25 Jul 2024
Previous
123456
Next