ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXivPDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,443 papers shown
Title
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
The Power of Active Multi-Task Learning in Reinforcement Learning from Human Feedback
Ruitao Chen
Liwei Wang
75
1
0
18 May 2024
Tailoring Vaccine Messaging with Common-Ground Opinions
Tailoring Vaccine Messaging with Common-Ground Opinions
Rickard Stureborg
Sanxing Chen
Ruoyu Xie
Aayushi Patel
Christopher Li
Chloe Qinyu Zhu
Tingnan Hu
Jun Yang
Bhuwan Dhingra
47
0
0
17 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
37
4
0
17 May 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via
  Reinforcement Learning
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi Ma
Sergey Levine
LLMAG
LRM
52
63
0
16 May 2024
IntelliExplain: Enhancing Interactive Code Generation through Natural
  Language Explanations for Non-Professional Programmers
IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers
Hao Yan
Thomas D. Latoza
Ziyu Yao
LRM
45
0
0
16 May 2024
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning
  Systems: Survey and Taxonomy
A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy
Zhaoxing Li
30
2
0
16 May 2024
NIFTY Financial News Headlines Dataset
NIFTY Financial News Headlines Dataset
Raeid Saqur
Ken Kato
Nicholas Vinden
Frank Rudzicz
AIFin
44
1
0
16 May 2024
Spectral Editing of Activations for Large Language Model Alignment
Spectral Editing of Activations for Large Language Model Alignment
Yifu Qiu
Zheng Zhao
Yftah Ziser
Anna Korhonen
Edoardo Ponti
Shay B. Cohen
KELM
LLMSV
33
16
0
15 May 2024
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal
  Language Modelling (CLM)
Enhancing Maritime Trajectory Forecasting via H3 Index and Causal Language Modelling (CLM)
Nicolas Drapier
Aladine Chetouani
A. Chateigner
40
2
0
15 May 2024
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning
  Inner Monologues
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
Diji Yang
Jinmeng Rao
Kezhen Chen
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yi Zhang
LRM
RALM
55
12
0
15 May 2024
Understanding the performance gap between online and offline alignment
  algorithms
Understanding the performance gap between online and offline alignment algorithms
Yunhao Tang
Daniel Guo
Zeyu Zheng
Daniele Calandriello
Yuan Cao
...
Rémi Munos
Bernardo Avila-Pires
Michal Valko
Yong Cheng
Will Dabney
OffRL
OnRL
32
61
0
14 May 2024
RLHF Workflow: From Reward Modeling to Online RLHF
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
29
99
0
13 May 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf B. Cassel
Haipeng Luo
Aviv A. Rosenberg
Dmitry Sotnikov
OffRL
36
3
0
13 May 2024
Integrating Emotional and Linguistic Models for Ethical Compliance in
  Large Language Models
Integrating Emotional and Linguistic Models for Ethical Compliance in Large Language Models
Edward Y. Chang
32
3
0
11 May 2024
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee
Jae Oh Woo
Juree Seok
Parisa Hassanzadeh
Wooseok Jang
...
Hankyu Moon
Wenjun Hu
Yeong-Dae Kwon
Taehee Lee
Seungjai Min
52
2
0
10 May 2024
Conversational Topic Recommendation in Counseling and Psychotherapy with
  Decision Transformer and Large Language Models
Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models
Aylin Gunal
Baihan Lin
Djallel Bouneffouf
OffRL
AI4MH
LM&MA
40
1
0
08 May 2024
MoDiPO: text-to-motion alignment via AI-feedback-driven Direct
  Preference Optimization
MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization
Massimiliano Pappa
Luca Collorone
Giovanni Ficarra
Indro Spinelli
Fabio Galasso
54
1
0
06 May 2024
Select to Perfect: Imitating desired behavior from large multi-agent
  data
Select to Perfect: Imitating desired behavior from large multi-agent data
Tim Franzmeyer
Edith Elkind
Philip Torr
Jakob N. Foerster
Joao Henriques
44
2
0
06 May 2024
MedAdapter: Efficient Test-Time Adaptation of Large Language Models
  towards Medical Reasoning
MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning
Wenqi Shi
Ran Xu
Yuchen Zhuang
Yue Yu
Hang Wu
Carl Yang
M. D. Wang
MedIm
LM&MA
55
18
0
05 May 2024
Conformal Prediction for Natural Language Processing: A Survey
Conformal Prediction for Natural Language Processing: A Survey
Margarida M. Campos
António Farinhas
Chrysoula Zerva
Mário A. T. Figueiredo
André F. T. Martins
AI4CE
60
15
0
03 May 2024
Reinforcement Learning-Guided Semi-Supervised Learning
Reinforcement Learning-Guided Semi-Supervised Learning
Marzi Heidari
Hanping Zhang
Yuhong Guo
OffRL
44
0
0
02 May 2024
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen
Zhilin Wang
Olivier Delalleau
Jiaqi Zeng
Yi Dong
...
Sahil Jain
Ali Taghibakhshi
Markel Sanz Ausin
Ashwath Aithal
Oleksii Kuchaiev
48
13
0
02 May 2024
WildChat: 1M ChatGPT Interaction Logs in the Wild
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao
Xiang Ren
Jack Hessel
Claire Cardie
Yejin Choi
Yuntian Deng
44
181
0
02 May 2024
The Real, the Better: Aligning Large Language Models with Online Human
  Behaviors
The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Guanying Jiang
Lingyong Yan
Haibo Shi
Dawei Yin
41
2
0
01 May 2024
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu
Yiming Hao
Manyuan Zhang
Keqiang Sun
Zhaoyang Huang
Guanglu Song
Yu Liu
Hongsheng Li
EGVM
76
19
0
01 May 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference
  Learning
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLM
LRM
60
91
0
01 May 2024
MetaRM: Shifted Distributions Alignment via Meta-Learning
MetaRM: Shifted Distributions Alignment via Meta-Learning
Shihan Dou
Yan Liu
Enyu Zhou
Tianlong Li
Haoxiang Jia
...
Junjie Ye
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OOD
76
2
0
01 May 2024
RLHF from Heterogeneous Feedback via Personalization and Preference
  Aggregation
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park
Mingyang Liu
Dingwen Kong
Kaiqing Zhang
Asuman Ozdaglar
52
30
0
30 Apr 2024
Iterative Reasoning Preference Optimization
Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang
Weizhe Yuan
Kyunghyun Cho
He He
Sainbayar Sukhbaatar
Jason Weston
LRM
52
114
0
30 Apr 2024
Countering Reward Over-optimization in LLM with Demonstration-Guided
  Reinforcement Learning
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
42
8
0
30 Apr 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On
  Language Model Trustworthiness
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
48
3
0
29 Apr 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
97
145
0
29 Apr 2024
MRScore: Evaluating Radiology Report Generation with LLM-based Reward
  System
MRScore: Evaluating Radiology Report Generation with LLM-based Reward System
Yunyi Liu
Zhanyu Wang
Yingshu Li
Xinyu Liang
Lingqiao Liu
Lei Wang
Luping Zhou
LM&MA
19
3
0
27 Apr 2024
Probabilistic Inference in Language Models via Twisted Sequential Monte
  Carlo
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo
Stephen Zhao
Rob Brekelmans
Alireza Makhzani
Roger C. Grosse
47
33
0
26 Apr 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
48
31
0
25 Apr 2024
Hippocrates: An Open-Source Framework for Advancing Large Language
  Models in Healthcare
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare
Emre Can Acikgoz
Osman Batur .Ince
Rayene Bench
Arda Anil Boz
.Ilker Kesen
Aykut Erdem
Erkut Erdem
LM&MA
40
10
0
25 Apr 2024
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Hanyin Wang
Chufan Gao
Bolun Liu
Qiping Xu
Guleid Hussein
Mohamad El Labban
Kingsley Iheasirim
H. Korsapati
Chuck Outcalt
Jiashuo Sun
LM&MA
AI4MH
40
2
0
25 Apr 2024
A Human-Computer Collaborative Tool for Training a Single Large Language
  Model Agent into a Network through Few Examples
A Human-Computer Collaborative Tool for Training a Single Large Language Model Agent into a Network through Few Examples
Lihang Pan
Yuxuan Li
Chun Yu
Yuanchun Shi
LLMAG
38
2
0
24 Apr 2024
The AI Companion in Education: Analyzing the Pedagogical Potential of
  ChatGPT in Computer Science and Engineering
The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering
Z. He
Thomas Nguyen
Tahereh Miari
Mehrdad Aliasgari
S. Rafatirad
Hossein Sayadi
29
2
0
23 Apr 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal
  LLMs
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
42
38
0
23 Apr 2024
Aligning LLM Agents by Learning Latent Preference from User Edits
Aligning LLM Agents by Learning Latent Preference from User Edits
Ge Gao
Alexey Taymanov
Eduardo Salinas
Paul Mineiro
Dipendra Kumar Misra
LLMAG
42
27
0
23 Apr 2024
Multimodal Large Language Model is a Human-Aligned Annotator for
  Text-to-Image Generation
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation
Xun Wu
Shaohan Huang
Furu Wei
51
8
0
23 Apr 2024
Self-Supervised Alignment with Mutual Information: Learning to Follow
  Principles without Preference Labels
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
Jan-Philipp Fränken
E. Zelikman
Rafael Rafailov
Kanishk Gandhi
Tobias Gerstenberg
Noah D. Goodman
39
10
0
22 Apr 2024
Protecting Your LLMs with Information Bottleneck
Protecting Your LLMs with Information Bottleneck
Zichuan Liu
Zefan Wang
Linjie Xu
Jinyu Wang
Lei Song
Tianchun Wang
Chunlin Chen
Wei Cheng
Jiang Bian
KELM
AAML
66
15
0
22 Apr 2024
Generating Attractive and Authentic Copywriting from Customer Reviews
Generating Attractive and Authentic Copywriting from Customer Reviews
Yu-Xiang Lin
Wei-Yun Ma
46
2
0
22 Apr 2024
Filtered Direct Preference Optimization
Filtered Direct Preference Optimization
Tetsuro Morimura
Mitsuki Sakamoto
Yuu Jinnai
Kenshi Abe
Kaito Air
48
13
0
22 Apr 2024
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
Avinash Anand
Janak Kapuriya
Chhavi Kirtani
Apoorv Singh
Jay Saraf
Naman Lal
Jatin Kumar
A. Shivam
Astha Verma
R. Shah
OffRL
53
9
0
19 Apr 2024
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual
  Alignment
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Zhaofeng Wu
Ananth Balashankar
Yoon Kim
Jacob Eisenstein
Ahmad Beirami
46
14
0
18 Apr 2024
FedEval-LLM: Federated Evaluation of Large Language Models on Downstream
  Tasks with Collective Wisdom
FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom
Yuanqin He
Yan Kang
Lixin Fan
Qiang Yang
35
3
0
18 Apr 2024
Exploring the landscape of large language models: Foundations,
  techniques, and challenges
Exploring the landscape of large language models: Foundations, techniques, and challenges
M. Moradi
Ke Yan
David Colwell
Matthias Samwald
Rhona Asgari
OffRL
46
1
0
18 Apr 2024
Previous
123...121314...272829
Next