ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback
v1v2v3 (latest)

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXiv (abs)PDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,548 papers shown
Title
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
85
41
0
04 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
168
336
0
02 Jun 2023
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an
  Opportunity?
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?
Michael Heck
Nurul Lubis
Benjamin Ruppik
Renato Vukovic
Shutong Feng
Christian Geishauser
Hsien-chin Lin
Carel van Niekerk
Milica Gavsić
127
47
0
02 Jun 2023
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A
  Practical Study
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study
Guang Lu
Sylvia B. Larcher
Tu-Anh Tran
56
9
0
01 Jun 2023
The ethical ambiguity of AI data enrichment: Measuring gaps in research
  ethics norms and practices
The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practices
Will Hawkins
Brent Mittelstadt
111
10
0
01 Jun 2023
Identifiability and Generalizability in Constrained Inverse
  Reinforcement Learning
Identifiability and Generalizability in Constrained Inverse Reinforcement Learning
Andreas Schlaginhaufen
Maryam Kamgarpour
106
12
0
01 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
146
28
0
01 Jun 2023
Factually Consistent Summarization via Reinforcement Learning with
  Textual Entailment Feedback
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Paul Roit
Johan Ferret
Lior Shani
Roee Aharoni
Geoffrey Cideron
...
Olivier Bachem
G. Elidan
Avinatan Hassidim
Olivier Pietquin
Idan Szpektor
HILM
94
87
0
31 May 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
256
1,241
0
31 May 2023
LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on
  Large Language Model
LLM-BRAIn: AI-driven Fast Generation of Robot Behaviour Tree based on Large Language Model
Artem Lykov
Dzmitry Tsetserukou
LM&Ro
52
30
0
30 May 2023
Strategic Reasoning with Language Models
Strategic Reasoning with Language Models
Kanishk Gandhi
Dorsa Sadigh
Noah D. Goodman
LM&RoLRM
84
41
0
30 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
405
4,190
0
29 May 2023
Provable Reward-Agnostic Preference-Based Reinforcement Learning
Provable Reward-Agnostic Preference-Based Reinforcement Learning
Wenhao Zhan
Masatoshi Uehara
Wen Sun
Jason D. Lee
78
11
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in
  Vision-Language Models
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
111
23
0
29 May 2023
Generating EDU Extracts for Plan-Guided Summary Re-Ranking
Generating EDU Extracts for Plan-Guided Summary Re-Ranking
Griffin Adams
Alexander R. Fabbri
Faisal Ladhak
Kathleen McKeown
Noémie Elhadad
84
11
0
28 May 2023
Language Models are Bounded Pragmatic Speakers: Understanding RLHF from
  a Bayesian Cognitive Modeling Perspective
Language Models are Bounded Pragmatic Speakers: Understanding RLHF from a Bayesian Cognitive Modeling Perspective
Khanh Nguyen
LRM
142
8
0
28 May 2023
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Query-Policy Misalignment in Preference-Based Reinforcement Learning
Xiao Hu
Jianxiong Li
Xianyuan Zhan
Qing-Shan Jia
Ya Zhang
106
9
0
27 May 2023
Fine-Tuning Language Models with Just Forward Passes
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
167
205
0
27 May 2023
Language Models Can Improve Event Prediction by Few-Shot Abductive
  Reasoning
Language Models Can Improve Event Prediction by Few-Shot Abductive Reasoning
Xiaoming Shi
Siqiao Xue
Kangrui Wang
Fan Zhou
James Y. Zhang
Jun-ping Zhou
Chenhao Tan
Hongyuan Mei
ReLMLRM
86
48
0
26 May 2023
Coarse-Tuning Models of Code with Reinforcement Learning Feedback
Coarse-Tuning Models of Code with Reinforcement Learning Feedback
Abhinav C. P. Jain
Chima Adiole
Swarat Chaudhuri
Thomas W. Reps
Chris Jermaine Rice University
ALM
63
2
0
25 May 2023
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
  Models
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Ying Fan
Olivia Watkins
Yuqing Du
Hao Liu
Moonkyung Ryu
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
Kangwook Lee
Kimin Lee
172
167
0
25 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
115
79
0
25 May 2023
Role-Play with Large Language Models
Role-Play with Large Language Models
Murray Shanahan
Kyle McDonell
Laria Reynolds
LLMAG
84
307
0
25 May 2023
PandaGPT: One Model To Instruction-Follow Them All
PandaGPT: One Model To Instruction-Follow Them All
Yixuan Su
Tian Lan
Huayang Li
Jialu Xu
Yan Wang
Deng Cai
MLLM
101
295
0
25 May 2023
Inverse Preference Learning: Preference-based RL without a Reward
  Function
Inverse Preference Learning: Preference-based RL without a Reward Function
Joey Hejna
Dorsa Sadigh
OffRL
109
56
0
24 May 2023
Science in the Era of ChatGPT, Large Language Models and Generative AI:
  Challenges for Research Ethics and How to Respond
Science in the Era of ChatGPT, Large Language Models and Generative AI: Challenges for Research Ethics and How to Respond
Evangelos Pournaras
56
4
0
24 May 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs
  without Fine-tuning
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Ximing Lu
Faeze Brahman
Peter West
Jaehun Jang
Khyathi Chandu
...
Bill Yuchen Lin
Skyler Hallinan
Xiang Ren
Sean Welleck
Yejin Choi
133
29
0
24 May 2023
Active Learning for Natural Language Generation
Active Learning for Natural Language Generation
Yotam Perlitz
Ariel Gera
Michal Shmueli-Scheuer
D. Sheinwald
Noam Slonim
L. Ein-Dor
100
3
0
24 May 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence
  Scores from Language Models Fine-Tuned with Human Feedback
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
183
357
0
24 May 2023
PURR: Efficiently Editing Language Model Hallucinations by Denoising
  Language Model Corruptions
PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions
Anthony Chen
Panupong Pasupat
Sameer Singh
Hongrae Lee
Kelvin Guu
118
48
0
24 May 2023
Provable Offline Preference-Based Reinforcement Learning
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
127
32
0
24 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
94
8
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference
  Judgments via GPT-4
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
99
13
0
24 May 2023
PEARL: Prompting Large Language Models to Plan and Execute Actions Over
  Long Documents
PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
Simeng Sun
Yongxu Liu
Shuohang Wang
Chenguang Zhu
Mohit Iyyer
RALMLRMReLM
89
55
0
23 May 2023
Language Model Self-improvement by Reinforcement Learning Contemplation
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang
Pengyuan Wang
Kaiyuan Li
Xiong-Hui Chen
Jiacheng Xu
Zongzhang Zhang
Yang Yu
LRMKELM
64
52
0
23 May 2023
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot
  Text-to-Video Generation
DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation
Susung Hong
Junyoung Seo
Heeseong Shin
Sung‐Jin Hong
Seung Wook Kim
DiffMVGen
106
36
0
23 May 2023
Navigating Prompt Complexity for Zero-Shot Classification: A Study of
  Large Language Models in Computational Social Science
Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science
Yida Mu
Benze Wu
William Thorne
Ambrose Robinson
Nikolaos Aletras
Carolina Scarton
Kalina Bontcheva
Xingyi Song
111
18
0
23 May 2023
On Learning to Summarize with Large Language Models as References
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
119
82
0
23 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via
  sub-4-bit Integer Quantization
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
166
106
0
23 May 2023
Learning from Mistakes via Cooperative Study Assistant for Large
  Language Models
Learning from Mistakes via Cooperative Study Assistant for Large Language Models
Danqing Wang
Lei Li
80
8
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALMSyDa
155
70
0
23 May 2023
Training Priors Predict Text-To-Image Model Performance
Training Priors Predict Text-To-Image Model Performance
Charles Lovering
Ellie Pavlick
CoGe
78
3
0
23 May 2023
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as
  Conversational Agents
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents
Kranti Chalamalasetti
Jana Gotze
Sherzod Hakimov
Brielen Madureira
P. Sadler
David Schlangen
ELMALMLLMAG
107
36
0
22 May 2023
Element-aware Summarization with Large Language Models: Expert-aligned
  Evaluation and Chain-of-Thought Method
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method
Yiming Wang
Zhuosheng Zhang
Rui Wang
117
88
0
22 May 2023
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
  Text-to-Image Generation by Selection
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
96
21
0
22 May 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
171
379
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
156
608
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
  Evaluation
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
97
41
0
22 May 2023
Observations on LLMs for Telecom Domain: Capabilities and Limitations
Observations on LLMs for Telecom Domain: Capabilities and Limitations
Sumit Soman
G. RanjaniH
63
27
0
22 May 2023
Distilling ChatGPT for Explainable Automated Student Answer Assessment
Distilling ChatGPT for Explainable Automated Student Answer Assessment
Jiazheng Li
Lin Gui
Yuxiang Zhou
David West
Cesare Aloisi
Yulan He
79
28
0
22 May 2023
Previous
123...252627...293031
Next