ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.01325
  4. Cited By
Learning to summarize from human feedback
v1v2v3 (latest)

Learning to summarize from human feedback

2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
    ALM
ArXiv (abs)PDFHTML

Papers citing "Learning to summarize from human feedback"

50 / 1,548 papers shown
Title
Faster Machine Translation Ensembling with Reinforcement Learning and Competitive Correction
Kritarth Prasad
Mohammadi Zaki
Pratik Rakesh Singh
Pankaj Wasnik
65
1
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
229
10
0
28 Jan 2025
Controllable Protein Sequence Generation with LLM Preference Optimization
Xiangyu Liu
Yi Liu
Silei Chen
Wei Hu
119
1
0
28 Jan 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
336
60
0
28 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
129
2
0
21 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
201
1
0
20 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
219
83
0
20 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CELM&MAVLM
302
27
0
17 Jan 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
167
10
0
17 Jan 2025
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
Chaoqi Wang
Zhuokai Zhao
Yibo Jiang
Zhaorun Chen
Chen Zhu
...
Jiayi Liu
Lizhu Zhang
Xiangjun Fan
Hao Ma
Sinong Wang
189
5
0
16 Jan 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Evangelos Anagnostopoulos
Christos Varytimidis
Antonio del Rio Chanona
80
0
0
13 Jan 2025
MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare
MedCT: A Clinical Terminology Graph for Generative AI Applications in Healthcare
Ye Chen
Dongdong Huang
Haoyun Xu
Cong Fu
Lin Sheng
Qingli Zhou
Yuqiang Shen
Kai Wang
VLMMedIm
123
1
0
11 Jan 2025
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
FocalPO: Enhancing Preference Optimizing by Focusing on Correct Preference Rankings
Tong Liu
Xiao Yu
Wenxuan Zhou
Jindong Gu
Volker Tresp
82
1
0
11 Jan 2025
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Tailored-LLaMA: Optimizing Few-Shot Learning in Pruned LLaMA Models with Task-Specific Prompts
Danyal Aftab
Steven Davy
ALM
122
1
0
10 Jan 2025
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs
Tongshuang Wu
Haiyi Zhu
Maya Albayrak
Alexis Axon
Amanda Bertsch
...
Ying-Jui Tseng
Patricia Vaidos
Zhijin Wu
Wei Wu
Chenyang Yang
182
34
0
10 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
143
1
0
08 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
137
2
0
07 Jan 2025
Improving GenIR Systems Based on User Feedback
Qingyao Ai
Zhicheng Dou
Min Zhang
422
0
0
06 Jan 2025
Enhancing Preference-based Linear Bandits via Human Response Time
Enhancing Preference-based Linear Bandits via Human Response Time
Shen Li
Yuyang Zhang
Tongzheng Ren
Claire Liang
Na Li
J. Shah
183
1
0
03 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRLELM
471
0
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILawLM&MALRM
166
30
0
31 Dec 2024
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang
Jun Bai
Yangqiu Song
Yanmeng Wang
Rumei Li
Chenghua Lin
Wenge Rong
152
0
0
31 Dec 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
155
13
0
31 Dec 2024
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
Yang Han
Ziping Wan
Lu Chen
Kai Yu
Xin Chen
LM&MA
111
3
0
31 Dec 2024
Cannot or Should Not? Automatic Analysis of Refusal Composition in
  IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
145
2
0
22 Dec 2024
Ensembling Large Language Models with Process Reward-Guided Tree Search
  for Better Complex Reasoning
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
Sungjin Park
Xiao Liu
Yeyun Gong
Edward Choi
LRM
126
10
0
20 Dec 2024
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
Flint Xiaofeng Fan
Cheston Tan
Yew-Soon Ong
Roger Wattenhofer
Wei Tsang Ooi
177
1
0
20 Dec 2024
REFA: Reference Free Alignment for multi-preference optimization
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
191
1
0
20 Dec 2024
Learning to Generate Research Idea with Dynamic Control
Learning to Generate Research Idea with Dynamic Control
Ruochen Li
Liqiang Jing
Chi Han
Jiawei Zhou
Xinya Du
LRM
124
6
0
19 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the
  Bradley-Terry Preference Model
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
122
4
0
18 Dec 2024
Dual Traits in Probabilistic Reasoning of Large Language Models
Dual Traits in Probabilistic Reasoning of Large Language Models
Shenxiong Li
Huaxia Rui
122
0
0
15 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
283
6
0
10 Dec 2024
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models
  with Human Preferences
MVReward: Better Aligning and Evaluating Multi-View Diffusion Models with Human Preferences
Weitao Wang
Haoran Xu
Yuxiao Yang
Zhifang Liu
Jun Meng
Haoqian Wang
EGVM
126
3
0
09 Dec 2024
Beyond the Binary: Capturing Diverse Preferences With Reward
  Regularization
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
Vishakh Padmakumar
Chuanyang Jin
Hannah Rose Kirk
He He
104
6
0
05 Dec 2024
CPTQuant -- A Novel Mixed Precision Post-Training Quantization
  Techniques for Large Language Models
CPTQuant -- A Novel Mixed Precision Post-Training Quantization Techniques for Large Language Models
Amitash Nanda
Sree Bhargavi Balija
D. Sahoo
MQ
115
0
0
03 Dec 2024
Time-Reversal Provides Unsupervised Feedback to LLMs
Time-Reversal Provides Unsupervised Feedback to LLMs
Yerram Varun
Rahul Madhavan
Sravanti Addepalli
A. Suggala
Karthikeyan Shanmugam
Prateek Jain
LRMSyDa
123
0
0
03 Dec 2024
Detecting Memorization in Large Language Models
Detecting Memorization in Large Language Models
Eduardo Slonski
90
0
0
02 Dec 2024
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLMCoGe
200
32
0
26 Nov 2024
Learning from Relevant Subgoals in Successful Dialogs using Iterative
  Training for Task-oriented Dialog Systems
Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems
Magdalena Kaiser
P. Ernst
György Szarvas
110
1
0
25 Nov 2024
Self-Generated Critiques Boost Reward Modeling for Language Models
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRMALM
210
27
0
25 Nov 2024
Automatic Evaluation for Text-to-image Generation: Task-decomposed
  Framework, Distilled Training, and Meta-evaluation Benchmark
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark
Rong-Cheng Tu
Zi-Ao Ma
Tian Lan
Yuehao Zhao
Heyan Huang
Xian-Ling Mao
MLLMVLMEGVM
183
4
0
23 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
188
5
0
18 Nov 2024
PerfCodeGen: Improving Performance of LLM Generated Code with Execution
  Feedback
PerfCodeGen: Improving Performance of LLM Generated Code with Execution Feedback
Yun Peng
Akhilesh Deepak Gotmare
Michael R. Lyu
Caiming Xiong
Silvio Savarese
Doyen Sahoo
97
0
0
18 Nov 2024
Drowning in Documents: Consequences of Scaling Reranker Inference
Mathew Jacob
Erik Lindgren
Matei A. Zaharia
Michael Carbin
Omar Khattab
Andrew Drozdov
OffRL
198
6
0
18 Nov 2024
Learning Quantitative Automata Modulo Theories
Learning Quantitative Automata Modulo Theories
Eric Hsiung
Swarat Chaudhuri
Joydeep Biswas
49
0
0
15 Nov 2024
Chain of Alignment: Integrating Public Will with Expert Intelligence for
  Language Model Alignment
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
Andrew Konya
Aviv Ovadya
K. J. Kevin Feng
Quan Ze Chen
Lisa Schirch
Colin Irwin
Amy X. Zhang
ALM
105
2
0
15 Nov 2024
Approximated Variational Bayesian Inverse Reinforcement Learning for
  Large Language Model Alignment
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
Yuang Cai
Yuyu Yuan
Jinsheng Shi
Qinhong Lin
81
0
0
14 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
179
1
0
12 Nov 2024
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for
  Improved Prompt Engineering
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering
Ishika Joshi
Simra Shahid
Shreeya Venneti
Manushree Vasu
Yantao Zheng
Yunyao Li
Balaji Krishnamurthy
Gromit Yeuk-Yin Chan
99
4
0
09 Nov 2024
Kwai-STaR: Transform LLMs into State-Transition Reasoners
Kwai-STaR: Transform LLMs into State-Transition Reasoners
Xingyu Lu
Yihan Hu
Changyi Liu
Tianke Zhang
Zhenyu Yang
...
Fan Yang
Yan Li
Tingting Gao
Hai-Tao Zheng
Bin Wen
LRM
64
1
0
07 Nov 2024
Previous
123...678...293031
Next