Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.01325
Cited By
Learning to summarize from human feedback
2 September 2020
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to summarize from human feedback"
50 / 1,440 papers shown
Title
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design
Jiachen Li
Qian Long
Jian Zheng
Xiaofeng Gao
Robinson Piramuthu
Wenhu Chen
William Yang Wang
VGen
36
22
0
08 Oct 2024
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Devon Hjelm
Alexander Toshev
Bogdan Mazoure
LM&Ro
ELM
OffRL
LRM
34
2
0
08 Oct 2024
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths
Yew Ken Chia
Guizhen Chen
Weiwen Xu
Luu Anh Tuan
Soujanya Poria
Lidong Bing
LRM
28
0
0
07 Oct 2024
LRHP: Learning Representations for Human Preferences via Preference Pairs
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Qiaozhi He
Murun Yang
Tong Xiao
Chunliang Zhang
Tongran Liu
Jingbo Zhu
AI4TS
37
0
0
06 Oct 2024
An evaluation of LLM code generation capabilities through graded exercises
Álvaro Barbero Jiménez
ELM
36
1
0
06 Oct 2024
MVP-Bench: Can Large Vision--Language Models Conduct Multi-level Visual Perception Like Humans?
Guanzhen Li
Yuxi Xie
Min-Yen Kan
VLM
148
0
0
06 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
43
4
0
06 Oct 2024
Reward Learning From Preference With Ties
Jinsong Liu
Dongdong Ge
Ruihao Zhu
29
3
0
05 Oct 2024
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao
Genta Indra Winata
Anirban Das
Shi-Xiong Zhang
D. Yao
Wenpin Tang
Sambit Sahu
62
6
0
05 Oct 2024
Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback
Fatemeh Pesaran Zadeh
Juyeon Kim
Jin-Hwa Kim
Gunhee Kim
ALM
54
2
0
05 Oct 2024
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
Yufang Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Aimin Zhou
VLM
MLLM
46
2
0
04 Oct 2024
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
Kyuyoung Kim
Ah Jeong Seo
Hao Liu
Jinwoo Shin
Kimin Lee
30
2
0
04 Oct 2024
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu
Chengkai Jin
Huanyu Wang
Zhenghao Chen
Sheng Jin
...
Zhenbang Sun
Bingni Zhang
Jiawei Wu
Hao Zhang
Qianru Sun
77
5
0
04 Oct 2024
Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance
Seungyong Moon
Bumsoo Park
Hyun Oh Song
RALM
AIFin
29
1
0
03 Oct 2024
Coal Mining Question Answering with LLMs
Antonio Carlos Rivera
Anthony Moore
Steven Robinson
28
0
0
03 Oct 2024
Unlocking Structured Thinking in Language Models with Cognitive Prompting
Oliver Kramer
Jill Baumann
ReLM
LRM
29
3
0
03 Oct 2024
Cognitive Biases in Large Language Models for News Recommendation
Yougang Lyu
Xiaoyu Zhang
Zhaochun Ren
Maarten de Rijke
34
2
0
03 Oct 2024
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
Rohin Manvi
Anikait Singh
Stefano Ermon
SyDa
27
15
0
03 Oct 2024
Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions
Angana Borah
Rada Mihalcea
42
9
0
03 Oct 2024
Reward-RAG: Enhancing RAG with Reward Driven Supervision
Thang Nguyen
Peter Chin
Yu-Wing Tai
RALM
42
4
0
03 Oct 2024
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
Huimu Yu
Xing Wu
Weidong Yin
Debing Zhang
Songlin Hu
LRM
36
5
0
03 Oct 2024
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai
Haoran Sun
Huang Fang
Shuohuan Wang
Yu Sun
Hua Wu
183
1
0
03 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
48
9
0
03 Oct 2024
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Models
Yinhong Liu
Zhijiang Guo
Tianya Liang
Ehsan Shareghi
Ivan Vulić
Nigel Collier
149
0
0
03 Oct 2024
Strong Preferences Affect the Robustness of Preference Models and Value Alignment
Ziwei Xu
Mohan Kankanhalli
AAML
27
0
0
03 Oct 2024
Investigating on RLHF methodology
Alexey Kutalev
Sergei Markoff
34
0
0
02 Oct 2024
Evaluating Robustness of Reward Models for Mathematical Reasoning
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Jungsoo Won
Dongha Lee
Jinyoung Yeo
38
5
0
02 Oct 2024
Integrative Decoding: Improve Factuality via Implicit Self-consistency
Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
...
Wenjie Li
Jian Jiao
Qi Chen
Peng Cheng
Wayne Xiong
HILM
59
1
0
02 Oct 2024
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
45
1
0
02 Oct 2024
Mixing It Up: The Cocktail Effect of Multi-Task Fine-Tuning on LLM Performance -- A Case Study in Finance
Meni Brief
Oded Ovadia
Gil Shenderovitz
Noga Ben Yoash
Rachel Lemberg
Eitam Sheetrit
55
4
0
01 Oct 2024
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization
Mingye Zhu
Yi Liu
Quan Wang
Junbo Guo
Zhendong Mao
29
1
0
01 Oct 2024
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Qingyao Ai
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
50
10
0
01 Oct 2024
A Taxonomy of Loss Functions for Stochastic Optimal Control
Carles Domingo-Enrich
37
3
0
01 Oct 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
53
22
0
01 Oct 2024
Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!
Divya Patel
Pathik Patel
Ankush Chander
Sourish Dasgupta
Tanmoy Chakraborty
24
1
0
30 Sep 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
41
9
0
30 Sep 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
36
0
0
29 Sep 2024
LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis
Hamed Babaei Giglou
Jennifer D'Souza
Sören Auer
23
5
0
27 Sep 2024
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
Jaepill Choi
Kyubyung Chae
Jiwoo Song
Yohan Jo
Taesup Kim
26
0
0
27 Sep 2024
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback
Guoxi Zhang
Jiuding Duan
40
1
0
27 Sep 2024
Learning to Love Edge Cases in Formative Math Assessment: Using the AMMORE Dataset and Chain-of-Thought Prompting to Improve Grading Accuracy
Owen Henkel
Hannah Horne-Robinson
Maria Dyshel
Nabil Ch
Baptiste Moreau-Pernet
Ralph Abood
37
0
0
26 Sep 2024
Inference-Time Language Model Alignment via Integrated Value Guidance
Zhixuan Liu
Zhanhui Zhou
Yuanfu Wang
Chao Yang
Yu Qiao
35
7
0
26 Sep 2024
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li
Haojing Huang
Yujia Zhang
Pengfei Xu
Xi Chen
Rui Song
Lida Shi
Jingwen Wang
Hao Xu
31
0
0
26 Sep 2024
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
Ruijie Xu
Zhihan Liu
Yongfei Liu
Shipeng Yan
Zhaoran Wang
Zhi-Li Zhang
Xuming He
ALM
40
1
0
26 Sep 2024
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards
Heejin Do
Sangwon Ryu
Gary Geunbae Lee
34
2
0
26 Sep 2024
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen
Guangyu Yang
Weizhe Lin
Jingbiao Mei
Bill Byrne
32
3
0
25 Sep 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
40
5
0
25 Sep 2024
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang
Lei Ying
OffRL
37
2
0
25 Sep 2024
CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data
Qian-Wen Zhang
Haochen Wang
Fang Li
Siyu An
Lingfeng Qiao
Liangcai Gao
Di Yin
Xing Sun
ELM
AI4Ed
32
0
0
24 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
Previous
1
2
3
...
6
7
8
...
27
28
29
Next