ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.06452
  4. Cited By
Understanding the Effects of RLHF on LLM Generalisation and Diversity

Understanding the Effects of RLHF on LLM Generalisation and Diversity

10 October 2023
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
    AI4CE
    ALM
ArXivPDFHTML

Papers citing "Understanding the Effects of RLHF on LLM Generalisation and Diversity"

50 / 104 papers shown
Title
RLHF Can Speak Many Languages: Unlocking Multilingual Preference
  Optimization for LLMs
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
A. Ustun
Sara Hooker
39
21
0
02 Jul 2024
Generative Monoculture in Large Language Models
Generative Monoculture in Large Language Models
Fan Wu
Emily Black
Varun Chandrasekaran
SyDa
35
3
0
02 Jul 2024
Detection and Measurement of Syntactic Templates in Generated Text
Detection and Measurement of Syntactic Templates in Generated Text
Chantal Shaib
Yanai Elazar
Junyi Jessy Li
Byron C. Wallace
51
15
0
28 Jun 2024
AI Alignment through Reinforcement Learning from Human Feedback?
  Contradictions and Limitations
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Adam Dahlgren Lindstrom
Leila Methnani
Lea Krause
Petter Ericson
Ínigo Martínez de Rituerto de Troya
Dimitri Coelho Mollo
Roel Dobbe
ALM
42
2
0
26 Jun 2024
Multi-property Steering of Large Language Models with Dynamic Activation
  Composition
Multi-property Steering of Large Language Models with Dynamic Activation Composition
Daniel Scalena
Gabriele Sarti
Malvina Nissim
KELM
LLMSV
AI4CE
27
13
0
25 Jun 2024
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Thom Lake
Eunsol Choi
Greg Durrett
44
9
0
25 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
59
14
0
24 Jun 2024
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Preference Tuning For Toxicity Mitigation Generalizes Across Languages
Xiaochen Li
Zheng-Xin Yong
Stephen H. Bach
CLL
34
13
0
23 Jun 2024
Self-Evolution Fine-Tuning for Policy Optimization
Self-Evolution Fine-Tuning for Policy Optimization
Ruijun Chen
Jiehao Liang
Shiping Gao
Fanqi Wan
Xiaojun Quan
46
0
0
16 Jun 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for
  Cartoon Captioning
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
39
5
0
15 Jun 2024
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with
  LLM-Enhanced RLHF
Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF
Yuan Sun
Navid Salami Pargoo
Peter J. Jin
Jorge Ortiz
40
18
0
06 Jun 2024
Self-Improving Robust Preference Optimization
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
31
8
0
03 Jun 2024
Evaluating Large Language Model Biases in Persona-Steered Generation
Evaluating Large Language Model Biases in Persona-Steered Generation
Andy Liu
Mona Diab
Daniel Fried
36
21
0
30 May 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
22
0
29 May 2024
Automating Thematic Analysis: How LLMs Analyse Controversial Topics
Automating Thematic Analysis: How LLMs Analyse Controversial Topics
Awais Hameed Khan
H. Kegalle
Rhea D'Silva
Ned Watt
Daniel Whelan-Shamy
Lida Ghahremanlou
Liam Magee
40
6
0
11 May 2024
Position: Understanding LLMs Requires More Than Statistical
  Generalization
Position: Understanding LLMs Requires More Than Statistical Generalization
Patrik Reizinger
Szilvia Ujváry
Anna Mészáros
A. Kerekes
Wieland Brendel
Ferenc Huszár
36
12
0
03 May 2024
Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey
  Beyond 6G
Artificial General Intelligence (AGI)-Native Wireless Systems: A Journey Beyond 6G
Walid Saad
Omar Hashash
Christo Kurisummoottil Thomas
Christina Chaccour
Merouane Debbah
N. Mandayam
Zhu Han
AI4CE
42
23
0
29 Apr 2024
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Towards Adapting Open-Source Large Language Models for Expert-Level Clinical Note Generation
Hanyin Wang
Chufan Gao
Bolun Liu
Qiping Xu
Guleid Hussein
Mohamad El Labban
Kingsley Iheasirim
H. Korsapati
Chuck Outcalt
Jiashuo Sun
LM&MA
AI4MH
40
2
0
25 Apr 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy
  Data
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
44
106
0
22 Apr 2024
Filtered Direct Preference Optimization
Filtered Direct Preference Optimization
Tetsuro Morimura
Mitsuki Sakamoto
Yuu Jinnai
Kenshi Abe
Kaito Air
48
13
0
22 Apr 2024
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across
  Applications
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications
Charith Chandra Sai Balne
S. Bhaduri
Tamoghna Roy
Vinija Jain
Aman Chadha
40
12
0
21 Apr 2024
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
MM-PhyRLHF: Reinforcement Learning Framework for Multimodal Physics Question-Answering
Avinash Anand
Janak Kapuriya
Chhavi Kirtani
Apoorv Singh
Jay Saraf
Naman Lal
Jatin Kumar
A. Shivam
Astha Verma
R. Shah
OffRL
40
9
0
19 Apr 2024
Stepwise Alignment for Constrained Language Model Policy Optimization
Stepwise Alignment for Constrained Language Model Policy Optimization
Akifumi Wachi
Thien Q. Tran
Rei Sato
Takumi Tanabe
Yohei Akimoto
34
5
0
17 Apr 2024
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of
  Language Models with Fine-grained Rewards
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
Hyeonbin Hwang
Doyoung Kim
Seungone Kim
Seonghyeon Ye
Minjoon Seo
LRM
ReLM
40
16
0
16 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
54
1
0
16 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
54
26
0
15 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
A. Kalyan
Karthik Narasimhan
A. Deshpande
Bruno Castro da Silva
26
34
0
12 Apr 2024
Unveiling the Generalization Power of Fine-Tuned Large Language Models
Unveiling the Generalization Power of Fine-Tuned Large Language Models
Haoran Yang
Yumeng Zhang
Jiaqi Xu
Hongyuan Lu
Pheng Ann Heng
Wai Lam
45
29
0
14 Mar 2024
Human Alignment of Large Language Models through Online Preference
  Optimisation
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
44
60
0
13 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
40
204
0
12 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLM
LRM
31
68
0
07 Mar 2024
Unintended Impacts of LLM Alignment on Global Representation
Unintended Impacts of LLM Alignment on Global Representation
Michael Joseph Ryan
William B. Held
Diyi Yang
42
40
0
22 Feb 2024
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
Shu Yang
Muhammad Asif Ali
Cheng-Long Wang
Lijie Hu
Di Wang
CLL
MoE
37
38
0
17 Feb 2024
Exploring Precision and Recall to assess the quality and diversity of
  LLMs
Exploring Precision and Recall to assess the quality and diversity of LLMs
Florian Le Bronnec
Alexandre Verine
Benjamin Négrevergne
Y. Chevaleyre
Alexandre Allauzen
44
14
0
16 Feb 2024
Had enough of experts? Quantitative knowledge retrieval from large language models
Had enough of experts? Quantitative knowledge retrieval from large language models
David Selby
Kai Spriestersbach
Yuichiro Iwashita
Dennis Bappert
Archana Warrier
Sumantrak Mukherjee
Muhammad Nabeel Asim
Koichi Kise
Sebastian Vollmer
44
0
0
12 Feb 2024
A Roadmap to Pluralistic Alignment
A Roadmap to Pluralistic Alignment
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
...
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
65
80
0
07 Feb 2024
From PARIS to LE-PARIS: Toward Patent Response Automation with
  Recommender Systems and Collaborative Large Language Models
From PARIS to LE-PARIS: Toward Patent Response Automation with Recommender Systems and Collaborative Large Language Models
Jung-Mei Chu
Hao-Cheng Lo
Jieh Hsiang
Chun-Chieh Cho
20
2
0
01 Feb 2024
Does DetectGPT Fully Utilize Perturbation? Bridging Selective
  Perturbation to Fine-tuned Contrastive Learning Detector would be Better
Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better
Shengchao Liu
Xiaoming Liu
Yichen Wang
Zehua Cheng
Chengzhengxu Li
Zhaohan Zhang
Y. Lan
Chao Shen
DeLMO
32
2
0
01 Feb 2024
Integrating Physician Diagnostic Logic into Large Language Models:
  Preference Learning from Process Feedback
Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback
Chengfeng Dou
Zhi Jin
Wenpin Jiao
Haiyan Zhao
Yongqiang Zhao
Zhenwei Tao
LM&MA
74
5
0
11 Jan 2024
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from
  Human Feedback
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert
Roberto Calandra
ALM
26
31
0
31 Oct 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
213
192
0
20 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
21
47
0
16 Oct 2023
From Language Modeling to Instruction Following: Understanding the
  Behavior Shift in LLMs after Instruction Tuning
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
Xuansheng Wu
Wenlin Yao
Jianshu Chen
Xiaoman Pan
Xiaoyang Wang
Ninghao Liu
Dong Yu
LRM
20
26
0
30 Sep 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the
  Evolutionary Path towards GPT-4 and Beyond
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
27
24
0
28 Sep 2023
Emergent autonomous scientific research capabilities of large language
  models
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
104
118
0
11 Apr 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
159
579
0
06 Apr 2023
On the Creativity of Large Language Models
On the Creativity of Large Language Models
Giorgio Franceschelli
Mirco Musolesi
72
51
0
27 Mar 2023
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
114
93
0
06 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
446
0
23 Aug 2022
Previous
123
Next