ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.14328
  4. Cited By
Reinforcement Learning for Generative AI: A Survey
v1v2v3 (latest)

Reinforcement Learning for Generative AI: A Survey

28 August 2023
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
    SyDa
ArXiv (abs)PDFHTML

Papers citing "Reinforcement Learning for Generative AI: A Survey"

50 / 205 papers shown
Title
Decision Flow Policy Optimization
Decision Flow Policy Optimization
Jifeng Hu
Sili Huang
Siyuan Guo
Zhaogeng Liu
Li Shen
Lichao Sun
Hechang Chen
Yi-Ju Chang
Dacheng Tao
58
0
0
26 May 2025
The Superalignment of Superhuman Intelligence with Large Language Models
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
176
1
0
15 Dec 2024
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
  Learning
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring
Kunhao Zheng
Jade Copet
Vegard Mella
Taco Cohen
Gabriel Synnaeve
LLMAG
55
35
0
02 Oct 2024
Imitating Language via Scalable Inverse Reinforcement Learning
Imitating Language via Scalable Inverse Reinforcement Learning
Markus Wulfmeier
Michael Bloesch
Nino Vieillard
Arun Ahuja
Jorg Bornschein
...
Jost Tobias Springenberg
Nikola Momchev
Olivier Bachem
Matthieu Geist
Martin Riedmiller
109
10
0
02 Sep 2024
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Enhancing the Capability and Robustness of Large Language Models through Reinforcement Learning-Driven Query Refinement
Zisu Huang
Xiaohua Wang
Feiran Zhang
Zhibo Xu
Cenyuan Zhang
Qi Qian
Xiaoqing Zheng
Xuanjing Huang
AAMLLRM
102
4
0
01 Jul 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
118
29
0
29 May 2024
RLSF: Fine-tuning LLMs via Symbolic Feedback
RLSF: Fine-tuning LLMs via Symbolic Feedback
Piyush Jha
Prithwish Jana
Pranavkrishna Suresh
Arnav Arora
Vijay Ganesh
LRM
85
4
0
26 May 2024
Multi-turn Reinforcement Learning from Preference Human Feedback
Multi-turn Reinforcement Learning from Preference Human Feedback
Lior Shani
Aviv Rosenberg
Asaf B. Cassel
Oran Lang
Daniele Calandriello
...
Bilal Piot
Idan Szpektor
Avinatan Hassidim
Yossi Matias
Rémi Munos
97
34
0
23 May 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via
  Reinforcement Learning
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi-An Ma
Sergey Levine
LLMAGLRM
131
80
0
16 May 2024
Automating Creativity
Automating Creativity
Ming-Hui Huang
R. Rust
97
0
0
11 May 2024
Self-Training Large Language Models for Improved Visual Program
  Synthesis With Visual Reinforcement
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan
B. Vijaykumar
S. Schulter
Yun Fu
Manmohan Chandraker
LRMReLM
98
8
0
06 Apr 2024
Reinforcement Learning with Token-level Feedback for Controllable Text
  Generation
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Wendi Li
Xiaoye Qu
Kaihe Xu
Wenfeng Xie
Dangyang Chen
Yu Cheng
88
7
0
18 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLMLRM
111
94
0
07 Mar 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
135
79
0
29 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement
  Learning and Large Language Models
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
75
9
0
02 Feb 2024
StepCoder: Improve Code Generation with Reinforcement Learning from
  Compiler Feedback
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Shihan Dou
Yan Liu
Haoxiang Jia
Limao Xiong
Enyu Zhou
...
Tao Ji
Rui Zheng
Qi Zhang
Xuanjing Huang
Tao Gui
LLMAG
121
43
0
02 Feb 2024
Improving Reinforcement Learning from Human Feedback with Efficient
  Reward Model Ensemble
Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble
Shun Zhang
Zhenfang Chen
Sunli Chen
Yikang Shen
Zhiqing Sun
Chuang Gan
75
27
0
30 Jan 2024
Improving Large Language Models via Fine-grained Reinforcement Learning
  with Minimum Editing Constraint
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Di Zhang
Ji-Rong Wen
KELM
99
35
0
11 Jan 2024
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with
  Human Feedback in Large Language Models
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
94
15
0
16 Nov 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
133
364
0
19 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
120
79
0
16 Oct 2023
Reinforcement Learning in the Era of LLMs: What is Essential? What is
  needed? An RL Perspective on RLHF, Prompting, and Beyond
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Hao Sun
OffRL
87
23
0
09 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning
  from Human Feedback
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
106
52
0
08 Oct 2023
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation
Benjamin Steenhoek
Michele Tufano
Neel Sundaresan
Alexey Svyatkovskiy
OffRLALM
138
22
0
03 Oct 2023
Prompt-Based Length Controlled Generation with Reinforcement Learning
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
77
10
0
23 Aug 2023
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence
  Generation
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation
Chenglong Wang
Hang Zhou
Yimin Hu
Yi Huo
Bei Li
Tongran Liu
Tong Xiao
Jingbo Zhu
79
9
0
04 Aug 2023
Reinforcement Learning for Generative AI: State of the Art,
  Opportunities and Open Research Challenges
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Giorgio Franceschelli
Mirco Musolesi
AI4CE
139
22
0
31 Jul 2023
Selective Perception: Optimizing State Descriptions with Reinforcement
  Learning for Language Model Actors
Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors
Kolby Nottingham
Yasaman Razeghi
Kyungmin Kim
JB Lanier
Pierre Baldi
Roy Fox
Sameer Singh
89
10
0
21 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
132
518
0
12 Jul 2023
RLTF: Reinforcement Learning from Unit Test Feedback
RLTF: Reinforcement Learning from Unit Test Feedback
Jiate Liu
Yiqin Zhu
Kaiwen Xiao
Qiang Fu
Xiao Han
Wei Yang
Deheng Ye
OffRL
93
62
0
10 Jul 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
159
335
0
02 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
391
4,177
0
29 May 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&RoSyDa
167
842
0
25 May 2023
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
  Models
DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models
Ying Fan
Olivia Watkins
Yuqing Du
Hao Liu
Moonkyung Ryu
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
Kangwook Lee
Kimin Lee
144
167
0
25 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
104
10
0
24 May 2023
Language Model Self-improvement by Reinforcement Learning Contemplation
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang
Pengyuan Wang
Kaiyuan Li
Xiong-Hui Chen
Jiacheng Xu
Zongzhang Zhang
Yang Yu
LRMKELM
55
52
0
23 May 2023
Large Language Models as Commonsense Knowledge for Large-Scale Task
  Planning
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
Zirui Zhao
W. Lee
David Hsu
LRMLLMAGLM&Ro
111
227
0
23 May 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
152
377
0
22 May 2023
Synthesizing Diverse Human Motions in 3D Indoor Scenes
Synthesizing Diverse Human Motions in 3D Indoor Scenes
Kaifeng Zhao
Yan Zhang
Shaofei Wang
Thabo Beeler
Siyu Tang
106
70
0
21 May 2023
RL4F: Generating Natural Language Feedback with Reinforcement Learning
  for Repairing Model Outputs
RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
Afra Feyza Akyürek
Ekin Akyürek
Aman Madaan
Ashwin Kalyan
Peter Clark
Derry Wijaya
Niket Tandon
ALMKELM
103
101
0
15 May 2023
Replicating Complex Dialogue Policy of Humans via Offline Imitation
  Learning with Supervised Regularization
Replicating Complex Dialogue Policy of Humans via Offline Imitation Learning with Supervised Regularization
Zhoujian Sun
Chenyang Zhao
Zheng-Wei Huang
Nai Ding
OffRL
44
1
0
06 May 2023
Causal Decision Transformer for Recommender Systems via Offline
  Reinforcement Learning
Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning
Siyu Wang
Xiaocong Chen
Dietmar Jannach
Lina Yao
CMLOffRL
114
28
0
17 Apr 2023
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
Evaluating the Logical Reasoning Ability of ChatGPT and GPT-4
Hanmeng Liu
Ruoxi Ning
Zhiyang Teng
Jian Liu
Qiji Zhou
Yuexin Zhang
ELMReLMLRM
121
258
0
07 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
92
19
0
04 Apr 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAGKELM
139
1,322
0
20 Mar 2023
Pretraining Language Models with Human Preferences
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALMSyDa
96
230
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence
  Minimization
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
99
76
0
16 Feb 2023
Execution-based Code Generation using Deep Reinforcement Learning
Execution-based Code Generation using Deep Reinforcement Learning
Parshin Shojaee
Aneesh Jain
Sindhu Tipirneni
Chandan K. Reddy
128
58
0
31 Jan 2023
Optimizing DDPM Sampling with Shortcut Fine-Tuning
Optimizing DDPM Sampling with Shortcut Fine-Tuning
Ying Fan
Kangwook Lee
108
60
0
31 Jan 2023
Response-act Guided Reinforced Dialogue Generation for Mental Health
  Counseling
Response-act Guided Reinforced Dialogue Generation for Mental Health Counseling
Aseem Srivastava
Ishan Pandey
Md. Shad Akhtar
Tanmoy Chakraborty
OffRL
73
13
0
30 Jan 2023
12345
Next