ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.07157
  4. Cited By
Natural Language Reinforcement Learning

Natural Language Reinforcement Learning

11 February 2024
Xidong Feng
Bo Liu
Mengyue Yang
Ziyan Wang
Girish A. Koushiks
Yali Du
Ying Wen
Jun Wang
    OffRL
ArXivPDFHTML

Papers citing "Natural Language Reinforcement Learning"

50 / 56 papers shown
Title
debug-gym: A Text-Based Environment for Interactive Debugging
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
83
2
0
27 Mar 2025
Generative Reward Models
Generative Reward Models
Dakota Mahan
Duy Phung
Rafael Rafailov
Chase Blagden
Nathan Lile
Louis Castricato
Jan-Philipp Fränken
Chelsea Finn
Alon Albalak
VLM
SyDa
OffRL
34
36
0
02 Oct 2024
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
  Reinforcement Learning and Monte-Carlo Tree Search
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin
Zhaochun Ren
Junxiao Song
Zhihong Shao
Wanjia Zhao
...
Dejian Yang
Zhibin Gou
Z. F. Wu
Fuli Luo
Chong Ruan
AIMat
LRM
61
59
0
15 Aug 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Pranav Putta
Edmund Mills
Naman Garg
S. Motwani
Chelsea Finn
Divyansh Garg
Rafael Rafailov
LLMAG
LRM
43
76
0
13 Aug 2024
TextGrad: Automatic "Differentiation" via Text
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAG
OOD
AI4CE
51
39
0
11 Jun 2024
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf
Xuanfa Jin
Ziyan Wang
Yali Du
Meng Fang
Haifeng Zhang
Jun Wang
OffRL
LLMAG
78
6
0
30 May 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
45
3
0
19 Feb 2024
Policy Improvement using Language Feedback Models
Policy Improvement using Language Feedback Models
Victor Zhong
Dipendra Kumar Misra
Xingdi Yuan
Marc-Alexandre Côté
23
10
0
12 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
87
37
0
02 Feb 2024
Large Language Models for Generative Information Extraction: A Survey
Large Language Models for Generative Information Extraction: A Survey
Derong Xu
Wei-neng Chen
Wenjun Peng
Chao Zhang
Tong Xu
Xiangyu Zhao
Xian Wu
Yefeng Zheng
Yang Wang
Enhong Chen
113
154
0
29 Dec 2023
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Filippos Christianos
Georgios Papoudakis
Matthieu Zimmer
Thomas Coste
Zhihao Wu
...
Yicheng Luo
Jianye Hao
Kun Shao
Haitham Bou-Ammar
Jun Wang
46
20
0
22 Dec 2023
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
LLF-Bench: Benchmark for Interactive Learning from Language Feedback
Ching-An Cheng
Andrey Kolobov
Dipendra Kumar Misra
Allen Nie
Adith Swaminathan
37
19
0
11 Dec 2023
Can language agents be alternatives to PPO? A Preliminary Empirical
  Study On OpenAI Gym
Can language agents be alternatives to PPO? A Preliminary Empirical Study On OpenAI Gym
Junjie Sheng
Zixiao Huang
Chuyun Shen
Wenhao Li
Yun Hua
Bo Jin
Hongyuan Zha
Xiangfeng Wang
59
1
0
06 Dec 2023
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
  Models
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAG
OffRL
LRM
49
34
0
30 Nov 2023
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in
  AlphaZero
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero
Lisa Schut
Nenad Tomašev
Tom McGrath
Demis Hassabis
Ulrich Paquet
Been Kim
26
35
0
25 Oct 2023
Vision-Language Models are Zero-Shot Reward Models for Reinforcement
  Learning
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde
Victoriano Montesinos
Elvis Nava
Ethan Perez
David Lindner
VLM
42
81
0
19 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
39
81
0
09 Oct 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation
  Tasks
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang
Yishan Li
Ge Zhang
Wenhao Huang
Bill Yuchen Lin
Wenhu Chen
ALM
61
64
0
01 Oct 2023
Alphazero-like Tree-Search can Guide Large Language Model Decoding and
  Training
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
Xidong Feng
Bo Liu
Muning Wen
Stephen Marcus McAleer
Ying Wen
Weinan Zhang
Jun Wang
LRM
AI4CE
40
170
0
29 Sep 2023
State2Explanation: Concept-Based Explanations to Benefit Agent Learning
  and User Understanding
State2Explanation: Concept-Based Explanations to Benefit Agent Learning and User Understanding
Devleena Das
Sonia Chernova
Been Kim
LRM
LLMAG
72
22
0
21 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
95
2,049
0
12 Sep 2023
Large Language Models as Optimizers
Large Language Models as Optimizers
Chengrun Yang
Xuezhi Wang
Yifeng Lu
Hanxiao Liu
Quoc V. Le
Denny Zhou
Xinyun Chen
ODL
53
395
0
07 Sep 2023
AgentBench: Evaluating LLMs as Agents
AgentBench: Evaluating LLMs as Agents
Xiao Liu
Hao Yu
Hanchen Zhang
Yifan Xu
Xuanyu Lei
...
Yu-Chuan Su
Huan Sun
Minlie Huang
Yuxiao Dong
Jie Tang
ELM
LLMAG
68
277
0
07 Aug 2023
Secrets of RLHF in Large Language Models Part I: PPO
Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng
Shihan Dou
Songyang Gao
Yuan Hua
Wei Shen
...
Hang Yan
Tao Gui
Qi Zhang
Xipeng Qiu
Xuanjing Huang
ALM
OffRL
57
163
0
11 Jul 2023
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Konstantinos Bousmalis
Giulia Vezzani
Dushyant Rao
Coline Devin
Alex X. Lee
...
Martin Riedmiller
Jost Tobias Springenberg
R. Hadsell
F. Nori
N. Heess
LM&Ro
12
49
0
20 Jun 2023
ChessGPT: Bridging Policy Learning and Language Modeling
ChessGPT: Bridging Policy Learning and Language Modeling
Xidong Feng
Yicheng Luo
Ziyan Wang
Hongrui Tang
Mengyue Yang
Kun Shao
D. Mguni
Yali Du
Jun Wang
37
41
0
15 Jun 2023
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning
  Optimization
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
...
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
ALM
ELM
82
237
0
08 Jun 2023
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang
Yuqi Xie
Yunfan Jiang
Ajay Mandlekar
Chaowei Xiao
Yuke Zhu
Linxi Fan
Anima Anandkumar
LM&Ro
SyDa
77
781
0
25 May 2023
Reasoning with Language Model is Planning with World Model
Reasoning with Language Model is Planning with World Model
Shibo Hao
Yi Gu
Haodi Ma
Joshua Jiahua Hong
Zhen Wang
D. Wang
Zhiting Hu
ReLM
LRM
LLMAG
87
539
0
24 May 2023
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao
Dian Yu
Jeffrey Zhao
Izhak Shafran
Thomas Griffiths
Yuan Cao
Karthik Narasimhan
LM&Ro
LRM
AI4CE
80
1,850
0
17 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
146
1,172
0
17 May 2023
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLM
LRM
DiffM
95
1,549
0
30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAG
KELM
30
1,190
0
20 Mar 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
337
13,788
0
15 Mar 2023
Reward Design with Language Models
Reward Design with Language Models
Minae Kwon
Sang Michael Xie
Kalesha Bullard
Dorsa Sadigh
LM&Ro
79
209
0
27 Feb 2023
Benchmarking Large Language Models for News Summarization
Benchmarking Large Language Models for News Summarization
Tianyi Zhang
Faisal Ladhak
Esin Durmus
Percy Liang
Kathleen McKeown
Tatsunori B. Hashimoto
ELM
57
501
0
31 Jan 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
331
2,709
0
06 Oct 2022
Inner Monologue: Embodied Reasoning through Planning with Language
  Models
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang
F. Xia
Ted Xiao
Harris Chan
Jacky Liang
...
Tomas Jackson
Linda Luu
Sergey Levine
Karol Hausman
Brian Ichter
LLMAG
LM&Ro
LRM
62
880
0
12 Jul 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
166
2,428
0
15 Jun 2022
Spatial-temporal Concept based Explanation of 3D ConvNets
Spatial-temporal Concept based Explanation of 3D ConvNets
Yi Ji
Yu Wang
K. Mori
Jien Kato
3DPC
FAtt
39
7
0
09 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
627
12,525
0
04 Mar 2022
Retrieval-Augmented Reinforcement Learning
Retrieval-Augmented Reinforcement Learning
Anirudh Goyal
A. Friesen
Andrea Banino
T. Weber
Nan Rosemary Ke
...
Michal Valko
Simon Osindero
Timothy Lillicrap
N. Heess
Charles Blundell
OffRL
46
53
0
17 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
522
9,009
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
174
4,175
0
27 Oct 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
132
5,328
0
07 Jul 2021
A Survey of Embodied AI: From Simulators to Research Tasks
A Survey of Embodied AI: From Simulators to Research Tasks
Jiafei Duan
Samson Yu
Tangyao Li
Huaiyu Zhu
Cheston Tan
LM&Ro
45
280
0
08 Mar 2021
Learning Rewards from Linguistic Feedback
Learning Rewards from Linguistic Feedback
T. Sumers
Mark K. Ho
Robert D. Hawkins
Karthik Narasimhan
Thomas Griffiths
73
54
0
30 Sep 2020
What Matters In On-Policy Reinforcement Learning? A Large-Scale
  Empirical Study
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Marcin Andrychowicz
Anton Raichuk
Piotr Stańczyk
Manu Orsini
Sertan Girgin
...
Matthieu Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
Olivier Bachem
OffRL
36
217
0
10 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
369
41,106
0
28 May 2020
Bridging the Gap: Providing Post-Hoc Symbolic Explanations for
  Sequential Decision-Making Problems with Inscrutable Representations
Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations
S. Sreedharan
Utkarsh Soni
Mudit Verma
Siddharth Srivastava
S. Kambhampati
95
30
0
04 Feb 2020
12
Next