ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.16129
  4. Cited By
MARFT: Multi-Agent Reinforcement Fine-Tuning
v1v2v3 (latest)

MARFT: Multi-Agent Reinforcement Fine-Tuning

21 April 2025
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "MARFT: Multi-Agent Reinforcement Fine-Tuning"

50 / 57 papers shown
Title
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Mengkang Hu
Yuhang Zhou
Wendong Fan
Yuzhou Nie
Bowei Xia
...
Yifeng Wang
Qianshuo Ye
Bernard Ghanem
Ping Luo
Guohao Li
90
10
0
29 May 2025
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration
Zijun Liu
Zhennan Wan
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
LLMAG
74
0
0
27 May 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
225
38
0
17 Mar 2025
Interactive Debugging and Steering of Multi-Agent AI Systems
Will Epperson
Gagan Bansal
Victor C. Dibia
Adam Fourney
Jack Gerrits
Erkang Zhu
Saleema Amershi
105
7
0
03 Mar 2025
Networked Agents in the Dark: Team Value Learning under Partial Observability
Networked Agents in the Dark: Team Value Learning under Partial Observability
G. Varela
Alberto Sardinha
Francisco S. Melo
55
1
0
15 Jan 2025
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
  Reinforcement Learning Perspective
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Xuanjing Huang
Xipeng Qiu
ELMAI4TSLRM
149
36
0
18 Dec 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large
  Language Models
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Bo Liu
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
Weinan Zhang
LRM
93
39
0
12 Oct 2024
Qwen2.5-Coder Technical Report
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
115
336
0
18 Sep 2024
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Jiarui Lu
Thomas Holleis
Yizhe Zhang
Bernhard Aumayer
Feng Nan
...
Shen Ma
Mengyu Li
Guoli Yin
Zirui Wang
Ruoming Pang
LLMAGELM
102
39
0
08 Aug 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future
Haolin Jin
Linghan Huang
Haipeng Cai
Jun Yan
Bo Li
Huaming Chen
149
37
0
05 Aug 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLMVLMMU
194
981
0
15 Jul 2024
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Bo Liu
Weinan Zhang
Jun Wang
Ying Wen
78
10
0
23 May 2024
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
Xibin Wu
Weixun Wang
OpenLLMAI Team
Dehao Zhang
Yu Cao
AI4CEVLM
106
130
0
20 May 2024
Octopus: On-device language model for function calling of software APIs
Octopus: On-device language model for function calling of software APIs
Wei Chen
Zhiyuan Li
Mingyuan Ma
LLMAG
102
16
0
02 Apr 2024
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
Wei Tao
Yucheng Zhou
Yanlin Wang
Wenqiang Zhang
Hongyu Zhang
Yu Cheng
LLMAG
101
46
0
26 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
127
449
0
12 Mar 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent
  Reinforcement
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
57
3
0
09 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
167
1,288
0
05 Feb 2024
GAIA: a benchmark for General AI Assistants
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MHALMELMRALM
94
185
0
21 Nov 2023
DSPy: Compiling Declarative Language Model Calls into Self-Improving
  Pipelines
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab
Arnav Singhvi
Paridhi Maheshwari
Zhiyuan Zhang
Keshav Santhanam
...
Thomas T. Joshi
Hanna Moazam
Heather Miller
Matei A. Zaharia
Christopher Potts
RALM
89
280
0
05 Oct 2023
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Chrisantha Fernando
Dylan Banarse
Henryk Michalewski
Simon Osindero
Tim Rocktaschel
LLMAGReLMLRM
83
208
0
28 Sep 2023
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Qingyun Wu
Gagan Bansal
Jieyu Zhang
Yiran Wu
Beibin Li
...
Jiale Liu
Ahmed Hassan Awadallah
Ryen W. White
Doug Burger
Chi Wang
LLMAGAI4CE
103
391
0
16 Aug 2023
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Sirui Hong
Mingchen Zhuge
Jonathan Chen
Xiawu Zheng
Yuheng Cheng
...
Liyang Zhou
Chenyu Ran
Lingfeng Xiao
Chenglin Wu
Jürgen Schmidhuber
LLMAGAIFin
103
548
0
01 Aug 2023
ChatDev: Communicative Agents for Software Development
ChatDev: Communicative Agents for Software Development
Cheng Qian
Wei Liu
Hongzhang Liu
Nuo Chen
Yufan Dang
...
Xin Cong
Juyuan Xu
Dahai Li
Zhiyuan Liu
Maosong Sun
LLMAG
98
219
0
16 Jul 2023
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Zhao Mandi
Shreeya Jain
Shuran Song
LM&RoLLMAG
73
140
0
10 Jul 2023
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
AD-AutoGPT: An Autonomous GPT for Alzheimer's Disease Infodemiology
Haixing Dai
Yiwei Li
Zheng Liu
Lin Zhao
Zihao Wu
...
Quanzheng Li
Zhuo Chen
D. Zhang
Gengchen Mai
Tianming Liu
LM&MA
92
30
0
16 Jun 2023
Gorilla: Large Language Model Connected with Massive APIs
Gorilla: Large Language Model Connected with Massive APIs
Shishir G. Patil
Tianjun Zhang
Xin Wang
Joseph E. Gonzalez
ELMCLLALMSyDa
93
568
0
24 May 2023
An Empirical Study on Google Research Football Multi-agent Scenarios
An Empirical Study on Google Research Football Multi-agent Scenarios
Yan Song
He Jiang
Zheng Tian
Haifeng Zhang
Yingping Zhang
Jiangcheng Zhu
Zonghong Dai
Weinan Zhang
Jun Wang
63
6
0
16 May 2023
Order Matters: Agent-by-agent Policy Optimization
Order Matters: Agent-by-agent Policy Optimization
Xihuai Wang
Zheng Tian
Bo Liu
Ying Wen
Jun Wang
Weinan Zhang
73
29
0
13 Feb 2023
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools
Timo Schick
Jane Dwivedi-Yu
Roberto Dessì
Roberta Raileanu
Maria Lomeli
Luke Zettlemoyer
Nicola Cancedda
Thomas Scialom
SyDaRALM
164
1,772
0
09 Feb 2023
Grounding Large Language Models in Interactive Environments with Online
  Reinforcement Learning
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta
Clément Romac
Thomas Wolf
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LM&RoLLMAG
88
194
0
06 Feb 2023
WebShop: Towards Scalable Real-World Web Interaction with Grounded
  Language Agents
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao
Howard Chen
John Yang
Karthik Narasimhan
LLMAGLM&Ro
168
518
0
04 Jul 2022
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Muning Wen
J. Kuba
Runji Lin
Weinan Zhang
Ying Wen
Jun Wang
Yaodong Yang
96
188
0
30 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
888
13,207
0
04 Mar 2022
Communication-Efficient Actor-Critic Methods for Homogeneous Markov
  Games
Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games
Dingyang Chen
Yile Li
Qi Zhang
OffRL
99
11
0
18 Feb 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
353
4,598
0
27 Oct 2021
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning
J. Kuba
Ruiqing Chen
Munning Wen
Ying Wen
Fanglei Sun
Jun Wang
Yaodong Yang
115
245
0
23 Sep 2021
Settling the Variance of Multi-Agent Policy Gradients
Settling the Variance of Multi-Agent Policy Gradients
J. Kuba
Muning Wen
Yaodong Yang
Linghui Meng
Shangding Gu
Haifeng Zhang
D. Mguni
Jun Wang
64
65
0
19 Aug 2021
MALib: A Parallel Framework for Population-based Multi-agent
  Reinforcement Learning
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning
Ming Zhou
Bo Liu
Hanjing Wang
Muning Wen
Runzhe Wu
Ying Wen
Yaodong Yang
Weinan Zhang
Jun Wang
OffRL
61
49
0
05 Jun 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
194
2,407
0
05 Mar 2021
The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
Chao Yu
Akash Velu
Eugene Vinitsky
Jiaxuan Gao
Yu Wang
Alexandre M. Bayen
Yi Wu
OffRL
158
1,272
0
02 Mar 2021
Independent Policy Gradient Methods for Competitive Reinforcement
  Learning
Independent Policy Gradient Methods for Competitive Reinforcement Learning
C. Daskalakis
Dylan J. Foster
Noah Golowich
238
163
0
11 Jan 2021
On the Utility of Learning about Humans for Human-AI Coordination
On the Utility of Learning about Humans for Human-AI Coordination
Micah Carroll
Rohin Shah
Mark K. Ho
Thomas Griffiths
Sanjit A. Seshia
Pieter Abbeel
Anca Dragan
HAI
71
403
0
13 Oct 2019
Deep Reinforcement Learning for Swarm Systems
Deep Reinforcement Learning for Swarm Systems
Maximilian Hüttenrauch
Adrian Šošić
Gerhard Neumann
48
198
0
17 Jul 2018
VirtualHome: Simulating Household Activities via Programs
VirtualHome: Simulating Household Activities via Programs
Xavier Puig
K. Ra
Marko Boben
Jiaman Li
Tingwu Wang
Sanja Fidler
Antonio Torralba
LM&Ro
100
500
0
19 Jun 2018
Fully Decentralized Multi-Agent Reinforcement Learning with Networked
  Agents
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
Kai Zhang
Zhuoran Yang
Han Liu
Tong Zhang
Tamer Basar
108
592
0
23 Feb 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
550
19,296
0
20 Jul 2017
Emergence of Locomotion Behaviours in Rich Environments
Emergence of Locomotion Behaviours in Rich Environments
N. Heess
TB Dhruva
S. Sriram
Jay Lemmon
J. Merel
...
Tom Erez
Ziyun Wang
S. M. Ali Eslami
Martin Riedmiller
David Silver
208
938
0
07 Jul 2017
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Ryan J. Lowe
Yi Wu
Aviv Tamar
J. Harb
Pieter Abbeel
Igor Mordatch
162
4,520
0
07 Jun 2017
Counterfactual Multi-Agent Policy Gradients
Counterfactual Multi-Agent Policy Gradients
Jakob N. Foerster
Gregory Farquhar
Triantafyllos Afouras
Nantas Nardelli
Shimon Whiteson
156
2,090
0
24 May 2017
12
Next