ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.02718
  4. Cited By
Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems

Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems

3 June 2025
Guanzhong Chen
Shaoxiong Yang
Chao Li
Wei Liu
Jian Luan
Zenglin Xu
Author Contacts:
muxichenz@outlook.comyangshaoxiong@xiaomi.comlichao75@xiaomi.comliuwei40@xiaomi.comluanjian@xiaomi.comzenglinxu@fudan.edu.cn
ArXiv (abs)PDFHTML

Papers citing "Heterogeneous Group-Based Reinforcement Learning for LLM-based Multi-Agent Systems"

16 / 16 papers shown
Title
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
112
4
0
21 Apr 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
212
31
0
17 Mar 2025
ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
Zhao Xinjie
Fan Gao
Rui Yang
Yingjian Chen
Yuyang Wang
Ying Zhu
Jiacheng Tang
Irene Li
Y. Matsuo
Irene Li
KELMLRM
91
1
0
10 Mar 2025
Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering
Talk to Right Specialists: Routing and Planning in Multi-agent System for Question Answering
Feijie Wu
Zitao Li
Fei Wei
Yaliang Li
Bolin Ding
Jing Gao
55
4
0
14 Jan 2025
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
138
1,119
0
05 Feb 2024
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Mandar Kulkarni
Praveen Tangarajan
Kyung Kim
Anusua Trivedi
OffRLRALMSILM
56
30
0
10 Jan 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DVRALM
177
1,776
1
18 Dec 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
302
11,894
0
18 Jul 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,247
0
27 Feb 2023
Unsupervised Dense Information Retrieval with Contrastive Learning
Unsupervised Dense Information Retrieval with Contrastive Learning
Gautier Izacard
Mathilde Caron
Lucas Hosseini
Sebastian Riedel
Piotr Bojanowski
Armand Joulin
Edouard Grave
RALM
195
907
0
16 Dec 2021
MuSiQue: Multihop Questions via Single-hop Question Composition
MuSiQue: Multihop Questions via Single-hop Question Composition
H. Trivedi
Niranjan Balasubramanian
Tushar Khot
Ashish Sabharwal
LRM
110
278
0
02 Aug 2021
The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games
Chao Yu
Akash Velu
Eugene Vinitsky
Jiaxuan Gao
Yu Wang
Alexandre M. Bayen
Yi Wu
OffRL
137
1,252
0
02 Mar 2021
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of
  Reasoning Steps
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Xanh Ho
A. Nguyen
Saku Sugawara
Akiko Aizawa
RALMLRM
78
451
0
02 Nov 2020
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question
  Answering
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
174
2,655
0
25 Sep 2018
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
517
19,065
0
20 Jul 2017
High-Dimensional Continuous Control Using Generalized Advantage
  Estimation
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
101
3,414
0
08 Jun 2015
1