Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.14555
Cited By
V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL
27 October 2021
Chi Jin
Qinghua Liu
Yuanhao Wang
Tiancheng Yu
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL"
41 / 41 papers shown
Title
Improving LLM General Preference Alignment via Optimistic Online Mirror Descent
Yuheng Zhang
Dian Yu
Tao Ge
Linfeng Song
Zhichen Zeng
Haitao Mi
Nan Jiang
Dong Yu
95
4
0
24 Feb 2025
Incentivize without Bonus: Provably Efficient Model-based Online Multi-agent RL for Markov Games
Tong Yang
Bo Dai
Lin Xiao
Yuejie Chi
OffRL
90
2
0
13 Feb 2025
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
Emile Anand
Ishani Karmarkar
Guannan Qu
109
2
0
01 Dec 2024
The Bandit Whisperer: Communication Learning for Restless Bandits
Yunfan Zhao
Tonghan Wang
Dheeraj M. Nagaraj
Aparna Taneja
Milind Tambe
81
5
0
11 Aug 2024
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
78
2
0
14 Jul 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
113
16
0
30 Jun 2024
Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
Muhammad Aneeq uz Zaman
Alec Koppel
Mathieu Laurière
Tamer Basar
52
3
0
17 Mar 2024
Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
Weichao Mao
Tamer Basar
50
66
0
12 Oct 2021
When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?
Ziang Song
Song Mei
Yu Bai
86
67
0
08 Oct 2021
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
Chi Jin
Qinghua Liu
Tiancheng Yu
50
50
0
07 Jun 2021
Decentralized Q-Learning in Zero-sum Markov Games
M. O. Sayin
Kai Zhang
David S. Leslie
Tamer Basar
Asuman Ozdaglar
38
83
0
04 Jun 2021
Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games
Chen-Yu Wei
Chung-Wei Lee
Mengxiao Zhang
Haipeng Luo
40
82
0
08 Feb 2021
Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms
Chi Jin
Qinghua Liu
Sobhan Miryoosefi
OffRL
72
216
0
01 Feb 2021
Independent Policy Gradient Methods for Competitive Reinforcement Learning
C. Daskalakis
Dylan J. Foster
Noah Golowich
143
161
0
11 Jan 2021
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
Qinghua Liu
Tiancheng Yu
Yu Bai
Chi Jin
58
122
0
04 Oct 2020
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
Kai Zhang
Sham Kakade
Tamer Bacsar
Lin F. Yang
84
122
0
15 Jul 2020
Near-Optimal Reinforcement Learning with Self-Play
Yunru Bai
Chi Jin
Tiancheng Yu
119
131
0
22 Jun 2020
No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium
A. Celli
A. Marchesi
Gabriele Farina
N. Gatti
72
46
0
01 Apr 2020
Learning Near Optimal Policies with Low Inherent Bellman Error
Andrea Zanette
A. Lazaric
Mykel Kochenderfer
Emma Brunskill
OffRL
63
222
0
29 Feb 2020
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
Qiaomin Xie
Yudong Chen
Zhaoran Wang
Zhuoran Yang
105
125
0
17 Feb 2020
Provable Self-Play Algorithms for Competitive Reinforcement Learning
Yu Bai
Chi Jin
SSL
119
149
0
10 Feb 2020
Emergent Tool Use From Multi-Agent Autocurricula
Bowen Baker
I. Kanitscheider
Todor Markov
Yi Wu
Glenn Powell
Bob McGrew
Igor Mordatch
LRM
62
649
0
17 Sep 2019
Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
Aaron Sidford
Mengdi Wang
Lin F. Yang
Yinyu Ye
45
70
0
29 Aug 2019
Provably Efficient Reinforcement Learning with Linear Function Approximation
Chi Jin
Zhuoran Yang
Zhaoran Wang
Michael I. Jordan
76
549
0
11 Jul 2019
Feature-Based Q-Learning for Two-Player Stochastic Games
Zeyu Jia
Lin F. Yang
Mengdi Wang
53
45
0
02 Jun 2019
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
Kyunghwan Son
Daewoo Kim
Wan Ju Kang
D. Hostallero
Yung Yi
OffRL
50
799
0
14 May 2019
Learning to Collaborate in Markov Decision Processes
Goran Radanović
R. Devidze
David C. Parkes
Adish Singla
60
33
0
23 Jan 2019
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Shariq Iqbal
Fei Sha
57
743
0
05 Oct 2018
Is Q-learning Provably Efficient?
Chi Jin
Zeyuan Allen-Zhu
Sébastien Bubeck
Michael I. Jordan
OffRL
52
801
0
10 Jul 2018
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Tabish Rashid
Mikayel Samvelyan
Christian Schroeder de Witt
Gregory Farquhar
Jakob N. Foerster
Shimon Whiteson
118
1,662
0
30 Mar 2018
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents
Kai Zhang
Zhuoran Yang
Han Liu
Tong Zhang
Tamer Basar
66
584
0
23 Feb 2018
Online Reinforcement Learning in Stochastic Games
Chen-Yu Wei
Yi-Te Hong
Chi-Jen Lu
OffRL
27
120
0
02 Dec 2017
Value-Decomposition Networks For Cooperative Multi-Agent Learning
P. Sunehag
Guy Lever
A. Gruslys
Wojciech M. Czarnecki
V. Zambaldi
...
Marc Lanctot
Nicolas Sonnerat
Joel Z Leibo
K. Tuyls
T. Graepel
64
997
0
16 Jun 2017
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Ryan J. Lowe
Yi Wu
Aviv Tamar
J. Harb
Pieter Abbeel
Igor Mordatch
116
4,441
0
07 Jun 2017
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Christoph Dann
Tor Lattimore
Emma Brunskill
60
307
0
22 Mar 2017
Minimax Regret Bounds for Reinforcement Learning
M. G. Azar
Ian Osband
Rémi Munos
65
771
0
16 Mar 2017
Contextual Decision Processes with Low Bellman Rank are PAC-Learnable
Nan Jiang
A. Krishnamurthy
Alekh Agarwal
John Langford
Robert Schapire
90
417
0
29 Oct 2016
Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
Shai Shalev-Shwartz
Shaked Shammah
Amnon Shashua
40
830
0
11 Oct 2016
On Lower Bounds for Regret in Reinforcement Learning
Ian Osband
Benjamin Van Roy
63
101
0
09 Aug 2016
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
Gergely Neu
181
182
0
10 Jun 2015
Generalization and Exploration via Randomized Value Functions
Ian Osband
Benjamin Van Roy
Zheng Wen
67
314
0
04 Feb 2014
1