ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.05949
  4. Cited By
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear
  Contextual Bandits and Markov Decision Processes

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

12 December 2022
Chen Ye
Wei Xiong
Quanquan Gu
Tong Zhang
ArXivPDFHTML

Papers citing "Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes"

25 / 25 papers shown
Title
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
Chenlu Ye
Yujia Jin
Alekh Agarwal
Tong Zhang
103
0
0
04 Feb 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
85
44
0
31 Dec 2024
Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits
Variance-Aware Linear UCB with Deep Representation for Neural Contextual Bandits
H. Bui
Enrique Mallada
Anqi Liu
108
0
0
08 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
Uncertainty-based Offline Variational Bayesian Reinforcement Learning
  for Robustness under Diverse Data Corruptions
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions
Rui Yang
Jie Wang
Guoping Wu
Yangqiu Song
AAML
OffRL
34
1
0
01 Nov 2024
How Does Variance Shape the Regret in Contextual Bandits?
How Does Variance Shape the Regret in Contextual Bandits?
Zeyu Jia
Jian Qian
Alexander Rakhlin
Chen-Yu Wei
35
4
0
16 Oct 2024
Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent
  Misspecification
Corruption-Robust Linear Bandits: Minimax Optimality and Gap-Dependent Misspecification
Haolin Liu
Artin Tajdini
Andrew Wagenmaker
Chen-Yu Wei
31
0
0
10 Oct 2024
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust
  Dueling
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
Yuwei Cheng
Fan Yao
Xuefeng Liu
Haifeng Xu
46
1
0
18 May 2024
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
Qiwei Di
Jiafan He
Quanquan Gu
29
1
0
16 Apr 2024
Distributionally Robust Reinforcement Learning with Interactive Data
  Collection: Fundamental Hardness and Near-Optimal Algorithm
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Miao Lu
Han Zhong
Tong Zhang
Jose H. Blanchet
OffRL
OOD
73
4
0
04 Apr 2024
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Corruption-Robust Offline Two-Player Zero-Sum Markov Games
Andi Nika
Debmalya Mandal
Adish Singla
Goran Radanović
OffRL
34
2
0
04 Mar 2024
Towards Robust Model-Based Reinforcement Learning Against Adversarial
  Corruption
Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
Chen Ye
Jiafan He
Quanquan Gu
Tong Zhang
46
5
0
14 Feb 2024
Online Iterative Reinforcement Learning from Human Feedback with General
  Preference Model
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
38
9
0
11 Feb 2024
Corruption Robust Offline Reinforcement Learning with Human Feedback
Corruption Robust Offline Reinforcement Learning with Human Feedback
Debmalya Mandal
Andi Nika
Parameswaran Kamalaruban
Adish Singla
Goran Radanović
OffRL
30
8
0
09 Feb 2024
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits
Yuko Kuroki
Alberto Rumi
Taira Tsuchiya
Fabio Vitale
Nicolò Cesa-Bianchi
36
5
0
24 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
38
161
0
18 Dec 2023
Corruption-Robust Offline Reinforcement Learning with General Function
  Approximation
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Chen Ye
Rui Yang
Quanquan Gu
Tong Zhang
OffRL
33
17
0
23 Oct 2023
Towards Robust Offline Reinforcement Learning under Diverse Data
  Corruption
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
Rui Yang
Han Zhong
Jiawei Xu
Amy Zhang
Chong Zhang
Lei Han
Tong Zhang
OffRL
OnRL
41
15
0
19 Oct 2023
Pessimistic Nonlinear Least-Squares Value Iteration for Offline
  Reinforcement Learning
Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning
Qiwei Di
Heyang Zhao
Jiafan He
Quanquan Gu
OffRL
55
5
0
02 Oct 2023
Optimal Sample Selection Through Uncertainty Estimation and Its
  Application in Deep Learning
Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning
Yong Lin
Chen Liu
Chen Ye
Qing Lian
Yuan Yao
Tong Zhang
27
4
0
05 Sep 2023
On the Model-Misspecification in Reinforcement Learning
On the Model-Misspecification in Reinforcement Learning
Yunfan Li
Lin F. Yang
36
5
0
19 Jun 2023
Robust Lipschitz Bandits to Adversarial Corruptions
Robust Lipschitz Bandits to Adversarial Corruptions
Yue Kang
Cho-Jui Hsieh
T. C. Lee
AAML
30
8
0
29 May 2023
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
Andrea Zanette
OffRL
16
14
0
10 Nov 2022
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial
  Corruptions
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions
Jiafan He
Dongruo Zhou
Tong Zhang
Quanquan Gu
66
46
0
13 May 2022
Fast Rates in Pool-Based Batch Active Learning
Fast Rates in Pool-Based Batch Active Learning
Claudio Gentile
Zhilei Wang
Tong Zhang
24
14
0
11 Feb 2022
1