ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Communities
  3. ...

Neighbor communities

0 / 0 papers shown
Title
Top Contributors
Name# Papers# Citations
Social Events
DateLocationEvent
  1. Home
  2. Communities
  3. OffRL

Offline Reinforcement Learning

OffRL
More data

Offline Reinforcement Learning focuses on learning policies from previously collected data without further interaction with the environment.

Neighbor communities

51015

Featured Papers

0 / 0 papers shown
Title

All papers

50 / 8,940 papers shown
Title
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
OneTrans: Unified Feature Interaction and Sequence Modeling with One Transformer in Industrial Recommender
Zhaoqi Zhang
Haolei Pei
Jun Guo
Tianyu Wang
Yufei Feng
Hui Sun
Shaowei Liu
Aixin Sun
OffRL
4
0
0
30 Oct 2025
The End of Manual Decoding: Towards Truly End-to-End Language Models
The End of Manual Decoding: Towards Truly End-to-End Language Models
Zhichao Wang
Dongyang Ma
Xinting Huang
Deng Cai
Tian Lan
Jiahao Xu
Haitao Mi
Xiaoying Tang
Yan Wang
SyDaOffRL
0
0
0
30 Oct 2025
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation
Guanxing Lu
Rui Zhao
Haitao Lin
He Zhang
Yansong Tang
OffRL
0
0
0
30 Oct 2025
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error
Chenming Tang
Hsiu-Yuan Huang
Weijie Liu
Saiyong Yang
Yunfang Wu
OffRLLRM
4
0
0
30 Oct 2025
Data-Efficient RLVR via Off-Policy Influence Guidance
Data-Efficient RLVR via Off-Policy Influence Guidance
Erle Zhu
Dazhi Jiang
Yuan Wang
Xujun Li
Jiale Cheng
...
Yilin Niu
Aohan Zeng
Jie Tang
Minlie Huang
Hongning Wang
OffRL
4
0
0
30 Oct 2025
Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning
Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning
Masahiro Kato
OffRLCML
4
0
0
30 Oct 2025
Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle
Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle
Sebastian Zieglmeier
Niklas Erdmann
Narada D. Warakagoda
OffRL
0
0
0
30 Oct 2025
Engineering Social Optimality via Utility Shaping in Non-Cooperative Games under Incomplete Information and Imperfect Monitoring
Engineering Social Optimality via Utility Shaping in Non-Cooperative Games under Incomplete Information and Imperfect Monitoring
David Smith
Jie Dong
Yizhou Yang
OffRL
0
0
0
30 Oct 2025
Offline Clustering of Preference Learning with Active-data Augmentation
Offline Clustering of Preference Learning with Active-data Augmentation
Jingyuan Liu
Fatemeh Ghaffari
Xuchuang Wang
Mohammad Hajiesmaili
Carlee Joe-Wong
OffRL
0
0
0
30 Oct 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
4
0
0
30 Oct 2025
Think Outside the Policy: In-Context Steered Policy Optimization
Think Outside the Policy: In-Context Steered Policy Optimization
Hsiu-Yuan Huang
Chenming Tang
Weijie Liu
Saiyong Yang
Yunfang Wu
OffRL
4
0
0
30 Oct 2025
KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA
KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA
Zhuo Chen
Fei Wang
Zixuan Li
Zhao Zhang
Weiwei Ding
Chuanguang Yang
Yongjun Xu
Xiaolong Jin
Jiafeng Guo
OffRLLRM
0
0
0
29 Oct 2025
Zero Reinforcement Learning Towards General Domains
Zero Reinforcement Learning Towards General Domains
Yuyuan Zeng
Yufei Huang
Can Xu
Qingfeng Sun
Jianfeng Yan
Guanghui Xu
Tao Yang
Fengzong Lian
OffRLReLMLRMAI4CE
0
0
0
29 Oct 2025
Right for the Right Reasons: Avoiding Reasoning Shortcuts via Prototypical Neurosymbolic AI
Right for the Right Reasons: Avoiding Reasoning Shortcuts via Prototypical Neurosymbolic AI
Luca Andolfi
Eleonora Giunchiglia
NAIOffRL
0
0
0
29 Oct 2025
Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning
Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning
Kei Ikemura
Yifei Dong
David Blanco-Mulero
Alberta Longhini
Li Chen
Florian T. Pokorny
OffRL
0
0
0
29 Oct 2025
TheraMind: A Strategic and Adaptive Agent for Longitudinal Psychological Counseling
TheraMind: A Strategic and Adaptive Agent for Longitudinal Psychological Counseling
He Hu
Yucheng Zhou
Chiyuan Ma
Qianning Wang
Zheng Zhang
Fei Ma
Laizhong Cui
Qi Tian
OffRL
0
0
0
29 Oct 2025
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Likun Wang
Xiangteng Zhang
Yinuo Wang
Guojian Zhan
Wenxuan Wang
Haoyu Gao
Jingliang Duan
Shengbo Eben Li
OffRL
0
0
0
29 Oct 2025
$π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
πRLπ_\texttt{RL}πRL​: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
Kang Chen
Zhihao Liu
Tonghe Zhang
Zhen Guo
Si Xu
...
Zhaofei Yu
Guoliang Fan
Tiejun Huang
Yu Wang
Chao Yu
OffRLVLM
20
0
0
29 Oct 2025
Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
Metis-SPECS: Decoupling Multimodal Learning via Self-distilled Preference-based Cold Start
Kun Chen
Peng Shi
Haibo Qiu
Zhixiong Zeng
Siqi Yang
Wenji Mao
Lin Ma
OffRLVLMLRM
67
0
0
29 Oct 2025
Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills
Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills
Weikang Wan
Fabio Ramos
Xuning Yang
Caelan Garrett
OffRL
0
0
0
29 Oct 2025
Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains
Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains
Maik Overmars
Jasper Goseling
Richard Boucherie
OffRL
0
0
0
29 Oct 2025
Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs
Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs
Fei Wei
Daoyuan Chen
Ce Wang
Yilun Huang
Yushuo Chen
Xuchen Pan
Yaliang Li
Bolin Ding
OffRLLLMAG
8
0
0
29 Oct 2025
RLMEval: Evaluating Research-Level Neural Theorem Proving
RLMEval: Evaluating Research-Level Neural Theorem Proving
Auguste Poiroux
Antoine Bosselut
Viktor Kunčak
AIMatOffRL
61
0
0
29 Oct 2025
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation
Feichen Gan
Youcun Lu
Yingying Zhang
Yukun Liu
OffRL
0
0
0
29 Oct 2025
Scaling Latent Reasoning via Looped Language Models
Scaling Latent Reasoning via Looped Language Models
Rui-Jie Zhu
Zixuan Wang
Kai Hua
Tianyu Zhang
Ziniu Li
...
Tianle Cai
Ge Zhang
Wenhao Huang
Yoshua Bengio
Jason Eshraghian
ReLMOffRLLRM
4
0
0
29 Oct 2025
Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Maximilian Bloor
M. Mowbray
Ehecatl Antonio del Rio Chanona
Calvin Tsay
OffRL
0
0
0
28 Oct 2025
Sample-efficient and Scalable Exploration in Continuous-Time RL
Sample-efficient and Scalable Exploration in Continuous-Time RL
Klemens Iten
Lenart Treven
Bhavya Sukhija
Florian Dorfler
Andreas Krause
OffRL
0
0
0
28 Oct 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
Zhiheng Xi
Jixuan Huang
Xin Guo
Boyang Hong
Dingwen Yang
...
Jiecao Chen
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OffRLLRM
0
0
0
28 Oct 2025
Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning
Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning
Nitin Rai
Daeun
Choi
Nathan Boyd
Arnold W. Schumann
OffRLAI4CE
0
0
0
28 Oct 2025
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling
FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling
Zengzhuang Xu
Bingguang Hao
Zechuan Wang
Y. Wen
Maolin Wang
...
Chenyi Zhuang
Jinjie Gu
Leilei Gan
X. Zhao
Shi Gu
LLMAGOffRLLRM
0
0
0
28 Oct 2025
Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames
Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames
E. Zisselman
Mirco Mutti
Shelly Francis-Meretzki
Elisei Shafer
Aviv Tamar
OffRL
0
0
0
28 Oct 2025
LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies
LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies
Ximan Sun
Xiang Cheng
OffRL
0
0
0
28 Oct 2025
Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings
Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings
Seyed Mahdi Basiri Azad
Joschka Boedecker
OffRLOnRL
64
0
0
28 Oct 2025
Success and Cost Elicit Convention Formation for Efficient Communication
Success and Cost Elicit Convention Formation for Efficient Communication
Saujas Vaduguru
Yilun Hua
Yoav Artzi
Daniel Fried
OffRL
0
0
0
28 Oct 2025
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization
Guoxin Chen
Jing Wu
Xinjie Chen
Wayne Xin Zhao
Ruihua Song
Chengxi Li
Kai Fan
Dayiheng Liu
Minpeng Liao
AIMatOffRL
74
0
0
28 Oct 2025
Think Twice: Branch-and-Rethink Reasoning Reward Model
Think Twice: Branch-and-Rethink Reasoning Reward Model
Yizhu Jiao
Jiaqi Zeng
Julien Veron Vialard
Oleksii Kuchaiev
Jiawei Han
Olivier Delalleau
OffRLLRM
24
0
0
27 Oct 2025
Offline Preference Optimization via Maximum Marginal Likelihood Estimation
Offline Preference Optimization via Maximum Marginal Likelihood Estimation
Saeed Najafi
Alona Fyshe
OffRL
16
0
0
27 Oct 2025
A Survey on LLM Mid-training
A Survey on LLM Mid-training
Chengying Tu
Xuemiao Zhang
Rongxiang Weng
Rumei Li
Chen Zhang
Yang Bai
Hongfei Yan
Jingang Wang
Xunliang Cai
OffRLLRM
20
0
0
27 Oct 2025
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
Christos Thrampoulidis
Sadegh Mahdavi
Wenlong Deng
OffRL
12
0
0
27 Oct 2025
Learning to Reason Efficiently with Discounted Reinforcement Learning
Learning to Reason Efficiently with Discounted Reinforcement Learning
Alex Ayoub
Kavosh Asadi
Dale Schuurmans
Csaba Szepesvári
Karim Bouyarmane
OffRLLRM
16
0
0
27 Oct 2025
Latent Chain-of-Thought for Visual Reasoning
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun
Hang Hua
Jian Wang
Jiebo Luo
S. Dianat
Majid Rabbani
Raghuveer Rao
Zhiqiang Tao
BDLOffRLLRM
8
0
0
27 Oct 2025
RL-AUX: Reinforcement Learning for Auxiliary Task Generation
RL-AUX: Reinforcement Learning for Auxiliary Task Generation
Judah Goldfeder
Matthew So
Hod Lipson
OffRL
16
0
0
27 Oct 2025
VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation
VOLD: Reasoning Transfer from LLMs to Vision-Language Models via On-Policy Distillation
Walid Bousselham
Hilde Kuehne
Cordelia Schmid
OffRLLRMVLM
20
0
0
27 Oct 2025
VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations
VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations
Lu Dong
H. Zhang
Han Lin
Ziang Yan
Xiangyu Zeng
...
Yifei Huang
Yi Wang
Z. Ling
Limin Wang
Yali Wang
OffRL
16
0
0
27 Oct 2025
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Yuyang Ding
Chi Zhang
Juntao Li
H. Lin
Xin Liu
Min-Ling Zhang
OffRLLRM
16
0
0
26 Oct 2025
Guardian: Decoupling Exploration from Safety in Reinforcement Learning
Guardian: Decoupling Exploration from Safety in Reinforcement Learning
Kaitong Cai
Jusheng Zhang
Jing Yang
Keze Wang
OffRLOnRL
64
0
0
26 Oct 2025
Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM
Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM
Sai Krishna Ghanta
Ramviyas Parasuraman
OffRL
16
0
0
26 Oct 2025
Transitive RL: Value Learning via Divide and Conquer
Transitive RL: Value Learning via Divide and Conquer
S. Park
Aditya Oberai
P. Atreya
Sergey Levine
OffRL
12
0
0
26 Oct 2025
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
Shan Zhong
Shutong Ding
He Diao
Xiangyu Wang
Kah Chan Teh
Bei Peng
OffRL
16
0
0
26 Oct 2025
BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles
BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles
Seyed Ahmad Hosseini Miangoleh
Amin Jalal Aghdasian
Farzaneh Abdollahi
OffRL
16
0
0
25 Oct 2025
Loading #Papers per Month with "OffRL"
Past speakers
Name (-)
Top Contributors
Name (-)
Top Organizations at ResearchTrend.AI
Name (-)
Social Events
DateLocationEvent
No social events available