Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.17161
Cited By
v1
v2 (latest)
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
28 January 2025
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training"
50 / 133 papers shown
Title
BREAD: Branched Rollouts from Expert Anchors Bridge SFT & RL for Reasoning
Xuechen Zhang
Zijian Huang
Yingcong Li
Chenshun Ni
Jiasi Chen
Samet Oymak
OffRL
MoE
LRM
52
0
0
20 Jun 2025
Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs
Haoran Sun
Yankai Jiang
Wenjie Lou
Yujie Zhang
Wenjie Li
Lilong Wang
Mianxin Liu
Lei Liu
Xiaosong Wang
LRM
27
0
0
20 Jun 2025
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi
Fan Nie
Alexandre Alahi
James Zou
Himabindu Lakkaraju
Yilun Du
Eric P. Xing
Sham Kakade
Hanlin Zhang
59
1
0
19 Jun 2025
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling
Liran Ringel
Elad Tolochinsky
Yaniv Romano
LRM
31
0
0
12 Jun 2025
LEO-VL: Towards 3D Vision-Language Generalists via Data Scaling with Efficient Representation
J. Huang
Xiaojian Ma
Xiongkun Linghu
Yue Fan
Junchao He
...
Qing Li
Song-Chun Zhu
Yixin Chen
Baoxiong Jia
Siyuan Huang
88
0
0
11 Jun 2025
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
Hao Peng
Yunjia Qi
Xiaozhi Wang
Bin Xu
Lei Hou
Juanzi Li
OffRL
84
0
0
11 Jun 2025
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy Optimization
Zengjue Chen
Runliang Niu
He Kong
Qi Wang
68
0
0
10 Jun 2025
Reinforcement Learning Teachers of Test Time Scaling
Edoardo Cetin
Tianyu Zhao
Yujin Tang
OffRL
ReLM
LRM
70
0
0
10 Jun 2025
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan
Tengyang Xie
LRM
37
0
0
10 Jun 2025
ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption
Sungwoong Yune
Hyojeong Lee
Adiwena Putra
Hyunjun Cho
Cuong Duong Manh
Jaeho Jeon
Joo-Young Kim
28
0
0
10 Jun 2025
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Lu Ma
Hao Liang
Meiyi Qiang
Lexiang Tang
Xiaochen Ma
...
Junbo Niu
Chengyu Shen
Runming He
Bin Cui
Wentao Zhang
ReLM
OffRL
LRM
38
0
0
09 Jun 2025
Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning
Xuanyu Lei
Chenliang Li
Y. Wu
Kaiming Liu
Weizhou Shen
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
59
0
0
06 Jun 2025
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
S. Wang
Le Yu
Chang Gao
Chujie Zheng
Shixuan Liu
...
Yang Yue
S. Song
Bowen Yu
Gao Huang
Junyang Lin
LRM
76
9
0
02 Jun 2025
Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning
Yizhuo Zhang
Heng Wang
Shangbin Feng
Zhaoxuan Tan
Xinyun Liu
Yulia Tsvetkov
OffRL
92
0
0
01 Jun 2025
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning
Zihang Liu
Tianyu Pang
Oleg Balabanov
Chaoqun Yang
Tianjin Huang
L. Yin
Yaoqing Yang
Shiwei Liu
LRM
69
1
0
01 Jun 2025
Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
Yufa Zhou
S. Wang
Xingyu Dong
Xiangqi Jin
Yifang Chen
Yue Min
Kexin Yang
Xingzhang Ren
Dayiheng Liu
Linfeng Zhang
OffRL
LRM
37
0
0
31 May 2025
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang
Xinyu Zhu
Zilin Xiao
Minhao Jiang
Yu Meng
Jiawei Han
OffRL
ReLM
LRM
39
0
0
30 May 2025
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence
Guiyang Hou
Xing Gao
Yuchuan Wu
Xiang Huang
Wenqi Zhang
...
Yongliang Shen
Jialu Du
Fei Huang
Yongbin Li
Weiming Lu
53
0
0
30 May 2025
Self-Correcting Code Generation Using Small Language Models
Jeonghun Cho
Deokhyung Kang
Hyounghun Kim
Gary Lee
KELM
3DV
LRM
57
0
0
29 May 2025
Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R
2
^2
2
)GRPO
Ran Li
Shimin Di
Yuchen Liu
Chen Jing
Yu Qiu
Lei Chen
LRM
81
0
0
28 May 2025
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang
Zunnan Xu
Jun Zhou
Ting Liu
Yicheng Xiao
Mingwen Ou
Bowen Ji
Xiu Li
Kehong Yuan
VLM
93
0
0
28 May 2025
Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Chunyi Peng
Zhipeng Xu
Zhenghao Liu
Yishan Li
Yukun Yan
...
Zhiyuan Liu
Yu Gu
Minghe Yu
Ge Yu
Maosong Sun
LRM
112
1
0
28 May 2025
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning
Yongchao Chen
Y. Liu
Junwei Zhou
Yilun Hao
Jingquan Wang
Yang Zhang
Chuchu Fan
OffRL
ReLM
AI4TS
SyDa
ALM
LRM
79
0
0
27 May 2025
What Can RL Bring to VLA Generalization? An Empirical Study
Jijia Liu
Feng Gao
Bingwen Wei
Xinlei Chen
Qingmin Liao
Yi Wu
Chao Yu
Yu Wang
OffRL
320
0
0
26 May 2025
Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners
Jiabao Ji
Yongchao Chen
Yang Zhang
Ramana Rao Kompella
Chuchu Fan
Gaowen Liu
Shiyu Chang
126
0
0
26 May 2025
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
Tianle Li
Jihai Zhang
Yongming Rao
Yu Cheng
CoGe
LRM
VLM
104
0
0
26 May 2025
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective
Junnan Liu
Hongwei Liu
Linchen Xiao
Shudong Liu
Taolin Zhang
Zihan Ma
Songyang Zhang
Kai Chen
LRM
136
0
0
26 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRL
LRM
268
3
0
23 May 2025
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
Che Liu
Haozhe Wang
J. Pan
Zhongwei Wan
Yong Dai
Fangzhen Lin
Wenjia Bai
Daniel Rueckert
Rossella Arcucci
OffRL
LRM
ELM
118
1
0
23 May 2025
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Yutong Chen
Jiandong Gao
Ji Wu
ALM
228
0
0
23 May 2025
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Kaiwen Zhou
Xuandong Zhao
Gaowen Liu
Jayanth Srinivasa
Aosong Feng
Dawn Song
Xin Eric Wang
LRM
LLMSV
101
0
0
22 May 2025
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning
Shicheng Xu
Liang Pang
Yunchang Zhu
Jia Gu
Zihao Wei
Jingcheng Deng
Feiyang Pan
Huawei Shen
Xueqi Cheng
OffRL
LRM
119
0
0
22 May 2025
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
Huatong Song
Jinhao Jiang
Wenqing Tian
Zhongfu Chen
Yuhuan Wu
Jiahao Zhao
Yingqian Min
Wayne Xin Zhao
Lei Fang
Ji-Rong Wen
RALM
KELM
AI4TS
LRM
103
1
0
22 May 2025
RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals
Xuanliang Zhang
Dingzirui Wang
Keyan Xu
Qingfu Zhu
Wanxiang Che
LMTD
ReLM
LRM
121
0
0
21 May 2025
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao
Xingyu Sui
Yulin Hu
Jiahe Guo
Haixiao Liu
Biye Li
Yanyan Zhao
Bing Qin
Ting Liu
OffRL
115
1
0
21 May 2025
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
Wei Liu
Siya Qi
Xinyu Wang
Chen Qian
Yali Du
Yulan He
OffRL
LRM
97
0
0
21 May 2025
Procedural Environment Generation for Tool-Use Agents
Michael Sullivan
Mareike Hartmann
Alexander Koller
SyDa
35
0
0
21 May 2025
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning
Changtai Zhu
Siyin Wang
Ruijun Feng
Kai Song
Xipeng Qiu
LRM
95
0
0
21 May 2025
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
Jiaer Xia
Yuhang Zang
Peng Gao
Yixuan Li
Kaiyang Zhou
OffRL
ReLM
AI4TS
VLM
LRM
119
0
0
20 May 2025
Effective and Transparent RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
Jingyi Ren
Yekun Xu
Xiaolong Wang
Weitao Li
Weizhi Ma
Yang Liu
RALM
85
0
0
19 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
134
0
0
18 May 2025
Visual Planning: Let's Think Only with Images
Yi Xu
Chengzu Li
Han Zhou
Xingchen Wan
Caiqi Zhang
Anna Korhonen
Ivan Vulić
LM&Ro
LRM
170
1
0
16 May 2025
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation
Berkcan Kapusuzoglu
Supriyo Chakraborty
Chia-Hsuan Lee
Sambit Sahu
139
0
0
16 May 2025
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Yaorui Shi
Shihan Li
Chang Wu
Zhiyuan Liu
Sihang Li
Hengxing Cai
An Zhang
Xiang Wang
ReLM
LRM
179
0
0
16 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRL
LRM
99
11
0
13 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Zheng Zhang
Yan Li
Liang Wang
OffRL
LRM
139
6
0
05 May 2025
TWIST: Teleoperated Whole-Body Imitation System
Yanjie Ze
Zixuan Chen
Joao Pedro Araujo
Zi-ang Cao
Xue Bin Peng
Jiajun Wu
Chao Liu
103
5
0
05 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
398
21
0
05 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
216
8
0
30 Apr 2025
Reason Like a Radiologist: Chain-of-Thought and Reinforcement Learning for Verifiable Report Generation
Peiyuan Jing
Kinhei Lee
Zhenxuan Zhang
Huichi Zhou
Zhengqing Yuan
Zhifan Gao
Lei Zhu
G. Papanastasiou
Yingying Fang
Guang Yang
MedIm
OffRL
LRM
125
0
0
25 Apr 2025
1
2
3
Next