Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.17161
Cited By
v1
v2 (latest)
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
28 January 2025
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training"
50 / 133 papers shown
Title
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
444
31
0
22 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
105
2
0
22 Apr 2025
SWE-Synth: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs
Minh V.T. Pham
Huy N. Phan
Hoang N. Phan
Cuong Le Chi
Thien Hang Nguyen
Nghi D. Q. Bui
SyDa
104
0
0
20 Apr 2025
Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension
Lin Li
Wei Chen
Jiahui Li
Lu Chen
Long Chen
LRM
147
2
0
20 Apr 2025
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning
Siyan Zhao
Devaansh Gupta
Qinqing Zheng
Aditya Grover
DiffM
LRM
AI4CE
172
9
0
16 Apr 2025
Slow Thinking for Sequential Recommendation
Junjie Zhang
Beichen Zhang
Wenqi Sun
Hongyu Lu
Wayne Xin Zhao
Yu Chen
Ji-Rong Wen
OffRL
LRM
111
1
0
13 Apr 2025
Playpen: An Environment for Exploring Learning Through Conversational Interaction
Nicola Horst
Davide Mazzaccara
Antonia Schmidt
Michael Sullivan
Filippo Momentè
...
Alexander Koller
Oliver Lemon
David Schlangen
Mario Giulianelli
Alessandro Suglia
OffRL
122
0
0
11 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
Xianfeng Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLM
VLM
OffRL
LRM
173
36
0
10 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
384
20
0
10 Apr 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning
En Yu
Kangheng Lin
Liang Zhao
Jisheng Yin
Yana Wei
...
Zheng Ge
Xiangyu Zhang
Daxin Jiang
Jingyu Wang
Wenbing Tao
VLM
OffRL
LRM
113
18
0
10 Apr 2025
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
Yan Ma
Steffi Chern
Xuyang Shen
Yiran Zhong
Pengfei Liu
OffRL
LRM
141
9
0
03 Apr 2025
A Survey of Scaling in Large Language Model Reasoning
Zihan Chen
Song Wang
Zhen Tan
Xingbo Fu
Zhenyu Lei
Peng Wang
Huan Liu
Cong Shen
Jundong Li
LRM
248
2
0
02 Apr 2025
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Ruoxi Sun
Xingchen Wan
Hailong Li
Azalia Mirhoseini
Amin Saberi
Sercan O. Arik
ReLM
AI4TS
LRM
152
12
0
29 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
Yangqiu Song
Zonghao Guo
Yibing Wang
Tianshuo Peng
Jian Wu
Xiaoying Zhang
Benyou Wang
Xiangyu Yue
AI4TS
SyDa
LRM
195
62
0
27 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
442
4
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
212
0
0
26 Mar 2025
Test-Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards
Alexander Gambashidze
Konstantin Sobolev
Andrey Kuznetsov
Ivan Oseledets
VLM
LRM
96
0
0
25 Mar 2025
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
172
12
0
20 Mar 2025
Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
Yuxiang Lai
Shitian Zhao
Ming Li
Jike Zhong
Xiaofeng Yang
OffRL
LRM
LM&MA
VLM
197
31
0
18 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yize Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
443
4
0
18 Mar 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
436
0
0
15 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
102
10
0
13 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
...
Jun Wang
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
139
11
0
12 Mar 2025
Efficient Algorithms for Verifying Kruskal Rank in Sparse Linear Regression and Related Applications
Fengqin Zhou
123
6
0
06 Mar 2025
High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects
Jialong Xue
Wei Gao
Yu Wang
Chao Ji
Dongdong Zhao
Shi Yan
Shiwu Zhang
94
1
0
06 Mar 2025
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning
Gokul Swamy
Sanjiban Choudhury
Wen Sun
Zhiwei Steven Wu
J. Andrew Bagnell
OffRL
142
20
0
03 Mar 2025
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Sheng Zhang
Qianchu Liu
Guanghui Qin
Tristan Naumann
Hoifung Poon
ReLM
OffRL
LRM
141
9
0
27 Feb 2025
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Minggui He
Yilun Liu
Shimin Tao
Yuanchang Luo
Hongyong Zeng
...
Daimeng Wei
Weibin Meng
Hao Yang
Boxing Chen
Osamu Yoshie
LRM
171
8
0
27 Feb 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
158
42
0
26 Feb 2025
DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
Tianyi Zhuang
Chuqiao Kuang
Xiaoguang Li
Yihua Teng
Jihao Wu
Yijiao Wang
Lifeng Shang
RALM
ELM
LRM
89
1
0
25 Feb 2025
IPO: Your Language Model is Secretly a Preference Classifier
Shivank Garg
Ayush Singh
Shweta Singh
Paras Chopra
479
1
0
22 Feb 2025
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind
William Rudman
Michal Golovanesky
Amir Bar
Vedant Palit
Yann LeCun
Carsten Eickhoff
Ritambhara Singh
LRM
193
4
0
21 Feb 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Yansen Wang
Yichun Yin
Yijiao Wang
Lifeng Shang
Qiang Liu
LRM
190
3
0
17 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
398
2,033
0
22 Jan 2025
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong
David Fan
Jiachen Zhu
Yunyang Xiong
Xinlei Chen
Koustuv Sinha
Michael G. Rabbat
Yann LeCun
Saining Xie
Zhuang Liu
VLM
137
54
0
18 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
215
107
0
18 Dec 2024
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
143
77
0
10 Oct 2024
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi
Hongyin Luo
Xuliang Huang
Zhuokai Zhao
Yibo Jiang
Xiangjun Fan
Himabindu Lakkaraju
James Glass
LRM
ELM
89
7
0
02 Oct 2024
Law of the Weakest Link: Cross Capabilities of Large Language Models
Ming Zhong
Aston Zhang
Xuewei Wang
Rui Hou
Wenhan Xiong
...
Melanie Kambadur
Dhruv Mahajan
Sergey Edunov
Jiawei Han
Laurens van der Maaten
ELM
76
8
0
30 Sep 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
274
702
0
06 Aug 2024
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Tian Ye
Zicheng Xu
Yuanzhi Li
Zeyuan Allen-Zhu
ReLM
LRM
65
59
0
29 Jul 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
166
377
0
24 Jun 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi-An Ma
Sergey Levine
LLMAG
LRM
143
81
0
16 May 2024
AlphaMath Almost Zero: process Supervision without process
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
AIMat
LRM
100
113
0
06 May 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Ye Tian
Baolin Peng
Linfeng Song
Lifeng Jin
Dian Yu
Haitao Mi
Dong Yu
LRM
ReLM
112
85
0
18 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLM
VLM
80
32
0
10 Apr 2024
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
76
70
0
08 Apr 2024
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou
Andrea Zanette
Jiayi Pan
Sergey Levine
Aviral Kumar
160
79
0
29 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLM
LRM
124
137
0
09 Feb 2024
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang
Runyu Ding
Ellis L Brown
Xiaojuan Qi
Saining Xie
LM&Ro
123
22
0
05 Feb 2024
Previous
1
2
3
Next