Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.19655
Cited By
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
27 February 2025
Sheng Zhang
Qianchu Liu
Guanghui Qin
Tristan Naumann
Hoifung Poon
ReLM
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning"
8 / 8 papers shown
Title
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
Sicheng Feng
Song Wang
Shuyi Ouyang
Lingdong Kong
Zikai Song
Jianke Zhu
Huan Wang
Xinchao Wang
LRM
14
0
0
24 May 2025
Reasoning BO: Enhancing Bayesian Optimization with Long-Context Reasoning Power of LLMs
Zhuo Yang
Lingli Ge
Dong Han
Tianfan Fu
Yuqiang Li
42
0
0
19 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
70
0
0
16 May 2025
Open-Medical-R1: How to Choose Data for RLVR Training at Medicine Domain
Zhongxi Qiu
Zhang Zhang
Yan Hu
Heng Li
Jiang-Dong Liu
OffRL
321
0
0
16 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
129
10
0
09 Apr 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
92
85
0
28 Jan 2025
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
Xibin Wu
Weixun Wang
OpenLLMAI Team
Dehao Zhang
Yu Cao
AI4CE
VLM
46
108
0
20 May 2024
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
196
18,685
0
20 Jul 2017
1