Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.19256
Cited By
HybridFlow: A Flexible and Efficient RLHF Framework
28 September 2024
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HybridFlow: A Flexible and Efficient RLHF Framework"
29 / 29 papers shown
Title
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu
Jiahao Lin
Qichao Zhang
Xiangyu Tian
Linjing Li
Xiangyuan Lan
Dongbin Zhao
OffRL
ReLM
LRM
17
0
0
16 May 2025
Disentangling Reasoning and Knowledge in Medical Large Language Models
Rahul Thapa
Qingyang Wu
Kevin Wu
Harrison Zhang
Angela Zhang
...
Joseph Boen
Shriya Reddy
Ben Athiwaratkun
Shuaiwen Leon Song
James Zou
ELM
AI4MH
LM&MA
LRM
24
0
0
16 May 2025
Beyond Áha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Zhiyuan Hu
Yixuan Wang
Hanze Dong
Yuhui Xu
Amrita Saha
Caiming Xiong
Bryan Hooi
Junnan Li
LRM
21
0
0
15 May 2025
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
Ziyang Huang
Xiaowei Yuan
Yiming Ju
Jun Zhao
Kang Liu
RALM
KELM
26
0
0
12 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
62
0
0
05 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
165
1
0
05 May 2025
Bielik v3 Small: Technical Report
Krzysztof Ociepa
Łukasz Flis
Remigiusz Kinas
Krzysztof Wróbel
Adrian Gwoździej
27
0
0
05 May 2025
Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents
Minzheng Wang
Y. Li
Haozhao Wang
Xinghua Zhang
Nan Xu
Bingli Wu
Fei Huang
Haiyang Yu
Wenji Mao
LLMAG
LRM
43
1
0
04 May 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLM
OODD
LRM
144
1
0
30 Apr 2025
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
87
0
0
30 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
118
4
0
29 Apr 2025
Fast-Slow Thinking for Large Vision-Language Model Reasoning
W. L. Xiao
Leilei Gan
Weilong Dai
Wanggui He
Ziwei Huang
...
Fangxun Shu
Zhelun Yu
Peng Zhang
Hao Jiang
Fei Wu
ReLM
LRM
AI4CE
164
1
0
25 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
55
12
0
18 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
M. Shoeybi
Bryan Catanzaro
ReLM
OffRL
LRM
158
0
1
15 Apr 2025
Better Estimation of the KL Divergence Between Language Models
Afra Amini
Tim Vieira
Ryan Cotterell
46
0
0
14 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
Jian Wu
Hao Yang
Xinhua Zeng
Guibing He
Zhengzhang Chen
Z. Li
Xinming Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
133
0
0
12 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Dinesh Manocha
Jieyu Zhao
LRM
78
1
0
07 Apr 2025
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?
Yi-Long Lu
Chunhui Zhang
Jiajun Song
Lifeng Fan
Wei Wang
OffRL
48
0
0
02 Apr 2025
ToRL: Scaling Tool-Integrated RL
Xuefeng Li
Haoyang Zou
Pengfei Liu
OffRL
LRM
44
3
0
30 Mar 2025
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
M. Ben-Chen
Tianpeng Li
Haoze Sun
Yijie Zhou
Chenzheng Zhu
...
Xin Wu
Haofen Wang
Jeff Z. Pan
Wen Zhang
H. Chen
ReLM
OffRL
AI4TS
LRM
69
6
0
25 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
91
38
0
24 Mar 2025
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Nvidia
A. Azzolini
Junjie Bai
Prithvijit Chattopadhyay
Huayu Chen
...
Xiaodong Yang
Zhuolin Yang
Jingyang Zhang
Xiaohui Zeng
Zhe Zhang
AI4CE
LM&Ro
LRM
54
5
0
18 Mar 2025
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan Ö. Arik
Dong Wang
Hamed Zamani
J. Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
84
24
0
12 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
59
41
0
09 Mar 2025
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Z. Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
...
Xu Miao
Yaojie Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
81
16
0
06 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
93
30
0
03 Mar 2025
Autellix: An Efficient Serving Engine for LLM Agents as General Programs
Michael Luo
Xiaoxiang Shi
Colin Cai
Tianjun Zhang
Justin Wong
...
Chi Wang
Yanping Huang
Zhifeng Chen
Joseph E. Gonzalez
Ion Stoica
55
3
0
20 Feb 2025
Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
Weizhe Chen
Zhicheng Zhang
Guanlin Liu
Renjie Zheng
Wenlei Shi
Chen Dun
Zheng Wu
Xing Jin
Lin Yan
ALM
LRM
51
1
0
17 Feb 2025
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
Xupeng Miao
Gabriele Oliaro
Xinhao Cheng
Vineeth Kada
Ruohan Gao
...
April Yang
Yingcheng Wang
Mengdi Wu
Colin Unger
Zhihao Jia
MoE
94
9
0
29 Feb 2024
1