Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.11080
Cited By
v1
v2 (latest)
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
16 May 2025
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
10 / 10 papers shown
Title
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Haoran Xu
Baolin Peng
Hany Awadalla
DongDong Chen
Yen-Chun Chen
...
Yelong Shen
Shuaiqiang Wang
Weijian Xu
Jianfeng Gao
Weizhu Chen
ReLM
LRM
185
5
0
30 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
364
47
0
29 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Pengfei Yu
295
30
0
24 Apr 2025
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Hardy Chen
Haoqin Tu
Fali Wang
Hui Liu
Xianfeng Tang
Xinya Du
Yuyin Zhou
Cihang Xie
ReLM
VLM
OffRL
LRM
186
36
0
10 Apr 2025
LExT: Towards Evaluating Trustworthiness of Natural Language Explanations
Krithi Shailya
Shreya Rajpal
Gokul S Krishnan
Balaraman Ravindran
ELM
132
1
0
08 Apr 2025
Learning to Reason for Long-Form Story Generation
Alexander Gurung
Mirella Lapata
ReLM
OffRL
LRM
134
3
0
28 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Jialin Li
OffRL
LRM
248
172
0
26 Mar 2025
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Sheng Zhang
Qianchu Liu
Guanghui Qin
Tristan Naumann
Hoifung Poon
ReLM
OffRL
LRM
141
9
0
27 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
481
2,033
0
22 Jan 2025
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLM
LRM
257
132
0
18 Sep 2024
1