Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.07961
Cited By
v1
v2
v3 (latest)
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
12 May 2025
Xuechen Zhang
Zijian Huang
Chenshun Ni
Ziyang Xiong
Jiasi Chen
Samet Oymak
ReLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement"
23 / 23 papers shown
Title
Continuous Chain of Thought Enables Parallel Exploration and Reasoning
Halil Alperen Gozeten
M. E. Ildiz
Xuechen Zhang
Hrayr Harutyunyan
A. S. Rawat
Samet Oymak
LRM
46
0
0
29 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRL
LRM
102
8
0
08 May 2025
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Bairu Hou
Yang Zhang
Jiabao Ji
Yujian Liu
Kaizhi Qian
Jacob Andreas
Shiyu Chang
OffRL
LRM
103
35
0
02 Apr 2025
Test-Time Training Provably Improves Transformers as In-context Learners
Halil Alperen Gozeten
M. E. Ildiz
Xuechen Zhang
Mahdi Soltanolkotabi
Marco Mondelli
Samet Oymak
130
3
0
14 Mar 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei
LRM
97
50
0
25 Feb 2025
Small Models Struggle to Learn from Strong Reasoners
Yuetai Li
Xiang Yue
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Bill Yuchen Lin
Bhaskar Ramasubramanian
Radha Poovendran
LRM
108
30
0
17 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
373
1,967
0
22 Jan 2025
Precise Length Control in Large Language Models
Bradley Butcher
Michael O'Keefe
James Titchener
KELM
105
6
0
16 Dec 2024
Selective Attention: Enhancing Transformer through Principled Context Control
Xuechen Zhang
Xiangyu Chang
Mingchen Li
Amit K. Roy-Chowdhury
Jiasi Chen
Samet Oymak
119
3
0
19 Nov 2024
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
An Yang
Beichen Zhang
Binyuan Hui
Bofei Gao
Bowen Yu
...
Mingfeng Xue
Runji Lin
Tianyu Liu
Xingzhang Ren
Zhenru Zhang
OSLM
LRM
103
310
0
18 Sep 2024
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Charlie Snell
Jaehoon Lee
Kelvin Xu
Aviral Kumar
LRM
185
681
0
06 Aug 2024
Following Length Constraints in Instructions
Weizhe Yuan
Ilia Kulikov
Ping Yu
Kyunghyun Cho
Sainbayar Sukhbaatar
Jason Weston
Jing Xu
FaML
ALM
74
25
0
25 Jun 2024
Language Model Cascades: Token-level uncertainty and beyond
Neha Gupta
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
A. Menon
Sanjiv Kumar
UQLM
126
51
0
15 Apr 2024
Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
Xuechen Zhang
Mingchen Li
Jiasi Chen
Christos Thrampoulidis
Samet Oymak
86
3
0
25 Jan 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
107
682
0
20 Nov 2023
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Ji Lin
Jiaming Tang
Haotian Tang
Shang Yang
Wei-Ming Chen
Wei-Chen Wang
Guangxuan Xiao
Xingyu Dang
Chuang Gan
Song Han
EDL
MQ
95
574
0
01 Jun 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
193
1,228
0
31 May 2023
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance
Lingjiao Chen
Matei A. Zaharia
James Zou
LLMAG
165
244
0
09 May 2023
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
147
724
0
30 Nov 2022
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar
Saleh Ashkboos
Torsten Hoefler
Dan Alistarh
MQ
129
989
0
31 Oct 2022
AutoBalance: Optimized Loss Functions for Imbalanced Data
Mingchen Li
Xuechen Zhang
Christos Thrampoulidis
Jiasi Chen
Samet Oymak
54
68
0
04 Jan 2022
Long-tail learning via logit adjustment
A. Menon
Sadeep Jayasumana
A. S. Rawat
Himanshu Jain
Andreas Veit
Sanjiv Kumar
123
710
0
14 Jul 2020
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
251
2,683
0
23 Jan 2017
1