v1v2 (latest)

Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

10 June 2025

Abstract

Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.

View on arXiv

@article{wang2025_2506.08343,
  title={ Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency },
  author={ Chenlong Wang and Yuanning Feng and Dongping Chen and Zhaoyang Chu and Ranjay Krishna and Tianyi Zhou },
  journal={arXiv preprint arXiv:2506.08343},
  year={ 2025 }
}

Main:7 Pages

12 Figures

Bibliography:3 Pages

6 Tables

Appendix:14 Pages

Comments on this paper