Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.18454
Cited By
Hybrid Latent Reasoning via Reinforcement Learning
24 May 2025
Zhenrui Yue
Bowen Jin
Huimin Zeng
Honglei Zhuang
Zhen Qin
Jinsung Yoon
Lanyu Shang
Jiawei Han
Dong Wang
OffRL
BDL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hybrid Latent Reasoning via Reinforcement Learning"
39 / 39 papers shown
Title
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Bowen Jin
Hansi Zeng
Zhenrui Yue
Dong Wang
Sercan O. Arik
Dong Wang
Hamed Zamani
Jiawei Han
RALM
ReLM
KELM
OffRL
AI4TS
LRM
139
77
0
12 Mar 2025
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
Zhenyi Shen
Hanqi Yan
Linhai Zhang
Zhanghao Hu
Yali Du
Yulan He
LRM
118
19
0
28 Feb 2025
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang
Chandan Singh
Liyuan Liu
Jingbo Shang
Jianfeng Gao
114
6
0
21 Feb 2025
LLM Pretraining with Continuous Concepts
Jihoon Tack
Jack Lanchantin
Jane Dwivedi-Yu
Andrew Cohen
Ilia Kulikov
Janice Lan
Shibo Hao
Yuandong Tian
Jason Weston
Xian Li
CLL
101
3
0
12 Feb 2025
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
DiJia Su
Hanlin Zhu
Yingchen Xu
Jiantao Jiao
Yuandong Tian
Qinqing Zheng
LRM
85
21
0
05 Feb 2025
Scalable Language Models with Posterior Inference of Latent Thought Vectors
Deqian Kong
Minglu Zhao
Dehong Xu
Bo Pang
Shu Wang
...
Zhangzhang Si
Chuan Li
Jianwen Xie
Sirui Xie
Ying Nian Wu
VLM
LRM
BDL
104
9
0
03 Feb 2025
Efficient Reasoning with Hidden Thinking
Xuan Shen
Yizhou Wang
Xiangxi Shi
Yanzhi Wang
Pu Zhao
Jiuxiang Gu
LRM
80
15
0
31 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
303
1,503
0
22 Jan 2025
Accelerating Inference of Networks in the Frequency Domain
Chenqiu Zhao
Guanfang Dong
Anup Basu
82
13
0
06 Oct 2024
Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries
Eden Biran
Daniela Gottesman
Sohee Yang
Mor Geva
Amir Globerson
LRM
67
32
0
18 Jun 2024
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step
Yuntian Deng
Yejin Choi
Stuart M. Shieber
ReLM
LRM
57
69
0
23 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
90
425
0
23 May 2024
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
62
97
0
06 May 2024
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
Jacob Pfau
William Merrill
Samuel R. Bowman
LRM
58
76
0
24 Apr 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu
Wei Fu
Jiaxuan Gao
Wenjie Ye
Weiling Liu
Zhiyu Mei
Guangju Wang
Chao Yu
Yi Wu
95
149
0
16 Apr 2024
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation
Zhenrui Yue
Huimin Zeng
Yimeng Lu
Lanyu Shang
Yang Zhang
Dong Wang
RALM
OffRL
62
21
0
22 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
89
127
0
29 Feb 2024
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Sohee Yang
E. Gribovskaya
Nora Kassner
Mor Geva
Sebastian Riedel
ReLM
LRM
91
98
0
26 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
96
953
0
05 Feb 2024
Implicit Chain of Thought Reasoning via Knowledge Distillation
Yuntian Deng
Kiran Prasad
Roland Fernandez
P. Smolensky
Vishrav Chaudhary
Stuart M. Shieber
ReLM
LRM
42
51
0
02 Nov 2023
Think before you speak: Training Language Models With Pause Tokens
Sachin Goyal
Ziwei Ji
A. S. Rawat
A. Menon
Sanjiv Kumar
Vaishnavh Nagarajan
LRM
82
114
0
03 Oct 2023
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
126
1,044
0
31 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
300
3,712
0
29 May 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
240
282
0
11 Mar 2023
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
H. Trivedi
Niranjan Balasubramanian
Tushar Khot
Ashish Sabharwal
KELM
RALM
LRM
78
441
0
20 Dec 2022
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang
Nan Yang
Xiaolong Huang
Binxing Jiao
Linjun Yang
Daxin Jiang
Rangan Majumder
Furu Wei
VLM
191
576
0
07 Dec 2022
Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press
Muru Zhang
Sewon Min
Ludwig Schmidt
Noah A. Smith
M. Lewis
ReLM
KELM
LRM
127
595
0
07 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
730
12,525
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
616
9,009
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
214
4,175
0
27 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
312
10,099
0
17 Jun 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
134
2,109
0
05 Mar 2021
Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps
Xanh Ho
A. Nguyen
Saku Sugawara
Akiko Aizawa
RALM
LRM
54
425
0
02 Nov 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
155
4,222
0
07 Sep 2020
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
128
2,577
0
25 Sep 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
113
2,474
0
14 Mar 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
288
18,685
0
20 Jul 2017
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
187
2,610
0
09 May 2017
Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih
Adria Puigdomenech Badia
M. Berk Mirza
Alex Graves
Timothy Lillicrap
Tim Harley
David Silver
Koray Kavukcuoglu
170
8,805
0
04 Feb 2016
1