Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.15107
Cited By
v1
v2 (latest)
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
21 May 2025
Ziliang Wang
Xuhui Zheng
Kang An
Cijun Ouyang
Jialu Cai
Yuhang Wang
Yichao Wu
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization"
6 / 6 papers shown
Title
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
142
73
0
17 Mar 2025
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Huatong Song
Jinhao Jiang
Yingqian Min
Jie Chen
Zhongfu Chen
Wayne Xin Zhao
Lei Fang
Ji-Rong Wen
AI4TS
LRM
KELM
192
43
0
07 Mar 2025
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
Dawei Zhu
Xiyu Wei
Guangxiang Zhao
Wenhao Wu
Haosheng Zou
Junfeng Ran
Xun Wang
Lin Sun
Xiangzheng Zhang
Sujian Li
LRM
132
3
0
28 Feb 2025
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Zihuiwen Ye
Luckeciano C. Melo
Younesse Kaddar
Phil Blunsom
Shivalika Singh
Yarin Gal
LRM
144
5
0
16 Feb 2025
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALM
LRM
ReLM
3DV
144
7
0
24 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
343
338
0
22 Jan 2025
1