ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.01441
  4. Cited By
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

28 April 2025
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
    LLMAGKELMOffRLLRM
ArXiv (abs)PDFHTML

Papers citing "Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning"

19 / 19 papers shown
Title
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Minki Kang
Jongwon Jeong
Seanie Lee
Jaewoong Cho
Sung Ju Hwang
LRM
240
2
0
23 May 2025
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
Xiaoqing Zhang
Huabin Zheng
Ang Lv
Yuhan Liu
Zirui Song
Flood Sung
Xiuying Chen
Rui Yan
OffRLReLMLRMAI4CE
94
0
0
22 May 2025
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Yaorui Shi
Shihan Li
Chang Wu
Zhiyuan Liu
Sihang Li
Hengxing Cai
An Zhang
Xiang Wang
ReLMLRM
143
0
0
16 May 2025
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Jiazhan Feng
Shijue Huang
Xingwei Qu
Ge Zhang
Yujia Qin
Baoquan Zhong
Chengquan Jiang
Jinxin Chi
Wanjun Zhong
OffRLReLMSyDaKELMLRM
152
34
0
15 Apr 2025
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions
Xinyi Hou
Yanjie Zhao
Shenao Wang
Haoyu Wang
89
34
0
30 Mar 2025
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
M. Ben-Chen
Tianpeng Li
Haoze Sun
Yijie Zhou
Chenzheng Zhu
...
Xin Wu
Haofen Wang
Jeff Z. Pan
Wen Zhang
Ningyu Zhang
ReLMOffRLAI4TSLRM
152
23
0
25 Mar 2025
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Huatong Song
Jinhao Jiang
Yingqian Min
Jie Chen
Zhongfu Chen
Wayne Xin Zhao
Lei Fang
Ji-Rong Wen
AI4TSLRMKELM
175
43
0
07 Mar 2025
Self-Training Large Language Models for Tool-Use Without Demonstrations
Self-Training Large Language Models for Tool-Use Without Demonstrations
Ne Luo
Aryo Pradipta Gema
Xuanli He
Emile van Krieken
Pietro Lesci
Pasquale Minervini
LLMAG
147
2
0
09 Feb 2025
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Junde Wu
Jiayuan Zhu
Yuyuan Liu
LRM
99
25
0
07 Feb 2025
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
122
279
0
21 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
152
1,287
0
05 Feb 2024
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,163
0
29 May 2023
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of
  Thought Prompting
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
Tatsuro Inaba
Hirokazu Kiyomaru
Fei Cheng
Sadao Kurohashi
KELMLRM
92
23
0
26 May 2023
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning
  by Large Language Models
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Lei Wang
Wanyu Xu
Yihuai Lan
Zhiqiang Hu
Yunshi Lan
Roy Ka-wei Lee
Ee-Peng Lim
ReLMLRM
114
356
0
06 May 2023
ART: Automatic multi-step reasoning and tool-use for large language
  models
ART: Automatic multi-step reasoning and tool-use for large language models
Bhargavi Paranjape
Scott M. Lundberg
Sameer Singh
Hannaneh Hajishirzi
Luke Zettlemoyer
Marco Tulio Ribeiro
KELMReLMLRM
91
152
0
16 Mar 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
886
13,207
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
845
9,683
0
28 Jan 2022
Array Programming with NumPy
Array Programming with NumPy
Charles R. Harris
K. Millman
S. Walt
R. Gommers
Pauli Virtanen
...
Tyler Reddy
Warren Weckesser
Hameer Abbasi
C. Gohlke
T. Oliphant
156
15,026
0
18 Jun 2020
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
541
19,296
0
20 Jul 2017
1