ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.22732
  4. Cited By
Reasoning Beyond Limits: Advances and Open Problems for LLMs

Reasoning Beyond Limits: Advances and Open Problems for LLMs

26 March 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
    ELMOffRLLRMAI4CE
ArXiv (abs)PDFHTML

Papers citing "Reasoning Beyond Limits: Advances and Open Problems for LLMs"

50 / 96 papers shown
Title
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
Xudong Zhu
Jiachen Jiang
Mohammad Mahdi Khalili
Zhihui Zhu
ReLMLM&RoLRM
60
0
0
13 Jun 2025
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Song Dai
Yibo Yan
Jiamin Su
Dongfang Zihao
Yubo Gao
...
Jungang Li
Junyan Zhang
Sicheng Tao
Zhuoran Gao
Xuming Hu
LRMAI4CE
68
0
0
21 May 2025
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Xiaochen Li
Jiajie Jin
Guanting Dong
Hongjin Qian
Yutao Zhu
Yongkang Wu
Ji-Rong Wen
Zhicheng Dou
LLMAGLRM
199
19
0
30 Apr 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAGELM
Presented at ResearchTrend Connect | LLMAG on 07 May 2025
204
14
0
20 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
251
217
0
18 Mar 2025
DIMSUM: Discourse in Mathematical Reasoning as a Supervision Module
Krish Sharma
Niyar R. Barman
Nicholas M. Asher
Akshay Chaturvedi
LRMAIMat
130
15
0
06 Mar 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
449
699
0
20 Feb 2025
Process Reinforcement through Implicit Rewards
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Ziyi Wang
Hanbin Wang
Wendi Li
...
Yu Cheng
Zhiyuan Liu
Maosong Sun
Bowen Zhou
Ning Ding
OffRLLRM
197
103
0
03 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
253
128
0
28 Jan 2025
Chain-of-Retrieval Augmented Generation
Chain-of-Retrieval Augmented Generation
Liang Wang
Haonan Chen
Nan Yang
Xiaolong Huang
Zhicheng Dou
Furu Wei
RALMLRMReLM3DV
157
7
0
24 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
398
2,034
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRLALMAI4TSVLMLRM
355
338
0
22 Jan 2025
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Yujia Qin
Yining Ye
Junjie Fang
Han Wang
Shihao Liang
...
Haifeng Liu
F. Lin
Tao Peng
Xin Liu
Guang Shi
LLMAGLM&Ro
118
69
0
21 Jan 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
Lefei Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRMSyDaReLM
154
133
0
08 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALMLRM
317
331
0
03 Jan 2025
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
Junqiao Wang
Zeng Zhang
Yangfan He
Yuyang Song
Tianyu Shi
...
Tang Jingqun
Guangwu Qian
Keqin Li
Qiuwu Chen
Lewei He
149
22
0
29 Dec 2024
Malware Classification using a Hybrid Hidden Markov Model-Convolutional
  Neural Network
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
137
61
0
25 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
164
130
0
18 Dec 2024
Phi-4 Technical Report
Phi-4 Technical Report
Marah Abdin
J. Aneja
Harkirat Singh Behl
Sébastien Bubeck
Ronen Eldan
...
Rachel A. Ward
Yue Wu
Dingli Yu
Cyril Zhang
Yi Zhang
ALMSyDa
195
154
0
12 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
260
16
0
05 Dec 2024
Free Process Rewards without Process Labels
Free Process Rewards without Process Labels
Lifan Yuan
Wendi Li
Huayu Chen
Ganqu Cui
Ning Ding
Kaiyan Zhang
Bowen Zhou
Ziqiang Liu
Hao Peng
OffRL
129
65
0
02 Dec 2024
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated
  Parameters by Tencent
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
Xingwu Sun
Yanfeng Chen
Yanwen Huang
Ruobing Xie
Jiaqi Zhu
...
Zhanhui Kang
Yong Yang
Yuhong Liu
Di Wang
Jie Jiang
MoEALMELM
167
34
0
04 Nov 2024
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and
  Error-Aware Demonstration
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Yingqian Cui
Pengfei He
Xianfeng Tang
Qi He
Chen Luo
Jiliang Tang
Yue Xing
LRM
83
9
0
21 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Wenjie Wang
Rishabh Agarwal
Chen-Yu Lee
Tomas Pfister
205
11
0
15 Oct 2024
Thinking LLMs: General Instruction Following with Thought Generation
Thinking LLMs: General Instruction Following with Thought Generation
Tianhao Wu
Janice Lan
Weizhe Yuan
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
LRM
81
22
0
14 Oct 2024
Rewarding Progress: Scaling Automated Process Verifiers for LLM
  Reasoning
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRLLRM
132
77
0
10 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
173
87
0
08 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
106
61
0
03 Oct 2024
Contextual Document Embeddings
Contextual Document Embeddings
John X. Morris
Alexander M. Rush
99
9
0
03 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLMMQ
186
39
0
03 Oct 2024
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement
  Learning
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Jonas Gehring
Kunhao Zheng
Jade Copet
Vegard Mella
Taco Cohen
Gabriel Synnaeve
LLMAG
58
36
0
02 Oct 2024
Not All LLM Reasoners Are Created Equal
Not All LLM Reasoners Are Created Equal
Arian Hosseini
Alessandro Sordoni
Daniel Toyama
Rameswar Panda
Rishabh Agarwal
LRM
111
15
0
02 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
142
14
0
30 Sep 2024
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
  Multimodal Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke
Christopher Clark
Sangho Lee
Rohun Tripathi
Yue Yang
...
Noah A. Smith
Hannaneh Hajishirzi
Ross Girshick
Ali Farhadi
Aniruddha Kembhavi
OSLMVLM
130
13
0
25 Sep 2024
Direct Judgement Preference Optimization
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
114
13
0
23 Sep 2024
Moshi: a speech-text foundation model for real-time dialogue
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Hervé Jégou
Edouard Grave
Neil Zeghidour
AuLLM
165
150
0
17 Sep 2024
Critique-out-Loud Reward Models
Critique-out-Loud Reward Models
Zachary Ankner
Mansheej Paul
Brandon Cui
Jonathan D. Chang
Prithviraj Ammanabrolu
ALMLRM
110
38
0
21 Aug 2024
HMoE: Heterogeneous Mixture of Experts for Language Modeling
HMoE: Heterogeneous Mixture of Experts for Language Modeling
An Wang
Xingwu Sun
Ruobing Xie
Shuaipeng Li
Jiaqi Zhu
...
J. N. Han
Zhanhui Kang
Di Wang
Naoaki Okazaki
Cheng-zhong Xu
MoE
127
18
0
20 Aug 2024
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Pranav Putta
Edmund Mills
Naman Garg
S. Motwani
Chelsea Finn
Divyansh Garg
Rafael Rafailov
LLMAGLRM
103
88
0
13 Aug 2024
Anchored Preference Optimization and Contrastive Revisions: Addressing
  Underspecification in Alignment
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Karel DÓosterlinck
Winnie Xu
Chris Develder
Thomas Demeester
A. Singh
Christopher Potts
Douwe Kiela
Shikib Mehri
80
17
0
12 Aug 2024
Self-Taught Evaluators
Self-Taught Evaluators
Tianlu Wang
Ilia Kulikov
O. Yu. Golovneva
Ping Yu
Weizhe Yuan
Jane Dwivedi-Yu
Richard Yuanzhe Pang
Maryam Fazel-Zarandi
Jason Weston
Xian Li
ALMLRM
84
27
0
05 Aug 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLMMLLM
149
481
0
03 Aug 2024
Meta-Rewarding Language Models: Self-Improving Alignment with
  LLM-as-a-Meta-Judge
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALMKELMLRM
142
96
0
28 Jul 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO,
  DPO and More
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
97
58
0
23 Jul 2024
Understanding Reference Policies in Direct Preference Optimization
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu
Pengfei Liu
Arman Cohan
73
11
0
18 Jul 2024
Reasoning with Large Language Models, a Survey
Reasoning with Large Language Models, a Survey
Aske Plaat
Annie Wong
Suzan Verberne
Joost Broekens
Niki van Stein
Thomas Back
OffRLLRMAI4CEReLM
61
73
0
16 Jul 2024
RLHF Can Speak Many Languages: Unlocking Multilingual Preference
  Optimization for LLMs
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
Ahmet Üstün
Sara Hooker
103
28
0
02 Jul 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Agentless: Demystifying LLM-based Software Engineering Agents
Chunqiu Steven Xia
Yinlin Deng
Soren Dunn
Lingming Zhang
LLMAG
118
121
0
01 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
124
51
0
01 Jul 2024
Searching for Best Practices in Retrieval-Augmented Generation
Searching for Best Practices in Retrieval-Augmented Generation
Xiaohua Wang
Zhenghua Wang
Xuan Gao
Feiran Zhang
Yixin Wu
...
Qi Qian
Ruicheng Yin
Changze Lv
Xiaoqing Zheng
Xuanjing Huang
113
62
0
01 Jul 2024
12
Next