ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.22732
  4. Cited By
Reasoning Beyond Limits: Advances and Open Problems for LLMs

Reasoning Beyond Limits: Advances and Open Problems for LLMs

26 March 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
    ELMOffRLLRMAI4CE
ArXiv (abs)PDFHTML

Papers citing "Reasoning Beyond Limits: Advances and Open Problems for LLMs"

46 / 96 papers shown
Title
Meta Large Language Model Compiler: Foundation Models of Compiler
  Optimization
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
Chris Cummins
Volker Seeker
Dejan Grubisic
Baptiste Roziere
Jonas Gehring
Gabriel Synnaeve
Hugh Leather
100
29
0
27 Jun 2024
Following Length Constraints in Instructions
Following Length Constraints in Instructions
Weizhe Yuan
Ilia Kulikov
Ping Yu
Kyunghyun Cho
Sainbayar Sukhbaatar
Jason Weston
Jing Xu
FaMLALM
105
26
0
25 Jun 2024
Learn Beyond The Answer: Training Language Models with Reflection for
  Mathematical Reasoning
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Zhihan Zhang
Zhenwen Liang
Wenhao Yu
Dian Yu
Mengzhao Jia
Dong Yu
Meng Jiang
AIMatRALMLRMReLM
93
16
0
17 Jun 2024
Iterative Length-Regularized Direct Preference Optimization: A Case
  Study on Improving 7B Language Models to GPT-4 Level
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level
Jie Liu
Zhanhui Zhou
Jiaheng Liu
Xingyuan Bu
Chao Yang
Han-Sen Zhong
Wanli Ouyang
74
21
0
17 Jun 2024
Mixture-of-Agents Enhances Large Language Model Capabilities
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang
Jue Wang
Ben Athiwaratkun
Ce Zhang
James Zou
LLMAGAIFin
107
138
0
07 Jun 2024
Improve Mathematical Reasoning in Language Models by Automated Process
  Supervision
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
...
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
141
193
0
05 Jun 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
188
494
0
23 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min Zhang
MoE
111
42
0
18 May 2024
LoRA Learns Less and Forgets Less
LoRA Learns Less and Forgets Less
D. Biderman
Jose Javier Gonzalez Ortiz
Jacob P. Portes
Mansheej Paul
Philip Greengard
...
Sam Havens
Vitaliy Chiley
Jonathan Frankle
Cody Blakeney
John P. Cunningham
CLL
133
142
0
15 May 2024
Understanding the performance gap between online and offline alignment
  algorithms
Understanding the performance gap between online and offline alignment algorithms
Yunhao Tang
Daniel Guo
Zeyu Zheng
Daniele Calandriello
Yuan Cao
...
Rémi Munos
Bernardo Avila-Pires
Michal Valko
Yong Cheng
Will Dabney
OffRLOnRL
109
75
0
14 May 2024
MuMath-Code: Combining Tool-Use Large Language Models with
  Multi-perspective Data Augmentation for Mathematical Reasoning
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning
Shuo Yin
Weihao You
Zhilong Ji
Guoqiang Zhong
Jinfeng Bai
LRMSyDa
82
11
0
13 May 2024
Stream of Search (SoS): Learning to Search in Language
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALMAIFinLRM
103
68
0
01 Apr 2024
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept,
  Taxonomy, and Methods
Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods
Yuji Cao
Huan Zhao
Yuheng Cheng
Ting Shu
Guolong Liu
Gaoqi Liang
Junhua Zhao
Yun Li
LLMAGKELMOffRLLM&Ro
137
71
0
30 Mar 2024
InternLM2 Technical Report
InternLM2 Technical Report
Zheng Cai
Maosong Cao
Haojiong Chen
Kai-xiang Chen
Keyu Chen
...
Jingming Zhuo
Yi-Ling Zou
Xipeng Qiu
Yu Qiao
Dahua Lin
ALM
86
209
0
26 Mar 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAGReLMLRM
140
151
0
14 Mar 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with
  Olympiad-Level Bilingual Multimodal Scientific Problems
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELMAIMat
173
282
0
21 Feb 2024
Chain-of-Thought Reasoning Without Prompting
Chain-of-Thought Reasoning Without Prompting
Xuezhi Wang
Denny Zhou
ReLMLRM
278
125
0
15 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLMLRM
118
137
0
09 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
219
1,289
0
05 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
309
570
0
02 Feb 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
176
1,129
0
08 Jan 2024
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human
  Annotations
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMatLRMALM
205
398
0
14 Dec 2023
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
281
648
0
18 Oct 2023
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Minlie Huang
Nan Duan
Weizhu Chen
LRMAI4CELLMAG
161
168
0
29 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas Griffiths
LLMAGLM&Ro
166
182
0
05 Sep 2023
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Zhiwei Liu
Weiran Yao
Jianguo Zhang
Le Xue
Shelby Heinecke
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAG
102
87
0
11 Aug 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
204
130
0
02 Aug 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLMLRM
152
615
0
23 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
251
1,241
0
31 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
405
4,190
0
29 May 2023
Self-Refine: Iterative Refinement with Self-Feedback
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
...
Bodhisattwa Prasad Majumder
Katherine Hermann
Sean Welleck
Amir Yazdanbakhsh
Peter Clark
ReLMLRMDiffM
256
1,690
0
30 Mar 2023
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn
Federico Cassano
Beck Labash
A. Gopinath
Karthik Narasimhan
Shunyu Yao
LLMAGKELM
154
1,330
0
20 Mar 2023
Solving math word problems with process- and outcome-based feedback
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaMLReLMAIMatLRM
141
362
0
25 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
338
3,179
0
20 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAGReLMLRM
498
3,007
0
06 Oct 2022
STaR: Bootstrapping Reasoning With Reasoning
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLMLRM
160
512
0
28 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
1.2K
13,290
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
1.1K
9,827
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
442
4,609
0
27 Oct 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
150
613
0
10 Jun 2021
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
314
2,195
0
02 Sep 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
206
1,199
0
30 Jun 2020
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
709
19,377
0
20 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
1.0K
133,589
0
12 Jun 2017
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
246
3,389
0
12 Jun 2017
Outrageously Large Neural Networks: The Sparsely-Gated
  Mixture-of-Experts Layer
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
268
2,709
0
23 Jan 2017
Previous
12