Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.05229
Cited By
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
7 October 2024
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMat
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"
50 / 92 papers shown
Title
Detection and Mitigation of Hallucination in Large Reasoning Models: A Mechanistic Perspective
Zhongxiang Sun
Qipeng Wang
Haoyu Wang
Xiao Zhang
Jun Xu
HILM
LRM
9
0
0
19 May 2025
Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis
Akarsh Kumar
Jeff Clune
Joel Lehman
Kenneth O. Stanley
OOD
21
0
0
16 May 2025
Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency
Daniel Weitekamp
Christopher MacLellan
Erik Harpstead
Kenneth R. Koedinger
21
0
0
15 May 2025
Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models
John Hawkins
ReLM
LRM
57
0
0
08 May 2025
R^3-VQA: "Read the Room" by Video Social Reasoning
Lixing Niu
Jiapeng Li
Xingping Yu
Shu Wang
Ruining Feng
Bo Wu
Ping Wei
Yansen Wang
Lifeng Fan
51
0
0
07 May 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Qi Liu
Xinhao Zheng
Renqiu Xia
Xingzhi Qi
Qinxiang Cao
Junchi Yan
AIMat
52
0
0
07 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
48
0
0
07 May 2025
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
Daniel Weitekamp
M. N. Siddiqui
Christopher MacLellan
LLMAG
ELM
37
0
0
02 May 2025
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies
Bartosz Piotrowski
Witold Drzewakowski
Konrad Staniszewski
Piotr Miłoś
LRM
36
0
0
23 Apr 2025
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs
Huanyu Zhang
Chengzu Li
Wenshan Wu
Shaoguang Mao
Yan Xia
Ivan Vulić
Z. Zhang
Liang Wang
Tieniu Tan
Furu Wei
LRM
39
2
0
21 Apr 2025
An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework
Zeyu Wang
Frank P.-W. Lo
Qian Chen
Yongqi Zhang
Chen Lin
Xu Chen
Zhenhua Yu
Alexander J. Thompson
Eric M. Yeatman
Benny Lo
AI4CE
28
0
0
20 Apr 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time
Kevin Lin
Charlie Snell
Yansen Wang
Charles Packer
Sarah Wooders
Ion Stoica
Joseph E. Gonzalez
47
2
0
17 Apr 2025
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models
Liyi Zhang
Veniamin Veselovsky
R. Thomas McCoy
Thomas L. Griffiths
61
0
0
17 Apr 2025
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAML
VLM
LRM
83
0
0
16 Apr 2025
Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination
Mika Setälä
Pieta Sikström
Ville Heilala
T. Karkkainen
ELM
LRM
36
1
0
15 Apr 2025
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Parshin Shojaee
Ngoc-Hieu Nguyen
Kazem Meidani
A. Farimani
Khoa D. Doan
Chandan K. Reddy
31
1
0
14 Apr 2025
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Zaid Khan
Elias Stengel-Eskin
Archiki Prasad
Jaemin Cho
Joey Tianyi Zhou
34
0
0
14 Apr 2025
DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning
Atharva Pandey
Kshitij Dubey
Rahul Sharma
Amit Sharma
ReLM
ELM
LRM
52
0
0
09 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Ameya Prabhu
Matthias Bethge
ReLM
ALM
LRM
100
5
0
09 Apr 2025
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning
Rem Yang
Julian Dai
N. Vasilakis
Martin Rinard
ELM
LRM
34
0
0
07 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
34
0
0
04 Apr 2025
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann
Jie-jin Yang
LRM
46
0
0
02 Apr 2025
Medical large language models are easily distracted
Krithik Vishwanath
Anton Alyakin
Daniel Alber
Jin Vivian Lee
Douglas Kondziolka
E. Oermann
36
0
0
01 Apr 2025
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?
Kai Yan
Yufei Xu
Zhengyin Du
Xuesong Yao
Zhigang Wang
Xiaowen Guo
Jiecao Chen
ReLM
ELM
LRM
95
4
0
01 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
66
1
0
01 Apr 2025
Do Large Language Models Exhibit Spontaneous Rational Deception?
Samuel M. Taylor
Benjamin K. Bergen
LRM
62
0
0
31 Mar 2025
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation
Jixuan Leng
Chengsong Huang
Langlin Huang
Bill Yuchen Lin
William W. Cohen
Haohan Wang
Jiaxin Huang
LRM
49
0
0
30 Mar 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
46
0
0
29 Mar 2025
The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction
Yihuai Hong
Dian Zhou
Meng Cao
Lei Yu
Zhijing Jin
LRM
46
0
0
29 Mar 2025
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
Simeng Sun
Cheng-Ping Hsieh
Faisal Ladhak
Erik Arakelyan
Santiago Akle Serano
Boris Ginsburg
ReLM
ELM
LRM
175
0
0
28 Mar 2025
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
93
41
0
25 Mar 2025
VecTrans: LLM Transformation Framework for Better Auto-vectorization on High-performance CPU
Zhongchun Zheng
Long Cheng
Lu Li
Rodrigo C. O. Rocha
Tianyi Liu
Wei Wei
Xuzhi Zhang
Yaoqing Gao
36
0
0
25 Mar 2025
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?
Aabid Karim
Abdul Karim
Bhoomika Lohana
Matt Keon
Jaswinder Singh
A. Sattar
52
0
0
23 Mar 2025
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Giacomo Camposampiero
Michael Hersche
Roger Wattenhofer
Abu Sebastian
Abbas Rahimi
LRM
56
1
0
14 Mar 2025
Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach
Afrar Jahin
Arif Hassan Zidan
Wei Zhang
Yu Bao
Tianming Liu
LRM
76
1
0
13 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
55
0
0
13 Mar 2025
Effectiveness of Zero-shot-CoT in Japanese Prompts
Shusuke Takayama
Ian Frank
LRM
49
0
0
09 Mar 2025
Toward an Evaluation Science for Generative AI Systems
Laura Weidinger
Deb Raji
Hanna M. Wallach
Margaret Mitchell
Angelina Wang
Olawale Salaudeen
Rishi Bommasani
Sayash Kapoor
Deep Ganguli
Sanmi Koyejo
EGVM
ELM
67
4
0
07 Mar 2025
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao
Ting Zhang
Jiayu Sun
Mi Tian
Hua Huang
36
0
0
07 Mar 2025
Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning
Wenjie Wu
Yongcheng Jing
Yingjie Wang
Wenbin Hu
Dacheng Tao
RALM
LRM
69
2
0
03 Mar 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
RALM
LRM
62
0
0
27 Feb 2025
An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs
Kaustubh Vyas
D. Graux
Sébastien Montella
Pavlos Vougiouklis
Ruofei Lai
Keshuang Li
Yang Ren
Jeff Z. Pan
LLMAG
ELM
65
1
0
27 Feb 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Ziyi Wang
Yang Zhou
ELM
VLM
LRM
60
1
0
27 Feb 2025
BIG-Bench Extra Hard
Mehran Kazemi
Bahare Fatemi
Hritik Bansal
John Palowitch
Chrysovalantis Anastasiou
...
Kate Olszewska
Yi Tay
Vinh Q. Tran
Quoc V. Le
Orhan Firat
ELM
LRM
122
6
0
26 Feb 2025
General Reasoning Requires Learning to Reason from the Get-go
Seungwook Han
Jyothish Pari
Samuel J. Gershman
Pulkit Agrawal
LRM
175
1
0
26 Feb 2025
Broadening Discovery through Structural Models: Multimodal Combination of Local and Structural Properties for Predicting Chemical Features
Nikolai Rekut
Alexey Orlov
Klea Ziu
Elizaveta Starykh
Martin Takáč
Aleksandr Beznosikov
66
0
0
25 Feb 2025
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
Patomporn Payoungkhamdee
Pume Tuchinda
Jinheon Baek
Samuel Cahyawijaya
Can Udomcharoenchaikit
Potsawee Manakul
Peerat Limkonchotiwat
E. Chuangsuwanich
Sarana Nutanong
LRM
54
0
0
25 Feb 2025
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
Alon Albalak
Duy Phung
Nathan Lile
Rafael Rafailov
Kanishk Gandhi
...
Anikait Singh
Chase Blagden
Violet Xiang
Dakota Mahan
Nick Haber
OffRL
LRM
53
6
0
24 Feb 2025
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer
Marthe Ballon
Andres Algaba
Vincent Ginis
LRM
ReLM
44
5
0
24 Feb 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
61
1
0
24 Feb 2025
1
2
Next