Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.15823
Cited By
InductionBench: LLMs Fail in the Simplest Complexity Class
20 February 2025
Wenyue Hua
Tyler Wong
Sun Fei
Liangming Pan
Adam Jardine
William Yang Wang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"InductionBench: LLMs Fail in the Simplest Complexity Class"
22 / 22 papers shown
Title
From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models
Kaiyu He
Zhiyu Chen
ReLM
LRM
ELM
42
0
0
28 May 2025
A Minimum Description Length Approach to Regularization in Neural Networks
Matan Abudy
Orr Well
Emmanuel Chemla
Roni Katzir
Nur Lan
52
0
0
19 May 2025
Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
Tianyu Liu
Simeng Han
Xiao Luo
Haoyu Wang
Pan Lu
...
Arman Cohan
Hua Xu
Mark B. Gerstein
James Zou
Hongyu Zhao
54
0
0
03 May 2025
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
Haokun Liu
Sicong Huang
Jingyu Hu
Yangqiaoyu Zhou
Chenhao Tan
55
1
0
15 Apr 2025
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
157
876
0
25 Oct 2024
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Iman Mirzadeh
Keivan Alizadeh
Hooman Shahrokhi
Oncel Tuzel
Samy Bengio
Mehrdad Farajtabar
AIMat
LRM
84
168
0
07 Oct 2024
Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities
Wenyue Hua
Kaijie Zhu
Lingyao Li
Lizhou Fan
Shuhang Lin
Mingyu Jin
Haochen Xue
Zelong Li
Jindong Wang
Yongfeng Zhang
LRM
74
11
0
04 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRM
ELM
82
409
0
03 Jun 2024
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
Mihir Parmar
Nisarg Patel
Neeraj Varshney
Mutsumi Nakamura
Man Luo
Santosh Mashetty
Arindam Mitra
Chitta Baral
LRM
ReLM
ELM
145
28
0
23 Apr 2024
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models
Emmy Liu
Graham Neubig
Jacob Andreas
ReLM
LRM
50
10
0
03 Apr 2024
Long-context LLMs Struggle with Long In-context Learning
Tianle Li
Ge Zhang
Quy Duc Do
Xiang Yue
Wenhu Chen
79
182
0
02 Apr 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
82
383
0
12 Mar 2024
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
Lizhou Fan
Wenyue Hua
Lingyao Li
Haoyang Ling
Yongfeng Zhang
LRM
56
52
0
22 Dec 2023
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
Linlu Qiu
Liwei Jiang
Ximing Lu
Melanie Sclar
Valentina Pyatkin
...
Bailin Wang
Yoon Kim
Yejin Choi
Nouha Dziri
Xiang Ren
LRM
ReLM
71
86
0
12 Oct 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
66
564
0
10 Oct 2023
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
A. Maritan
Jiaao Chen
S. Dey
Luca Schenato
Diyi Yang
Xing Xie
ELM
LRM
98
51
0
29 Sep 2023
FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han
Hailey Schoelkopf
Yilun Zhao
Zhenting Qi
Martin Riddell
...
Yingbo Zhou
Caiming Xiong
Rex Ying
Arman Cohan
Dragomir R. Radev
ReLM
LRM
79
101
0
02 Sep 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
225
4,354
0
27 Oct 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
201
5,454
0
07 Jul 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
147
2,220
0
05 Mar 2021
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning
Weihao Yu
Zihang Jiang
Yanfei Dong
Jiashi Feng
LRM
112
250
0
11 Feb 2020
Human few-shot learning of compositional instructions
Brenden M. Lake
Tal Linzen
Marco Baroni
55
112
0
14 Jan 2019
1