Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.13867
Cited By
Mathematical Capabilities of ChatGPT
31 January 2023
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
ELM
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mathematical Capabilities of ChatGPT"
50 / 200 papers shown
Title
Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics
P. Romero
Stephen Fitz
T. Nakatsuma
30
10
0
14 Aug 2024
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
Yauwai Yim
Chunkit Chan
Tianyu Shi
Zheye Deng
Wei Fan
Tianshi Zheng
Yangqiu Song
LLMAG
36
10
0
05 Aug 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Derek F. Wong
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
71
25
0
11 Jul 2024
From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI
Stefanie Krause
Frieder Stolzenburg
ELM
LRM
41
1
0
04 Jul 2024
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
Meng Fang
Xiangpeng Wan
Fei Lu
Fei Xing
Kai Zou
26
20
0
26 Jun 2024
A Moonshot for AI Oracles in the Sciences
Bryan Kaiser
Tailin Wu
Maike Sonnewald
Colin Thackray
Skylar Callis
AI4CE
51
0
0
25 Jun 2024
Modulating Language Model Experiences through Frictions
Katherine M. Collins
Valerie Chen
Ilia Sucholutsky
Hannah Rose Kirk
Malak Sadek
Holli Sargeant
Ameet Talwalkar
Adrian Weller
Umang Bhatt
KELM
71
4
0
24 Jun 2024
Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens
M. Chartier
Nabil Dakkoune
G. Bourgeois
Stéphane Jean
KELM
ELM
31
1
0
21 Jun 2024
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Yang Chen
Cong Fang
Zhouchen Lin
Bing Liu
36
0
0
17 Jun 2024
Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou
Deqing Fu
Vatsal Sharan
Robin Jia
LRM
34
9
0
05 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAG
LRM
ELM
ReLM
58
28
0
04 Jun 2024
Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis
Yi Hu
Hyeonjin Kim
Kai Ye
Ning Lu
54
5
0
02 Jun 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
44
16
0
02 Jun 2024
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
44
2
0
24 May 2024
Investigating Symbolic Capabilities of Large Language Models
Neisarg Dave
Daniel Kifer
C. Lee Giles
A. Mali
ELM
LRM
42
2
0
21 May 2024
Can formal argumentative reasoning enhance LLMs performances?
Federico Castagna
I. Sassoon
Simon Parsons
LRM
LLMAG
30
2
0
16 May 2024
Exploring the Impact of ChatGPT on Wikipedia Engagement
Neal Reeves
Wenjie Yin
Elena Simperl
KELM
30
2
0
16 May 2024
The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering
Z. He
Thomas Nguyen
Tahereh Miari
Mehrdad Aliasgari
S. Rafatirad
Hossein Sayadi
27
2
0
23 Apr 2024
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
Chunkit Chan
Cheng Jiayang
Yauwai Yim
Zheye Deng
Wei Fan
Haoran Li
Xin Liu
Hongming Zhang
Weiqi Wang
Yangqiu Song
LLMAG
35
23
0
21 Apr 2024
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement
Ke-Shen Li
Yuan Yuan
LLMAG
30
12
0
20 Apr 2024
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
Denis Donadel
Francesco Marchiori
Luca Pajola
Mauro Conti
34
7
0
19 Apr 2024
A Survey on Deep Learning for Theorem Proving
Zhaoyu Li
Jialiang Sun
Logan Murphy
Qidong Su
Zenan Li
Xian Zhang
Kaiyu Yang
Xujie Si
LRM
56
22
0
15 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
53
6
0
12 Apr 2024
Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra
Darioush Kevian
U. Syed
Xing-ming Guo
Aaron J. Havens
Geir Dullerud
Peter M. Seiler
Lianhui Qin
Bin Hu
ELM
44
29
0
04 Apr 2024
From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision
Qingwen Lin
Boyan Xu
Zhengting Huang
Ruichu Cai
31
2
0
21 Mar 2024
Review of Generative AI Methods in Cybersecurity
Yagmur Yigit
William J. Buchanan
Madjid G Tehrani
Leandros A. Maglaras
AAML
56
20
0
13 Mar 2024
Human I/O: Towards a Unified Approach to Detecting Situational Impairments
Xingyu Bruce Liu
Jiahao Nick Li
David Kim
Xiang Ánthony' Chen
Andrea Colaço
42
13
0
06 Mar 2024
Chaining thoughts and LLMs to learn DNA structural biophysics
Tyler D. Ross
Ashwin Gopinath
AI4CE
35
2
0
02 Mar 2024
Large Language Models and Games: A Survey and Roadmap
Roberto Gallotta
Graham Todd
Marvin Zammit
Sam Earle
Antonios Liapis
Julian Togelius
Georgios N. Yannakakis
LLMAG
LM&MA
AI4CE
LRM
50
73
0
28 Feb 2024
A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
Fangzhou Wu
Ning Zhang
Somesh Jha
P. McDaniel
Chaowei Xiao
34
69
0
28 Feb 2024
WIPI: A New Web Threat for LLM-Driven Web Agents
Fangzhou Wu
Shutong Wu
Yulong Cao
Chaowei Xiao
LLMAG
34
20
0
26 Feb 2024
How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study
Tianjie Ju
Weiwei Sun
Wei Du
Xinwei Yuan
Zhaochun Ren
Gongshen Liu
KELM
39
24
0
25 Feb 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Chaoqun He
Renjie Luo
Yuzhuo Bai
Shengding Hu
Zhen Leng Thai
...
Yuxiang Zhang
Jie Liu
Lei Qi
Zhiyuan Liu
Maosong Sun
ELM
AIMat
35
161
0
21 Feb 2024
FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning
Xiao Li
Bolin Zhu
Kaiwen Shi
Sichen Liu
Yin Zhu
Yiwei Liu
Gong Cheng
AIMat
40
0
0
20 Feb 2024
Language Models as Science Tutors
Alexis Chevalier
Jiayi Geng
Alexander Wettig
Howard Chen
Sebastian Mizera
...
Jiatong Yu
Jun-Jie Zhu
Z. Ren
Sanjeev Arora
Danqi Chen
ELM
27
11
0
16 Feb 2024
UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph Construction
Yansong Ning
Hao Liu
LLMAG
31
2
0
10 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
35
159
0
06 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
40
122
0
31 Jan 2024
ChatGPT in the classroom. Exploring its potential and limitations in a Functional Programming course
Dan-Matei Popovici
27
29
0
20 Jan 2024
Code Simulation Challenges for Large Language Models
Emanuele La Malfa
Christoph Weinhuber
Orazio Torre
Fangru Lin
Samuele Marro
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
LLMAG
LRM
22
8
0
17 Jan 2024
Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance
Tinghui Ouyang
AprilPyone Maungmaung
Koichi Konishi
Yoshiki Seo
Isao Echizen
AI4MH
28
5
0
15 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
47
71
0
10 Jan 2024
AI Hallucinations: A Misnomer Worth Clarifying
Negar Maleki
Balaji Padmanabhan
Kaushik Dutta
28
34
0
09 Jan 2024
Computational Argumentation-based Chatbots: a Survey
Federico Castagna
Nadin Kökciyan
I. Sassoon
Simon Parsons
Elizabeth I. Sklar
39
6
0
07 Jan 2024
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Wenqi Zhang
Yongliang Shen
Linjuan Wu
Qiuying Peng
Jun Wang
Yueting Zhuang
Weiming Lu
LRM
LLMAG
40
51
0
04 Jan 2024
NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
Lizhou Fan
Wenyue Hua
Lingyao Li
Haoyang Ling
Yongfeng Zhang
LRM
31
46
0
22 Dec 2023
Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities
Yuhao Chen
Chloe Wong
Hanwen Yang
Juan Aguenza
Sai Bhujangari
...
Eric Phuong
Minghao Liu
Raja Kumar
Vanshika Vats
James Davis
37
1
0
22 Dec 2023
Evaluating AI Vocational Skills Through Professional Testing
David Noever
Matt Ciolino
ELM
32
0
0
17 Dec 2023
Resolving Crash Bugs via Large Language Models: An Empirical Study
Xueying Du
Mingwei Liu
Juntao Li
Hanlin Wang
Xin Peng
Yiling Lou
LRM
16
8
0
16 Dec 2023
Early ChatGPT User Portrait through the Lens of Data
Yuyang Deng
Ni Zhao
Xin Huang
19
3
0
10 Dec 2023
Previous
1
2
3
4
Next