ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.13814
  4. Cited By
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)

An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)

23 February 2023
Paulo Shakarian
Abhinav Koyyalamudi
Noel Ngu
Lakshmivihari Mareedu
ArXivPDFHTML

Papers citing "An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP)"

27 / 27 papers shown
Title
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Andreas Opedal
Haruki Shirakami
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
57
1
0
17 Feb 2025
Understanding and Evaluating Trust in Generative AI and Large Language
  Models for Spreadsheets
Understanding and Evaluating Trust in Generative AI and Large Language Models for Spreadsheets
Simon Thorne
77
1
0
18 Dec 2024
Neuro-Symbolic Data Generation for Math Reasoning
Neuro-Symbolic Data Generation for Math Reasoning
Zenan Li
Zhi-Hua Zhou
Yuan Yao
Yu Li
Chun Cao
Fan Yang
Xian Zhang
Xiaoxing Ma
OffRL
LRM
76
7
0
06 Dec 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRM
ReLM
ELM
AIMat
67
1
0
11 Nov 2024
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM
  Training in Proof Generation
Next-Token Prediction Task Assumes Optimal Data Ordering for LLM Training in Proof Generation
Chenyang An
Shima Imani
Feng Yao
Chengyu Dong
Ali Abbasi
...
Samuel Buss
Jingbo Shang
Gayathri Mahalingam
Pramod Sharma
Maurice Diesendruck
LRM
31
1
0
30 Oct 2024
How Numerical Precision Affects Mathematical Reasoning Capabilities of
  LLMs
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs
Guhao Feng
Kai-Bo Yang
Yuntian Gu
Xinyue Ai
Shengjie Luo
Jiacheng Sun
Di He
Zechao Li
Liwei Wang
LRM
37
6
0
17 Oct 2024
When Not to Answer: Evaluating Prompts on GPT Models for Effective
  Abstention in Unanswerable Math Word Problems
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems
Asir Saadat
Tasmia Binte Sogir
Md Taukir Azam Chowdhury
Syem Aziz
79
1
0
16 Oct 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on
  Error Identification and Correction
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
38
15
0
02 Jun 2024
Exploring the Limits of Fine-grained LLM-based Physics Inference via
  Premise Removal Interventions
Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions
Jordan Meadows
Tamsin James
André Freitas
ReLM
LRM
AI4CE
38
1
0
29 Apr 2024
A New Era in LLM Security: Exploring Security Concerns in Real-World
  LLM-based Systems
A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
Fangzhou Wu
Ning Zhang
Somesh Jha
P. McDaniel
Chaowei Xiao
34
68
0
28 Feb 2024
Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics
Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics
Sadaf Ghaffari
Nikhil Krishnaswamy
LRM
34
3
0
24 Feb 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in
  Closed-Source LLMs
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
23
156
0
06 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and
  Challenges
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
38
116
0
31 Jan 2024
Evaluating and Enhancing Large Language Models for Conversational
  Reasoning on Knowledge Graphs
Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs
Yuxuan Huang
Lida Shi
Anqi Liu
Hao Xu
LLMAG
ELM
KELM
LRM
16
2
0
18 Dec 2023
DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial
  Natural Language Instructions
DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions
Fangzhou Wu
Xiaogeng Liu
Chaowei Xiao
AAML
SILM
29
26
0
07 Dec 2023
Exploring the Potential of Large Language Models in Computational
  Argumentation
Exploring the Potential of Large Language Models in Computational Argumentation
Guizhen Chen
Liying Cheng
Anh Tuan Luu
Lidong Bing
LLMAG
LRM
24
23
0
15 Nov 2023
ChatGPT & Mechanical Engineering: Examining performance on the FE
  Mechanical Engineering and Undergraduate Exams
ChatGPT & Mechanical Engineering: Examining performance on the FE Mechanical Engineering and Undergraduate Exams
Matthew Frenkel
Hebah Emara
34
2
0
26 Sep 2023
Diversity Measures: Domain-Independent Proxies for Failure in Language
  Model Queries
Diversity Measures: Domain-Independent Proxies for Failure in Language Model Queries
Noel Ngu
Nathaniel Lee
Paulo Shakarian
21
4
0
22 Aug 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELM
LRM
45
37
0
25 Jul 2023
How is ChatGPT's behavior changing over time?
How is ChatGPT's behavior changing over time?
Lingjiao Chen
Matei A. Zaharia
James Zou
ELM
KELM
AI4MH
44
414
0
18 Jul 2023
Domain Specialization as the Key to Make Large Language Models
  Disruptive: A Comprehensive Survey
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Liang Zhao
ALM
30
126
0
30 May 2023
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities
  and Future Opportunities
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
Yuqi Zhu
Xiaohan Wang
Jing Chen
Shuofei Qiao
Yixin Ou
Yunzhi Yao
Shumin Deng
Huajun Chen
Ningyu Zhang
LLMAG
41
110
0
22 May 2023
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time
Shangqing Tu
Chunyang Li
Jifan Yu
Xiaozhi Wang
Lei Hou
Juanzi Li
LLMAG
AI4MH
75
10
0
27 Apr 2023
Summary of ChatGPT-Related Research and Perspective Towards the Future
  of Large Language Models
Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
Yi-Hsien Liu
Tianle Han
Siyuan Ma
Jia-Yu Zhang
Yuanyu Yang
...
Xiang Li
Ning Qiang
Dingang Shen
Tianming Liu
Bao Ge
ALM
ELM
AI4CE
LM&MA
LLMAG
38
464
0
04 Apr 2023
Matrix diagonalization and singular value decomposition: Static SageMath
  and dynamic ChatGPT juxtaposed
Matrix diagonalization and singular value decomposition: Static SageMath and dynamic ChatGPT juxtaposed
N. Karjanto
19
0
0
30 Mar 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,273
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
395
8,559
0
28 Jan 2022
1