ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.05660
  4. Cited By
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning
  Tasks

NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks

12 April 2022
Swaroop Mishra
Arindam Mitra
Neeraj Varshney
Bhavdeep Singh Sachdeva
Peter Clark
Chitta Baral
Ashwin Kalyan
    AIMat
    ReLM
    ELM
    LRM
ArXivPDFHTML

Papers citing "NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks"

50 / 73 papers shown
Title
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
45
0
0
22 Mar 2025
Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions
Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions
Lan Zhang
Marco Valentino
André Freitas
49
0
0
17 Feb 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wei Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
79
12
0
03 Jan 2025
Towards Adaptive Mechanism Activation in Language Agent
Towards Adaptive Mechanism Activation in Language Agent
Ziyang Huang
Jun Zhao
Kang Liu
LLMAG
AI4CE
80
0
0
01 Dec 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yunfan LU
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic
  Reasoning Tasks
VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat
Mutsumi Nakamura
Shankar Kailas
Kartik Aggarwal
Mandy Zhou
Yezhou Yang
Chitta Baral
MLLM
CoGe
ReLM
VLM
LRM
37
0
0
17 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Luu Anh Tuan
Rishabh Agarwal
Chen-Yu Lee
Tomas Pfister
80
8
0
15 Oct 2024
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Xiangyu Peng
Congying Xia
Xinyi Yang
Caiming Xiong
Chien-Sheng Wu
Chen Xing
LRM
48
2
0
03 Oct 2024
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Shengyu Feng
Xiang Kong
Shuang Ma
Aonan Zhang
Dong Yin
Chong-Jun Wang
Ruoming Pang
Yiming Yang
LRM
32
0
0
02 Oct 2024
A Looming Replication Crisis in Evaluating Behavior in Language Models?
  Evidence and Solutions
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions
Laurène Vaugrante
Mathias Niepert
Thilo Hagendorff
LRM
43
1
0
30 Sep 2024
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language
  Models
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Jiayi Gui
Yiming Liu
Jiale Cheng
Xiaotao Gu
Xiao-Yang Liu
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
ELM
LLMAG
LRM
37
2
0
28 Aug 2024
Multi-tool Integration Application for Math Reasoning Using Large
  Language Model
Multi-tool Integration Application for Math Reasoning Using Large Language Model
Zhihua Duan
Jialin Wang
LLMAG
LRM
43
0
0
22 Aug 2024
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for
  Continual Learning
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning
Min Jae Jung
Romain Rouvoy
KELM
MoE
CLL
44
2
0
31 Jul 2024
Do Large Language Models Exhibit Cognitive Dissonance? Studying the
  Difference Between Revealed Beliefs and Stated Answers
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Manuel Mondal
Ljiljana Dolamic
Gérôme Bovet
Philippe Cudré-Mauroux
Julien Audiffren
40
2
0
21 Jun 2024
Timo: Towards Better Temporal Reasoning for Language Models
Timo: Towards Better Temporal Reasoning for Language Models
Zhaochen Su
Jun Zhang
Tong Zhu
Xiaoye Qu
Juntao Li
Min Zhang
Yu Cheng
LRM
47
17
0
20 Jun 2024
GenQA: Generating Millions of Instructions from a Handful of Prompts
GenQA: Generating Millions of Instructions from a Handful of Prompts
Jiuhai Chen
Rifaa Qadri
Yuxin Wen
Neel Jain
John Kirchenbauer
Dinesh Manocha
Tom Goldstein
ALM
40
14
0
14 Jun 2024
Pre-trained Large Language Models Use Fourier Features to Compute
  Addition
Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou
Deqing Fu
Vatsal Sharan
Robin Jia
LRM
34
9
0
05 Jun 2024
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning
  using Large Language Models
NUMCoT: Numerals and Units of Measurement in Chain-of-Thought Reasoning using Large Language Models
Ancheng Xu
Minghuan Tan
Lei Wang
Min Yang
Ruifeng Xu
LRM
57
0
0
05 Jun 2024
MathBench: Evaluating the Theory and Application Proficiency of LLMs
  with a Hierarchical Mathematics Benchmark
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Hongwei Liu
Zilong Zheng
Yuxuan Qiao
Haodong Duan
Zhiwei Fei
Fengzhe Zhou
Wenwei Zhang
Songyang Zhang
Dahua Lin
Kai-xiang Chen
53
57
0
20 May 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
52
64
0
25 Apr 2024
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability
  of Large Language Models
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models
Mihir Parmar
Nisarg Patel
Neeraj Varshney
Mutsumi Nakamura
Man Luo
Santosh Mashetty
Arindam Mitra
Chitta Baral
LRM
ReLM
ELM
38
23
0
23 Apr 2024
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language
  Models
Pre-Calc: Learning to Use the Calculator Improves Numeracy in Language Models
Vishruth Veerendranath
Vishwa Shah
Kshitish Ghate
30
0
0
22 Apr 2024
Mathify: Evaluating Large Language Models on Mathematical Problem
  Solving Tasks
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks
Avinash Anand
Mohit Gupta
Kritarth Prasad
Navya Singla
Sanjana Sanjeev
Jatin Kumar
A. Shivam
R. Shah
LRM
55
14
0
19 Apr 2024
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical
  Reasoning in Large Language Models
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Hyeonwoo Kim
Gyoungjin Gim
Yungi Kim
Jihoo Kim
Byungju Kim
Wonseok Lee
Chanjun Park
ReLM
LRM
34
1
0
05 Apr 2024
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
  with a Self-Critique Pipeline
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Yifan Xu
Xiao Liu
Xinghan Liu
Zhenyu Hou
Yueyan Li
...
Aohan Zeng
Zhengxiao Du
Wenyi Zhao
Jie Tang
Yuxiao Dong
LRM
49
35
0
03 Apr 2024
Advancing LLM Reasoning Generalists with Preference Trees
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan
Ganqu Cui
Hanbin Wang
Ning Ding
Xingyao Wang
...
Zhenghao Liu
Bowen Zhou
Hao Peng
Zhiyuan Liu
Maosong Sun
LRM
39
98
0
02 Apr 2024
Dual Instruction Tuning with Large Language Models for Mathematical
  Reasoning
Dual Instruction Tuning with Large Language Models for Mathematical Reasoning
Yongwei Zhou
Tiejun Zhao
LRM
30
6
0
27 Mar 2024
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large
  Language Models by Summarizing Training Trajectories of Small Models
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
Yu Yang
Siddhartha Mishra
Jeffrey N Chiang
Baharan Mirzasoleiman
40
17
0
12 Mar 2024
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of
  LLMs as Mathematical Problem Solvers
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers
Qintong Li
Leyang Cui
Xueliang Zhao
Lingpeng Kong
Wei Bi
LRM
40
46
0
29 Feb 2024
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient
  Tuning
Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning
Weijieying Ren
Xinlong Li
Lei Wang
Tianxiang Zhao
Wei Qin
CLL
KELM
38
34
0
29 Feb 2024
How Do Humans Write Code? Large Models Do It the Same Way Too
How Do Humans Write Code? Large Models Do It the Same Way Too
Long Li
Xuzheng He
LRM
43
0
0
24 Feb 2024
CriticBench: Evaluating Large Language Models as Critic
CriticBench: Evaluating Large Language Models as Critic
Tian Lan
Wenwei Zhang
Chen Xu
Heyan Huang
Dahua Lin
Kai-xiang Chen
Xian-Ling Mao
ELM
AI4MH
LRM
47
3
0
21 Feb 2024
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark
  for Deception Reasoning
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning
Kang Chen
Zheng Lian
Haiyang Sun
Bin Liu
Jianhua Tao
36
0
0
18 Feb 2024
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Arindam Mitra
Hamed Khanpour
Corby Rosset
Ahmed Hassan Awadallah
ALM
MoE
LRM
37
62
0
16 Feb 2024
MAPO: Advancing Multilingual Reasoning through Multilingual
  Alignment-as-Preference Optimization
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Shuaijie She
Wei Zou
Shujian Huang
Wenhao Zhu
Xiang Liu
Xiang Geng
Jiajun Chen
LRM
75
31
0
12 Jan 2024
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset
Haoyi Wu
Wenyang Hui
Yezeng Chen
Weiqi Wu
Kewei Tu
Yi Zhou
LRM
43
3
0
09 Nov 2023
Multi-Operational Mathematical Derivations in Latent Space
Multi-Operational Mathematical Derivations in Latent Space
Marco Valentino
Jordan Meadows
Lan Zhang
André Freitas
29
5
0
02 Nov 2023
A Comprehensive Evaluation of Tool-Assisted Generation Strategies
A Comprehensive Evaluation of Tool-Assisted Generation Strategies
Alon Jacovi
Avi Caciularu
Jonathan Herzig
Roee Aharoni
Bernd Bohnet
Mor Geva
ELM
31
6
0
16 Oct 2023
TRACE: A Comprehensive Benchmark for Continual Learning in Large
  Language Models
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models
Xiao Wang
Yuan Zhang
Tianze Chen
Songyang Gao
Senjie Jin
...
Rui Zheng
Yicheng Zou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
LRM
CLL
60
18
0
10 Oct 2023
Large Language Models Only Pass Primary School Exams in Indonesia: A
  Comprehensive Test on IndoMMLU
Large Language Models Only Pass Primary School Exams in Indonesia: A Comprehensive Test on IndoMMLU
Fajri Koto
Nurul Aisyah
Haonan Li
Timothy Baldwin
AI4Ed
LRM
ELM
30
37
0
07 Oct 2023
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of
  Large Language Models with Misconceptions
Novice Learner and Expert Tutor: Evaluating Math Reasoning Abilities of Large Language Models with Misconceptions
Naiming Liu
Shashank Sonkar
Zichao Wang
Simon Woodhead
Richard G. Baraniuk
LRM
AI4Ed
28
14
0
03 Oct 2023
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical
  Reasoning Capabilities of Language Models
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models
Man Luo
Shrinidhi Kumbhar
Ming shen
Mihir Parmar
Neeraj Varshney
Pratyay Banerjee
Somak Aditya
Chitta Baral
ReLM
ELM
LRM
45
25
0
02 Oct 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought
  Reasoning: Advances, Frontiers and Future
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yu Liu
Bing Qin
Ting Liu
LRM
AI4CE
31
151
0
27 Sep 2023
MAmmoTH: Building Math Generalist Models through Hybrid Instruction
  Tuning
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue
Xingwei Qu
Ge Zhang
Yao Fu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
AIMat
LRM
62
361
0
11 Sep 2023
Can NLP Models Ídentify', 'Distinguish', and 'Justify' Questions that
  Don't have a Definitive Answer?
Can NLP Models Ídentify', 'Distinguish', and 'Justify' Questions that Don't have a Definitive Answer?
Ayushi Agarwal
Nisarg Patel
Neeraj Varshney
Mihir Parmar
Pavan Mallina
Aryan Bhavin Shah
Srihari Sangaraju
Tirth Patel
Nihar Thakkar
Chitta Baral
ELM
14
3
0
08 Sep 2023
Through the Lens of Core Competency: Survey on Evaluation of Large
  Language Models
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
Ziyu Zhuang
Qiguang Chen
Longxuan Ma
Mingda Li
Yi Han
Yushan Qian
Haopeng Bai
Zixian Feng
Weinan Zhang
Ting Liu
ELM
26
9
0
15 Aug 2023
FERMAT: An Alternative to Accuracy for Numerical Reasoning
FERMAT: An Alternative to Accuracy for Numerical Reasoning
Jasivan Sivakumar
N. Moosavi
ReLM
LRM
40
3
0
27 May 2023
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of
  Thought Prompting
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
Tatsuro Inaba
Hirokazu Kiyomaru
Fei Cheng
Sadao Kurohashi
KELM
LRM
24
23
0
26 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
  using Causal Mediation Analysis
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
33
50
0
24 May 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large
  Language Models
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Gen Luo
Yiyi Zhou
Tianhe Ren
Shen Chen
Xiaoshuai Sun
Rongrong Ji
VLM
MLLM
29
89
0
24 May 2023
12
Next