ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.19444
  4. Cited By
MathChat: Benchmarking Mathematical Reasoning and Instruction Following
  in Multi-Turn Interactions

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

29 May 2024
Zhenwen Liang
Dian Yu
Wenhao Yu
Wenlin Yao
Zhihan Zhang
Xiangliang Zhang
Dong Yu
    LRM
ArXivPDFHTML

Papers citing "MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions"

16 / 16 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
50
1
0
09 May 2025
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework
Yao Shi
Rongkeng Liang
Yong Xu
LLMAG
AI4Ed
ELM
67
0
0
21 Apr 2025
Cultural Learning-Based Culture Adaptation of Language Models
Cultural Learning-Based Culture Adaptation of Language Models
Chen Cecilia Liu
Anna Korhonen
Iryna Gurevych
39
0
0
03 Apr 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Wu
Chenlin Ming
H. V. Zhao
Zeang Sheng
Lijun Wu
LRM
59
1
0
21 Mar 2025
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
Jakub Macina
Nico Daheim
Ido Hakimi
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
ELM
68
1
0
26 Feb 2025
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
Heng Chang
Miao Zheng
Fan Yang
Guosheng Dong
Bin Cui
Xin Wu
Zenan Zhou
Wentao Zhang
ALM
51
6
0
12 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
32
18
0
05 Oct 2024
Learn Beyond The Answer: Training Language Models with Reflection for
  Mathematical Reasoning
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Zhihan Zhang
Zhenwen Liang
Wenhao Yu
Dian Yu
Mengzhao Jia
Dong Yu
Meng Jiang
AIMat
RALM
LRM
ReLM
35
13
0
17 Jun 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on
  Error Identification and Correction
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
42
15
0
02 Jun 2024
SceMQA: A Scientific College Entrance Level Multimodal Question
  Answering Benchmark
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Zhenwen Liang
Kehan Guo
Gang Liu
Taicheng Guo
Yujun Zhou
Tianyu Yang
Jiajun Jiao
Renjie Pi
Jipeng Zhang
Xiangliang Zhang
ELM
36
18
0
06 Feb 2024
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
133
142
0
19 Sep 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDa
ALM
LM&MA
162
585
0
06 Apr 2023
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,273
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
Math Word Problem Generation with Mathematical Consistency and Problem
  Context Constraints
Math Word Problem Generation with Mathematical Consistency and Problem Context Constraints
Zichao Wang
Andrew S. Lan
Richard G. Baraniuk
64
45
0
09 Sep 2021
1