Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.18940
Cited By
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
26 February 2025
Jakub Macina
Nico Daheim
Ido Hakimi
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors"
14 / 14 papers shown
Title
DualSchool: How Reliable are LLMs for Optimization Education?
Michael Klamkin
Arnaud Deza
Sikai Cheng
Haoruo Zhao
Pascal Van Hentenryck
41
0
0
27 May 2025
Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark
Unggi Lee
Jaeyong Lee
Jiyeong Bae
Yeil Jeong
Junbo Koh
Gyeonggeon Lee
Gunho Lee
Taekyung Ahn
Hyeoncheol Kim
LRM
49
0
0
24 May 2025
LLM Agents for Education: Advances and Applications
Zhendong Chu
Shen Wang
Jian Xie
Tinghui Zhu
Yibo Yan
...
Aoxiao Zhong
Xuming Hu
Jing Liang
Philip S. Yu
Qingsong Wen
LLMAG
ELM
158
7
0
14 Mar 2025
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMe
ALM
ELM
130
205
0
02 May 2024
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
Paiheng Xu
Jing Liu
Nathan Jones
Julie Cohen
Wei Ai
AI4Ed
90
7
0
03 Apr 2024
Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes
Rose E. Wang
Qingyang Zhang
Carly Robinson
Susanna Loeb
Dorottya Demszky
115
38
0
16 Oct 2023
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
Jakub Macina
Nico Daheim
Sankalan Pal Chowdhury
Tanmay Sinha
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
LRM
103
68
0
23 May 2023
Automatic Generation of Socratic Subquestions for Teaching Math Word Problems
Kumar Shridhar
Jakub Macina
Mennatallah El-Assady
Tanmay Sinha
Manu Kapur
Mrinmaya Sachan
AIMat
85
48
0
23 Nov 2022
Dialog Inpainting: Turning Documents into Dialogs
Zhuyun Dai
Arun Tejasvi Chaganty
Vincent Zhao
Aida Amini
Q. Rashid
Mike Green
Kelvin Guu
63
67
0
18 May 2022
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Anaïs Tack
Chris Piech
ELM
98
94
0
16 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
900
13,228
0
04 Mar 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
367
4,598
0
27 Oct 2021
Measuring Conversational Uptake: A Case Study on Student-Teacher Interactions
Dorottya Demszky
Jing Liu
Zid Mancenido
Julie Cohen
H. Hill
Dan Jurafsky
Tatsunori Hashimoto
114
67
0
07 Jun 2021
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
384
5,872
0
21 Apr 2019
1