Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.00074
Cited By
v1
v2 (latest)
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
29 September 2023
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation"
21 / 21 papers shown
Title
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
LRM
123
4
0
10 Feb 2025
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
200
14
0
03 Jan 2025
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
Xiongtao Zhou
Jie He
Lanyu Chen
Jingyu Li
Haojing Chen
Víctor Gutiérrez-Basulto
Jeff Z. Pan
Ningyu Zhang
LRM
125
2
0
18 Oct 2024
Can We Verify Step by Step for Incorrect Answer Detection?
Xin Xu
Shizhe Diao
Can Yang
Yang Wang
LRM
249
15
0
16 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
176
40
0
02 Feb 2024
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta
Nils Blach
Aleš Kubíček
Robert Gerstenberger
Michal Podstawski
...
Joanna Gajda
Tomasz Lehmann
H. Niewiadomski
Piotr Nyczyk
Torsten Hoefler
LRM
AI4CE
LM&Ro
153
701
0
18 Aug 2023
GPT-4 Can't Reason
Konstantine Arkoudas
ELM
LRM
AI4MH
56
34
0
21 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
410
4,422
0
09 Jun 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
131
470
0
07 Mar 2023
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ReLM
LRM
91
152
0
15 Dec 2022
FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han
Hailey Schoelkopf
Yilun Zhao
Zhenting Qi
Martin Riddell
...
Yingbo Zhou
Caiming Xiong
Rex Ying
Arman Cohan
Dragomir R. Radev
ReLM
LRM
105
109
0
02 Sep 2022
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Jaehun Jung
Lianhui Qin
Sean Welleck
Faeze Brahman
Chandra Bhagavatula
Ronan Le Bras
Yejin Choi
ReLM
LRM
271
196
0
24 May 2022
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning
Xi Ye
Greg Durrett
ReLM
LRM
69
185
0
06 May 2022
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
144
508
0
28 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
833
9,644
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
339
4,569
0
27 Oct 2021
Explaining Answers with Entailment Trees
Bhavana Dalvi
Peter Alexander Jansen
Oyvind Tafjord
Zhengnan Xie
Hannah Smith
Leighanna Pipatanangkura
Peter Clark
ReLM
FAtt
LRM
293
185
0
17 Apr 2021
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
115
387
0
26 Jun 2020
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
Yang Gao
Wei Zhao
Steffen Eger
ELM
92
126
0
07 May 2020
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
Lifu Huang
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
AIMat
RALM
LRM
115
457
0
31 Aug 2019
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
417
640
0
04 Dec 2018
1