Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.10703
Cited By
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
21 April 2023
Archiki Prasad
Swarnadeep Saha
Xiang Zhou
Joey Tianyi Zhou
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness"
48 / 48 papers shown
Title
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani
Sachit Menon
Ahmet Iscen
Shyamal Buch
Ramin Mehran
...
Yukun Zhu
Carl Vondrick
Mikhail Sirotenko
Cordelia Schmid
Tobias Weyand
58
0
0
01 May 2025
Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning
Sky CH-Wang
Darshan Deshpande
Smaranda Muresan
Anand Kannappan
Rebecca Qian
54
1
0
24 Mar 2025
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Jing Bi
Junjia Guo
Susan Liang
Guangyu Sun
Luchuan Song
...
Jinxi He
Jiarui Wu
A. Vosoughi
Cheng Chen
Chenliang Xu
LRM
74
1
0
14 Mar 2025
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Ketan More
Omkar Thawakar
Ritesh Thawkar
...
F. Khan
Hisham Cholakkal
Ivan Laptev
Rao Muhammad Anwer
Salman Khan
LRM
71
0
0
13 Mar 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Hongsheng Li
MLLM
LRM
91
21
0
13 Feb 2025
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
LRM
67
3
0
10 Feb 2025
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
Xiongtao Zhou
Jie He
Lanyu Chen
Jingyu Li
Haojing Chen
Víctor Gutiérrez-Basulto
Jeff Z. Pan
H. Chen
LRM
57
1
0
18 Oct 2024
Identifying Task Groupings for Multi-Task Learning Using Pointwise V-Usable Information
Yingya Li
Timothy A. Miller
Steven Bethard
G. Savova
24
0
0
16 Oct 2024
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning
Ruosen Li
Ziming Luo
Xinya Du
LRM
29
0
0
08 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
23
3
0
02 Oct 2024
Step-by-Step Reasoning to Solve Grid Puzzles: Where do LLMs Falter?
Nemika Tyagi
Mihir Parmar
Mohith Kulkarni
Aswin Rrv
Nisarg Patel
Mutsumi Nakamura
Arindam Mitra
Chitta Baral
LRM
37
6
0
20 Jul 2024
Free-text Rationale Generation under Readability Level Control
Yi-Sheng Hsu
Nils Feldhus
Sherzod Hakimov
38
0
0
01 Jul 2024
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He
Yongliang Shen
Wenqi Zhang
Zeqi Tan
Weiming Lu
LRM
35
5
0
29 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
48
0
0
04 Jun 2024
NExT: Teaching Large Language Models to Reason about Code Execution
Ansong Ni
Miltiadis Allamanis
Arman Cohan
Yinlin Deng
Kensen Shi
Charles Sutton
Pengcheng Yin
ReLM
LRM
36
34
0
23 Apr 2024
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia
Xuefeng Li
Yixin Liu
Tongshuang Wu
Pengfei Liu
LRM
ReLM
47
21
0
08 Apr 2024
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Shibo Hao
Yi Gu
Haotian Luo
Tianyang Liu
Xiyan Shao
...
Haodi Ma
Adithya Samavedhi
Qiyue Gao
Zhen Wang
Zhiting Hu
LRM
ELM
89
22
0
08 Apr 2024
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Jooyoung Lee
Fan Yang
Thanh Tran
Qian Hu
Emre Barut
Kai-Wei Chang
Chengwei Su
ReLM
LLMAG
LRM
21
10
0
04 Apr 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey
Philipp Mondorf
Barbara Plank
ELM
LRM
LM&MA
33
35
0
02 Apr 2024
RORA: Robust Free-Text Rationale Evaluation
Zhengping Jiang
Yining Lu
Hanjie Chen
Daniel Khashabi
Benjamin Van Durme
Anqi Liu
50
1
0
28 Feb 2024
Soft Self-Consistency Improves Language Model Agents
Han Wang
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMAG
24
7
0
20 Feb 2024
How Interpretable are Reasoning Explanations from Prompting Large Language Models?
Yeo Wei Jie
Ranjan Satapathy
Rick Mong
Erik Cambria
ReLM
LRM
57
16
0
19 Feb 2024
Can We Verify Step by Step for Incorrect Answer Detection?
Xin Xu
Shizhe Diao
Can Yang
Yang Wang
LRM
122
13
0
16 Feb 2024
Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification
Shanshan Xu
Santosh T.Y.S.S
O. Ichim
Barbara Plank
Matthias Grabmair
37
4
0
11 Feb 2024
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Alon Jacovi
Yonatan Bitton
Bernd Bohnet
Jonathan Herzig
Or Honovich
Michael Tseng
Michael Collins
Roee Aharoni
Mor Geva
LRM
37
19
0
01 Feb 2024
PathFinder: Guided Search over Multi-Step Reasoning Paths
O. Yu. Golovneva
Sean O'Brien
Ramakanth Pasunuru
Tianlu Wang
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
LRM
27
7
0
08 Dec 2023
CLOMO: Counterfactual Logical Modification with Large Language Models
Yinya Huang
Ruixin Hong
Hongming Zhang
Wei Shao
Zhicheng YANG
Dong Yu
Changshui Zhang
Xiaodan Liang
Linqi Song
LRM
34
7
0
29 Nov 2023
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu
Oyvind Tafjord
Peter Clark
ELM
LRM
27
2
0
16 Nov 2023
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
Garima Agrawal
Tharindu Kumarage
Zeyad Alghami
Huanmin Liu
37
81
0
14 Nov 2023
Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking
Yongqi Tong
Yifan Wang
Dawei Li
Sizhe Wang
Zi Lin
Simeng Han
Jingbo Shang
LRM
13
17
0
18 Oct 2023
Measuring Pointwise
V
\mathcal{V}
V
-Usable Information In-Context-ly
Sheng Lu
Shan Chen
Yingya Li
Danielle Bitterman
G. Savova
Iryna Gurevych
19
0
0
18 Oct 2023
GLoRE: Evaluating Logical Reasoning of Large Language Models
Hanmeng Liu
Zhiyang Teng
Ruoxi Ning
Jian Liu
Qiji Zhou
Yuexin Zhang
Yue Zhang
ReLM
ELM
LRM
70
7
0
13 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
28
13
0
29 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
26
409
0
02 Sep 2023
Deductive Verification of Chain-of-Thought Reasoning
Z. Ling
Yunhao Fang
Xuanlin Li
Zhiao Huang
Mingu Lee
Roland Memisevic
Hao Su
ReLM
LRM
29
125
0
06 Jun 2023
Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
Zhanming Jie
Wei Lu
LRM
ReLM
25
15
0
29 May 2023
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs
Albert Q. Jiang
Sean Welleck
Jin Peng Zhou
Wenda Li
Jiacheng Liu
M. Jamnik
Timothée Lacroix
Yuhuai Wu
Guillaume Lample
AIMat
70
158
0
21 Oct 2022
REV: Information-Theoretic Evaluation of Free-Text Rationales
Hanjie Chen
Faeze Brahman
Xiang Ren
Yangfeng Ji
Yejin Choi
Swabha Swayamdipta
89
23
0
10 Oct 2022
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
162
414
0
03 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
325
4,077
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,248
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
367
8,495
0
28 Jan 2022
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
164
157
0
16 Oct 2021
Finding a Balanced Degree of Automation for Summary Evaluation
Shiyue Zhang
Joey Tianyi Zhou
52
43
0
23 Sep 2021
Explaining Answers with Entailment Trees
Bhavana Dalvi
Peter Alexander Jansen
Oyvind Tafjord
Zhengnan Xie
Hannah Smith
Leighanna Pipatanangkura
Peter Clark
ReLM
FAtt
LRM
239
184
0
17 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
673
0
06 Jan 2021
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
415
2,586
0
03 Sep 2019
1