ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.07919
  4. Cited By
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning

15 December 2022
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
    ReLM
    LRM
ArXivPDFHTML

Papers citing "ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning"

50 / 111 papers shown
Title
Think Twice Before Trusting: Self-Detection for Large Language Models
  through Comprehensive Answer Reflection
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Moxin Li
Wenjie Wang
Fuli Feng
Fengbin Zhu
Qifan Wang
Tat-Seng Chua
HILM
LRM
46
13
0
15 Mar 2024
Soft Self-Consistency Improves Language Model Agents
Soft Self-Consistency Improves Language Model Agents
Han Wang
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMAG
24
7
0
20 Feb 2024
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step
  Reasoning Task
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
Jannik Brinkmann
Abhay Sheshadri
Victor Levoso
Paul Swoboda
Christian Bartelt
LRM
27
21
0
19 Feb 2024
How Interpretable are Reasoning Explanations from Prompting Large
  Language Models?
How Interpretable are Reasoning Explanations from Prompting Large Language Models?
Yeo Wei Jie
Ranjan Satapathy
Rick Mong
Erik Cambria
ReLM
LRM
57
16
0
19 Feb 2024
Can We Verify Step by Step for Incorrect Answer Detection?
Can We Verify Step by Step for Incorrect Answer Detection?
Xin Xu
Shizhe Diao
Can Yang
Yang Wang
LRM
122
13
0
16 Feb 2024
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Yinya Huang
Xiaohan Lin
Zhengying Liu
Qingxing Cao
Huajian Xin
Haiming Wang
Zhenguo Li
Linqi Song
Xiaodan Liang
ALM
38
35
0
14 Feb 2024
Plausible Extractive Rationalization through Semi-Supervised Entailment
  Signal
Plausible Extractive Rationalization through Semi-Supervised Entailment Signal
Yeo Wei Jie
Ranjan Satapathy
Erik Cambria
19
5
0
13 Feb 2024
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for
  Verifiers of Reasoning Chains
A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains
Alon Jacovi
Yonatan Bitton
Bernd Bohnet
Jonathan Herzig
Or Honovich
Michael Tseng
Michael Collins
Roee Aharoni
Mor Geva
LRM
41
19
0
01 Feb 2024
Over-Reasoning and Redundant Calculation of Large Language Models
Over-Reasoning and Redundant Calculation of Large Language Models
Cheng-Han Chiang
Hunghuei Lee
LRM
34
9
0
21 Jan 2024
PathFinder: Guided Search over Multi-Step Reasoning Paths
PathFinder: Guided Search over Multi-Step Reasoning Paths
O. Yu. Golovneva
Sean O'Brien
Ramakanth Pasunuru
Tianlu Wang
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
LRM
27
7
0
08 Dec 2023
CLadder: Assessing Causal Reasoning in Language Models
CLadder: Assessing Causal Reasoning in Language Models
Zhijing Jin
Yuen Chen
Felix Leeb
Luigi Gresele
Ojasv Kamal
...
Kevin Blin
Fernando Gonzalez Adauto
Max Kleiman-Weiner
Mrinmaya Sachan
Bernhard Schölkopf
ReLM
ELM
LRM
45
62
0
07 Dec 2023
Reason2Drive: Towards Interpretable and Chain-based Reasoning for
  Autonomous Driving
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming-Jun Nie
Renyuan Peng
Chunwei Wang
Xinyue Cai
Jianhua Han
Hang Xu
Li Zhang
LRM
29
45
0
06 Dec 2023
CritiqueLLM: Towards an Informative Critique Generation Model for
  Evaluation of Large Language Model Generation
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
Pei Ke
Bosi Wen
Andrew Feng
Xiao-Yang Liu
Xuanyu Lei
...
Aohan Zeng
Yuxiao Dong
Hongning Wang
Jie Tang
Minlie Huang
ELM
ALM
42
22
0
30 Nov 2023
CLOMO: Counterfactual Logical Modification with Large Language Models
CLOMO: Counterfactual Logical Modification with Large Language Models
Yinya Huang
Ruixin Hong
Hongming Zhang
Wei Shao
Zhicheng YANG
Dong Yu
Changshui Zhang
Xiaodan Liang
Linqi Song
LRM
34
7
0
29 Nov 2023
Digital Socrates: Evaluating LLMs through Explanation Critiques
Digital Socrates: Evaluating LLMs through Explanation Critiques
Yuling Gu
Oyvind Tafjord
Peter Clark
ELM
LRM
27
2
0
16 Nov 2023
Self-Contradictory Reasoning Evaluation and Detection
Self-Contradictory Reasoning Evaluation and Detection
Ziyi Liu
Isabelle G. Lee
Yongkang Du
Soumya Sanyal
Jieyu Zhao
LRM
30
2
0
16 Nov 2023
Towards A Unified View of Answer Calibration for Multi-Step Reasoning
Towards A Unified View of Answer Calibration for Multi-Step Reasoning
Shumin Deng
Ningyu Zhang
Nay Oo
Bryan Hooi
LRM
48
1
0
15 Nov 2023
Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts
Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts
Leonardo Ranaldi
Giulia Pucci
Federico Ranaldi
Elena Sofia Ruzzetti
Fabio Massimo Zanzotto
LRM
29
12
0
14 Nov 2023
A Survey on Hallucination in Large Language Models: Principles,
  Taxonomy, Challenges, and Open Questions
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
39
722
0
09 Nov 2023
Leveraging Structured Information for Explainable Multi-hop Question
  Answering and Reasoning
Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning
Ruosen Li
Xinya Du
LRM
44
15
0
07 Nov 2023
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning
  across Languages
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Libo Qin
Qiguang Chen
Fuxuan Wei
Shijue Huang
Wanxiang Che
LRM
27
72
0
23 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos?
Can Language Models Laugh at YouTube Short-form Videos?
Dayoon Ko
Sangho Lee
Gunhee Kim
36
6
0
22 Oct 2023
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language
  Models
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models
Deepak Nathani
David Wang
Liangming Pan
Luu Anh Tuan
KELM
LRM
ReLM
20
10
0
19 Oct 2023
Learning To Teach Large Language Models Logical Reasoning
Learning To Teach Large Language Models Logical Reasoning
Meiqi Chen
Yubo Ma
Kaitao Song
Yixin Cao
Yan Zhang
Dongsheng Li
ELM
LRM
28
14
0
13 Oct 2023
SocREval: Large Language Models with the Socratic Method for
  Reference-Free Reasoning Evaluation
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
28
13
0
29 Sep 2023
Design of Chain-of-Thought in Math Problem Solving
Design of Chain-of-Thought in Math Problem Solving
Zhanming Jie
Trung Quoc Luong
Xinbo Zhang
Xiaoran Jin
Hang Li
LRM
55
11
0
20 Sep 2023
Contrastive Decoding Improves Reasoning in Large Language Models
Contrastive Decoding Improves Reasoning in Large Language Models
Sean O'Brien
Mike Lewis
SyDa
LRM
ReLM
24
31
0
17 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
29
411
0
02 Sep 2023
Shepherd: A Critic for Language Model Generation
Shepherd: A Critic for Language Model Generation
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ALM
33
78
0
08 Aug 2023
Automatically Correcting Large Language Models: Surveying the landscape
  of diverse self-correction strategies
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Liangming Pan
Michael Stephen Saxon
Wenda Xu
Deepak Nathani
Xinyi Wang
William Yang Wang
KELM
LRM
44
201
0
06 Aug 2023
Question Decomposition Improves the Faithfulness of Model-Generated
  Reasoning
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
Karina Nguyen
Anna Chen
Carol Chen
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLM
LRM
HILM
27
84
0
17 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Kaser
Antoine Bosselut
22
3
0
01 Jul 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think"
  Step-by-Step
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
Liunian Harold Li
Jack Hessel
Youngjae Yu
Xiang Ren
Kai-Wei Chang
Yejin Choi
LRM
AI4CE
ReLM
22
129
0
24 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
30
13
0
22 Jun 2023
Unifying Large Language Models and Knowledge Graphs: A Roadmap
Unifying Large Language Models and Knowledge Graphs: A Roadmap
Shirui Pan
Linhao Luo
Yufei Wang
Chen Chen
Jiapu Wang
Xindong Wu
KELM
35
715
0
14 Jun 2023
Deductive Verification of Chain-of-Thought Reasoning
Deductive Verification of Chain-of-Thought Reasoning
Z. Ling
Yunhao Fang
Xuanlin Li
Zhiao Huang
Mingu Lee
Roland Memisevic
Hao Su
ReLM
LRM
32
125
0
06 Jun 2023
Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning
Zhanming Jie
Wei Lu
LRM
ReLM
30
15
0
29 May 2023
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
Muhammad Khalifa
Lajanugen Logeswaran
Moontae Lee
Ho Hin Lee
Lu Wang
LRM
29
37
0
24 May 2023
The CoT Collection: Improving Zero-shot and Few-shot Learning of
  Language Models via Chain-of-Thought Fine-Tuning
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
Seungone Kim
Se June Joo
Doyoung Kim
Joel Jang
Seonghyeon Ye
Jamin Shin
Minjoon Seo
ALM
RALM
LRM
23
96
0
23 May 2023
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better
  than Chain-of-thought Fine-tuning
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
Xuekai Zhu
Biqing Qi
Kaiyan Zhang
Xingwei Long
Zhouhan Lin
Bowen Zhou
ALM
LRM
35
19
0
23 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large
  Language Models
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models
Oana Ignat
Zhijing Jin
Artem Abzaliev
Laura Biester
Santiago Castro
...
Verónica Pérez-Rosas
Siqi Shen
Zekun Wang
Winston Wu
Rada Mihalcea
LRM
41
6
0
21 May 2023
Logic-LM: Empowering Large Language Models with Symbolic Solvers for
  Faithful Logical Reasoning
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning
Liangming Pan
Alon Albalak
Xinyi Wang
William Yang Wang
ReLM
LRM
AI4CE
49
233
0
20 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive
  Critiquing
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
36
357
0
19 May 2023
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by
  Reversing Chain-of-Thought
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought
Tianci Xue
Ziqi Wang
Zhenhailong Wang
Chi Han
Pengfei Yu
Heng Ji
KELM
LRM
35
31
0
19 May 2023
Natural Language Decomposition and Interpretation of Complex Utterances
Natural Language Decomposition and Interpretation of Complex Utterances
Harsh Jhamtani
Hao Fang
Patrick Xia
Eran Levy
Jacob Andreas
Benjamin Van Durme
ReLM
23
7
0
15 May 2023
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving
  Human Utility of Free-Text Rationales
Are Machine Rationales (Not) Useful to Humans? Measuring and Improving Human Utility of Free-Text Rationales
Brihi Joshi
Ziyi Liu
Sahana Ramnath
Aaron Chan
Zhewei Tong
Shaoliang Nie
Qifan Wang
Yejin Choi
Xiang Ren
HAI
LRM
34
29
0
11 May 2023
Language Models Don't Always Say What They Think: Unfaithful
  Explanations in Chain-of-Thought Prompting
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
27
383
0
07 May 2023
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
Ruochen Zhao
Xingxuan Li
Shafiq R. Joty
Chengwei Qin
Lidong Bing
LRM
KELM
21
156
0
05 May 2023
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
Archiki Prasad
Swarnadeep Saha
Xiang Zhou
Joey Tianyi Zhou
LRM
32
45
0
21 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLM
LRM
26
171
0
04 Apr 2023
Previous
123
Next