Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.08678
Cited By
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
17 July 2023
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations"
47 / 47 papers shown
Title
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
68
12
1
08 May 2025
LExT: Towards Evaluating Trustworthiness of Natural Language Explanations
Krithi Shailya
Shreya Rajpal
Gokul S Krishnan
Balaraman Ravindran
ELM
59
1
0
08 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
LRM
ReLM
63
6
0
11 Mar 2025
Cross-Examiner: Evaluating Consistency of Large Language Model-Generated Explanations
Danielle Villa
Maria Chang
K. Murugesan
Rosario A. Uceda-Sosa
K. Ramamurthy
LRM
50
0
0
11 Mar 2025
Can LLMs Explain Themselves Counterfactually?
Zahra Dehghanighobadi
Asja Fischer
Muhammad Bilal Zafar
LRM
47
0
0
25 Feb 2025
Let your LLM generate a few tokens and you will reduce the need for retrieval
Hervé Déjean
83
0
0
16 Dec 2024
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Jaechang Kim
Jinmin Goh
Inseok Hwang
Jaewoong Cho
Jungseul Ok
ELM
28
1
0
28 Oct 2024
Causality for Large Language Models
Anpeng Wu
Kun Kuang
Minqin Zhu
Yingrong Wang
Yujia Zheng
Kairong Han
Yangqiu Song
Guangyi Chen
Fei Wu
Anton van den Hengel
LRM
46
7
0
20 Oct 2024
XForecast: Evaluating Natural Language Explanations for Time Series Forecasting
Taha Aksu
Chenghao Liu
Amrita Saha
Sarah Tan
Caiming Xiong
Doyen Sahoo
AI4TS
26
1
0
18 Oct 2024
PromptExp: Multi-granularity Prompt Explanation of Large Language Models
Ximing Dong
Shaowei Wang
Dayi Lin
Gopi Krishnan Rajbahadur
Boquan Zhou
Shichao Liu
Ahmed E. Hassan
AAML
LRM
30
1
0
16 Oct 2024
LLMs are One-Shot URL Classifiers and Explainers
Fariza Rashid
Nishavi Ranaweera
Ben Doyle
Suranga Seneviratne
LRM
33
2
0
22 Sep 2024
Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning
Rochelle Choenni
Ekaterina Shutova
44
6
0
29 Aug 2024
Evaluating Human Alignment and Model Faithfulness of LLM Rationale
Mohsen Fayyaz
Fan Yin
Jiao Sun
Nanyun Peng
62
3
0
28 Jun 2024
xTower: A Multilingual LLM for Explaining and Correcting Translation Errors
Marcos Vinícius Treviso
Nuno M. Guerreiro
Sweta Agrawal
Ricardo Rei
José P. Pombal
Tânia Vaz
Helena Wu
Beatriz Silva
Daan van Stigt
André F. T. Martins
LRM
39
7
0
27 Jun 2024
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?
Peter Hase
Thomas Hofweber
Xiang Zhou
Elias Stengel-Eskin
Joey Tianyi Zhou
KELM
LRM
43
12
0
27 Jun 2024
Designing a Dashboard for Transparency and Control of Conversational AI
Yida Chen
Aoyu Wu
Trevor DePodesta
Catherine Yeh
Kenneth Li
...
Jan Riecke
Shivam Raval
Olivia Seow
Martin Wattenberg
Fernanda Viégas
44
16
0
12 Jun 2024
Why Would You Suggest That? Human Trust in Language Model Responses
Manasi Sharma
H. Siu
Rohan R. Paleja
Jaime D. Peña
LRM
26
6
0
04 Jun 2024
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
S. Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
49
3
0
21 May 2024
Interpretability Needs a New Paradigm
Andreas Madsen
Himabindu Lakkaraju
Siva Reddy
Sarath Chandar
39
4
0
08 May 2024
Large Language Models and Causal Inference in Collaboration: A Survey
Xiaoyu Liu
Paiheng Xu
Junda Wu
Jiaxin Yuan
Yifan Yang
...
Haoliang Wang
Tong Yu
Julian McAuley
Wei Ai
Furong Huang
ELM
LRM
77
34
0
14 Mar 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua
Edward Rees
Hunar Batra
Samuel R. Bowman
Julian Michael
Ethan Perez
Miles Turpin
LRM
39
13
0
08 Mar 2024
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Tharindu Kumarage
Garima Agrawal
Paras Sheth
Raha Moraffah
Amanat Chadha
Joshua Garland
Huan Liu
DeLMO
36
11
0
02 Mar 2024
RORA: Robust Free-Text Rationale Evaluation
Zhengping Jiang
Yining Lu
Hanjie Chen
Daniel Khashabi
Benjamin Van Durme
Anqi Liu
50
1
0
28 Feb 2024
FaithLM: Towards Faithful Explanations for Large Language Models
Yu-Neng Chuang
Guanchu Wang
Chia-Yuan Chang
Ruixiang Tang
Shaochen Zhong
Fan Yang
Mengnan Du
Xuanting Cai
Xia Hu
LRM
74
0
0
07 Feb 2024
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRM
AI4CE
77
62
0
30 Jan 2024
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
Yanda Chen
Chandan Singh
Xiaodong Liu
Simiao Zuo
Bin-Xia Yu
He He
Jianfeng Gao
LRM
20
13
0
25 Jan 2024
Are self-explanations from Large Language Models faithful?
Andreas Madsen
Sarath Chandar
Siva Reddy
LRM
30
24
0
15 Jan 2024
ALMANACS: A Simulatability Benchmark for Language Model Explainability
Edmund Mills
Shiye Su
Stuart J. Russell
Scott Emmons
51
7
0
20 Dec 2023
Towards Evaluating AI Systems for Moral Status Using Self-Reports
Ethan Perez
Robert Long
ELM
38
8
0
14 Nov 2023
Predicting Text Preference Via Structured Comparative Reasoning
Jing Nathan Yan
Tianqi Liu
Justin T Chiu
Jiaming Shen
Zhen Qin
...
Charumathi Lakshmanan
Y. Kurzion
Alexander M. Rush
Jialu Liu
Michael Bendersky
LRM
38
7
0
14 Nov 2023
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning
Yue Yu
Jiaming Shen
Tianqi Liu
Zhen Qin
Jing Nathan Yan
Jialu Liu
Chao Zhang
Michael Bendersky
51
6
0
13 Nov 2023
Self-Consistency of Large Language Models under Ambiguity
Henning Bartsch
Ole Jorgensen
Domenic Rosati
Jason Hoelscher-Obermaier
Jacob Pfau
HILM
27
4
0
20 Oct 2023
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Shiyuan Huang
Siddarth Mamidanna
Shreedhar Jangam
Yilun Zhou
Leilani H. Gilpin
LRM
MILM
ELM
43
66
0
17 Oct 2023
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
Linlu Qiu
Liwei Jiang
Ximing Lu
Melanie Sclar
Valentina Pyatkin
...
Bailin Wang
Yoon Kim
Yejin Choi
Nouha Dziri
Xiang Ren
LRM
ReLM
45
75
0
12 Oct 2023
Controlled Generation with Prompt Insertion for Natural Language Explanations in Grammatical Error Correction
Masahiro Kaneko
Naoaki Okazaki
LRM
29
4
0
20 Sep 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jundong Li
LRM
26
409
0
02 Sep 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
27
383
0
07 May 2023
REV: Information-Theoretic Evaluation of Free-Text Rationales
Hanjie Chen
Faeze Brahman
Xiang Ren
Yangfeng Ji
Yejin Choi
Swabha Swayamdipta
89
23
0
10 Oct 2022
Towards Faithful Model Explanation in NLP: A Survey
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
109
107
0
22 Sep 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,248
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
370
8,495
0
28 Jan 2022
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
167
157
0
16 Oct 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
677
0
06 Jan 2021
Measuring Association Between Labels and Free-Text Rationales
Sarah Wiegreffe
Ana Marasović
Noah A. Smith
282
170
0
24 Oct 2020
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
257
620
0
04 Dec 2018
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
254
3,684
0
28 Feb 2017
1