Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.04388
Cited By
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
7 May 2023
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting"
35 / 85 papers shown
Title
Evaluating Class Membership Relations in Knowledge Graphs using Large Language Models
Bradley Paul Allen
Paul T. Groth
29
3
0
25 Apr 2024
Self-playing Adversarial Language Game Enhances LLM Reasoning
Pengyu Cheng
Tianhao Hu
Han Xu
Zhisong Zhang
Yong Dai
Lei Han
Nan Du
Nan Du
Xiaolong Li
SyDa
LRM
ReLM
98
29
0
16 Apr 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
LLM Voting: Human Choices and AI Collective Decision Making
Joshua C. Yang
Damian Dailisan
Marcin Korecki
C. I. Hausladen
Dirk Helbing
34
17
0
31 Jan 2024
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting
Masahiro Kaneko
Danushka Bollegala
Naoaki Okazaki
Timothy Baldwin
LRM
37
27
0
28 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
34
78
0
25 Jan 2024
Evaluating Language Model Agency through Negotiations
Tim R. Davidson
V. Veselovsky
Martin Josifoski
Maxime Peyrard
Antoine Bosselut
Michal Kosinski
Robert West
LLMAG
34
22
0
09 Jan 2024
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
Ondvrej Kvapil
ELM
AAML
23
3
0
26 Nov 2023
Self-Contradictory Reasoning Evaluation and Detection
Ziyi Liu
Isabelle G. Lee
Yongkang Du
Soumya Sanyal
Jieyu Zhao
LRM
30
2
0
16 Nov 2023
Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations
Zilu Tang
Mayank Agarwal
Alex Shypula
Bailin Wang
Derry Wijaya
Jie Chen
Yoon Kim
LRM
37
15
0
13 Nov 2023
Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Sree Harsha Tanneru
Chirag Agarwal
Himabindu Lakkaraju
LRM
27
14
0
06 Nov 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
221
198
0
20 Oct 2023
Generative AI Text Classification using Ensemble LLM Approaches
Harika Abburi
Michael Suesserman
Nirmala Pudota
Balaji Veeramani
Edward Bowen
Sanmitra Bhattacharya
DeLMO
29
44
0
14 Sep 2023
Simple synthetic data reduces sycophancy in large language models
Jerry W. Wei
Da Huang
Yifeng Lu
Denny Zhou
Quoc V. Le
33
69
0
07 Aug 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
34
47
0
17 Jul 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
44
159
0
02 Jun 2023
Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations
Wei-Lin Chen
Cheng-Kuang Wu
Yun-Nung Chen
Hsin-Hsi Chen
26
27
0
24 May 2023
Leveraging GPT-4 for Automatic Translation Post-Editing
Vikas Raunak
Amr Sharaf
Yiren Wang
H. Awadallah
Arul Menezes
13
62
0
24 May 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate
Boshi Wang
Xiang Yue
Huan Sun
ELM
LRM
46
60
0
22 May 2023
ExplainCPE: A Free-text Explanation Benchmark of Chinese Pharmacist Examination
Dongfang Li
Jindi Yu
Baotian Hu
Zhenran Xu
Hao Fei
ELM
13
11
0
22 May 2023
The Case Against Explainability
Hofit Wasserman Rozen
N. Elkin-Koren
Ran Gilad-Bachrach
AILaw
ELM
36
1
0
20 May 2023
ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness
Archiki Prasad
Swarnadeep Saha
Xiang Zhou
Joey Tianyi Zhou
LRM
32
45
0
21 Apr 2023
Faithful Chain-of-Thought Reasoning
Qing Lyu
Shreya Havaldar
Adam Stein
Li Zhang
D. Rao
Eric Wong
Marianna Apidianaki
Chris Callison-Burch
ReLM
LRM
41
208
0
31 Jan 2023
Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning
Oyvind Tafjord
Bhavana Dalvi
Peter Clark
ReLM
KELM
LRM
64
52
0
21 Oct 2022
Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model
Jacob Eisenstein
D. Andor
Bernd Bohnet
Michael Collins
David M. Mimno
LRM
194
24
0
05 Oct 2022
Towards Faithful Model Explanation in NLP: A Survey
Qing Lyu
Marianna Apidianaki
Chris Callison-Burch
XAI
114
107
0
22 Sep 2022
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
Aman Madaan
Amir Yazdanbakhsh
LRM
154
116
0
16 Sep 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
68
183
0
30 Aug 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations
Jaehun Jung
Lianhui Qin
Sean Welleck
Faeze Brahman
Chandra Bhagavatula
Ronan Le Bras
Yejin Choi
ReLM
LRM
229
190
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
369
12,003
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
413
8,559
0
28 Jan 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
223
374
0
15 Oct 2021
Measuring Association Between Labels and Free-Text Rationales
Sarah Wiegreffe
Ana Marasović
Noah A. Smith
282
170
0
24 Oct 2020
Towards A Rigorous Science of Interpretable Machine Learning
Finale Doshi-Velez
Been Kim
XAI
FaML
257
3,690
0
28 Feb 2017
Previous
1
2