ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.04388
  4. Cited By
Language Models Don't Always Say What They Think: Unfaithful
  Explanations in Chain-of-Thought Prompting

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

7 May 2023
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting"

50 / 85 papers shown
Title
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models
Junhao Liu
Haonan Yu
Xin Zhang
LRM
4
0
0
18 May 2025
Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Automated Meta Prompt Engineering for Alignment with the Theory of Mind
Aaron Baughman
Rahul Agarwal
Eduardo Morales
Gozde Akay
36
0
0
13 May 2025
Reasoning Models Don't Always Say What They Think
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLM
LRM
68
14
1
08 May 2025
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration
Zirong Chen
Ziyan An
Jennifer Reynolds
Kristin Mullen
Stephen Martini
Meiyi Ma
34
0
0
06 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i
Kola Ayonrinde
Louis Jaburi
MILM
88
1
0
01 May 2025
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
R. Manuvinakurike
Emanuel Moss
E. A. Watkins
Saurav Sahay
G. Raffa
L. Nachman
LRM
31
0
0
01 May 2025
Phi-4-reasoning Technical Report
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLM
LRM
90
1
0
30 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
168
2
0
28 Apr 2025
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering
Sahil Sethi
Mohamad Hussein
Ann Barcomb
Mohammad Moshirpour
53
0
0
27 Apr 2025
Evolution of AI in Education: Agentic Workflows
Evolution of AI in Education: Agentic Workflows
Firuz Kamalov
David Santandreu Calonge
Linda Smail
Dilshod Azizov
Dimple R. Thadani
Theresa Kwong
Amara Atif
50
1
0
25 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
34
0
0
19 Apr 2025
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Katie Matton
Robert Osazuwa Ness
John Guttag
Emre Kıcıman
26
2
0
19 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making
Cognitive Debiasing Large Language Models for Decision-Making
Yougang Lyu
Shijie Ren
Yue Feng
Zihan Wang
Z. Chen
Z. Z. Ren
Maarten de Rijke
43
0
0
05 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
LRM
ReLM
63
7
0
11 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models
Panatchakorn Anantaprayoon
Masahiro Kaneko
Naoaki Okazaki
LRM
KELM
55
0
0
08 Mar 2025
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs
Ruxiao Chen
Chenguang Wang
Yuran Sun
Xilei Zhao
Susu Xu
95
1
0
24 Feb 2025
The Call for Socially Aware Language Technologies
The Call for Socially Aware Language Technologies
Diyi Yang
Dirk Hovy
David Jurgens
Barbara Plank
VLM
61
11
0
24 Feb 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
Leena Mathur
Marian Qian
Paul Pu Liang
Louis-Philippe Morency
LRM
190
1
0
21 Feb 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs
Andreas Opedal
Haruki Shirakami
Bernhard Schölkopf
Abulhair Saparov
Mrinmaya Sachan
LRM
57
1
0
17 Feb 2025
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning
Libo Wang
LRM
171
1
0
07 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
67
1
0
07 Feb 2025
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
LRM
36
0
0
05 Feb 2025
FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation
FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation
Qianli Wang
Nils Feldhus
Simon Ostermann
Luis Felipe Villa-Arenas
Sebastian Möller
Vera Schmitt
AAML
34
1
0
01 Jan 2025
Evaluating Vision-Language Models as Evaluators in Path Planning
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
77
1
0
27 Nov 2024
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
Elita Lobo
Chirag Agarwal
Himabindu Lakkaraju
LRM
75
5
0
22 Nov 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
56
7
0
25 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
25
0
0
22 Oct 2024
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
Xiongtao Zhou
Jie He
Lanyu Chen
Jingyu Li
Haojing Chen
Víctor Gutiérrez-Basulto
Jeff Z. Pan
H. Chen
LRM
63
1
0
18 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Georgios Chochlakis
Alexandros Potamianos
Kristina Lerman
Shrikanth Narayanan
34
0
0
17 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration
FLARE: Faithful Logic-Aided Reasoning and Exploration
Erik Arakelyan
Pasquale Minervini
Pat Verga
Patrick Lewis
Isabelle Augenstein
ReLM
LRM
69
2
0
14 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical
  Reasoning
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
Joshua Ong Jun Leang
Aryo Pradipta Gema
Shay B. Cohen
ReLM
LRM
ReCod
41
2
0
14 Oct 2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann
Alexander Spiridonov
Robin Staab
Nikola Jovanović
Mark Vero
...
Mislav Balunović
Nikola Konstantinov
Pavol Bielik
Petar Tsankov
Martin Vechev
ELM
53
4
0
10 Oct 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Tongxuan Liu
Wenjiang Xu
Weizhe Huang
Yuting Zeng
Jiaxing Wang
Hailong Yang
Hailong Yang
Jing Li
LRM
ReLM
52
5
0
26 Sep 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
42
10
0
27 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
AI Alignment through Reinforcement Learning from Human Feedback?
  Contradictions and Limitations
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Adam Dahlgren Lindstrom
Leila Methnani
Lea Krause
Petter Ericson
Ínigo Martínez de Rituerto de Troya
Dimitri Coelho Mollo
Roel Dobbe
ALM
45
2
0
26 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
64
3
0
23 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in
  LLMs
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jannik Kossen
Jiatong Han
Muhammed Razzak
Lisa Schut
Shreshth A. Malik
Yarin Gal
HILM
60
35
0
22 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
61
57
0
18 Jun 2024
On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language
  Models
On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models
Sree Harsha Tanneru
Dan Ley
Chirag Agarwal
Himabindu Lakkaraju
LRM
31
4
0
15 Jun 2024
Designing a Dashboard for Transparency and Control of Conversational AI
Designing a Dashboard for Transparency and Control of Conversational AI
Yida Chen
Aoyu Wu
Trevor DePodesta
Catherine Yeh
Kenneth Li
...
Jan Riecke
Shivam Raval
Olivia Seow
Martin Wattenberg
Fernanda Viégas
44
16
0
12 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
45
6
0
05 Jun 2024
Break the Chain: Large Language Models Can be Shortcut Reasoners
Break the Chain: Large Language Models Can be Shortcut Reasoners
Mengru Ding
Hanmeng Liu
Zhizhang Fu
Jian Song
Wenbo Xie
Yue Zhang
KELM
LRM
36
7
0
04 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap
ACCORD: Closing the Commonsense Measurability Gap
François Roewer-Després
Jinyue Feng
Zining Zhu
Frank Rudzicz
LRM
48
0
0
04 Jun 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep
  neural networks
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks
Jacob Russin
Sam Whitman McGrath
Danielle J. Williams
Lotem Elber-Dorozko
AI4CE
75
3
0
24 May 2024
Securing the Future of GenAI: Policy and Technology
Securing the Future of GenAI: Policy and Technology
Mihai Christodorescu
Craven
S. Feizi
Neil Zhenqiang Gong
Mia Hoffmann
...
Jessica Newman
Emelia Probasco
Yanjun Qi
Khawaja Shams
Turek
SILM
52
3
0
21 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning
Chain of Thoughtlessness? An Analysis of CoT in Planning
Kaya Stechly
Karthik Valmeekam
Subbarao Kambhampati
LRM
LM&Ro
75
40
0
08 May 2024
Large Language Models Cannot Explain Themselves
Large Language Models Cannot Explain Themselves
Advait Sarkar
LRM
43
7
0
07 May 2024
General Purpose Verification for Chain of Thought Prompting
General Purpose Verification for Chain of Thought Prompting
Robert Vacareanu
Anurag Pratik
Evangelia Spiliopoulou
Zheng Qi
Giovanni Paolini
Neha Ann John
Jie Ma
Yassine Benajiba
Miguel Ballesteros
LRM
32
8
0
30 Apr 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu
Anette Frank
MLLM
CoGe
VLM
84
3
0
29 Apr 2024
12
Next