Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

7 May 2023

Papers citing "Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting"

50 / 85 papers shown

Title
Towards Budget-Friendly Model-Agnostic Explanation Generation for Large Language Models Junhao Liu Haonan Yu Xin Zhang LRM 4 0 0 18 May 2025
Automated Meta Prompt Engineering for Alignment with the Theory of Mind Aaron Baughman Rahul Agarwal Eduardo Morales Gozde Akay 36 0 0 13 May 2025
Reasoning Models Don't Always Say What They Think Yanda Chen Joe Benton Ansh Radhakrishnan Jonathan Uesato Carson E. Denison ... Vlad Mikulik Samuel R. Bowman Jan Leike Jared Kaplan E. Perez ReLM LRM 68 14 1 08 May 2025
LogiDebrief: A Signal-Temporal Logic based Automated Debriefing Approach with Large Language Models Integration Zirong Chen Ziyan An Jennifer Reynolds Kristin Mullen Stephen Martini Meiyi Ma 34 0 0 06 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i Kola Ayonrinde Louis Jaburi MILM 88 1 0 01 May 2025
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines R. Manuvinakurike Emanuel Moss E. A. Watkins Saurav Sahay G. Raffa L. Nachman LRM 31 0 0 01 May 2025
Phi-4-reasoning Technical Report Marah Abdin Sahaj Agarwal Ahmed Hassan Awadallah Vidhisha Balachandran Harkirat Singh Behl ... Vaishnavi Shrivastava Vibhav Vineet Yue Wu Safoora Yousefi Guoqing Zheng ReLM LRM 90 1 0 30 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Joykirat Singh Raghav Magazine Yash Pandya A. Nambi LLMAG KELM OffRL LRM 168 2 0 28 Apr 2025
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering Sahil Sethi Mohamad Hussein Ann Barcomb Mohammad Moshirpour 53 0 0 27 Apr 2025
Evolution of AI in Education: Agentic Workflows Firuz Kamalov David Santandreu Calonge Linda Smail Dilshod Azizov Dimple R. Thadani Theresa Kwong Amara Atif 50 1 0 25 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee Lihao Sun Chris Wendler Fernanda Viégas Martin Wattenberg LRM 34 0 0 19 Apr 2025
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations Katie Matton Robert Osazuwa Ness John Guttag Emre Kıcıman 26 2 0 19 Apr 2025
Cognitive Debiasing Large Language Models for Decision-Making Yougang Lyu Shijie Ren Yue Feng Zihan Wang Z. Chen Z. Z. Ren Maarten de Rijke 43 0 0 05 Apr 2025
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful Iván Arcuschin Jett Janiak Robert Krzyzanowski Senthooran Rajamanoharan Neel Nanda Arthur Conmy LRM ReLM 63 7 0 11 Mar 2025
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models Panatchakorn Anantaprayoon Masahiro Kaneko Naoaki Okazaki LRM KELM 55 0 0 08 Mar 2025
From Perceptions to Decisions: Wildfire Evacuation Decision Prediction with Behavioral Theory-informed LLMs Ruxiao Chen Chenguang Wang Yuran Sun Xilei Zhao Susu Xu 95 1 0 24 Feb 2025
The Call for Socially Aware Language Technologies Diyi Yang Dirk Hovy David Jurgens Barbara Plank VLM 61 11 0 24 Feb 2025
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models Leena Mathur Marian Qian Paul Pu Liang Louis-Philippe Morency LRM 190 1 0 21 Feb 2025
MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs Andreas Opedal Haruki Shirakami Bernhard Schölkopf Abulhair Saparov Mrinmaya Sachan LRM 57 1 0 17 Feb 2025
Dynamic Chain-of-Thought: Towards Adaptive Deep Reasoning Libo Wang LRM 171 1 0 07 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations Guanxu Chen Dongrui Liu Tao Luo Jing Shao LRM MILM 67 1 0 07 Feb 2025
IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates Aissatou Diallo Antonis Bikakis Luke Dickens Anthony Hunter Rob Miller LRM 36 0 0 05 Feb 2025
FitCF: A Framework for Automatic Feature Importance-guided Counterfactual Example Generation Qianli Wang Nils Feldhus Simon Ostermann Luis Felipe Villa-Arenas Sebastian Möller Vera Schmitt AAML 34 1 0 01 Jan 2025
Evaluating Vision-Language Models as Evaluators in Path Planning Mohamed Aghzal Xiang Yue Erion Plaku Ziyu Yao LRM 77 1 0 27 Nov 2024
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning Elita Lobo Chirag Agarwal Himabindu Lakkaraju LRM 75 5 0 22 Nov 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina Yuan Gao Dokyun Lee Gordon Burtch Sina Fazelpour LRM 56 7 0 25 Oct 2024
LLMScan: Causal Scan for LLM Misbehavior Detection Mengdi Zhang Kai Kiat Goh Peixin Zhang Jun Sun Rose Lin Xin Hongyu Zhang 25 0 0 22 Oct 2024
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps Xiongtao Zhou Jie He Lanyu Chen Jingyu Li Haojing Chen Víctor Gutiérrez-Basulto Jeff Z. Pan H. Chen LRM 63 1 0 18 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors Georgios Chochlakis Alexandros Potamianos Kristina Lerman Shrikanth Narayanan 34 0 0 17 Oct 2024
FLARE: Faithful Logic-Aided Reasoning and Exploration Erik Arakelyan Pasquale Minervini Pat Verga Patrick Lewis Isabelle Augenstein ReLM LRM 69 2 0 14 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning Joshua Ong Jun Leang Aryo Pradipta Gema Shay B. Cohen ReLM LRM ReCod 41 2 0 14 Oct 2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act Philipp Guldimann Alexander Spiridonov Robin Staab Nikola Jovanović Mark Vero ... Mislav Balunović Nikola Konstantinov Pavol Bielik Petar Tsankov Martin Vechev ELM 53 4 0 10 Oct 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models Tongxuan Liu Wenjiang Xu Weizhe Huang Yuting Zeng Jiaxing Wang Hailong Yang Hailong Yang Jing Li LRM ReLM 52 5 0 26 Sep 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 42 10 0 27 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations Adam Dahlgren Lindstrom Leila Methnani Lea Krause Petter Ericson Ínigo Martínez de Rituerto de Troya Dimitri Coelho Mollo Roel Dobbe ALM 45 2 0 26 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step Zezhong Wang Xingshan Zeng Weiwen Liu Yufei Wang Liangyou Li Yasheng Wang Lifeng Shang Xin Jiang Qun Liu Kam-Fai Wong LRM 64 3 0 23 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs Jannik Kossen Jiatong Han Muhammed Razzak Lisa Schut Shreshth A. Malik Yarin Gal HILM 60 35 0 22 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Aman Singh Thakur Kartik Choudhary Venkat Srinik Ramayapally Sankaran Vaidyanathan Dieuwke Hupkes ELM ALM 61 57 0 18 Jun 2024
On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models Sree Harsha Tanneru Dan Ley Chirag Agarwal Himabindu Lakkaraju LRM 31 4 0 15 Jun 2024
Designing a Dashboard for Transparency and Control of Conversational AI Yida Chen Aoyu Wu Trevor DePodesta Catherine Yeh Kenneth Li ... Jan Riecke Shivam Raval Olivia Seow Martin Wattenberg Fernanda Viégas 44 16 0 12 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations Evan Becker Stefano Soatto 45 6 0 05 Jun 2024
Break the Chain: Large Language Models Can be Shortcut Reasoners Mengru Ding Hanmeng Liu Zhizhang Fu Jian Song Wenbo Xie Yue Zhang KELM LRM 36 7 0 04 Jun 2024
ACCORD: Closing the Commonsense Measurability Gap François Roewer-Després Jinyue Feng Zining Zhu Frank Rudzicz LRM 48 0 0 04 Jun 2024
From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks Jacob Russin Sam Whitman McGrath Danielle J. Williams Lotem Elber-Dorozko AI4CE 75 3 0 24 May 2024
Securing the Future of GenAI: Policy and Technology Mihai Christodorescu Craven S. Feizi Neil Zhenqiang Gong Mia Hoffmann ... Jessica Newman Emelia Probasco Yanjun Qi Khawaja Shams Turek SILM 52 3 0 21 May 2024
Chain of Thoughtlessness? An Analysis of CoT in Planning Kaya Stechly Karthik Valmeekam Subbarao Kambhampati LRM LM&Ro 75 40 0 08 May 2024
Large Language Models Cannot Explain Themselves Advait Sarkar LRM 43 7 0 07 May 2024
General Purpose Verification for Chain of Thought Prompting Robert Vacareanu Anurag Pratik Evangelia Spiliopoulou Zheng Qi Giovanni Paolini Neha Ann John Jie Ma Yassine Benajiba Miguel Ballesteros LRM 32 8 0 30 Apr 2024
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations? Letitia Parcalabescu Anette Frank MLLM CoGe VLM 84 3 0 29 Apr 2024