ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.08679
  4. Cited By
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
v1v2v3v4 (latest)

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

11 March 2025
Iván Arcuschin
Jett Janiak
Robert Krzyzanowski
Senthooran Rajamanoharan
Neel Nanda
Arthur Conmy
    ReLMLRM
ArXiv (abs)PDFHTML

Papers citing "Chain-of-Thought Reasoning In The Wild Is Not Always Faithful"

30 / 30 papers shown
Title
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav
Pablo Bernabeu-Pérez
Nathan Helm-Burger
Tim Kostolansky
Hannes Whittingham
Mary Phuong
LRM
87
1
0
29 May 2025
Search-Based Correction of Reasoning Chains for Language Models
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh Jain
Sungjin Ahn
Sungsoo Ahn
Yoshua Bengio
KELMReLMLRM
83
0
0
17 May 2025
Noise Injection Systemically Degrades Large Language Model Safety Guardrails
Noise Injection Systemically Degrades Large Language Model Safety Guardrails
Prithviraj Singh Shahani
Matthias Scheutz
AAML
101
0
0
16 May 2025
Reasoning Models Don't Always Say What They Think
Reasoning Models Don't Always Say What They Think
Yanda Chen
Joe Benton
Ansh Radhakrishnan
Jonathan Uesato
Carson E. Denison
...
Vlad Mikulik
Samuel R. Bowman
Jan Leike
Jared Kaplan
E. Perez
ReLMLRM
142
49
1
08 May 2025
Phi-4-reasoning Technical Report
Phi-4-reasoning Technical Report
Marah Abdin
Sahaj Agarwal
Ahmed Hassan Awadallah
Vidhisha Balachandran
Harkirat Singh Behl
...
Vaishnavi Shrivastava
Vibhav Vineet
Yue Wu
Safoora Yousefi
Guoqing Zheng
ReLMLRM
197
15
0
30 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
152
1
0
19 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLMLRM
207
128
0
18 Apr 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
167
38
0
14 Mar 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
380
2,000
0
22 Jan 2025
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam
  Mathematical Competition
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition
George Tsoukalas
Jasper Lee
John Jennings
Jimmy Xin
Michelle Ding
Michael Jennings
Amitayush Thakur
Swarat Chaudhuri
LRMAIMat
117
28
0
15 Jul 2024
Dissociation of Faithful and Unfaithful Reasoning in LLMs
Dissociation of Faithful and Unfaithful Reasoning in LLMs
Evelyn Yee
Alice Li
Chenyu Tang
Yeon Ho Jung
R. Paturi
Leon Bergen
LRM
65
5
0
23 May 2024
The Probabilities Also Matter: A More Faithful Metric for Faithfulness
  of Free-Text Explanations in Large Language Models
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models
Noah Y. Siegel
Oana-Maria Camburu
N. Heess
Maria Perez-Ortiz
71
10
0
04 Apr 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua
Edward Rees
Hunar Batra
Samuel R. Bowman
Julian Michael
Ethan Perez
Miles Turpin
LRM
118
13
0
08 Mar 2024
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go
  without Hallucination?
Deceptive Semantic Shortcuts on Reasoning Chains: How Far Can Models Go without Hallucination?
Bangzheng Li
Ben Zhou
Fei Wang
Xingyu Fu
Dan Roth
Muhao Chen
HILMLRM
77
22
0
16 Nov 2023
On Measuring Faithfulness or Self-consistency of Natural Language
  Explanations
On Measuring Faithfulness or Self-consistency of Natural Language Explanations
Letitia Parcalabescu
Anette Frank
LRM
100
28
0
13 Nov 2023
Preventing Language Models From Hiding Their Reasoning
Preventing Language Models From Hiding Their Reasoning
Fabien Roger
Ryan Greenblatt
LRM
102
18
0
27 Oct 2023
Language Models Represent Space and Time
Language Models Represent Space and Time
Wes Gurnee
Max Tegmark
124
167
0
03 Oct 2023
Do Models Explain Themselves? Counterfactual Simulatability of Natural
  Language Explanations
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations
Yanda Chen
Ruiqi Zhong
Narutatsu Ri
Chen Zhao
He He
Jacob Steinhardt
Zhou Yu
Kathleen McKeown
LRM
73
55
0
17 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLMLRM
72
193
0
17 Jul 2023
Question Decomposition Improves the Faithfulness of Model-Generated
  Reasoning
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
Karina Nguyen
Anna Chen
Carol Chen
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLMLRMHILM
76
89
0
17 Jul 2023
Faithfulness Tests for Natural Language Explanations
Faithfulness Tests for Natural Language Explanations
Pepa Atanasova
Oana-Maria Camburu
Christina Lioma
Thomas Lukasiewicz
J. Simonsen
Isabelle Augenstein
FAtt
107
67
0
29 May 2023
Language Models Don't Always Say What They Think: Unfaithful
  Explanations in Chain-of-Thought Prompting
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLMLRM
100
443
0
07 May 2023
Faithful Chain-of-Thought Reasoning
Faithful Chain-of-Thought Reasoning
Qing Lyu
Shreya Havaldar
Adam Stein
Li Zhang
D. Rao
Eric Wong
Marianna Apidianaki
Chris Callison-Burch
ReLMLRM
119
228
0
31 Jan 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
847
9,683
0
28 Jan 2022
Show Your Work: Scratchpads for Intermediate Computation with Language
  Models
Show Your Work: Scratchpads for Intermediate Computation with Language Models
Maxwell Nye
Anders Andreassen
Guy Gur-Ari
Henryk Michalewski
Jacob Austin
...
Aitor Lewkowycz
Maarten Bosma
D. Luan
Charles Sutton
Augustus Odena
ReLMLRM
183
756
0
30 Nov 2021
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
350
4,596
0
27 Oct 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
183
2,405
0
05 Mar 2021
Prompt Programming for Large Language Models: Beyond the Few-Shot
  Paradigm
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria Reynolds
Kyle McDonell
114
918
0
15 Feb 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
184
4,572
0
07 Sep 2020
Towards Faithfully Interpretable NLP Systems: How should we define and
  evaluate faithfulness?
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
Alon Jacovi
Yoav Goldberg
XAI
131
600
0
07 Apr 2020
1