Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.03750
Cited By
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
5 March 2025
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
Brad Kenstler
Mick Yang
Isabelle Barrass
Alice Gatti
Xuwang Yin
Eduardo Trevino
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems"
12 / 12 papers shown
Title
Mitigating Deceptive Alignment via Self-Monitoring
Jiaming Ji
Wenqi Chen
Kaile Wang
Donghai Hong
Sitong Fang
...
Jiayi Zhou
Juntao Dai
Sirui Han
Yike Guo
Yaodong Yang
LRM
33
1
0
24 May 2025
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
123
5
0
13 Sep 2024
Does ChatGPT Have a Mind?
Simon Goldstein
B. Levinstein
AI4MH
LRM
52
6
0
27 Jun 2024
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Tu Vu
Mohit Iyyer
Xuezhi Wang
Noah Constant
Jerry W. Wei
...
Chris Tar
Yun-hsuan Sung
Denny Zhou
Quoc Le
Thang Luong
KELM
HILM
LRM
87
214
0
05 Oct 2023
Do Large Language Models Know What They Don't Know?
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Xuanjing Huang
ELM
AI4MH
65
160
0
29 May 2023
e-CARE: a New Dataset for Exploring Explainable Causal Reasoning
Li Du
Xiao Ding
Kai Xiong
Ting Liu
Bing Qin
CML
66
65
0
12 May 2022
Prompt Consistency for Zero-Shot Task Generalization
Chunting Zhou
Junxian He
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
VLM
72
78
0
29 Apr 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
213
1,330
0
10 Feb 2022
Truthful AI: Developing and governing AI that does not lie
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
283
116
0
13 Oct 2021
Unsolved Problems in ML Safety
Dan Hendrycks
Nicholas Carlini
John Schulman
Jacob Steinhardt
236
286
0
28 Sep 2021
CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge
Yasumasa Onoe
Michael J.Q. Zhang
Eunsol Choi
Greg Durrett
HILM
69
87
0
03 Sep 2021
Explaining Answers with Entailment Trees
Bhavana Dalvi
Peter Alexander Jansen
Oyvind Tafjord
Zhengnan Xie
Hannah Smith
Leighanna Pipatanangkura
Peter Clark
ReLM
FAtt
LRM
280
185
0
17 Apr 2021
1