Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.06674
Cited By
Truthful AI: Developing and governing AI that does not lie
13 October 2021
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Truthful AI: Developing and governing AI that does not lie"
22 / 22 papers shown
Title
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
88
0
0
05 Mar 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
61
7
0
31 Dec 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
72
5
0
28 Oct 2024
Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch
Jinman Zhao
Xueyan Zhang
Xingyu Yue
Weizhe Chen
Zifan Qian
Ruiyu Wang
LRM
34
0
0
21 Sep 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents
Zhe Su
Xuhui Zhou
Sanketh Rangreji
Anubha Kabra
Julia Mendelsohn
Faeze Brahman
Maarten Sap
LLMAG
100
2
0
13 Sep 2024
Standards for Belief Representations in LLMs
Daniel A. Herrmann
B. Levinstein
39
6
0
31 May 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
54
7
0
21 Mar 2024
BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability
Peter Clark
Bhavana Dalvi
Oyvind Tafjord
23
2
0
12 Dec 2023
Calibrated Language Models Must Hallucinate
Adam Tauman Kalai
Santosh Vempala
HILM
22
75
0
24 Nov 2023
Secret-Keeping in Question Answering
Nathaniel W. Rollings
Kent O'Sullivan
Sakshum Kulshrestha
KELM
30
0
0
16 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
29
506
0
07 Mar 2023
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
42
194
0
16 Feb 2023
Truth Machines: Synthesizing Veracity in AI Language Models
Luke Munn
Liam Magee
Vanicka Arora
SyDa
HILM
20
28
0
28 Jan 2023
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
61
324
0
07 Dec 2022
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
19
74
0
26 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
500
0
28 Sep 2022
In conversation with Artificial Intelligence: aligning language models with human values
Atoosa Kasirzadeh
Iason Gabriel
18
98
0
01 Sep 2022
Language Models (Mostly) Know What They Know
Saurav Kadavath
Tom Conerly
Amanda Askell
T. Henighan
Dawn Drain
...
Nicholas Joseph
Benjamin Mann
Sam McCandlish
C. Olah
Jared Kaplan
ELM
47
712
0
11 Jul 2022
Forecasting Future World Events with Neural Networks
Andy Zou
Tristan Xiao
Ryan Jia
Joe Kwon
Mantas Mazeika
Richard Li
Dawn Song
Jacob Steinhardt
Owain Evans
Dan Hendrycks
22
22
0
30 Jun 2022
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano
Jacob Hilton
S. Balaji
Jeff Wu
Ouyang Long
...
Gretchen Krueger
Kevin Button
Matthew Knight
B. Chess
John Schulman
ALM
RALM
63
1,195
0
17 Dec 2021
Impossibility Results in AI: A Survey
Mario Brčič
Roman V. Yampolskiy
11
25
0
01 Sep 2021
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,326
0
05 Jun 2016
1