Truthful AI: Developing and governing AI that does not lie

13 October 2021

Papers citing "Truthful AI: Developing and governing AI that does not lie"

22 / 22 papers shown

Title
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems Richard Ren Arunim Agarwal Mantas Mazeika Cristina Menghini Robert Vacareanu ... Matias Geralnik Adam Khoja Dean Lee Summer Yue Dan Hendrycks HILM ALM 88 0 0 05 Mar 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation Weilong Dong Xinwei Wu Renren Jin Shaoyang Xu Deyi Xiong 61 7 0 31 Dec 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation Dongryeol Lee Yerin Hwang Yongil Kim Joonsuk Park Kyomin Jung ELM 72 5 0 28 Oct 2024
Can Language Model Understand Word Semantics as A Chatbot? An Empirical Study of Language Model Internal External Mismatch Jinman Zhao Xueyan Zhang Xingyu Yue Weizhe Chen Zifan Qian Ruiyu Wang LRM 34 0 0 21 Sep 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents Zhe Su Xuhui Zhou Sanketh Rangreji Anubha Kabra Julia Mendelsohn Faeze Brahman Maarten Sap LLMAG 100 2 0 13 Sep 2024
Standards for Belief Representations in LLMs Daniel A. Herrmann B. Levinstein 39 6 0 31 May 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding Ahmad A Mahmood Ashmal Vayani Muzammal Naseer Salman Khan Fahad Shahbaz Khan LRM 54 7 0 21 Mar 2024
BaRDa: A Belief and Reasoning Dataset that Separates Factual Accuracy and Reasoning Ability Peter Clark Bhavana Dalvi Oyvind Tafjord 23 2 0 12 Dec 2023
Calibrated Language Models Must Hallucinate Adam Tauman Kalai Santosh Vempala HILM 22 75 0 24 Nov 2023
Secret-Keeping in Question Answering Nathaniel W. Rollings Kent O'Sullivan Sakshum Kulshrestha KELM 30 0 0 16 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT Yihan Cao Siyu Li Yixin Liu Zhiling Yan Yutong Dai Philip S. Yu Lichao Sun 29 506 0 07 Mar 2023
Auditing large language models: a three-layered approach Jakob Mokander Jonas Schuett Hannah Rose Kirk Luciano Floridi AILaw MLAU 42 194 0 16 Feb 2023
Truth Machines: Synthesizing Veracity in AI Language Models Luke Munn Liam Magee Vanicka Arora SyDa HILM 20 28 0 28 Jan 2023
Discovering Latent Knowledge in Language Models Without Supervision Collin Burns Haotian Ye Dan Klein Jacob Steinhardt 61 324 0 07 Dec 2022
Broken Neural Scaling Laws Ethan Caballero Kshitij Gupta Irina Rish David M. Krueger 19 74 0 26 Oct 2022
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 227 500 0 28 Sep 2022
In conversation with Artificial Intelligence: aligning language models with human values Atoosa Kasirzadeh Iason Gabriel 18 98 0 01 Sep 2022
Language Models (Mostly) Know What They Know Saurav Kadavath Tom Conerly Amanda Askell T. Henighan Dawn Drain ... Nicholas Joseph Benjamin Mann Sam McCandlish C. Olah Jared Kaplan ELM 47 712 0 11 Jul 2022
Forecasting Future World Events with Neural Networks Andy Zou Tristan Xiao Ryan Jia Joe Kwon Mantas Mazeika Richard Li Dawn Song Jacob Steinhardt Owain Evans Dan Hendrycks 22 22 0 30 Jun 2022
WebGPT: Browser-assisted question-answering with human feedback Reiichiro Nakano Jacob Hilton S. Balaji Jeff Wu Ouyang Long ... Gretchen Krueger Kevin Button Matthew Knight B. Chess John Schulman ALM RALM 63 1,195 0 17 Dec 2021
Impossibility Results in AI: A Survey Mario Brčič Roman V. Yampolskiy 11 25 0 01 Sep 2021
Deep Reinforcement Learning for Dialogue Generation Jiwei Li Will Monroe Alan Ritter Michel Galley Jianfeng Gao Dan Jurafsky 214 1,326 0 05 Jun 2016