Auditing large language models: a three-layered approach

16 February 2023

Papers citing "Auditing large language models: a three-layered approach"

50 / 98 papers shown

Title
Human-AI Governance (HAIG): A Trust-Utility Approach Zeynep Engin 44 0 0 03 May 2025
Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems Ben Bucknall Robert F. Trager Michael A. Osborne 80 0 0 03 Mar 2025
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs Jonathan Rystrøm Hannah Rose Kirk Scott A. Hale 44 2 0 23 Feb 2025
Addressing the regulatory gap: moving towards an EU AI audit ecosystem beyond the AI Act by including civil society David Hartmann José Renato Laranjeira de Pereira Chiara Streitbörger Bettina Berendt 94 6 0 20 Feb 2025
CALM: Curiosity-Driven Auditing for Large Language Models Xiang Zheng Longxiang Wang Yi Liu Xingjun Ma Chao Shen Cong Wang MLAU 52 0 0 06 Jan 2025
The Systems Engineering Approach in Times of Large Language Models Christian Cabrera Viviana Bastidas Jennifer Schooling Neil D. Lawrence 30 0 0 13 Nov 2024
Safety case template for frontier AI: A cyber inability argument Arthur Goemans Marie Davidsen Buhl Jonas Schuett Tomek Korbak Jessica Wang Benjamin Hilton Geoffrey Irving 58 15 0 12 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset Khaoula Chehbouni Jonathan Colaço-Carr Yash More Jackie CK Cheung G. Farnadi 73 0 0 12 Nov 2024
A Clinical Trial Design Approach to Auditing Language Models in Healthcare Setting Lovedeep Gondara Jonathan Simkin LM&MA 62 0 0 11 Nov 2024
PRISM: A Methodology for Auditing Biases in Large Language Models Leif Azzopardi Yashar Moshfeghi 29 0 0 24 Oct 2024
Causality for Large Language Models Anpeng Wu Kun Kuang Minqin Zhu Yingrong Wang Yujia Zheng Kairong Han B. Li Guangyi Chen Fei Wu Kun Zhang LRM 46 7 0 20 Oct 2024
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability Weitong Zhang Chengqi Zang Bernhard Kainz 28 0 0 01 Oct 2024
Responsible AI in Open Ecosystems: Reconciling Innovation with Risk Assessment and Disclosure Mahasweta Chakraborti Bert Joseph Prestoza Nicholas Vincent Seth Frey 36 1 0 27 Sep 2024
Improving governance outcomes through AI documentation: Bridging theory and practice Amy A. Winecoff Miranda Bogen 23 2 0 13 Sep 2024
Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models Bang An Sicheng Zhu Ruiyi Zhang Michael-Andrei Panaitescu-Liess Yuancheng Xu Furong Huang AAML 31 11 0 01 Sep 2024
Design of a Quality Management System based on the EU Artificial Intelligence Act Henryk Mustroph Stefanie Rinderle-Ma 23 1 0 08 Aug 2024
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs Sara Abdali Jia He C. Barberan Richard Anarfi 31 7 0 30 Jul 2024
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias Waqar Hussain 38 0 0 16 Jul 2024
Auditing of AI: Legal, Ethical and Technical Approaches Jakob Mokander 44 37 0 07 Jul 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets Zhihua Jin Shiyi Liu Haotian Li Xun Zhao Huamin Qu 38 3 0 03 Jul 2024
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? Nicoló Fontana Francesco Pierri L. Aiello 26 10 0 19 Jun 2024
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models Meftahul Ferdaus Mahdi Abdelguerfi Elias Ioup Kendall N. Niles Ken Pathak Steve Sloan 34 10 0 01 Jun 2024
Risks and Opportunities of Open-Source Generative AI Francisco Eiras Aleksander Petrov Bertie Vidgen Christian Schroeder Fabio Pizzati ... Matthew Jackson Phillip H. S. Torr Trevor Darrell Y. Lee Jakob N. Foerster 40 18 0 14 May 2024
Navigating LLM Ethics: Advancements, Challenges, and Future Directions Junfeng Jiao S. Afroogh Yiming Xu Connor Phillips AILaw 60 19 0 14 May 2024
Concerns on Bias in Large Language Models when Creating Synthetic Personae Helena A. Haxvig SyDa 28 2 0 08 May 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness Aaron Jiaxun Li Satyapriya Krishna Himabindu Lakkaraju 33 3 0 29 Apr 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI Francisco Eiras Aleksandar Petrov Bertie Vidgen Christian Schroeder de Witt Fabio Pizzati ... Paul Röttger Philip H. S. Torr Trevor Darrell Y. Lee Jakob N. Foerster 46 5 0 25 Apr 2024
Holistic Safety and Responsibility Evaluations of Advanced AI Models Laura Weidinger Joslyn Barnhart Jenny Brennan Christina Butterfield Susie Young ... Sebastian Farquhar Lewis Ho Iason Gabriel Allan Dafoe William S. Isaac ELM 32 8 0 22 Apr 2024
The Necessity of AI Audit Standards Boards David Manheim Sammy Martin Mark Bailey Mikhail Samin Ross Greutzmacher 28 7 0 11 Apr 2024
The Impact of Unstated Norms in Bias Analysis of Language Models Farnaz Kohankhaki D. B. Emerson David B. Emerson Laleh Seyyed-Kalantari Faiza Khan Khattak 52 1 0 04 Apr 2024
Responsible Reporting for Frontier AI Development Noam Kolt Markus Anderljung Joslyn Barnhart Asher Brass K. Esvelt Gillian K. Hadfield Lennart Heim Mikel Rodriguez Jonas B. Sandbrink Thomas Woodside 42 13 0 03 Apr 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Taishi Nakamura Mayank Mishra Simone Tedeschi Yekun Chai Jason T Stillerman ... Virendra Mehta Matthew Blumberg Victor May Huu Nguyen S. Pyysalo LRM 25 7 0 30 Mar 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices Shivani Kapania Ruiyi Wang Toby Jia-Jun Li Tianshi Li Hong Shen 34 7 0 28 Mar 2024
Accelerating Greedy Coordinate Gradient via Probe Sampling Yiran Zhao Wenyue Zheng Tianle Cai Xuan Long Do Kenji Kawaguchi Anirudh Goyal Michael Shieh 43 2 0 02 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency Akila Wickramasekara F. Breitinger Mark Scanlon 46 8 0 29 Feb 2024
An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide Oluwole Fagbohun Rachel M. Harrison Anton Dereventsov 44 6 0 18 Feb 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation Jessica Quaye Alicia Parrish Oana Inel Charvi Rastogi Hannah Rose Kirk ... Nathan Clement Rafael Mosquera Juan Ciro Vijay Janapa Reddi Lora Aroyo 31 7 0 14 Feb 2024
AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach Maryam Amirizaniani Elias Martin Tanya Roosta Aman Chadha Chirag Shah 18 2 0 14 Feb 2024
Mapping the Ethics of Generative AI: A Comprehensive Scoping Review Thilo Hagendorff 21 35 0 13 Feb 2024
Black-Box Access is Insufficient for Rigorous AI Audits Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ... Michael Gerovitch David Bau Max Tegmark David M. Krueger Dylan Hadfield-Menell AAML 22 76 0 25 Jan 2024
Visibility into AI Agents Alan Chan Carson Ezell Max Kaufmann K. Wei Lewis Hammond ... Nitarshan Rajkumar David M. Krueger Noam Kolt Lennart Heim Markus Anderljung 17 31 0 23 Jan 2024
LLM-Assisted Crisis Management: Building Advanced LLM Platforms for Effective Emergency Response and Public Collaboration Hakan T. Otal M. A. Canbaz 15 13 0 12 Jan 2024
From Prompt Engineering to Prompt Science With Human in the Loop Chirag Shah 34 9 0 01 Jan 2024
Foundational Moral Values for AI Alignment Betty Hou Brian Patrick Green 24 0 0 28 Nov 2023
Challenges of Large Language Models for Mental Health Counseling N. C. Chung George C. Dyer L. Brocki LM&MA AI4MH 68 14 0 23 Nov 2023
Rethinking Large Language Models in Mental Health Applications Shaoxiong Ji Tianlin Zhang Kailai Yang Sophia Ananiadou Erik Cambria LM&MA AI4MH 29 18 0 19 Nov 2023
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework Markus Anderljung Everett Thornton Smith Joe O'Brien Lisa Soder Ben Bucknall Emma Bluemke Jonas Schuett Robert F. Trager Lacey Strahm Rumman Chowdhury 35 16 0 15 Nov 2023
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge Hongjian Zhou Fenglin Liu Boyang Gu Xinyu Zou Jinfa Huang ... Yefeng Zheng Lei A. Clifton Zheng Li Fenglin Liu David A. Clifton LM&MA 31 106 0 09 Nov 2023
Contextual Confidence and Generative AI Shrey Jain Zoe Hitzig Pamela Mishkin 36 5 0 02 Nov 2023
Trust, Accountability, and Autonomy in Knowledge Graph-based AI for Self-determination Luis-Daniel Ibánez J. Domingue Sabrina Kirrane O. Seneviratne Aisling Third Maria-Esther Vidal 20 2 0 30 Oct 2023