ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.14446
  4. Cited By
Black-Box Access is Insufficient for Rigorous AI Audits

Black-Box Access is Insufficient for Rigorous AI Audits

25 January 2024
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
Ben Bucknall
Andreas A. Haupt
K. Wei
Jérémy Scheurer
Marius Hobbhahn
Lee D. Sharkey
Satyapriya Krishna
Marvin von Hagen
Silas Alberti
Alan Chan
Qinyi Sun
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
    AAML
ArXivPDFHTML

Papers citing "Black-Box Access is Insufficient for Rigorous AI Audits"

50 / 77 papers shown
Title
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Explanations as Bias Detectors: A Critical Study of Local Post-hoc XAI Methods for Fairness Exploration
Vasiliki Papanikou
Danae Pla Karidi
E. Pitoura
Emmanouil Panagiotou
Eirini Ntoutsi
31
0
0
01 May 2025
P2NIA: Privacy-Preserving Non-Iterative Auditing
P2NIA: Privacy-Preserving Non-Iterative Auditing
Jade Garcia Bourrée
H. Lautraite
Sébastien Gambs
Gilles Tredan
Erwan Le Merrer
Benoit Rottembourg
42
0
0
01 Apr 2025
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Evidencing Unauthorized Training Data from AI Generated Content using Information Isotopes
Qi Tao
Yin Jinhua
Cai Dongqi
Xie Yueqi
Wang Huili
...
Zhou Zhili
Wang Shangguang
Lyu Lingjuan
Huang Yongfeng
Lane Nicholas
40
0
0
24 Mar 2025
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
Dillon Bowen
Ann-Kathrin Dombrowski
Adam Gleave
Chris Cundy
ELM
48
0
0
17 Mar 2025
DarkBench: Benchmarking Dark Patterns in Large Language Models
Esben Kran
Hieu Minh "Jord" Nguyen
Akash Kundu
Sami Jawhar
Jinsuk Park
Mateusz Maria Jurewicz
50
1
0
13 Mar 2025
Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems
Ben Bucknall
Robert F. Trager
Michael A. Osborne
80
0
0
03 Mar 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations
David Hartmann
Amin Oueslati
Dimitri Staufer
Lena Pohlmann
Simon Munzert
Hendrik Heuer
48
0
0
03 Mar 2025
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
Paradigms of AI Evaluation: Mapping Goals, Methodologies and Culture
John Burden
Marko Tesic
Lorenzo Pacchiardi
José Hernández Orallo
31
0
0
21 Feb 2025
Addressing the regulatory gap: moving towards an EU AI audit ecosystem beyond the AI Act by including civil society
Addressing the regulatory gap: moving towards an EU AI audit ecosystem beyond the AI Act by including civil society
David Hartmann
José Renato Laranjeira de Pereira
Chiara Streitbörger
Bettina Berendt
94
6
0
20 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
49
0
0
08 Feb 2025
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
Javier Rando
Jie Zhang
Nicholas Carlini
F. Tramèr
AAML
ELM
59
3
0
04 Feb 2025
CALM: Curiosity-Driven Auditing for Large Language Models
Xiang Zheng
Longxiang Wang
Yi Liu
Xingjun Ma
Chao Shen
Cong Wang
MLAU
52
0
0
06 Jan 2025
Combining Large Language Models with Tutoring System Intelligence: A
  Case Study in Caregiver Homework Support
Combining Large Language Models with Tutoring System Intelligence: A Case Study in Caregiver Homework Support
Devika Venugopalan
Ziwen Yan
Conrad Borchers
Jionghao Lin
Vincent Aleven
83
2
0
16 Dec 2024
Regulation of Language Models With Interpretability Will Likely Result
  In A Performance Trade-Off
Regulation of Language Models With Interpretability Will Likely Result In A Performance Trade-Off
Eoin M. Kenny
Julie A. Shah
66
0
0
12 Dec 2024
What AI evaluations for preventing catastrophic risks can and cannot do
What AI evaluations for preventing catastrophic risks can and cannot do
Peter Barnett
Lisa Thiergart
ELM
76
2
0
26 Nov 2024
Declare and Justify: Explicit assumptions in AI evaluations are
  necessary for effective regulation
Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation
Peter Barnett
Lisa Thiergart
ELM
67
2
0
19 Nov 2024
SoK: Dataset Copyright Auditing in Machine Learning Systems
SoK: Dataset Copyright Auditing in Machine Learning Systems
L. Du
Xuanru Zhou
M. Chen
Chusong Zhang
Zhou Su
Peng Cheng
Jiming Chen
Zhikun Zhang
MLAU
15
3
0
22 Oct 2024
To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Sean McGregor
Allyson Ettinger
Nick Judd
Paul Albee
Liwei Jiang
...
Avijit Ghosh
Christopher Fiorelli
Michelle Hoang
Sven Cattell
Nouha Dziri
29
2
0
15 Oct 2024
Language model developers should report train-test overlap
Language model developers should report train-test overlap
Andy K. Zhang
Kevin Klyman
Yifan Mai
Yoav Levine
Yian Zhang
Rishi Bommasani
Percy Liang
VLM
ELM
29
8
0
10 Oct 2024
From Transparency to Accountability and Back: A Discussion of Access and
  Evidence in AI Auditing
From Transparency to Accountability and Back: A Discussion of Access and Evidence in AI Auditing
Sarah H. Cen
Rohan Alur
29
1
0
07 Oct 2024
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion
  in LLMs
Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
Yohan Mathew
Ollie Matthews
Robert McCarthy
Joan Velja
Christian Schroeder de Witt
Dylan R. Cope
Nandi Schoots
24
3
0
02 Oct 2024
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and
  Reliability
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability
Weitong Zhang
Chengqi Zang
Bernhard Kainz
28
0
0
01 Oct 2024
Designing an Intervention Tool for End-User Algorithm Audits in
  Personalized Recommendation Systems
Designing an Intervention Tool for End-User Algorithm Audits in Personalized Recommendation Systems
Qunfang Wu
Lu Xian
MLAU
42
0
0
20 Sep 2024
OATH: Efficient and Flexible Zero-Knowledge Proofs of End-to-End ML
  Fairness
OATH: Efficient and Flexible Zero-Knowledge Proofs of End-to-End ML Fairness
Olive Franzese
Ali Shahin Shamsabadi
Hamed Haddadi
22
3
0
17 Sep 2024
Towards Safe Multilingual Frontier AI
Towards Safe Multilingual Frontier AI
Artūrs Kanepajs
Vladimir Ivanov
Richard Moulange
31
1
0
06 Sep 2024
Verification methods for international AI agreements
Verification methods for international AI agreements
Akash R. Wasil
Tom Reed
Jack William Miller
Peter Barnett
37
2
0
28 Aug 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
46
39
0
27 Aug 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
49
9
0
20 Jul 2024
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals
Jaden Fiotto-Kaufman
Alexander R. Loftus
Eric Todd
Jannik Brinkmann
Caden Juang
...
Carla Brodley
Arjun Guha
Jonathan Bell
Byron C. Wallace
David Bau
35
2
0
18 Jul 2024
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept
  Space
Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space
Core Francisco Park
Maya Okawa
Andrew Lee
Ekdeep Singh Lubana
Hidenori Tanaka
62
7
0
27 Jun 2024
Watching the Watchers: A Comparative Fairness Audit of Cloud-based
  Content Moderation Services
Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services
David Hartmann
Amin Oueslati
Dimitri Staufer
MLAU
33
1
0
20 Jun 2024
Model Internals-based Answer Attribution for Trustworthy
  Retrieval-Augmented Generation
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation
Jirui Qi
Gabriele Sarti
Raquel Fernández
Arianna Bisazza
RALM
37
5
0
19 Jun 2024
STAR: SocioTechnical Approach to Red Teaming Language Models
STAR: SocioTechnical Approach to Red Teaming Language Models
Laura Weidinger
John F. J. Mellor
Bernat Guillen Pegueroles
Nahema Marchal
Ravin Kumar
...
Mark Diaz
Stevie Bergman
Mikel Rodriguez
Verena Rieser
William S. Isaac
VLM
34
7
0
17 Jun 2024
Do Parameters Reveal More than Loss for Membership Inference?
Do Parameters Reveal More than Loss for Membership Inference?
Anshuman Suri
Xiao Zhang
David E. Evans
MIACV
MIALM
AAML
44
1
0
17 Jun 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
37
23
0
11 Jun 2024
Position: An Inner Interpretability Framework for AI Inspired by Lessons
  from Cognitive Neuroscience
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Martina G. Vilas
Federico Adolfi
David Poeppel
Gemma Roig
40
5
0
03 Jun 2024
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Andis Draguns
Andrew Gritsevskiy
S. Motwani
Charlie Rogers-Smith
Jeffrey Ladish
Christian Schroeder de Witt
40
2
0
03 Jun 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
32
13
0
29 May 2024
Risks and Opportunities of Open-Source Generative AI
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
40
18
0
14 May 2024
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals
Joshua Clymer
Caden Juang
Severin Field
CVBM
32
1
0
08 May 2024
Improving Intervention Efficacy via Concept Realignment in Concept
  Bottleneck Models
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
Nishad Singhi
Jae Myung Kim
Karsten Roth
Zeynep Akata
48
1
0
02 May 2024
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksandar Petrov
Bertie Vidgen
Christian Schroeder de Witt
Fabio Pizzati
...
Paul Röttger
Philip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
46
5
0
25 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
40
111
0
22 Apr 2024
Disguised Copyright Infringement of Latent Diffusion Models
Disguised Copyright Infringement of Latent Diffusion Models
Yiwei Lu
Matthew Y.R. Yang
Zuoqiu Liu
Gautam Kamath
Yaoliang Yu
WIGM
23
7
0
10 Apr 2024
Responsible Reporting for Frontier AI Development
Responsible Reporting for Frontier AI Development
Noam Kolt
Markus Anderljung
Joslyn Barnhart
Asher Brass
K. Esvelt
Gillian K. Hadfield
Lennart Heim
Mikel Rodriguez
Jonas B. Sandbrink
Thomas Woodside
42
13
0
03 Apr 2024
Risks from Language Models for Automated Mental Healthcare: Ethics and
  Structure for Implementation
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation
D. Grabb
Max Lamparth
N. Vasan
40
15
0
02 Apr 2024
A Path Towards Legal Autonomy: An interoperable and explainable approach
  to extracting, transforming, loading and computing legal information using
  large language models, expert systems and Bayesian networks
A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networks
Axel Constant
Hannes Westermann
Bryan Wilson
Alex B. Kiefer
Ines Hipólito
Sylvain Pronovost
Steven Swanson
Mahault Albarracin
M. Ramstead
AILaw
21
0
0
27 Mar 2024
A Safe Harbor for AI Evaluation and Red Teaming
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre
Sayash Kapoor
Kevin Klyman
Ashwin Ramaswami
Rishi Bommasani
...
Daniel Kang
Sandy Pentland
Arvind Narayanan
Percy Liang
Peter Henderson
49
38
0
07 Mar 2024
FairProof : Confidential and Certifiable Fairness for Neural Networks
FairProof : Confidential and Certifiable Fairness for Neural Networks
Chhavi Yadav
A. Chowdhury
Dan Boneh
Kamalika Chaudhuri
MLAU
35
7
0
19 Feb 2024
Rethinking Machine Unlearning for Large Language Models
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
70
81
0
13 Feb 2024
12
Next