Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.08500
Cited By
Auditing large language models: a three-layered approach
16 February 2023
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Auditing large language models: a three-layered approach"
48 / 98 papers shown
Title
Managing extreme AI risks amid rapid progress
Yoshua Bengio
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
...
Philip H. S. Torr
Stuart J. Russell
Daniel Kahneman
J. Brauner
Sören Mindermann
29
63
0
26 Oct 2023
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David E. Evans
Shruti Tople
Robert West
KELM
LLMAG
18
20
0
24 Oct 2023
Can ChatGPT Perform Reasoning Using the IRAC Method in Analyzing Legal Scenarios Like a Lawyer?
Xiaoxi Kang
Lizhen Qu
Lay-Ki Soon
Adnan Trakic
Terry Yue Zhuo
Patrick Charles Emerton
Genevieve Grant
LRM
AILaw
ELM
123
13
0
23 Oct 2023
An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Ross Gruetzemacher
Alan Chan
Kevin Frazier
Christy Manning
Stepán Los
...
Clíodhna Ní Ghuidhir
Mark M. Bailey
Daniel Eth
Toby D. Pilditch
Kyle A. Kilian
24
5
0
22 Oct 2023
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives
Elizabeth Seger
Noemi Dreksler
Richard Moulange
Emily Dardaman
Jonas Schuett
...
Emma Bluemke
Michael Aird
Patrick Levermore
Julian Hazell
Abhishek Gupta
20
40
0
29 Sep 2023
Bias and Fairness in Chatbots: An Overview
Jintang Xue
Yun Cheng Wang
Chengwei Wei
Xiaofeng Liu
Jonghye Woo
C.-C. Jay Kuo
36
29
0
16 Sep 2023
Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies
C. Shah
Ryen W. White
Reid Andersen
Georg Buscher
Scott Counts
...
Tara Safavi
Siddharth Suri
Mengting Wan
Leijie Wang
Longfei Yang
29
23
0
14 Sep 2023
Decolonial AI Alignment: Openness, Viśe\d{s}a-Dharma, and Including Excluded Knowledges
Kush R. Varshney
41
2
0
10 Sep 2023
Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps
David Lo
34
39
0
08 Sep 2023
International Governance of Civilian AI: A Jurisdictional Certification Approach
Robert F. Trager
Benjamin Harack
Anka Reuel
A. Carnegie
Lennart Heim
...
R. Lall
Owen Larter
Seán Ó hÉigeartaigh
Simon Staffell
José Jaime Villalobos
26
20
0
29 Aug 2023
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models
Keyu Pan
Yawen Zeng
LLMAG
21
41
0
30 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
47
473
0
27 Jul 2023
Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
Flor Miriam Plaza del Arco
Debora Nozza
Dirk Hovy
ALM
51
6
0
24 Jul 2023
Shaping New Norms for AI
Andrea Baronchelli
17
14
0
17 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
70
525
0
12 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
44
118
0
06 Jul 2023
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US
J. H. Rystrøm
16
0
0
22 Jun 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
38
158
0
02 Jun 2023
Training Data Extraction From Pre-trained Language Models: A Survey
Shotaro Ishihara
26
46
0
25 May 2023
Model evaluation for extreme risks
Toby Shevlane
Sebastian Farquhar
Ben Garfinkel
Mary Phuong
Jess Whittlestone
...
Vijay Bolina
Jack Clark
Yoshua Bengio
Paul Christiano
Allan Dafoe
ELM
32
152
0
24 May 2023
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models
Alicia Parrish
Hannah Rose Kirk
Jessica Quaye
Charvi Rastogi
Max Bartolo
...
Addison Howard
William J. Cukierski
D. Sculley
Vijay Janapa Reddi
Lora Aroyo
DiffM
43
13
0
22 May 2023
Inducing anxiety in large language models increases exploration and bias
Julian Coda-Forno
Kristin Witte
A. Jagadish
Marcel Binz
Zeynep Akata
Eric Schulz
AI4CE
33
41
0
21 Apr 2023
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Brian M. Belgodere
Pierre L. Dognin
Adam Ivankay
Igor Melnyk
Youssef Mroueh
...
Mattia Rigotti
Jerret Ross
Yair Schiff
Radhika Vedpathak
Richard A. Young
29
12
0
21 Apr 2023
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
28
42
0
31 Mar 2023
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods
Thilo Hagendorff
LLMAG
37
4
0
24 Mar 2023
Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases
Emma Bluemke
Tantum Collins
Ben Garfinkel
Andrew Trask
14
10
0
15 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
33
99
0
09 Mar 2023
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Christopher Akiki
Odunayo Ogundepo
Aleksandra Piktus
Xinyu Crystina Zhang
Akintunde Oladipo
Jimmy J. Lin
Martin Potthast
25
5
0
28 Feb 2023
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
Rafal Kocielnik
Shrimai Prabhumoye
Vivian Zhang
Roy Jiang
R. Alvarez
Anima Anandkumar
41
6
0
14 Feb 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
225
446
0
23 Aug 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,106
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Algorithmic audits of algorithms, and the law
Erwan Le Merrer
Ronan Pons
Gilles Trédan
MLAU
FaML
6
11
0
15 Feb 2022
Conformity Assessments and Post-market Monitoring: A Guide to the Role of Auditing in the Proposed European AI Regulation
Jakob Mokander
M. Axente
F. Casolari
Luciano Floridi
64
85
0
09 Nov 2021
Truthful AI: Developing and governing AI that does not lie
Owain Evans
Owen Cotton-Barratt
Lukas Finnveden
Adam Bales
Avital Balwit
Peter Wills
Luca Righetti
William Saunders
HILM
236
109
0
13 Oct 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
250
193
0
15 Sep 2021
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Shafiq R. Joty
S. Hoi
235
1,489
0
02 Sep 2021
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate
Hannah Rose Kirk
B. Vidgen
Paul Röttger
Tristan Thrush
Scott A. Hale
67
57
0
12 Aug 2021
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
Alex Tamkin
Miles Brundage
Jack Clark
Deep Ganguli
AILaw
ELM
200
259
0
04 Feb 2021
Robustness Gym: Unifying the NLP Evaluation Landscape
Karan Goel
Nazneen Rajani
Jesse Vig
Samson Tan
Jason M. Wu
Stephan Zheng
Caiming Xiong
Joey Tianyi Zhou
Christopher Ré
AAML
OffRL
OOD
154
136
0
13 Jan 2021
Misspelling Correction with Pre-trained Contextual Language Model
Yifei Hu
X. Jing
Youlim Ko
Julia Taylor Rayz
KELM
32
26
0
08 Jan 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
255
4,489
0
23 Jan 2020
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
223
616
0
03 Sep 2019
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
323
4,212
0
23 Aug 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
Previous
1
2