Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.12241
Cited By
v1
v2 (latest)
Introducing v0.5 of the AI Safety Benchmark from MLCommons
18 April 2024
Bertie Vidgen
Adarsh Agrawal
Ahmed M. Ahmed
Victor Akinwande
Namir Al-nuaimi
Najla Alfaraj
Elie Alhajjar
Lora Aroyo
Trupti Bavalatti
Max Bartolo
Borhane Blili-Hamelin
K. Bollacker
Rishi Bomassani
Marisa Ferrara Boston
Siméon Campos
Kal Chakra
Canyu Chen
Cody Coleman
Zacharie Delpierre Coudert
Leon Derczynski
Debojyoti Dutta
Ian Eisenberg
J. Ezick
Heather Frase
Brian Fuller
Ram Gandikota
Agasthya Gangavarapu
Ananya Gangavarapu
J. Gealy
Rajat Ghosh
James Goel
Usman Gohar
Sujata Goswami
Scott A. Hale
Wiebke Hutiri
Joseph Marvin Imperial
Surgan Jandial
Nicholas C. Judd
Felix Juefei Xu
Foutse Khomh
Bhavya Kailkhura
Hannah Rose Kirk
Kevin Klyman
Chris Knotz
Michael Kuchnik
Shachi H. Kumar
Srijan Kumar
Chris Lengerich
Bo Li
Zeyi Liao
E. Long
Victor Lu
Sarah Luger
Yifan Mai
P. Mammen
Kelvin Manyeki
Sean McGregor
Virendra Mehta
Shafee Mohammed
Emanuel Moss
L. Nachman
Dinesh Jinenhally Naganna
Amin Nikanjam
Besmira Nushi
Luis Oala
Iftach Orr
Alicia Parrish
Çigdem Patlak
William Pietri
Forough Poursabzi-Sangdeh
Eleonora Presani
Fabrizio Puletti
Paul Röttger
Saurav Sahay
Tim Santos
Nino Scherrer
Alice Schoenauer Sebag
P. Schramowski
Abolfazl Shahbazi
Vin Sharma
Xudong Shen
Vamsi Sistla
Leonard Tang
Davide Testuggine
Vithursan Thangarasa
E. A. Watkins
Rebecca Weiss
Christoper A. Welty
Tyler Wilbers
Adina Williams
Carole-Jean Wu
Poonam Yadav
Xianjun Yang
Yi Zeng
Wenhui Zhang
Fedor Zhdanov
Jiacheng Zhu
Percy Liang
Peter Mattson
Joaquin Vanschoren
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Introducing v0.5 of the AI Safety Benchmark from MLCommons"
26 / 26 papers shown
Title
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVM
AAML
93
1
0
17 May 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
148
5
0
25 Apr 2025
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
216
5
0
23 Apr 2025
RealHarm: A Collection of Real-World Language Model Application Failures
Pierre Le Jeune
Jiaen Liu
Luca Rossi
Matteo Dora
ALM
45
2
0
14 Apr 2025
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges
Francisco Eiras
Eliott Zemour
Eric Lin
Vaikkunth Mugunthan
ELM
120
1
0
06 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
467
0
0
03 Mar 2025
Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation
Dimosthenis Antypas
Indira Sen
Carla Pérez-Almendros
Jose Camacho-Collados
Francesco Barbieri
122
1
0
29 Nov 2024
Llama Guard 3 Vision: Safeguarding Human-AI Image Understanding Conversations
Jianfeng Chi
Ujjwal Karn
Hongyuan Zhan
Eric Michael Smith
Javier Rando
Yiming Zhang
Kate Plawiak
Zacharie Delpierre Coudert
Kartikeya Upasani
Mahesh Pasupuleti
MLLM
3DH
124
32
0
15 Nov 2024
SafetyAnalyst: Interpretable, Transparent, and Steerable Safety Moderation for AI Behavior
Jing-Jing Li
Valentina Pyatkin
Max Kleiman-Weiner
Liwei Jiang
Nouha Dziri
Anne Collins
Jana Schaich Borg
Maarten Sap
Yejin Choi
Sydney Levine
74
0
0
22 Oct 2024
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
Anvesh Rao Vijjini
Rakesh R Menon
Jiayi Fu
Shashank Srivastava
Snigdha Chaturvedi
ALM
69
0
0
11 Oct 2024
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Ido Levy
Ben Wiesel
Sami Marreed
Alon Oved
Avi Yaeli
Segev Shlomov
LLMAG
129
23
0
09 Oct 2024
SoK: Towards Security and Safety of Edge AI
Tatjana Wingarz
Anne Lauscher
Janick Edinger
Dominik Kaaser
Stefan Schulte
Mathias Fischer
77
0
0
07 Oct 2024
The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu
Eryk Helenowski
Karthik Abinav Sankararaman
Di Jin
Kaiyan Peng
...
Gabriel Cohen
Yuandong Tian
Hao Ma
Sinong Wang
Han Fang
127
14
0
30 Sep 2024
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models
Prannaya Gupta
Le Qi Yau
Hao Han Low
I-Shiang Lee
Hugo Maximus Lim
...
Jia Hng Koh
Dar Win Liew
Rishabh Bhardwaj
Rajat Bhardwaj
Soujanya Poria
ELM
LM&MA
100
6
0
07 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
71
25
0
31 Jul 2024
Can Editing LLMs Inject Harm?
Canyu Chen
Baixiang Huang
Zekun Li
Zhaorun Chen
Shiyang Lai
...
Xifeng Yan
William Wang
Philip Torr
Dawn Song
Kai Shu
KELM
130
15
0
29 Jul 2024
ToVo: Toxicity Taxonomy via Voting
Tinh Son Luong
Thanh-Thien Le
Thang Viet Doan
Linh Ngo Van
Thien Huu Nguyen
Diep Thi-Ngoc Nguyen
137
0
0
21 Jun 2024
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal
Tinghao Xie
Xiangyu Qi
Yi Zeng
Yangsibo Huang
Udari Madhushani Sehwag
...
Bo Li
Kai Li
Danqi Chen
Peter Henderson
Prateek Mittal
ALM
ELM
180
79
0
20 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models
Anvesh Rao Vijjini
Somnath Basu Roy Chowdhury
Snigdha Chaturvedi
170
9
0
17 Jun 2024
A Standardized Machine-readable Dataset Documentation Format for Responsible AI
Nitisha Jain
Mubashara Akhtar
Joan Giner-Miguelez
Rajat Shinde
Joaquin Vanschoren
...
Costanza Conforti
Michael Kuchnik
Lora Aroyo
Omar Benjelloun
Elena Simperl
76
3
0
04 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
149
49
0
04 Jun 2024
Risks and Opportunities of Open-Source Generative AI
Francisco Eiras
Aleksander Petrov
Bertie Vidgen
Christian Schroeder
Fabio Pizzati
...
Matthew Jackson
Phillip H. S. Torr
Trevor Darrell
Y. Lee
Jakob N. Foerster
88
19
0
14 May 2024
An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping
Boming Xia
Qinghua Lu
Liming Zhu
Zhenchang Xing
59
9
0
08 Apr 2024
Can Large Language Models Identify Authorship?
Baixiang Huang
Canyu Chen
Kai Shu
DeLMO
72
18
0
13 Mar 2024
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators
Wiebke Hutiri
Orestis Papakyriakopoulos
Alice Xiang
53
23
0
25 Jan 2024
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Jiaming Ji
Jiayi Zhou
Borong Zhang
Juntao Dai
Xuehai Pan
Ruiyang Sun
Weidong Huang
Yiran Geng
Mickel Liu
Yaodong Yang
OffRL
146
52
0
16 May 2023
1