Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17864
Cited By
AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
25 June 2024
Yi Zeng
Kevin Klyman
Andy Zhou
Yu Yang
Minzhou Pan
Ruoxi Jia
Dawn Song
Percy Liang
Bo Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies"
11 / 11 papers shown
Title
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David E. Evans
LLMSV
76
0
0
23 Apr 2025
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou
Kevin E. Wu
Francesco Pinto
Z. Chen
Yi Zeng
Yu Yang
Shuang Yang
Sanmi Koyejo
James Zou
Bo Li
LLMAG
AAML
77
0
0
20 Mar 2025
MinorBench: A hand-built benchmark for content-based risks for children
Shaun Khoo
Gabriel Chua
Rachel Shong
31
0
0
13 Mar 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
Rakeen Rouf
Trupti Bavalatti
Osama Ahmed
Dhaval Potdar
Faraz Jawed
EGVM
66
1
0
23 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
125
2
0
07 Feb 2025
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
A. Feder Cooper
Christopher A. Choquette-Choo
Miranda Bogen
Matthew Jagielski
Katja Filippova
...
Abigail Z. Jacobs
Andreas Terzis
Hanna M. Wallach
Nicolas Papernot
Katherine Lee
AILaw
MU
93
10
0
09 Dec 2024
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI
Jonghong Jeon
36
2
0
29 Oct 2024
SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior
Jing-Jing Li
Valentina Pyatkin
Max Kleiman-Weiner
Liwei Jiang
Nouha Dziri
Anne Collins
Jana Schaich Borg
Maarten Sap
Yejin Choi
Sydney Levine
29
1
0
22 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
40
3
0
25 Sep 2024
Acceptable Use Policies for Foundation Models
Kevin Klyman
31
14
0
29 Aug 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Yi Zeng
Weiyu Sun
Tran Ngoc Huynh
Dawn Song
Bo Li
Ruoxi Jia
AAML
LLMSV
42
19
0
24 Jun 2024
1