AI Risk Categorization Decoded (AIR 2024): From Government Regulations
to Corporate Policies

AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

25 June 2024

Ruoxi Jia

Dawn Song

Bo Li

Papers citing "AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies"

11 / 11 papers shown

Title
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Hannah Cyberey David E. Evans LLMSV 78 0 0 23 Apr 2025
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration Andy Zhou Kevin E. Wu Francesco Pinto Z. Chen Yi Zeng Yu Yang Shuang Yang Sanmi Koyejo James Zou Bo Li LLMAG AAML 77 0 0 20 Mar 2025
MinorBench: A hand-built benchmark for content-based risks for children Shaun Khoo Gabriel Chua Rachel Shong 31 0 0 13 Mar 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety Rakeen Rouf Trupti Bavalatti Osama Ahmed Dhaval Potdar Faraz Jawed EGVM 66 1 0 23 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM Yueying Zou Peipei Li Zekun Li Huaibo Huang Xing Cui Xuannan Liu Chenghanyu Zhang Ran He DeLMO 125 2 0 07 Feb 2025
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice A. Feder Cooper Christopher A. Choquette-Choo Miranda Bogen Matthew Jagielski Katja Filippova ... Abigail Z. Jacobs Andreas Terzis Hanna M. Wallach Nicolas Papernot Katherine Lee AILaw MU 95 10 0 09 Dec 2024
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI Jonghong Jeon 36 2 0 29 Oct 2024
SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior Jing-Jing Li Valentina Pyatkin Max Kleiman-Weiner Liwei Jiang Nouha Dziri Anne Collins Jana Schaich Borg Maarten Sap Yejin Choi Sydney Levine 29 1 0 22 Oct 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies Ritwik Gupta Leah Walker Rodolfo Corona Stephanie Fu Suzanne Petryk Janet Napolitano Trevor Darrell Andrew W. Reddie ELM 43 3 0 25 Sep 2024
Acceptable Use Policies for Foundation Models Kevin Klyman 31 14 0 29 Aug 2024
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models Yi Zeng Weiyu Sun Tran Ngoc Huynh Dawn Song Bo Li Ruoxi Jia AAML LLMSV 42 19 0 24 Jun 2024