Adversaries Can Misuse Combinations of Safe Models

20 June 2024

Papers citing "Adversaries Can Misuse Combinations of Safe Models"

7 / 7 papers shown

Title
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents Christian Schroeder de Witt AAML AI4CE 147 0 0 04 May 2025
Superintelligence Strategy: Expert Version Dan Hendrycks Eric Schmidt Alexandr Wang 59 1 0 07 Mar 2025
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice A. Feder Cooper Christopher A. Choquette-Choo Miranda Bogen Matthew Jagielski Katja Filippova ... Abigail Z. Jacobs Andreas Terzis Hanna M. Wallach Nicolas Papernot Katherine Lee AILaw MU 93 10 0 09 Dec 2024
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step Wenxuan Wang Kuiyi Gao Zihan Jia Youliang Yuan Jen-tse Huang Qiuzhi Liu Shuai Wang Wenxiang Jiao Zhaopeng Tu 121 2 0 04 Oct 2024
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) K. Kenthapadi M. Sameki Ankur Taly HILM ELM AILaw 39 12 0 10 Jul 2024
Feedback Loops With Language Models Drive In-Context Reward Hacking Alexander Pan Erik Jones Meena Jagadeesan Jacob Steinhardt KELM 47 26 0 09 Feb 2024
Generative Agents: Interactive Simulacra of Human Behavior J. Park Joseph C. O'Brien Carrie J. Cai Meredith Ringel Morris Percy Liang Michael S. Bernstein LM&Ro AI4CE 232 1,734 0 07 Apr 2023