ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.14595
  4. Cited By
Adversaries Can Misuse Combinations of Safe Models

Adversaries Can Misuse Combinations of Safe Models

20 June 2024
Erik Jones
Anca Dragan
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Adversaries Can Misuse Combinations of Safe Models"

7 / 7 papers shown
Title
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents
Christian Schroeder de Witt
AAML
AI4CE
147
0
0
04 May 2025
Superintelligence Strategy: Expert Version
Superintelligence Strategy: Expert Version
Dan Hendrycks
Eric Schmidt
Alexandr Wang
59
1
0
07 Mar 2025
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI
  Policy, Research, and Practice
Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice
A. Feder Cooper
Christopher A. Choquette-Choo
Miranda Bogen
Matthew Jagielski
Katja Filippova
...
Abigail Z. Jacobs
Andreas Terzis
Hanna M. Wallach
Nicolas Papernot
Katherine Lee
AILaw
MU
93
10
0
09 Dec 2024
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step
  by Step
Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
Wenxuan Wang
Kuiyi Gao
Zihan Jia
Youliang Yuan
Jen-tse Huang
Qiuzhi Liu
Shuai Wang
Wenxiang Jiao
Zhaopeng Tu
121
2
0
04 Oct 2024
Grounding and Evaluation for Large Language Models: Practical Challenges
  and Lessons Learned (Survey)
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
K. Kenthapadi
M. Sameki
Ankur Taly
HILM
ELM
AILaw
39
12
0
10 Jul 2024
Feedback Loops With Language Models Drive In-Context Reward Hacking
Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan
Erik Jones
Meena Jagadeesan
Jacob Steinhardt
KELM
47
26
0
09 Feb 2024
Generative Agents: Interactive Simulacra of Human Behavior
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
232
1,734
0
07 Apr 2023
1