ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14539
  4. Cited By
Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

17 June 2025
Daewon Kang
YeongHwan Shin
Doyeon Kim
Kyu-Hwan Jung
Meong Hi Son
    AAMLSILM
ArXiv (abs)PDFHTML

Papers citing "Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack"

6 / 6 papers shown
Title
Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning
Enhancing Persona Consistency for LLMs' Role-Playing using Persona-Aware Contrastive Learning
Ke Ji
Yixin Lian
Linxu Li
Jingsheng Gao
Weiyuan Li
Bin Dai
69
2
0
22 Mar 2025
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents
Junkai Li
Yunghwei Lai
Weitao Li
Jingyi Ren
Meng Zhang
...
Siyu Wang
Ziwei Sun
Yanzhe Zhang
Weizhi Ma
Yang Liu
LLMAGLM&MALM&RoMedIm
152
121
0
20 Jan 2025
What Limits LLM-based Human Simulation: LLMs or Our Design?
What Limits LLM-based Human Simulation: LLMs or Our Design?
Qian Wang
Jiaying Wu
Zhenheng Tang
B. Luo
Nuo Chen
Wei Chen
Bingsheng He
AI4CE
50
6
0
15 Jan 2025
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Yue Zhou
Henry Peng Zou
Barbara Di Eugenio
Yang Zhang
LRMHILM
80
6
0
01 Jul 2024
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments
Samuel Schmidgall
Rojin Ziaei
Carl Harris
Eduardo Reis
Jeffrey Jopling
Michael Moor
142
54
0
13 May 2024
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
291
1,498
0
27 Jul 2023
1