Jatmo: Prompt Injection Defense by Task-Specific Finetuning

29 December 2023

Zeming Wei

Papers citing "Jatmo: Prompt Injection Defense by Task-Specific Finetuning"

21 / 21 papers shown

Title
OET: Optimization-based prompt injection Evaluation Toolkit Jinsheng Pan Xiaogeng Liu Chaowei Xiao AAML 69 0 0 01 May 2025
Robustness via Referencing: Defending against Prompt Injection Attacks by Referencing the Executed Instruction Y. Chen Haoran Li Yuan Sui Yi Liu Yufei He Yangqiu Song Bryan Hooi AAML SILM 63 0 0 29 Apr 2025
ACE: A Security Architecture for LLM-Integrated App Systems Evan Li Tushin Mallick Evan Rose William K. Robertson Alina Oprea Cristina Nita-Rotaru 52 0 0 29 Apr 2025
Prompt Injection Attack to Tool Selection in LLM Agents Jiawen Shi Zenghui Yuan Guiyao Tie Pan Zhou Neil Zhenqiang Gong Lichao Sun LLMAG 51 0 0 28 Apr 2025
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks Ivan Evtimov Arman Zharmagambetov Aaron Grattafiori Chuan Guo Kamalika Chaudhuri AAML 35 1 0 22 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey Shuang Tian Tao Zhang Jiaheng Liu Jiacheng Wang Xuangou Wu ... Ruichen Zhang W. Zhang Zhenhui Yuan Shiwen Mao Dong In Kim 60 0 0 22 Apr 2025
ASIDE: Architectural Separation of Instructions and Data in Language Models Egor Zverev Evgenii Kortukov Alexander Panfilov Soroush Tabesh Alexandra Volkova Sebastian Lapuschkin Wojciech Samek Christoph H. Lampert AAML 54 1 0 13 Mar 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo Giandomenico Cornacchia Kieran Fraser Muhammad Zaid Hameed Ambrish Rawat Beat Buesser Mark Purcell Pin-Yu Chen P. Sattigeri Kush R. Varshney AAML 43 2 0 24 Feb 2025
Attention Tracker: Detecting Prompt Injection Attacks in LLMs Kuo-Han Hung Ching-Yun Ko Ambrish Rawat I-Hsin Chung Winston H. Hsu Pin-Yu Chen 49 7 0 01 Nov 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond Shanshan Han 84 1 0 09 Oct 2024
Non-Halting Queries: Exploiting Fixed Points in LLMs Ghaith Hammouri Kemal Derya B. Sunar 33 0 0 08 Oct 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review Jie Zhang Haoyu Bu Hui Wen Yu Chen Lun Li Hongsong Zhu 42 36 0 06 May 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge Jiawen Shi Zenghui Yuan Yinuo Liu Yue Huang Pan Zhou Lichao Sun Neil Zhenqiang Gong AAML 45 39 0 26 Mar 2024
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? Egor Zverev Sahar Abdelnabi Soroush Tabesh Mario Fritz Christoph H. Lampert 56 19 0 11 Mar 2024
StruQ: Defending Against Prompt Injection with Structured Queries Sizhe Chen Julien Piet Chawin Sitawarin David A. Wagner SILM AAML 30 65 0 09 Feb 2024
Enhancing Adversarial Attacks: The Similar Target Method Shuo Zhang Ziruo Wang Zikai Zhou Huanran Chen AAML 54 1 0 21 Aug 2023
Can Large Language Models Be an Alternative to Human Evaluations? Cheng-Han Chiang Hung-yi Lee ALM LM&MA 224 572 0 03 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 298 2,232 0 22 Mar 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 319 11,953 0 04 Mar 2022
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 290 1,815 0 14 Dec 2020
Teaching Machines to Read and Comprehend Karl Moritz Hermann Tomás Kociský Edward Grefenstette L. Espeholt W. Kay Mustafa Suleyman Phil Blunsom 175 3,510 0 10 Jun 2015