Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.01011
Cited By
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
2 November 2023
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
Tiffany Wang
Isaac Ong
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game"
11 / 61 papers shown
Title
SPML: A DSL for Defending Language Models Against Prompt Attacks
Reshabh K Sharma
Vinayak Gupta
Dan Grossman
AAML
57
16
0
19 Feb 2024
Stealthy Attack on Large Language Model based Recommendation
Jinghao Zhang
Yuting Liu
Qiang Liu
Shu Wu
Guibing Guo
Liang Wang
35
13
0
18 Feb 2024
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Lingbo Mo
Zeyi Liao
Boyuan Zheng
Yu-Chuan Su
Chaowei Xiao
Huan Sun
AAML
LLMAG
49
15
0
15 Feb 2024
Rethinking Machine Unlearning for Large Language Models
Sijia Liu
Yuanshun Yao
Jinghan Jia
Stephen Casper
Nathalie Baracaldo
...
Hang Li
Kush R. Varshney
Mohit Bansal
Sanmi Koyejo
Yang Liu
AILaw
MU
79
84
0
13 Feb 2024
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Ye Wang
Jing Jiang
Min Lin
LLMAG
LM&Ro
37
49
0
13 Feb 2024
StruQ: Defending Against Prompt Injection with Structured Queries
Sizhe Chen
Julien Piet
Chawin Sitawarin
David Wagner
SILM
AAML
35
68
0
09 Feb 2024
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
Xuchen Suo
AAML
SILM
31
27
0
15 Jan 2024
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Julien Piet
Maha Alrashed
Chawin Sitawarin
Sizhe Chen
Zeming Wei
Elizabeth Sun
Basel Alomair
David Wagner
AAML
SyDa
83
53
0
29 Dec 2023
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
31
27
0
06 Nov 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
50
42
0
16 Oct 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
372
12,081
0
04 Mar 2022
Previous
1
2