Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

17 August 2023

Papers citing "Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection"

15 / 15 papers shown

Title
PR-Attack: Coordinated Prompt-RAG Attacks on Retrieval-Augmented Generation in Large Language Models via Bilevel Optimization Yang Jiao Xiao Wang Kai Yang AAML SILM 88 0 0 10 Apr 2025
Defense Against Prompt Injection Attack by Leveraging Attack Techniques Yulin Chen Haoran Li Zihao Zheng Yangqiu Song Dekai Wu Bryan Hooi SILM AAML 90 6 0 01 Nov 2024
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models Lipeng Ma Weidong Yang Sihang Jiang Ben Fei Mingjie Zhou Shuhao Li Bo Xu Bo Xu Yanghua Xiao 123 0 0 03 Sep 2024
On the Exploitability of Instruction Tuning Manli Shu Jiong Wang Chen Zhu Jonas Geiping Chaowei Xiao Tom Goldstein SILM 95 97 0 28 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena Lianmin Zheng Wei-Lin Chiang Ying Sheng Siyuan Zhuang Zhanghao Wu ... Dacheng Li Eric Xing Haotong Zhang Joseph E. Gonzalez Ion Stoica ALM OSLM ELM 332 4,298 0 09 Jun 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation Xiaowei Huang Wenjie Ruan Wei Huang Gao Jin Yizhen Dong ... Sihao Wu Peipei Xu Dengyu Wu André Freitas Mustafa A. Mustafa ALM 101 88 0 19 May 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective Jindong Wang Xixu Hu Wenxin Hou Hao Chen Runkai Zheng ... Weirong Ye Xiubo Geng Binxing Jiao Yue Zhang Xingxu Xie AI4MH 108 233 0 22 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks Daniel Kang Xuechen Li Ion Stoica Carlos Guestrin Matei A. Zaharia Tatsunori Hashimoto AAML 87 251 0 11 Feb 2023
Improving language models by retrieving from trillions of tokens Sebastian Borgeaud A. Mensch Jordan Hoffmann Trevor Cai Eliza Rutherford ... Simon Osindero Karen Simonyan Jack W. Rae Erich Elsen Laurent Sifre KELM RALM 242 1,085 0 08 Dec 2021
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models Wei Ping Chejian Xu Shuohang Wang Zhe Gan Yu Cheng Jianfeng Gao Ahmed Hassan Awadallah Yangqiu Song VLM ELM AAML 58 222 0 04 Nov 2021
Finetuned Language Models Are Zero-Shot Learners Jason W. Wei Maarten Bosma Vincent Zhao Kelvin Guu Adams Wei Yu Brian Lester Nan Du Andrew M. Dai Quoc V. Le ALM UQCV 184 3,743 0 03 Sep 2021
Adversarial NLI: A New Benchmark for Natural Language Understanding Yixin Nie Adina Williams Emily Dinan Joey Tianyi Zhou Jason Weston Douwe Kiela 118 1,005 0 31 Oct 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Zhilin Yang Peng Qi Saizheng Zhang Yoshua Bengio William W. Cohen Ruslan Salakhutdinov Christopher D. Manning RALM 166 2,647 0 25 Sep 2018
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension Mandar Joshi Eunsol Choi Daniel S. Weld Luke Zettlemoyer RALM 204 2,646 0 09 May 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 274 8,127 0 16 Jun 2016