Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

11 February 2023

Tatsunori Hashimoto

Papers citing "Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks"

28 / 178 papers shown

Title
"Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Xinyue Shen Zhenpeng Chen Michael Backes Yun Shen Yang Zhang SILM 40 249 0 07 Aug 2023
Mondrian: Prompt Abstraction Attack Against Large Language Models for Cheaper API Pricing Waiman Si Michael Backes Yang Zhang 30 5 0 07 Aug 2023
From Prompt Injections to SQL Injection Attacks: How Protected is Your LLM-Integrated Web Application? Rodrigo Pedro Daniel Castro Paulo Carreira Nuno Santos SILM AAML 41 51 0 03 Aug 2023
Embedding Democratic Values into Social Media AIs via Societal Objective Functions Chenyan Jia Michelle S. Lam Minh Chau Mai Jeffrey T. Hancock Michael S. Bernstein 23 27 0 26 Jul 2023
LLM Censorship: A Machine Learning Challenge or a Computer Security Problem? David Glukhov Ilia Shumailov Y. Gal Nicolas Papernot Vardan Papyan AAML ELM 30 57 0 20 Jul 2023
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models Huachuan Qiu Shuai Zhang Anqi Li Hongliang He Zhenzhong Lan ALM 44 48 0 17 Jul 2023
Jailbroken: How Does LLM Safety Training Fail? Alexander Wei Nika Haghtalab Jacob Steinhardt 107 852 0 05 Jul 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models Yue Huang Qihui Zhang Philip S. Y Lichao Sun 18 46 0 20 Jun 2023
From Bad to Worse: Using Private Data to Propagate Disinformation on Online Platforms with a Greater Efficiency Protik Bose Pranto Waqar Hassan Khan Sahar Abdelnabi Rebecca Weil Mario Fritz Rakibul Hasan 13 2 0 08 Jun 2023
Can large language models democratize access to dual-use biotechnology? Emily H. Soice R. Rocha Kimberlee Cordova Michael A. Specter K. Esvelt 14 46 0 06 Jun 2023
Spear or Shield: Leveraging Generative AI to Tackle Security Threats of Intelligent Network Services Hongyang Du Dusit Niyato Jiawen Kang Zehui Xiong K. Lam Ya-Nan Fang Yonghui Li AAML 29 13 0 04 Jun 2023
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks Abhinav Rao S. Vashistha Atharva Naik Somak Aditya Monojit Choudhury 35 17 0 24 May 2023
In-Context Impersonation Reveals Large Language Models' Strengths and Biases Leonard Salewski Stephan Alaniz Isabel Rio-Torto Eric Schulz Zeynep Akata 44 151 0 24 May 2023
Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models Oana Ignat Zhijing Jin Artem Abzaliev Laura Biester Santiago Castro ... Verónica Pérez-Rosas Siqi Shen Zekun Wang Winston Wu Rada Mihalcea LRM 41 6 0 21 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation Xiaowei Huang Wenjie Ruan Wei Huang Gao Jin Yizhen Dong ... Sihao Wu Peipei Xu Dengyu Wu André Freitas Mustafa A. Mustafa ALM 45 82 0 19 May 2023
Beyond the Safeguards: Exploring the Security Risks of ChatGPT Erik Derner Kristina Batistic SILM 27 65 0 13 May 2023
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT Xinyue Shen Zhenpeng Chen Michael Backes Yang Zhang 27 55 0 18 Apr 2023
Multi-step Jailbreaking Privacy Attacks on ChatGPT Haoran Li Dadi Guo Wei Fan Mingshi Xu Jie Huang Fanpu Meng Yangqiu Song SILM 47 321 0 11 Apr 2023
Large Language Model Instruction Following: A Survey of Progresses and Challenges Renze Lou Kai Zhang Wenpeng Yin ALM LRM 32 20 0 18 Mar 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection Kai Greshake Sahar Abdelnabi Shailesh Mishra C. Endres Thorsten Holz Mario Fritz SILM 49 436 0 23 Feb 2023
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning Omar Shaikh Hongxin Zhang William B. Held Michael S. Bernstein Diyi Yang ReLM LRM 35 184 0 15 Dec 2022
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 227 506 0 28 Sep 2022
TempLM: Distilling Language Models into Template-Based Generators Tianyi Zhang Mina Lee Lisa Li Ende Shen Tatsunori B. Hashimoto VLM 40 5 0 23 May 2022
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 339 12,003 0 04 Mar 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization Victor Sanh Albert Webson Colin Raffel Stephen H. Bach Lintang Sutawika ... T. Bers Stella Biderman Leo Gao Thomas Wolf Alexander M. Rush LRM 213 1,661 0 15 Oct 2021
Measuring Coding Challenge Competence With APPS Dan Hendrycks Steven Basart Saurav Kadavath Mantas Mazeika Akul Arora ... Collin Burns Samir Puranik Horace He D. Song Jacob Steinhardt ELM AIMat ALM 208 627 0 20 May 2021
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning Yuhuai Wu M. Rabe Wenda Li Jimmy Ba Roger C. Grosse Christian Szegedy AIMat LRM 75 51 0 15 Jan 2021
Robust Encodings: A Framework for Combating Adversarial Typos Erik Jones Robin Jia Aditi Raghunathan Percy Liang AAML 142 102 0 04 May 2020