The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance

8 January 2024

Papers citing "The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance"

35 / 35 papers shown

Title
Guillotine: Hypervisors for Isolating Malicious AIs James Mickens Sarah Radway Ravi Netravali 30 0 0 22 Apr 2025
Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models Liyi Zhang Veniamin Veselovsky R. Thomas McCoy Thomas L. Griffiths 56 0 0 17 Apr 2025
Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability Jennifer Haase P. Hanel Sebastian Pokutta ALM LRM 67 0 0 10 Apr 2025
LLM Social Simulations Are a Promising Research Method Jacy Reese Anthis Ryan Liu Sean M. Richardson Austin C. Kozlowski Bernard Koch James A. Evans Erik Brynjolfsson Michael S. Bernstein ALM 51 5 0 03 Apr 2025
Interpretation Gaps in LLM-Assisted Comprehension of Privacy Documents Rinku Dewri 50 0 0 15 Mar 2025
Beyond Black-Box Benchmarking: Observability, Analytics, and Optimization of Agentic Systems Dany Moshkovich Hadar Mulian Sergey Zeltyn Natti Eder Inna Skarbovsky Roy Abitbol 47 1 0 09 Mar 2025
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs Zara Siddique Irtaza Khalid Liam D. Turner Luis Espinosa-Anke LLMSV 58 1 0 07 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness Tingchen Fu Fazl Barez AAML 65 0 0 03 Mar 2025
Steered Generation via Gradient Descent on Sparse Features Sumanta Bhattacharyya Pedram Rooshenas LLMSV 43 0 0 25 Feb 2025
From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task Nicolas Martorell LLMAG 61 1 0 23 Feb 2025
Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring Alex Heyman Joel Zylberberg LRM 45 0 0 10 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu Jiahang Xu Li Lyna Zhang Qi Chen Xuan Feng Yang Chen Zhongxin Guo Yuqing Yang Cheng Peng 84 2 0 06 Feb 2025
Generics are puzzling. Can language models find the missing piece? Gustavo Cilleruelo Calderón Emily Allaway Barry Haddow Alexandra Birch 69 0 0 15 Dec 2024
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering Aryan Keluskar Amrita Bhattacharjee Huan Liu 72 2 0 19 Nov 2024
Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs Muhammed Saeed Elgizouli Mohamed Mukhtar Mohamed Shaina Raza Muhammad Abdul-Mageed Shady Shehata 43 0 0 31 Oct 2024
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting Mohamed Salim Aissi Clément Romac Thomas Carta Sylvain Lamprier Pierre-Yves Oudeyer Olivier Sigaud Laure Soulier Nicolas Thome 24 2 0 25 Oct 2024
PromptHive: Bringing Subject Matter Experts Back to the Forefront with Collaborative Prompt Engineering for Educational Content Creation Mohi Reza Ioannis Anastasopoulos Shreya Bhandari Z. Pardos 23 0 0 21 Oct 2024
A Comprehensive Evaluation of Large Language Models on Mental Illnesses Abdelrahman Hanafi Mohammed Saad Noureldin Zahran Radwa J. Hanafy Mohammed E. Fouda ELM LM&MA AI4MH 26 4 0 24 Sep 2024
Learning from Contrastive Prompts: Automated Optimization and Adaptation Mingqi Li Karan Aggarwal Yong Xie Aitzaz Ahmad Stephen Lau 30 2 0 23 Sep 2024
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification Michel Olvera Paraskevas Stamatiadis S. Essid VLM 32 1 0 19 Sep 2024
Towards More Realistic Extraction Attacks: An Adversarial Perspective Yash More Prakhar Ganesh G. Farnadi AAML 71 6 0 02 Jul 2024
COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities Zihao He Rebecca Dorn Siyi Guo Minh Duc Hoang Chu Kristina Lerman 44 6 0 17 Jun 2024
Bayesian Statistical Modeling with Predictors from LLMs Michael Franke Polina Tsvilodub Fausto Carcassi 45 4 0 13 Jun 2024
On the Worst Prompt Performance of Large Language Models Bowen Cao Deng Cai Zhisong Zhang Yuexian Zou Wai Lam ALM LRM 30 5 0 08 Jun 2024
Empirical influence functions to understand the logic of fine-tuning Jordan K Matelsky Lyle Ungar Konrad Paul Kording 26 0 0 01 Jun 2024
Harmful Speech Detection by Language Models Exhibits Gender-Queer Dialect Bias Rebecca Dorn Lee Kezar Fred Morstatter Kristina Lerman 27 7 0 23 May 2024
Your Large Language Models Are Leaving Fingerprints Hope McGovern Rickard Stureborg Yoshi Suhara Dimitris Alikaniotis DeLMO 49 11 0 22 May 2024
MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing Siddhant Agarwal Shivam Sharma Preslav Nakov Tanmoy Chakraborty 24 4 0 18 May 2024
Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra Darioush Kevian U. Syed Xing-ming Guo Aaron J. Havens Geir Dullerud Peter M. Seiler Lianhui Qin Bin Hu ELM 41 29 0 04 Apr 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction Koki Maeda Shuhei Kurita Taiki Miyanishi Naoaki Okazaki 38 2 0 28 Feb 2024
An Empirical Study of In-context Learning in LLMs for Machine Translation Pranjal A. Chitale Jay Gala Raj Dabre LRM 31 5 0 22 Jan 2024
Large Language Models Help Reveal Unhealthy Diet and Body Concerns in Online Eating Disorders Communities Minh Duc Hoang Chu Zihao He Rebecca Dorn Kristina Lerman 16 0 0 17 Jan 2024
A Corpus for Sentence-level Subjectivity Detection on English News Articles Francesco Antici Andrea Galassi Federico Ruggeri Katerina Korre Arianna Muti Alessandra Bardi Alice Fedotova Alberto Barrón-Cedeño 32 11 0 29 May 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models Xuezhi Wang Jason W. Wei Dale Schuurmans Quoc Le Ed H. Chi Sharan Narang Aakanksha Chowdhery Denny Zhou ReLM BDL LRM AI4CE 314 3,248 0 21 Mar 2022
WARP: Word-level Adversarial ReProgramming Karen Hambardzumyan Hrant Khachatrian Jonathan May AAML 254 342 0 01 Jan 2021