Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023

Yejin Choi

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

50 / 235 papers shown

Title
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu Jiahang Xu Li Zhang Qi Chen Xuan Feng Yang Chen Zhongxin Guo Yuqing Yang Cheng Peng 84 2 0 06 Feb 2025
The Curious Case of Arbitrariness in Machine Learning Prakhar Ganesh Afaf Taik G. Farnadi 59 2 0 28 Jan 2025
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review Rock Yuren Pang Hope Schroeder Kynnedy Simone Smith Solon Barocas Ziang Xiao Emily Tseng Danielle Bragg 77 3 0 22 Jan 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates Fengqing Jiang Zhangchen Xu Luyao Niu Bill Yuchen Lin Radha Poovendran SILM 81 6 0 08 Jan 2025
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation Leonidas Zotos H. Rijn Malvina Nissim 75 0 0 16 Dec 2024
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages Jia Guo Longxu Dou Guangtao Zeng Stanley Kok Wei Lu Qian Liu ELM LRM 81 1 0 02 Dec 2024
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS Jinyang Wu Mingkuan Feng Shuai Zhang Feihu Che Zengqi Wen J. Tao ReLM LRM 115 9 0 27 Nov 2024
Explaining GPT-4's Schema of Depression Using Machine Behavior Analysis Adithya V Ganesan Vasudha Varadarajan Yash Kumar Lal Veerle C. Eijsbroek Katarina Kjell ... Elizabeth C. Stade J. Eichstaedt Ryan L. Boyd H. A. Schwartz Lucie Flek AI4MH 77 0 0 21 Nov 2024
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations Gaurav Verma Rachneet Kaur Nishan Srishankar Zhen Zeng T. Balch Manuela Veloso LLMAG 72 5 0 20 Nov 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices Anka Reuel Amelia F. Hardy Chandler Smith Max Lamparth Malcolm Hardy Mykel J. Kochenderfer ELM 81 17 0 20 Nov 2024
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering Aryan Keluskar Amrita Bhattacharjee Huan Liu 72 2 0 19 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go Taesun Whang Chanhee Lee Hwayeon Kim Sunghoon Park Seunghwan Ji Dongchan Kim Young-Bum Kim Young-Bum Kim LRM 195 1 0 19 Nov 2024
Does Prompt Formatting Have Any Impact on LLM Performance? Jia He Mukund Rungta David Koleczek Arshdeep Sekhon Franklin X Wang Sadid Hasan LLMAG LRM 27 36 0 15 Nov 2024
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering Ishika Joshi Simra Shahid Shreeya Venneti Manushree Vasu Yantao Zheng Yunyao Li Balaji Krishnamurthy Gromit Yeuk-Yin Chan 31 3 0 09 Nov 2024
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? Daniel P. Jeong Saurabh Garg Zachary Chase Lipton Michael Oberst LM&MA VLM ELM 37 9 0 06 Nov 2024
Controlling Language and Diffusion Models by Transporting Activations P. Rodríguez Arno Blaas Michal Klein Luca Zappella N. Apostoloff Marco Cuturi Xavier Suau LLMSV 40 4 0 30 Oct 2024
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models Rishabh Adiga Besmira Nushi Varun Chandrasekaran 49 0 0 29 Oct 2024
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution Zhengmian Hu Tong Zheng Heng Huang BDL 29 2 0 29 Oct 2024
Vulnerability of LLMs to Vertically Aligned Text Manipulations Zhecheng Li Yijiao Wang Bryan Hooi Yujun Cai Zhen Xiong Nanyun Peng Kai-Wei Chang 56 1 0 26 Oct 2024
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting Mohamed Salim Aissi Clément Romac Thomas Carta Sylvain Lamprier Pierre-Yves Oudeyer Olivier Sigaud Laure Soulier Nicolas Thome 24 2 0 25 Oct 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina Yuan Gao Dokyun Lee Gordon Burtch Sina Fazelpour LRM 56 7 0 25 Oct 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) Leander Girrbach Yiran Huang Stephan Alaniz Trevor Darrell Zeynep Akata VLM 47 2 0 25 Oct 2024
LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples Huiyu Wu Diego Klabjan FedML 46 0 0 24 Oct 2024
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts Yuxuan Xie Tianhua Li Wenqi Shao Kaipeng Zhang 25 0 0 23 Oct 2024
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data Wenkai Li Jiarui Liu Andy Liu Xuhui Zhou Mona Diab Maarten Sap 56 6 0 21 Oct 2024
Do LLMs "know" internally when they follow instructions? Juyeon Heo Christina Heinze-Deml Oussama Elachqar Shirley Ren Udhay Nallasamy Andy Miller Kwan Ho Ryan Chan Jaya Narain 51 5 0 18 Oct 2024
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks Akshara Prabhakar Yuanzhi Li Karthik Narasimhan Sham Kakade Eran Malach Samy Jelassi MoMe 36 9 0 16 Oct 2024
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Jingming Zhuo S. Zhang Xinyu Fang Haodong Duan Dahua Lin Kai Chen 34 19 0 16 Oct 2024
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products Nadia Nahar Christian Kastner Jenna L. Butler Chris Parnin Thomas Zimmermann Christian Bird 57 3 0 15 Oct 2024
Evaluating Gender Bias of LLMs in Making Morality Judgements Divij Bajaj Yuanyuan Lei Jonathan Tong Ruihong Huang 37 3 0 13 Oct 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities Andrey Anurin Jonathan Ng Kibo Schaffer Jason Schreiber Esben Kran ELM 40 5 0 10 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation Yixin Liu Kejian Shi Alexander R. Fabbri Yilun Zhao Peifeng Wang Chien-Sheng Wu Shafiq Joty Arman Cohan 27 6 0 09 Oct 2024
POSIX: A Prompt Sensitivity Index For Large Language Models Anwoy Chatterjee H. S. V. N. S. K. Renduchintala S. Bhatia Tanmoy Chakraborty AAML 39 6 0 03 Oct 2024
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments Amogh Mannekote Adam Davies Jina Kang K. Boyer 33 1 0 03 Oct 2024
'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants Shivani Kapania William Agnew Motahhare Eslami Hoda Heidari Sarah E Fox 42 4 0 28 Sep 2024
A Survey on the Honesty of Large Language Models Siheng Li Cheng Yang Taiqiang Wu Chufan Shi Yuji Zhang ... Jie Zhou Yujiu Yang Ngai Wong Xixin Wu Wai Lam HILM 35 4 0 27 Sep 2024
Data Analysis in the Era of Generative AI J. Inala Chenglong Wang Steven Drucker Gonzalo Ramos Victor C. Dibia N. Riche Dave Brown Dan Marshall Jianfeng Gao 29 7 0 27 Sep 2024
DARE: Diverse Visual Question Answering with Robustness Evaluation Hannah Sterz Jonas Pfeiffer Ivan Vulić OOD VLM 26 2 0 26 Sep 2024
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction Yuanchao Li Yuan Gong Chao-Han Huck Yang P. Bell Catherine Lai 45 1 0 23 Sep 2024
SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation Maying Shen Nadine Chang Sifei Liu Jose M. Alvarez 36 0 0 20 Sep 2024
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time David Herel Vojtech Bartek Jiri Jirak Tomáš Mikolov 50 2 0 20 Sep 2024
Pay Attention to What Matters Pedro Luiz Silva Antonio De Domenico Ali Maatouk Fadhel Ayed ALM 29 0 0 19 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination Eva Sánchez Salido Roser Morante Julio Gonzalo Guillermo Marco Jorge Carrillo-de-Albornoz ... Enrique Amigó Andrés Fernández Alejandro Benito-Santos Adrián Ghajari Espinosa Victor Fresno ELM 51 0 0 19 Sep 2024
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research Yi Yang Hanyu Duan Jiaxin Liu Kar Yan Tam 21 0 0 19 Sep 2024
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification Michel Olvera Paraskevas Stamatiadis S. Essid VLM 37 1 0 19 Sep 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models Gautier Dagan Olga Loginova Anil Batra CoGe 72 1 0 17 Sep 2024
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers Alexander Wuttke Matthias Aßenmacher Christopher Klamm Max M. Lang Quirin Würschinger Frauke Kreuter 44 2 0 16 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Sacha Muller António Loison Bilel Omrani Gautier Viaud RALM ELM 38 1 0 10 Sep 2024
End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting Leijie Wang Kathryn Yurechko Pranati Dani Quan Ze Chen Amy X. Zhang 50 3 0 05 Sep 2024
Irrelevant Alternatives Bias Large Language Model Hiring Decisions Kremena Valkanova Pencho Yordanov 23 0 0 04 Sep 2024