Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023

Yejin Choi

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

50 / 235 papers shown

Title
Leveraging LLM Inconsistency to Boost Pass@k Performance Uri Dalal Meirav Segal Zvika Ben-Haim Dan Lahav Omer Nevo 9 0 0 19 May 2025
PromptPrism: A Linguistically-Inspired Taxonomy for Prompts Sullam Jeoung Yueyan Chen Yi Zhang Shuai Wang Haibo Ding Lin Lee Cheong 12 0 0 19 May 2025
How Reliable is Multilingual LLM-as-a-Judge? Xiyan Fu Wei Liu ELM 4 0 0 18 May 2025
Improving Fairness in LLMs Through Testing-Time Adversaries Isabela Pereira Gregio Ian Pons Anna Helena Reali Costa Artur Jordao AAML 9 0 0 17 May 2025
The Effects of Demographic Instructions on LLM Personas Angel Felipe Magnossão de Paula J. Shane Culpepper Alistair Moffat Sachin Pathiyan Cherumanal Falk Scholer Johanne Trippas 4 0 0 17 May 2025
LLM Agents Are Hypersensitive to Nudges Manuel Cherep Pattie Maes Nikhil Singh 2 0 0 16 May 2025
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs Lake Yin Fan Huang 19 0 0 15 May 2025
LLM-Augmented Chemical Synthesis and Design Decision Programs Haorui Wang Jeff Guo Lingkai Kong R. Ramprasad Philippe Schwaller Yuanqi Du Chao Zhang 31 0 0 11 May 2025
Say It Another Way: A Framework for User-Grounded Paraphrasing Cléa Chataigner Rebecca Ma Prakhar Ganesh Afaf Taik Elliot Creager G. Farnadi 42 0 0 06 May 2025
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs Elisa Forcada Rodríguez Olatz Perez-de-Viñaspre Jon Ander Campos Dietrich Klakow Vagrant Gautam 32 0 0 05 May 2025
ConSens: Assessing context grounding in open-book question answering Ivan Vankov Matyo Ivanov Adriana Correia Victor Botev ELM 69 0 0 30 Apr 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Hanhua Hong Chenghao Xiao Yang Wang Y. Liu Wenge Rong Chenghua Lin 31 0 0 29 Apr 2025
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations Moran Mizrahi Chen Shani Gabriel Stanovsky Dan Jurafsky Dafna Shahaf 29 0 0 29 Apr 2025
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks Jaime Raldua Veuthey Zainab Ali Majid Suhas Hariharan Jacob Haimes ELM 31 0 0 18 Apr 2025
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment Negar Arabzadeh Charles L. A. Clarke 31 1 0 16 Apr 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models Aryan Shrivastava Paula Akemi Aoyagui 29 0 0 14 Apr 2025
LLM-driven Constrained Copy Generation through Iterative Refinement Varun Vasudevan Faezeh Akhavizadegan Abhinav Prakash Yokila Arora Jason H. D. Cho Tanya Mendiratta Sushant Kumar Kannan Achan 37 0 0 14 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation Mingxuan Li Hanchen Li Chenhao Tan ALM ELM 49 0 0 09 Apr 2025
Towards LLMs Robustness to Changes in Prompt Format Styles Lilian Ngweta Kiran Kate Jason Tsay Yara Rizk AAML VLM 35 0 0 09 Apr 2025
Model-Agnostic Policy Explanations with Large Language Models Zhang Xi-Jia Yue (Sophie) Guo Shufei Chen Simon Stepputtis Matthew C. Gombolay Katia P. Sycara Joseph Campbell LM&Ro LRM 57 0 0 08 Apr 2025
Accelerating Particle-based Energetic Variational Inference Xuelian Bao Lulu Kang Chun Liu Yiwei Wang BDL 64 0 0 04 Apr 2025
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models Gaurav Verma Jiawei Zhou Mohit Chandra Srijan Kumar M. D. Choudhury 53 0 0 03 Apr 2025
The quasi-semantic competence of LLMs: a case study on the part-whole relation Mattia Proietti Alessandro Lenci 48 0 0 03 Apr 2025
A Framework for Robust Cognitive Evaluation of LLMs Karin de Langis J. Park Bin Hu Khanh Chi Le Andreas Schramm Michael C. Mensink Andrew Elfenbein Dongyeop Kang 37 0 0 03 Apr 2025
Token embeddings violate the manifold hypothesis Michael Robinson Sourya Dey Tony Chiang 41 1 0 01 Apr 2025
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models Leander Girrbach Stephan Alaniz Genevieve Smith Zeynep Akata 42 0 0 30 Mar 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions Yubo Li Yidi Miao Xueying Ding Ramayya Krishnan R. Padman 37 0 0 28 Mar 2025
LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation Sarah Martinson Lingkai Kong Cheol Woo Kim Aparna Taneja Milind Tambe 37 0 0 25 Mar 2025
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language Dimitrios Stamatios Bouras Yihan Dai Tairan Wang Yingfei Xiong Sergey Mechtaev LRM 53 0 0 25 Mar 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence Sophia Hager David Mueller Kevin Duh Nicholas Andrews 67 0 0 18 Mar 2025
Aligned Probing: Relating Toxic Behavior and Model Internals Andreas Waldis Vagrant Gautam Anne Lauscher Dietrich Klakow Iryna Gurevych 45 0 0 17 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Zhaofeng Wu Michihiro Yasunaga Andrew Cohen Yoon Kim Asli Celikyilmaz Marjan Ghazvininejad 46 2 0 14 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy Ruixi Lin Ziqiao Wang Yang You FaML 86 1 0 07 Mar 2025
Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data Shiping Yang Jie Wu Wenbiao Ding Ning Wu Shining Liang Ming Gong Hengyuan Zhang Dongmei Zhang AAML 66 1 0 07 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness Tingchen Fu Fazl Barez AAML 65 0 0 03 Mar 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation Eliya Habba Ofir Arviv Itay Itzhak Yotam Perlitz Elron Bandel Leshem Choshen Michal Shmueli-Scheuer Gabriel Stanovsky 77 2 0 03 Mar 2025
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer Omer Goldman Uri Shaham Dan Malkin Sivan Eiger Avinatan Hassidim ... Shruti Rijhwani Laura Rimell Idan Szpektor Reut Tsarfaty Matan Eyal 47 3 0 28 Feb 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models Grigor Nalbandyan Rima Shahbazyan Evelina Bakhturina ELM 38 0 0 28 Feb 2025
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models Huazheng Wang Yongcheng Jing Haifeng Sun Yingjie Wang Jingchao Wang Jianxin Liao Dacheng Tao KELM MU 47 0 0 27 Feb 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising Hongye Jin Pei Chen Jingfeng Yang Zhaoxiang Wang Meng Jiang ... Xuzhi Zhang Zheng Li Tianyi Liu Huasheng Li Bing Yin 152 0 0 26 Feb 2025
Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models Konstantina Palla José Luis Redondo García C. Hauff Francesco Fabbri Henrik Lindström Daniel R. Taber Andreas Damianou M. Lalmas AILaw 67 0 0 25 Feb 2025
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions Joseph Suh Erfan Jahanparast Suhong Moon Minwoo Kang Serina Chang ALM LM&MA 57 1 0 24 Feb 2025
From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task Nicolas Martorell LLMAG 66 1 0 23 Feb 2025
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction Sarah Ball Simeon Allmendinger Frauke Kreuter Niklas Kühl 57 0 0 22 Feb 2025
Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning Yilei Tu Andrew Xue Freda Shi 49 0 0 17 Feb 2025
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities Hui Wei Zihao Zhang Shenghua He Tian Xia Shijia Pan Fei Liu 58 4 0 16 Feb 2025
Expect the Unexpected: FailSafe Long Context QA for Finance Kiran Kamble M. Russak Dmytro Mozolevskyi Muayad Ali Mateusz Russak Waseem Alshikh 85 0 0 10 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation Saurabh Kumar Pandey S. Vashistha Debrup Das Somak Aditya Monojit Choudhury AAML 74 0 0 10 Feb 2025
Benchmarking Prompt Sensitivity in Large Language Models Amirhossein Razavi Mina Soltangheis Negar Arabzadeh Sara Salamat Morteza Zihayat Ebrahim Bagheri 69 2 0 09 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization Yuanye Liu Jiahang Xu Li Zhang Qi Chen Xuan Feng Yang Chen Zhongxin Guo Yuqing Yang Cheng Peng 84 2 0 06 Feb 2025