Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023

Yejin Choi

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

35 / 235 papers shown

Title
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis Miaoran Zhang Vagrant Gautam Mingyang Wang Jesujoba Oluwadara Alabi Xiaoyu Shen Dietrich Klakow Marius Mosbach 47 8 0 20 Feb 2024
On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices Branislav Pecher Ivan Srba Maria Bielikova 69 3 0 20 Feb 2024
Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance Branislav Pecher Ivan Srba Maria Bielikova ALM 39 7 0 20 Feb 2024
An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide Oluwole Fagbohun Rachel M. Harrison Anton Dereventsov 52 6 0 18 Feb 2024
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents Renxi Wang Haonan Li Xudong Han Yixuan Zhang Timothy Baldwin LLMAG 27 22 0 18 Feb 2024
Large Language Models Can Better Understand Knowledge Graphs Than We Thought Xinbang Dai Yuncheng Hua Tongtong Wu Yang Sheng Qiu Ji Guilin Qi 82 0 0 18 Feb 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows Ajay Patel Colin Raffel Chris Callison-Burch SyDa AI4CE 33 25 0 16 Feb 2024
Understanding the Effects of Iterative Prompting on Truthfulness Satyapriya Krishna Chirag Agarwal Himabindu Lakkaraju HILM 27 9 0 09 Feb 2024
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases Elad Levi Eli Brosh Matan Friedmann 24 8 0 05 Feb 2024
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards Norah A. Alzahrani H. A. Alyahya Sultan Yazeed Alnumay Muhtasim Tahmid Shaykhah Alsubaie ... Saleh Soltan Nathan Scales Marie-Anne Lachaux Samuel R. Bowman Haidar Khan ELM 17 69 0 01 Feb 2024
Evaluating Large Language Models for Generalization and Robustness via Data Compression Yucheng Li Yunhao Guo Frank Guerin Chenghua Lin ELM 27 5 0 01 Feb 2024
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection Shangbin Feng Herun Wan Ningnan Wang Zhaoxuan Tan Minnan Luo Yulia Tsvetkov AAML DeLMO 25 16 0 01 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration Shangbin Feng Weijia Shi Yike Wang Wenxuan Ding Vidhisha Balachandran Yulia Tsvetkov 29 78 0 01 Feb 2024
Rethinking Interpretability in the Era of Large Language Models Chandan Singh J. Inala Michel Galley Rich Caruana Jianfeng Gao LRM AI4CE 77 62 0 30 Jan 2024
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning Yanda Chen Chandan Singh Xiaodong Liu Simiao Zuo Bin-Xia Yu He He Jianfeng Gao LRM 25 13 0 25 Jan 2024
WARM: On the Benefits of Weight Averaged Reward Models Alexandre Ramé Nino Vieillard Léonard Hussenot Robert Dadashi Geoffrey Cideron Olivier Bachem Johan Ferret 120 94 0 22 Jan 2024
An Empirical Study of In-context Learning in LLMs for Machine Translation Pranjal A. Chitale Jay Gala Raj Dabre LRM 31 5 0 22 Jan 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents Tongxin Yuan Zhiwei He Lingzhong Dong Yiming Wang Ruijie Zhao ... Binglin Zhou Fangqi Li Zhuosheng Zhang Rui Wang Gongshen Liu ELM 34 61 0 18 Jan 2024
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements Anton Voronov Lena Wolf Max Ryabinin 30 46 0 12 Jan 2024
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance A. Salinas Fred Morstatter 45 49 0 08 Jan 2024
Generalist embedding models are better at short-context clinical semantic search than specialized embedding models Jean-Baptiste Excoffier Tom Roehr Alexei Figueroa Jens-Michalis Papaioannou Keno Bressem Matthieu Ortala 45 4 0 03 Jan 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation Moran Mizrahi Guy Kaplan Daniel Malkin Rotem Dror Dafna Shahaf Gabriel Stanovsky ELM 32 127 0 31 Dec 2023
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments Bangzhao Shu Lechen Zhang Minje Choi Lavinia Dunagan Lajanugen Logeswaran Moontae Lee Dallas Card David Jurgens 24 33 0 16 Nov 2023
How are Prompts Different in Terms of Sensitivity? Sheng Lu Hendrik Schuff Iryna Gurevych 40 18 0 13 Nov 2023
Prompt Engineering a Prompt Engineer Qinyuan Ye Maxamed Axmed Reid Pryzant Fereshte Khani VLM LLMAG LRM 27 28 0 09 Nov 2023
Do LLMs exhibit human-like response biases? A case study in survey design Lindia Tjuatja Valerie Chen Sherry Tongshuang Wu Ameet Talwalkar Graham Neubig 32 80 0 07 Nov 2023
Principles from Clinical Research for NLP Model Generalization Aparna Elangovan Jiayuan He Yuan Li Karin Verspoor CML 32 3 0 07 Nov 2023
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models Ben Feuer Yurong Liu Chinmay Hegde Juliana Freire AI4TS VLM 27 9 0 27 Oct 2023
A Corpus for Sentence-level Subjectivity Detection on English News Articles Francesco Antici Andrea Galassi Federico Ruggeri Katerina Korre Arianna Muti Alessandra Bardi Alice Fedotova Alberto Barrón-Cedeño 40 11 0 29 May 2023
Instruction Induction: From Few Examples to Natural Language Task Descriptions Or Honovich Uri Shaham Samuel R. Bowman Omer Levy ELM LRM 120 137 0 22 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity Yao Lu Max Bartolo Alastair Moore Sebastian Riedel Pontus Stenetorp AILaw LRM 279 1,124 0 18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 280 3,858 0 18 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 259 374 0 28 Feb 2021
Measuring and Improving Consistency in Pretrained Language Models Yanai Elazar Nora Kassner Shauli Ravfogel Abhilasha Ravichander Eduard H. Hovy Hinrich Schütze Yoav Goldberg HILM 269 346 0 01 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners Tianyu Gao Adam Fisch Danqi Chen 243 1,924 0 31 Dec 2020