ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11324
  4. Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt
  Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
ArXivPDFHTML

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

35 / 235 papers shown
Title
The Impact of Demonstrations on Multilingual In-Context Learning: A
  Multidimensional Analysis
The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis
Miaoran Zhang
Vagrant Gautam
Mingyang Wang
Jesujoba Oluwadara Alabi
Xiaoyu Shen
Dietrich Klakow
Marius Mosbach
47
8
0
20 Feb 2024
On Sensitivity of Learning with Limited Labelled Data to the Effects of
  Randomness: Impact of Interactions and Systematic Choices
On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices
Branislav Pecher
Ivan Srba
Maria Bielikova
69
3
0
20 Feb 2024
Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance
Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance
Branislav Pecher
Ivan Srba
Maria Bielikova
ALM
39
7
0
20 Feb 2024
An Empirical Categorization of Prompting Techniques for Large Language
  Models: A Practitioner's Guide
An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide
Oluwole Fagbohun
Rachel M. Harrison
Anton Dereventsov
52
6
0
18 Feb 2024
Learning From Failure: Integrating Negative Examples when Fine-tuning
  Large Language Models as Agents
Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Renxi Wang
Haonan Li
Xudong Han
Yixuan Zhang
Timothy Baldwin
LLMAG
27
22
0
18 Feb 2024
Large Language Models Can Better Understand Knowledge Graphs Than We Thought
Large Language Models Can Better Understand Knowledge Graphs Than We Thought
Xinbang Dai
Yuncheng Hua
Tongtong Wu
Yang Sheng
Qiu Ji
Guilin Qi
82
0
0
18 Feb 2024
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM
  Workflows
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Ajay Patel
Colin Raffel
Chris Callison-Burch
SyDa
AI4CE
33
25
0
16 Feb 2024
Understanding the Effects of Iterative Prompting on Truthfulness
Understanding the Effects of Iterative Prompting on Truthfulness
Satyapriya Krishna
Chirag Agarwal
Himabindu Lakkaraju
HILM
27
9
0
09 Feb 2024
Intent-based Prompt Calibration: Enhancing prompt optimization with
  synthetic boundary cases
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
Elad Levi
Eli Brosh
Matan Friedmann
24
8
0
05 Feb 2024
When Benchmarks are Targets: Revealing the Sensitivity of Large Language
  Model Leaderboards
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards
Norah A. Alzahrani
H. A. Alyahya
Sultan Yazeed Alnumay
Muhtasim Tahmid
Shaykhah Alsubaie
...
Saleh Soltan
Nathan Scales
Marie-Anne Lachaux
Samuel R. Bowman
Haidar Khan
ELM
17
69
0
01 Feb 2024
Evaluating Large Language Models for Generalization and Robustness via
  Data Compression
Evaluating Large Language Models for Generalization and Robustness via Data Compression
Yucheng Li
Yunhao Guo
Frank Guerin
Chenghua Lin
ELM
27
5
0
01 Feb 2024
What Does the Bot Say? Opportunities and Risks of Large Language Models
  in Social Media Bot Detection
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection
Shangbin Feng
Herun Wan
Ningnan Wang
Zhaoxuan Tan
Minnan Luo
Yulia Tsvetkov
AAML
DeLMO
25
16
0
01 Feb 2024
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM
  Collaboration
Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Vidhisha Balachandran
Yulia Tsvetkov
29
78
0
01 Feb 2024
Rethinking Interpretability in the Era of Large Language Models
Rethinking Interpretability in the Era of Large Language Models
Chandan Singh
J. Inala
Michel Galley
Rich Caruana
Jianfeng Gao
LRM
AI4CE
77
62
0
30 Jan 2024
Towards Consistent Natural-Language Explanations via
  Explanation-Consistency Finetuning
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
Yanda Chen
Chandan Singh
Xiaodong Liu
Simiao Zuo
Bin-Xia Yu
He He
Jianfeng Gao
LRM
25
13
0
25 Jan 2024
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
120
94
0
22 Jan 2024
An Empirical Study of In-context Learning in LLMs for Machine
  Translation
An Empirical Study of In-context Learning in LLMs for Machine Translation
Pranjal A. Chitale
Jay Gala
Raj Dabre
LRM
31
5
0
22 Jan 2024
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents
Tongxin Yuan
Zhiwei He
Lingzhong Dong
Yiming Wang
Ruijie Zhao
...
Binglin Zhou
Fangqi Li
Zhuosheng Zhang
Rui Wang
Gongshen Liu
ELM
34
61
0
18 Jan 2024
Mind Your Format: Towards Consistent Evaluation of In-Context Learning
  Improvements
Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements
Anton Voronov
Lena Wolf
Max Ryabinin
30
46
0
12 Jan 2024
The Butterfly Effect of Altering Prompts: How Small Changes and
  Jailbreaks Affect Large Language Model Performance
The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model Performance
A. Salinas
Fred Morstatter
45
49
0
08 Jan 2024
Generalist embedding models are better at short-context clinical
  semantic search than specialized embedding models
Generalist embedding models are better at short-context clinical semantic search than specialized embedding models
Jean-Baptiste Excoffier
Tom Roehr
Alexei Figueroa
Jens-Michalis Papaioannou
Keno Bressem
Matthieu Ortala
45
4
0
03 Jan 2024
State of What Art? A Call for Multi-Prompt LLM Evaluation
State of What Art? A Call for Multi-Prompt LLM Evaluation
Moran Mizrahi
Guy Kaplan
Daniel Malkin
Rotem Dror
Dafna Shahaf
Gabriel Stanovsky
ELM
32
127
0
31 Dec 2023
You don't need a personality test to know these models are unreliable:
  Assessing the Reliability of Large Language Models on Psychometric
  Instruments
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments
Bangzhao Shu
Lechen Zhang
Minje Choi
Lavinia Dunagan
Lajanugen Logeswaran
Moontae Lee
Dallas Card
David Jurgens
24
33
0
16 Nov 2023
How are Prompts Different in Terms of Sensitivity?
How are Prompts Different in Terms of Sensitivity?
Sheng Lu
Hendrik Schuff
Iryna Gurevych
40
18
0
13 Nov 2023
Prompt Engineering a Prompt Engineer
Prompt Engineering a Prompt Engineer
Qinyuan Ye
Maxamed Axmed
Reid Pryzant
Fereshte Khani
VLM
LLMAG
LRM
27
28
0
09 Nov 2023
Do LLMs exhibit human-like response biases? A case study in survey
  design
Do LLMs exhibit human-like response biases? A case study in survey design
Lindia Tjuatja
Valerie Chen
Sherry Tongshuang Wu
Ameet Talwalkar
Graham Neubig
32
80
0
07 Nov 2023
Principles from Clinical Research for NLP Model Generalization
Principles from Clinical Research for NLP Model Generalization
Aparna Elangovan
Jiayuan He
Yuan Li
Karin Verspoor
CML
32
3
0
07 Nov 2023
ArcheType: A Novel Framework for Open-Source Column Type Annotation
  using Large Language Models
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
Ben Feuer
Yurong Liu
Chinmay Hegde
Juliana Freire
AI4TS
VLM
27
9
0
27 Oct 2023
A Corpus for Sentence-level Subjectivity Detection on English News
  Articles
A Corpus for Sentence-level Subjectivity Detection on English News Articles
Francesco Antici
Andrea Galassi
Federico Ruggeri
Katerina Korre
Arianna Muti
Alessandra Bardi
Alice Fedotova
Alberto Barrón-Cedeño
40
11
0
29 May 2023
Instruction Induction: From Few Examples to Natural Language Task
  Descriptions
Instruction Induction: From Few Examples to Natural Language Task Descriptions
Or Honovich
Uri Shaham
Samuel R. Bowman
Omer Levy
ELM
LRM
120
137
0
22 May 2022
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,124
0
18 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,858
0
18 Apr 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based
  Bias in NLP
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
259
374
0
28 Feb 2021
Measuring and Improving Consistency in Pretrained Language Models
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
269
346
0
01 Feb 2021
Making Pre-trained Language Models Better Few-shot Learners
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
243
1,924
0
31 Dec 2020
Previous
12345