ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11324
  4. Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt
  Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
ArXivPDFHTML

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

50 / 235 papers shown
Title
OLMES: A Standard for Language Model Evaluations
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
40
14
0
12 Jun 2024
The Impact of Quantization on Retrieval-Augmented Generation: An
  Analysis of Small LLMs
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs
Mert Yazan
Suzan Verberne
F. Situmeang
MQ
36
3
0
10 Jun 2024
On the Worst Prompt Performance of Large Language Models
On the Worst Prompt Performance of Large Language Models
Bowen Cao
Deng Cai
Zhisong Zhang
Yuexian Zou
Wai Lam
ALM
LRM
30
5
0
08 Jun 2024
Text-Guided Alternative Image Clustering
Text-Guided Alternative Image Clustering
Andreas Stephan
Lukas Miklautz
Collin Leiber
Pedro Henrique Luz de Araujo
Dominik Répás
Claudia Plant
Benjamin Roth
VLM
26
0
0
07 Jun 2024
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial
  Actions across X Community Notes and Wikipedia edits
HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Tim Franzmeyer
Aleksandar Shtedritski
Samuel Albanie
Philip Torr
João F. Henriques
Jakob N. Foerster
32
1
0
05 Jun 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation
  for Generative Large Language Models
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
30
9
0
28 May 2024
Efficient multi-prompt evaluation of LLMs
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
45
19
0
27 May 2024
Building Better AI Agents: A Provocation on the Utilisation of Persona
  in LLM-based Conversational Agents
Building Better AI Agents: A Provocation on the Utilisation of Persona in LLM-based Conversational Agents
Guangzhi Sun
Xiao Zhan
Jose Such
44
24
0
26 May 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELM
ALM
138
53
3
23 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
69
44
0
23 May 2024
Prompt Exploration with Prompt Regression
Prompt Exploration with Prompt Regression
Michael Feffer
Ronald Xu
Yuekai Sun
Mikhail Yurochkin
38
0
0
17 May 2024
Natural Language Processing RELIES on Linguistics
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
55
7
0
09 May 2024
Exploring prompts to elicit memorization in masked language model-based
  named entity recognition
Exploring prompts to elicit memorization in masked language model-based named entity recognition
Yuxi Xia
Anastasiia Sedova
Pedro Henrique Luz de Araujo
Vasiliki Kougia
Lisa Nussbaumer
Benjamin Roth
28
1
0
05 May 2024
Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Sneha Singhania
Simon Razniewski
Gerhard Weikum
RALM
34
1
0
04 May 2024
Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing
Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing
KV Aditya Srivatsa
Kaushal Kumar Maurya
Ekaterina Kochmar
54
15
0
01 May 2024
In-Context Learning with Long-Context Models: An In-Depth Exploration
In-Context Learning with Long-Context Models: An In-Depth Exploration
Amanda Bertsch
Maor Ivgi
Uri Alon
Jonathan Berant
Matthew R. Gormley
Matthew R. Gormley
Graham Neubig
ReLM
AIMat
93
64
0
30 Apr 2024
Talking Nonsense: Probing Large Language Models' Understanding of
  Adversarial Gibberish Inputs
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs
Valeriia Cherepanova
James Zou
AAML
33
4
0
26 Apr 2024
Examining the robustness of LLM evaluation to the distributional
  assumptions of benchmarks
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks
Melissa Ailem
Katerina Marazopoulou
Charlotte Siska
James Bono
59
14
0
25 Apr 2024
Does Instruction Tuning Make LLMs More Consistent?
Does Instruction Tuning Make LLMs More Consistent?
Constanza Fierro
Jiaang Li
Anders Sogaard
LRM
35
2
0
23 Apr 2024
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Saumya Gandhi
Ritu Gala
Vijay Viswanathan
Tongshuang Wu
Graham Neubig
SyDa
50
17
0
22 Apr 2024
Stronger Random Baselines for In-Context Learning
Stronger Random Baselines for In-Context Learning
Gregory Yauney
David M. Mimno
47
2
0
19 Apr 2024
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context
  Learning vs. Prompting Based Factual Knowledge Extraction
Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction
Qinyuan Wu
Mohammad Aflah Khan
Soumi Das
Vedant Nanda
Bishwamittra Ghosh
Camila Kolling
Till Speicher
Laurent Bindschaedler
Krishna P. Gummadi
Evimaria Terzi
KELM
34
4
0
19 Apr 2024
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM
  Outputs with Human Preferences
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Shreya Shankar
J.D. Zamfirescu-Pereira
Bjorn Hartmann
Aditya G. Parameswaran
Ian Arawjo
ALM
40
86
0
18 Apr 2024
Compression Represents Intelligence Linearly
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
50
26
0
15 Apr 2024
Resilience of Large Language Models for Noisy Instructions
Resilience of Large Language Models for Noisy Instructions
Bin Wang
Chengwei Wei
Zhengyuan Liu
Geyu Lin
Nancy F. Chen
47
11
0
15 Apr 2024
The Hallucinations Leaderboard -- An Open Effort to Measure
  Hallucinations in Large Language Models
The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models
Giwon Hong
Aryo Pradipta Gema
Rohit Saxena
Xiaotang Du
Ping Nie
...
Laura Perez-Beltrachini
Max Ryabinin
Xuanli He
Clémentine Fourrier
Pasquale Minervini
LRM
HILM
38
11
0
08 Apr 2024
Capabilities of Large Language Models in Control Engineering: A
  Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra
Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra
Darioush Kevian
U. Syed
Xing-ming Guo
Aaron J. Havens
Geir Dullerud
Peter M. Seiler
Lianhui Qin
Bin Hu
ELM
44
29
0
04 Apr 2024
Robust Pronoun Fidelity with English LLMs: Are they Reasoning,
  Repeating, or Just Biased?
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?
Vagrant Gautam
Eileen Bingert
D. Zhu
Anne Lauscher
Dietrich Klakow
45
8
0
04 Apr 2024
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient
  Compile-Time Prompt Optimization
Symbolic Prompt Program Search: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization
Tobias Schnabel
Jennifer Neville
LRM
29
6
0
02 Apr 2024
HyperCLOVA X Technical Report
HyperCLOVA X Technical Report
Kang Min Yoo
Jaegeun Han
Sookyo In
Heewon Jeon
Jisu Jeong
...
Hyunkyung Noh
Se-Eun Choi
Sang-Woo Lee
Jung Hwa Lim
Nako Sung
VLM
37
8
0
02 Apr 2024
PATCH -- Psychometrics-AssisTed benCHmarking of Large Language Models: A
  Case Study of Mathematics Proficiency
PATCH -- Psychometrics-AssisTed benCHmarking of Large Language Models: A Case Study of Mathematics Proficiency
Qixiang Fang
Daniel L. Oberski
Dong Nguyen
38
3
0
02 Apr 2024
Classifying Cancer Stage with Open-Source Clinical Large Language Models
Classifying Cancer Stage with Open-Source Clinical Large Language Models
Chia-Hsuan Chang
Mary M. Lucas
Grace Lu-Yao
Christopher C. Yang
21
5
0
02 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic
  Representations
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations
Deqing Fu
Ghazal Khalighinejad
Ollie Liu
Bhuwan Dhingra
Dani Yogatama
Robin Jia
W. Neiswanger
33
14
0
01 Apr 2024
Efficient Prompting Methods for Large Language Models: A Survey
Efficient Prompting Methods for Large Language Models: A Survey
Kaiyan Chang
Songcheng Xu
Chenglong Wang
Yingfeng Luo
Tong Xiao
Jingbo Zhu
LRM
45
32
0
01 Apr 2024
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and
  Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing
"Sorry, Come Again?" Prompting -- Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing
Vipula Rawte
Islam Tonmoy
M. M. Zaman
Prachi Priya
Marcin Kardas
Alan Schelten
Ruan Silva
LRM
30
1
0
27 Mar 2024
Can large language models explore in-context?
Can large language models explore in-context?
Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
LM&Ro
LLMAG
LRM
126
23
0
22 Mar 2024
On Prompt Sensitivity of ChatGPT in Affective Computing
On Prompt Sensitivity of ChatGPT in Affective Computing
Mostafa M. Amin
Björn W. Schuller
19
6
0
20 Mar 2024
Teacher-Student Training for Debiasing: General Permutation Debiasing
  for Large Language Models
Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models
Adian Liusie
Yassir Fathullah
Mark J. F. Gales
30
4
0
20 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in
  the Era of Large Foundation Models
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Chenyu You
Shih-Fu Chang
Chenhui Xu
AI4TS
66
14
0
18 Mar 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in
  Chain-of-Thought
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua
Edward Rees
Hunar Batra
Samuel R. Bowman
Julian Michael
Ethan Perez
Miles Turpin
LRM
44
13
0
08 Mar 2024
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks
Linyuan Gong
Sida Wang
Mostafa Elhoushi
Alvin Cheung
32
15
0
07 Mar 2024
Designing Informative Metrics for Few-Shot Example Selection
Designing Informative Metrics for Few-Shot Example Selection
Rishabh Adiga
Lakshminarayanan Subramanian
Varun Chandrasekaran
32
1
0
06 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
54
144
0
05 Mar 2024
Towards Measuring and Modeling "Culture" in LLMs: A Survey
Towards Measuring and Modeling "Culture" in LLMs: A Survey
Muhammad Farid Adilazuarda
Sagnik Mukherjee
Pradhyumna Lavania
Siddhant Singh
Alham Fikri Aji
Jacki OÑeill
Ashutosh Modi
Monojit Choudhury
67
54
0
05 Mar 2024
ChatGPT4PCG 2 Competition: Prompt Engineering for Science Birds Level
  Generation
ChatGPT4PCG 2 Competition: Prompt Engineering for Science Birds Level Generation
Pittawat Taveekitworachai
Febri Abdullah
Mury F. Dewantoro
Yi Xia
Pratch Suntichaikul
R. Thawonmas
Julian Togelius
Jochen Renz
34
1
0
05 Mar 2024
PHAnToM: Personality Has An Effect on Theory-of-Mind Reasoning in Large
  Language Models
PHAnToM: Personality Has An Effect on Theory-of-Mind Reasoning in Large Language Models
Fiona Anting Tan
G. Yeo
Fanyou Wu
Weijie Xu
Vinija Jain
Aman Chadha
Kokil Jaidka
Yang Liu
See-Kiong Ng
LRM
38
6
0
04 Mar 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations
  for Values and Opinions in Large Language Models
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
Paul Röttger
Valentin Hofmann
Valentina Pyatkin
Musashi Hinck
Hannah Rose Kirk
Hinrich Schütze
Dirk Hovy
ELM
26
53
0
26 Feb 2024
Repetition Improves Language Model Embeddings
Repetition Improves Language Model Embeddings
Jacob Mitchell Springer
Suhas Kotha
Daniel Fried
Graham Neubig
Aditi Raghunathan
48
29
0
23 Feb 2024
tinyBenchmarks: evaluating LLMs with fewer examples
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo
Lucas Weber
Leshem Choshen
Yuekai Sun
Gongjun Xu
Mikhail Yurochkin
ELM
26
77
0
22 Feb 2024
Identifying Multiple Personalities in Large Language Models with
  External Evaluation
Identifying Multiple Personalities in Large Language Models with External Evaluation
Xiaoyang Song
Yuta Adachi
Jessie Feng
Mouwei Lin
Linhao Yu
Frank Li
Akshat Gupta
Gopala Anumanchipalli
Simerjot Kaur
LLMAG
27
8
0
22 Feb 2024
Previous
12345
Next