Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.11324
Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"
50 / 235 papers shown
Title
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
72
5
0
27 Aug 2024
Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
Yusuke Sakai
Adam Nohejl
Jiangnan Hang
Hidetaka Kamigaito
Taro Watanabe
ELM
36
2
0
22 Aug 2024
Estimating Contribution Quality in Online Deliberations Using a Large Language Model
Lodewijk Gelauff
Mohak Goyal
Bhargav Dindukurthi
Ashish Goel
Alice Siu
43
0
0
21 Aug 2024
DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models
Yuanhao Zeng
Fei Ren
Xinpeng Zhou
Yihang Wang
Yingxia Shao
ALM
36
0
0
19 Aug 2024
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
Kaushal Kumar Maurya
KV Aditya Srivatsa
Ekaterina Kochmar
40
2
0
16 Aug 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Hila Gonen
Terra Blevins
Alisa Liu
Luke Zettlemoyer
Noah A. Smith
31
5
0
12 Aug 2024
Evaluating Language Model Math Reasoning via Grounding in Educational Curricula
L. Lucy
Tal August
Rose E. Wang
Luca Soldaini
Courtney Allison
Kyle Lo
ReLM
LRM
29
1
0
08 Aug 2024
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Zhi Rui Tam
Cheng-Kuang Wu
Yi-Lin Tsai
Chieh-Yen Lin
Hung-yi Lee
Yun-Nung Chen
27
24
0
05 Aug 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
56
6
0
05 Aug 2024
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios
Samuel Ackerman
Ella Rabinovich
E. Farchi
Ateret Anaby-Tavor
33
1
0
04 Aug 2024
Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval
Kyra Wilson
Aylin Caliskan
47
17
0
29 Jul 2024
Improving Minimum Bayes Risk Decoding with Multi-Prompt
David Heineman
Yao Dou
Wei-ping Xu
36
6
0
22 Jul 2024
Adversarial Databases Improve Success in Retrieval-based Large Language Models
Sean Wu
Michael Koo
Li Yo Kao
Andy Black
L. Blum
Fabien Scalzo
Ira Kurtz
RALM
38
0
0
19 Jul 2024
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Zhiyu Li
Zhiyu Li
HILM
LRM
70
25
0
19 Jul 2024
The Better Angels of Machine Personality: How Personality Relates to LLM Safety
Jie Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
LLMAG
PILM
53
12
0
17 Jul 2024
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
56
4
0
17 Jul 2024
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Chenyang Zhao
Xueying Jia
Vijay Viswanathan
Tongshuang Wu
Graham Neubig
SyDa
ALM
53
25
0
16 Jul 2024
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants
John Heibel
Daniel Lowd
AAML
40
3
0
12 Jul 2024
Open (Clinical) LLMs are Sensitive to Instruction Phrasings
Alberto Mario Ceballos Arroyo
Monica Munnangi
Jiuding Sun
Karen Y.C. Zhang
Denis Jered McInerney
Byron C. Wallace
Silvio Amir
LM&MA
26
8
0
12 Jul 2024
Accuracy is Not All You Need
Abhinav Dutta
Sanjeev Krishnan
Nipun Kwatra
Ramachandran Ramjee
49
3
0
12 Jul 2024
Probability of Differentiation Reveals Brittleness of Homogeneity Bias in Large Language Models
Messi H.J. Lee
Calvin K. Lai
28
0
0
10 Jul 2024
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Leonidas Zotos
H. Rijn
Malvina Nissim
ELM
46
2
0
07 Jul 2024
Towards Automating Text Annotation: A Case Study on Semantic Proximity Annotation using GPT-4
Sachin Yadav
Tejaswi Choppa
Dominik Schlechtweg
VLM
45
4
0
04 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Chenyu You
Jimmy Huang
ELM
ALM
31
28
0
04 Jul 2024
Social Bias Evaluation for Large Language Models Requires Prompt Variations
Rem Hida
Masahiro Kaneko
Naoaki Okazaki
40
14
0
03 Jul 2024
Towards More Realistic Extraction Attacks: An Adversarial Perspective
Yash More
Prakhar Ganesh
G. Farnadi
AAML
74
6
0
02 Jul 2024
Generative Monoculture in Large Language Models
Fan Wu
Emily Black
Varun Chandrasekaran
SyDa
35
3
0
02 Jul 2024
Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations
Ritam Dutt
Zhen Wu
Kelly Shi
Divyanshu Sheth
Prakhar Gupta
Carolyn Rose
46
2
0
27 Jun 2024
The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems
Judith Sieker
Simeon Junker
R. Utescher
Nazia Attari
H. Wersing
Hendrik Buschmeier
Sina Zarrieß
30
1
0
27 Jun 2024
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models
Vipul Rathore
Aniruddha Deb
Ankish Chandresh
Parag Singla
Mausam
LRM
52
0
0
27 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
34
8
0
26 Jun 2024
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance
Manon Reusens
Philipp Borchert
Jochen De Weerdt
Bart Baesens
44
1
0
25 Jun 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
30
2
0
24 Jun 2024
Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection
Choonghyun Park
Hyuhng Joon Kim
Junyeob Kim
Youna Kim
Taeuk Kim
Hyunsoo Cho
Hwiyeol Jo
Sang-goo Lee
Kang Min Yoo
AAML
46
1
0
24 Jun 2024
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas L. Griffiths
76
17
0
24 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
40
4
0
23 Jun 2024
Serial Position Effects of Large Language Models
Xiaobo Guo
Soroush Vosoughi
45
3
0
23 Jun 2024
Evidence of a log scaling law for political persuasion with large language models
Kobi Hackenburg
Ben M. Tappin
Paul Röttger
Scott Hale
Jonathan Bright
Helen Z. Margetts
36
7
0
20 Jun 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee
Seungwon Lim
Seungju Han
Giyeong Oh
Hyungjoo Chae
...
Beong-woo Kwak
Yeonsoo Lee
Dongha Lee
Jinyoung Yeo
Youngjae Yu
41
8
0
20 Jun 2024
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
86
3
0
19 Jun 2024
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Ting-Yun Chang
Jesse Thomason
Robin Jia
45
4
0
19 Jun 2024
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
80
19
0
18 Jun 2024
Large Scale Transfer Learning for Tabular Data via Language Modeling
Josh Gardner
Juan C. Perdomo
Ludwig Schmidt
LMTD
36
13
0
17 Jun 2024
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
Sagnik Mukherjee
Muhammad Farid Adilazuarda
Sunayana Sitaram
Kalika Bali
Alham Fikri Aji
Monojit Choudhury
43
5
0
17 Jun 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou
Xingchen Wan
Yinhong Liu
Nigel Collier
Ivan Vulić
Anna Korhonen
ALM
44
9
0
17 Jun 2024
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
43
9
0
17 Jun 2024
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
Zhenyu Zhang
Bingguang Hao
Jinpeng Li
Zekai Zhang
Dongyan Zhao
33
0
0
16 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
51
10
0
14 Jun 2024
ECBD: Evidence-Centered Benchmark Design for NLP
Yu Lu Liu
Su Lin Blodgett
Jackie Chi Kit Cheung
Q. Vera Liao
Alexandra Olteanu
Ziang Xiao
30
10
0
13 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
58
13
0
13 Jun 2024
Previous
1
2
3
4
5
Next