ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11324
  4. Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt
  Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
ArXivPDFHTML

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

50 / 235 papers shown
Title
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
72
5
0
27 Aug 2024
Toward the Evaluation of Large Language Models Considering Score
  Variance across Instruction Templates
Toward the Evaluation of Large Language Models Considering Score Variance across Instruction Templates
Yusuke Sakai
Adam Nohejl
Jiangnan Hang
Hidetaka Kamigaito
Taro Watanabe
ELM
36
2
0
22 Aug 2024
Estimating Contribution Quality in Online Deliberations Using a Large
  Language Model
Estimating Contribution Quality in Online Deliberations Using a Large Language Model
Lodewijk Gelauff
Mohak Goyal
Bhargav Dindukurthi
Ashish Goel
Alice Siu
43
0
0
21 Aug 2024
DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large
  Language Models
DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models
Yuanhao Zeng
Fei Ren
Xinpeng Zhou
Yihang Wang
Yingxia Shao
ALM
36
0
0
19 Aug 2024
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models
Kaushal Kumar Maurya
KV Aditya Srivatsa
Ekaterina Kochmar
40
2
0
16 Aug 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Hila Gonen
Terra Blevins
Alisa Liu
Luke Zettlemoyer
Noah A. Smith
31
5
0
12 Aug 2024
Evaluating Language Model Math Reasoning via Grounding in Educational
  Curricula
Evaluating Language Model Math Reasoning via Grounding in Educational Curricula
L. Lucy
Tal August
Rose E. Wang
Luca Soldaini
Courtney Allison
Kyle Lo
ReLM
LRM
29
1
0
08 Aug 2024
Let Me Speak Freely? A Study on the Impact of Format Restrictions on
  Performance of Large Language Models
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Zhi Rui Tam
Cheng-Kuang Wu
Yi-Lin Tsai
Chieh-Yen Lin
Hung-yi Lee
Yun-Nung Chen
27
24
0
05 Aug 2024
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models
Zi Liang
Haibo Hu
Qingqing Ye
Yaxin Xiao
Haoyang Li
AAML
ELM
SILM
56
6
0
05 Aug 2024
A Novel Metric for Measuring the Robustness of Large Language Models in
  Non-adversarial Scenarios
A Novel Metric for Measuring the Robustness of Large Language Models in Non-adversarial Scenarios
Samuel Ackerman
Ella Rabinovich
E. Farchi
Ateret Anaby-Tavor
33
1
0
04 Aug 2024
Gender, Race, and Intersectional Bias in Resume Screening via Language
  Model Retrieval
Gender, Race, and Intersectional Bias in Resume Screening via Language Model Retrieval
Kyra Wilson
Aylin Caliskan
47
17
0
29 Jul 2024
Improving Minimum Bayes Risk Decoding with Multi-Prompt
Improving Minimum Bayes Risk Decoding with Multi-Prompt
David Heineman
Yao Dou
Wei-ping Xu
36
6
0
22 Jul 2024
Adversarial Databases Improve Success in Retrieval-based Large Language
  Models
Adversarial Databases Improve Success in Retrieval-based Large Language Models
Sean Wu
Michael Koo
Li Yo Kao
Andy Black
L. Blum
Fabien Scalzo
Ira Kurtz
RALM
38
0
0
19 Jul 2024
Internal Consistency and Self-Feedback in Large Language Models: A
  Survey
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Zhiyu Li
Zhiyu Li
HILM
LRM
70
25
0
19 Jul 2024
The Better Angels of Machine Personality: How Personality Relates to LLM
  Safety
The Better Angels of Machine Personality: How Personality Relates to LLM Safety
Jie Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
LLMAG
PILM
53
12
0
17 Jul 2024
Questionable practices in machine learning
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
56
4
0
17 Jul 2024
SELF-GUIDE: Better Task-Specific Instruction Following via
  Self-Synthetic Finetuning
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Chenyang Zhao
Xueying Jia
Vijay Viswanathan
Tongshuang Wu
Graham Neubig
SyDa
ALM
53
25
0
16 Jul 2024
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on
  LLM-based Programming Assistants
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants
John Heibel
Daniel Lowd
AAML
40
3
0
12 Jul 2024
Open (Clinical) LLMs are Sensitive to Instruction Phrasings
Open (Clinical) LLMs are Sensitive to Instruction Phrasings
Alberto Mario Ceballos Arroyo
Monica Munnangi
Jiuding Sun
Karen Y.C. Zhang
Denis Jered McInerney
Byron C. Wallace
Silvio Amir
LM&MA
26
8
0
12 Jul 2024
Accuracy is Not All You Need
Accuracy is Not All You Need
Abhinav Dutta
Sanjeev Krishnan
Nipun Kwatra
Ramachandran Ramjee
49
3
0
12 Jul 2024
Probability of Differentiation Reveals Brittleness of Homogeneity Bias
  in Large Language Models
Probability of Differentiation Reveals Brittleness of Homogeneity Bias in Large Language Models
Messi H.J. Lee
Calvin K. Lai
28
0
0
10 Jul 2024
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Can Model Uncertainty Function as a Proxy for Multiple-Choice Question Item Difficulty?
Leonidas Zotos
H. Rijn
Malvina Nissim
ELM
46
2
0
07 Jul 2024
Towards Automating Text Annotation: A Case Study on Semantic Proximity
  Annotation using GPT-4
Towards Automating Text Annotation: A Case Study on Semantic Proximity Annotation using GPT-4
Sachin Yadav
Tejaswi Choppa
Dominik Schlechtweg
VLM
45
4
0
04 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Chenyu You
Jimmy Huang
ELM
ALM
31
28
0
04 Jul 2024
Social Bias Evaluation for Large Language Models Requires Prompt
  Variations
Social Bias Evaluation for Large Language Models Requires Prompt Variations
Rem Hida
Masahiro Kaneko
Naoaki Okazaki
40
14
0
03 Jul 2024
Towards More Realistic Extraction Attacks: An Adversarial Perspective
Towards More Realistic Extraction Attacks: An Adversarial Perspective
Yash More
Prakhar Ganesh
G. Farnadi
AAML
74
6
0
02 Jul 2024
Generative Monoculture in Large Language Models
Generative Monoculture in Large Language Models
Fan Wu
Emily Black
Varun Chandrasekaran
SyDa
35
3
0
02 Jul 2024
Leveraging Machine-Generated Rationales to Facilitate Social Meaning
  Detection in Conversations
Leveraging Machine-Generated Rationales to Facilitate Social Meaning Detection in Conversations
Ritam Dutt
Zhen Wu
Kelly Shi
Divyanshu Sheth
Prakhar Gupta
Carolyn Rose
46
2
0
27 Jun 2024
The Illusion of Competence: Evaluating the Effect of Explanations on
  Users' Mental Models of Visual Question Answering Systems
The Illusion of Competence: Evaluating the Effect of Explanations on Users' Mental Models of Visual Question Answering Systems
Judith Sieker
Simeon Junker
R. Utescher
Nazia Attari
H. Wersing
Hendrik Buschmeier
Sina Zarrieß
30
1
0
27 Jun 2024
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to
  Low-Resource Languages using Large Language Models
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models
Vipul Rathore
Aniruddha Deb
Ankish Chandresh
Parag Singla
Mausam
LRM
52
0
0
27 Jun 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine
  Translation and Summarization Evaluation
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
34
8
0
26 Jun 2024
Native Design Bias: Studying the Impact of English Nativeness on
  Language Model Performance
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance
Manon Reusens
Philipp Borchert
Jochen De Weerdt
Bart Baesens
44
1
0
25 Jun 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
30
2
0
24 Jun 2024
Investigating the Influence of Prompt-Specific Shortcuts in AI Generated
  Text Detection
Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection
Choonghyun Park
Hyuhng Joon Kim
Junyeob Kim
Youna Kim
Taeuk Kim
Hyunsoo Cho
Hwiyeol Jo
Sang-goo Lee
Kang Min Yoo
AAML
46
1
0
24 Jun 2024
Large Language Models Assume People are More Rational than We Really are
Large Language Models Assume People are More Rational than We Really are
Ryan Liu
Jiayi Geng
Joshua C. Peterson
Ilia Sucholutsky
Thomas L. Griffiths
76
17
0
24 Jun 2024
SEAM: A Stochastic Benchmark for Multi-Document Tasks
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior
Avi Caciularu
Arie Cattan
Shahar Levy
Ori Shapira
Gabriel Stanovsky
RALM
40
4
0
23 Jun 2024
Serial Position Effects of Large Language Models
Serial Position Effects of Large Language Models
Xiaobo Guo
Soroush Vosoughi
45
3
0
23 Jun 2024
Evidence of a log scaling law for political persuasion with large
  language models
Evidence of a log scaling law for political persuasion with large language models
Kobi Hackenburg
Ben M. Tappin
Paul Röttger
Scott Hale
Jonathan Bright
Helen Z. Margetts
36
7
0
20 Jun 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee
Seungwon Lim
Seungju Han
Giyeong Oh
Hyungjoo Chae
...
Beong-woo Kwak
Yeonsoo Lee
Dongha Lee
Jinyoung Yeo
Youngjae Yu
41
8
0
20 Jun 2024
BeHonest: Benchmarking Honesty in Large Language Models
BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern
Zhulin Hu
Yuqing Yang
Ethan Chern
Yuan Guo
Jiahe Jin
Binjie Wang
Pengfei Liu
HILM
ALM
86
3
0
19 Jun 2024
When Parts are Greater Than Sums: Individual LLM Components Can
  Outperform Full Models
When Parts are Greater Than Sums: Individual LLM Components Can Outperform Full Models
Ting-Yun Chang
Jesse Thomason
Robin Jia
45
4
0
19 Jun 2024
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering
Federico Errica
G. Siracusano
D. Sanvito
Roberto Bifulco
80
19
0
18 Jun 2024
Large Scale Transfer Learning for Tabular Data via Language Modeling
Large Scale Transfer Learning for Tabular Data via Language Modeling
Josh Gardner
Juan C. Perdomo
Ludwig Schmidt
LMTD
36
13
0
17 Jun 2024
Cultural Conditioning or Placebo? On the Effectiveness of
  Socio-Demographic Prompting
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
Sagnik Mukherjee
Muhammad Farid Adilazuarda
Sunayana Sitaram
Kalika Bali
Alham Fikri Aji
Monojit Choudhury
43
5
0
17 Jun 2024
Fairer Preferences Elicit Improved Human-Aligned Large Language Model
  Judgments
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments
Han Zhou
Xingchen Wan
Yinhong Liu
Nigel Collier
Ivan Vulić
Anna Korhonen
ALM
44
9
0
17 Jun 2024
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and
  Lexical Alterations
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
43
9
0
17 Jun 2024
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
Zhenyu Zhang
Bingguang Hao
Jinpeng Li
Zekai Zhang
Dongyan Zhao
33
0
0
16 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
51
10
0
14 Jun 2024
ECBD: Evidence-Centered Benchmark Design for NLP
ECBD: Evidence-Centered Benchmark Design for NLP
Yu Lu Liu
Su Lin Blodgett
Jackie Chi Kit Cheung
Q. Vera Liao
Alexandra Olteanu
Ziang Xiao
30
10
0
13 Jun 2024
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
58
13
0
13 Jun 2024
Previous
12345
Next