ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11324
  4. Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt
  Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
ArXivPDFHTML

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

50 / 235 papers shown
Title
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Yuanye Liu
Jiahang Xu
Li Zhang
Qi Chen
Xuan Feng
Yang Chen
Zhongxin Guo
Yuqing Yang
Cheng Peng
84
2
0
06 Feb 2025
The Curious Case of Arbitrariness in Machine Learning
Prakhar Ganesh
Afaf Taik
G. Farnadi
59
2
0
28 Jan 2025
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Understanding the LLM-ification of CHI: Unpacking the Impact of LLMs at CHI through a Systematic Literature Review
Rock Yuren Pang
Hope Schroeder
Kynnedy Simone Smith
Solon Barocas
Ziang Xiao
Emily Tseng
Danielle Bragg
77
3
0
22 Jan 2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Bill Yuchen Lin
Radha Poovendran
SILM
81
6
0
08 Jan 2025
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation
Leonidas Zotos
H. Rijn
Malvina Nissim
75
0
0
16 Dec 2024
SailCompass: Towards Reproducible and Robust Evaluation for Southeast
  Asian Languages
SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages
Jia Guo
Longxu Dou
Guangtao Zeng
Stanley Kok
Wei Lu
Qian Liu
ELM
LRM
81
1
0
02 Dec 2024
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context
  Learning via MCTS
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Feihu Che
Zengqi Wen
J. Tao
ReLM
LRM
115
9
0
27 Nov 2024
Explaining GPT-4's Schema of Depression Using Machine Behavior Analysis
Explaining GPT-4's Schema of Depression Using Machine Behavior Analysis
Adithya V Ganesan
Vasudha Varadarajan
Yash Kumar Lal
Veerle C. Eijsbroek
Katarina Kjell
...
Elizabeth C. Stade
J. Eichstaedt
Ryan L. Boyd
H. A. Schwartz
Lucie Flek
AI4MH
77
0
0
21 Nov 2024
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from
  Human Demonstrations
AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations
Gaurav Verma
Rachneet Kaur
Nishan Srishankar
Zhen Zeng
T. Balch
Manuela Veloso
LLMAG
72
5
0
20 Nov 2024
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and
  Establishing Best Practices
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices
Anka Reuel
Amelia F. Hardy
Chandler Smith
Max Lamparth
Malcolm Hardy
Mykel J. Kochenderfer
ELM
81
17
0
20 Nov 2024
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world
  Question Answering
Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering
Aryan Keluskar
Amrita Bhattacharjee
Huan Liu
72
2
0
19 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
195
1
0
19 Nov 2024
Does Prompt Formatting Have Any Impact on LLM Performance?
Does Prompt Formatting Have Any Impact on LLM Performance?
Jia He
Mukund Rungta
David Koleczek
Arshdeep Sekhon
Franklin X Wang
Sadid Hasan
LLMAG
LRM
27
36
0
15 Nov 2024
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for
  Improved Prompt Engineering
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering
Ishika Joshi
Simra Shahid
Shreeya Venneti
Manushree Vasu
Yantao Zheng
Yunyao Li
Balaji Krishnamurthy
Gromit Yeuk-Yin Chan
31
3
0
09 Nov 2024
Medical Adaptation of Large Language and Vision-Language Models: Are We
  Making Progress?
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
Daniel P. Jeong
Saurabh Garg
Zachary Chase Lipton
Michael Oberst
LM&MA
VLM
ELM
37
9
0
06 Nov 2024
Controlling Language and Diffusion Models by Transporting Activations
Controlling Language and Diffusion Models by Transporting Activations
P. Rodríguez
Arno Blaas
Michal Klein
Luca Zappella
N. Apostoloff
Marco Cuturi
Xavier Suau
LLMSV
40
4
0
30 Oct 2024
Attention Speaks Volumes: Localizing and Mitigating Bias in Language
  Models
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Rishabh Adiga
Besmira Nushi
Varun Chandrasekaran
49
0
0
29 Oct 2024
A Bayesian Approach to Harnessing the Power of LLMs in Authorship
  Attribution
A Bayesian Approach to Harnessing the Power of LLMs in Authorship Attribution
Zhengmian Hu
Tong Zheng
Heng Huang
BDL
29
2
0
29 Oct 2024
Vulnerability of LLMs to Vertically Aligned Text Manipulations
Vulnerability of LLMs to Vertically Aligned Text Manipulations
Zhecheng Li
Yijiao Wang
Bryan Hooi
Yujun Cai
Zhen Xiong
Nanyun Peng
Kai-Wei Chang
56
1
0
26 Oct 2024
Reinforcement Learning for Aligning Large Language Models Agents with
  Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Reinforcement Learning for Aligning Large Language Models Agents with Interactive Environments: Quantifying and Mitigating Prompt Overfitting
Mohamed Salim Aissi
Clément Romac
Thomas Carta
Sylvain Lamprier
Pierre-Yves Oudeyer
Olivier Sigaud
Laure Soulier
Nicolas Thome
24
2
0
25 Oct 2024
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina
Yuan Gao
Dokyun Lee
Gordon Burtch
Sina Fazelpour
LRM
56
7
0
25 Oct 2024
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach
Yiran Huang
Stephan Alaniz
Trevor Darrell
Zeynep Akata
VLM
47
2
0
25 Oct 2024
LanFL: Differentially Private Federated Learning with Large Language
  Models using Synthetic Samples
LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples
Huiyu Wu
Diego Klabjan
FedML
46
0
0
24 Oct 2024
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing
  Prompts
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts
Yuxuan Xie
Tianhua Li
Wenqi Shao
Kaipeng Zhang
25
0
0
23 Oct 2024
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Wenkai Li
Jiarui Liu
Andy Liu
Xuhui Zhou
Mona Diab
Maarten Sap
56
6
0
21 Oct 2024
Do LLMs "know" internally when they follow instructions?
Do LLMs "know" internally when they follow instructions?
Juyeon Heo
Christina Heinze-Deml
Oussama Elachqar
Shirley Ren
Udhay Nallasamy
Andy Miller
Kwan Ho Ryan Chan
Jaya Narain
51
5
0
18 Oct 2024
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
Akshara Prabhakar
Yuanzhi Li
Karthik Narasimhan
Sham Kakade
Eran Malach
Samy Jelassi
MoMe
36
9
0
16 Oct 2024
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Jingming Zhuo
S. Zhang
Xinyu Fang
Haodong Duan
Dahua Lin
Kai Chen
34
19
0
16 Oct 2024
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in
  Integrating LLMs into Software Products
Beyond the Comfort Zone: Emerging Solutions to Overcome Challenges in Integrating LLMs into Software Products
Nadia Nahar
Christian Kastner
Jenna L. Butler
Chris Parnin
Thomas Zimmermann
Christian Bird
57
3
0
15 Oct 2024
Evaluating Gender Bias of LLMs in Making Morality Judgements
Evaluating Gender Bias of LLMs in Making Morality Judgements
Divij Bajaj
Yuanyuan Lei
Jonathan Tong
Ruihong Huang
37
3
0
13 Oct 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM
  Agent Cyber Offense Capabilities
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Andrey Anurin
Jonathan Ng
Kibo Schaffer
Jason Schreiber
Esben Kran
ELM
40
5
0
10 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation
ReIFE: Re-evaluating Instruction-Following Evaluation
Yixin Liu
Kejian Shi
Alexander R. Fabbri
Yilun Zhao
Peifeng Wang
Chien-Sheng Wu
Shafiq Joty
Arman Cohan
27
6
0
09 Oct 2024
POSIX: A Prompt Sensitivity Index For Large Language Models
POSIX: A Prompt Sensitivity Index For Large Language Models
Anwoy Chatterjee
H. S. V. N. S. K. Renduchintala
S. Bhatia
Tanmoy Chakraborty
AAML
39
6
0
03 Oct 2024
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring
  Framework for Open-Ended Learning Environments
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments
Amogh Mannekote
Adam Davies
Jina Kang
K. Boyer
33
1
0
03 Oct 2024
'Simulacrum of Stories': Examining Large Language Models as Qualitative
  Research Participants
'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants
Shivani Kapania
William Agnew
Motahhare Eslami
Hoda Heidari
Sarah E Fox
42
4
0
28 Sep 2024
A Survey on the Honesty of Large Language Models
A Survey on the Honesty of Large Language Models
Siheng Li
Cheng Yang
Taiqiang Wu
Chufan Shi
Yuji Zhang
...
Jie Zhou
Yujiu Yang
Ngai Wong
Xixin Wu
Wai Lam
HILM
35
4
0
27 Sep 2024
Data Analysis in the Era of Generative AI
Data Analysis in the Era of Generative AI
J. Inala
Chenglong Wang
Steven Drucker
Gonzalo Ramos
Victor C. Dibia
N. Riche
Dave Brown
Dan Marshall
Jianfeng Gao
29
7
0
27 Sep 2024
DARE: Diverse Visual Question Answering with Robustness Evaluation
DARE: Diverse Visual Question Answering with Robustness Evaluation
Hannah Sterz
Jonas Pfeiffer
Ivan Vulić
OOD
VLM
26
2
0
26 Sep 2024
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction
Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction
Yuanchao Li
Yuan Gong
Chao-Han Huck Yang
P. Bell
Catherine Lai
45
1
0
23 Sep 2024
SSE: Multimodal Semantic Data Selection and Enrichment for
  Industrial-scale Data Assimilation
SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Maying Shen
Nadine Chang
Sifei Liu
Jose M. Alvarez
36
0
0
20 Sep 2024
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time
Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time
David Herel
Vojtech Bartek
Jiri Jirak
Tomáš Mikolov
50
2
0
20 Sep 2024
Pay Attention to What Matters
Pay Attention to What Matters
Pedro Luiz Silva
Antonio De Domenico
Ali Maatouk
Fadhel Ayed
ALM
29
0
0
19 Sep 2024
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination
Eva Sánchez Salido
Roser Morante
Julio Gonzalo
Guillermo Marco
Jorge Carrillo-de-Albornoz
...
Enrique Amigó
Andrés Fernández
Alejandro Benito-Santos
Adrián Ghajari Espinosa
Victor Fresno
ELM
51
0
0
19 Sep 2024
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based
  Measures for Social Science Research
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research
Yi Yang
Hanyu Duan
Jiaxin Liu
Kar Yan Tam
21
0
0
19 Sep 2024
A sound description: Exploring prompt templates and class descriptions
  to enhance zero-shot audio classification
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
Michel Olvera
Paraskevas Stamatiadis
S. Essid
VLM
37
1
0
19 Sep 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
72
1
0
17 Sep 2024
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
AI Conversational Interviewing: Transforming Surveys with LLMs as Adaptive Interviewers
Alexander Wuttke
Matthias Aßenmacher
Christopher Klamm
Max M. Lang
Quirin Würschinger
Frauke Kreuter
44
2
0
16 Sep 2024
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering
Sacha Muller
António Loison
Bilel Omrani
Gautier Viaud
RALM
ELM
38
1
0
10 Sep 2024
End User Authoring of Personalized Content Classifiers: Comparing
  Example Labeling, Rule Writing, and LLM Prompting
End User Authoring of Personalized Content Classifiers: Comparing Example Labeling, Rule Writing, and LLM Prompting
Leijie Wang
Kathryn Yurechko
Pranati Dani
Quan Ze Chen
Amy X. Zhang
50
3
0
05 Sep 2024
Irrelevant Alternatives Bias Large Language Model Hiring Decisions
Irrelevant Alternatives Bias Large Language Model Hiring Decisions
Kremena Valkanova
Pencho Yordanov
23
0
0
04 Sep 2024
Previous
12345
Next