ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.11324
  4. Cited By
Quantifying Language Models' Sensitivity to Spurious Features in Prompt
  Design or: How I learned to start worrying about prompt formatting

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

17 October 2023
Melanie Sclar
Yejin Choi
Yulia Tsvetkov
Alane Suhr
ArXivPDFHTML

Papers citing "Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting"

50 / 235 papers shown
Title
Leveraging LLM Inconsistency to Boost Pass@k Performance
Leveraging LLM Inconsistency to Boost Pass@k Performance
Uri Dalal
Meirav Segal
Zvika Ben-Haim
Dan Lahav
Omer Nevo
9
0
0
19 May 2025
PromptPrism: A Linguistically-Inspired Taxonomy for Prompts
PromptPrism: A Linguistically-Inspired Taxonomy for Prompts
Sullam Jeoung
Yueyan Chen
Yi Zhang
Shuai Wang
Haibo Ding
Lin Lee Cheong
12
0
0
19 May 2025
How Reliable is Multilingual LLM-as-a-Judge?
How Reliable is Multilingual LLM-as-a-Judge?
Xiyan Fu
Wei Liu
ELM
4
0
0
18 May 2025
Improving Fairness in LLMs Through Testing-Time Adversaries
Improving Fairness in LLMs Through Testing-Time Adversaries
Isabela Pereira Gregio
Ian Pons
Anna Helena Reali Costa
Artur Jordao
AAML
9
0
0
17 May 2025
The Effects of Demographic Instructions on LLM Personas
The Effects of Demographic Instructions on LLM Personas
Angel Felipe Magnossão de Paula
J. Shane Culpepper
Alistair Moffat
Sachin Pathiyan Cherumanal
Falk Scholer
Johanne Trippas
4
0
0
17 May 2025
LLM Agents Are Hypersensitive to Nudges
LLM Agents Are Hypersensitive to Nudges
Manuel Cherep
Pattie Maes
Nikhil Singh
2
0
0
16 May 2025
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs
Lake Yin
Fan Huang
19
0
0
15 May 2025
LLM-Augmented Chemical Synthesis and Design Decision Programs
LLM-Augmented Chemical Synthesis and Design Decision Programs
Haorui Wang
Jeff Guo
Lingkai Kong
R. Ramprasad
Philippe Schwaller
Yuanqi Du
Chao Zhang
31
0
0
11 May 2025
Say It Another Way: A Framework for User-Grounded Paraphrasing
Say It Another Way: A Framework for User-Grounded Paraphrasing
Cléa Chataigner
Rebecca Ma
Prakhar Ganesh
Afaf Taik
Elliot Creager
G. Farnadi
42
0
0
06 May 2025
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs
Elisa Forcada Rodríguez
Olatz Perez-de-Viñaspre
Jon Ander Campos
Dietrich Klakow
Vagrant Gautam
32
0
0
05 May 2025
ConSens: Assessing context grounding in open-book question answering
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
69
0
0
30 Apr 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
31
0
0
29 Apr 2025
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations
Cooking Up Creativity: A Cognitively-Inspired Approach for Enhancing LLM Creativity through Structured Representations
Moran Mizrahi
Chen Shani
Gabriel Stanovsky
Dan Jurafsky
Dafna Shahaf
29
0
0
29 Apr 2025
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
Jaime Raldua Veuthey
Zainab Ali Majid
Suhas Hariharan
Jacob Haimes
ELM
31
0
0
18 Apr 2025
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment
Negar Arabzadeh
Charles L. A. Clarke
31
1
0
16 Apr 2025
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
DICE: A Framework for Dimensional and Contextual Evaluation of Language Models
Aryan Shrivastava
Paula Akemi Aoyagui
29
0
0
14 Apr 2025
LLM-driven Constrained Copy Generation through Iterative Refinement
LLM-driven Constrained Copy Generation through Iterative Refinement
Varun Vasudevan
Faezeh Akhavizadegan
Abhinav Prakash
Yokila Arora
Jason H. D. Cho
Tanya Mendiratta
Sushant Kumar
Kannan Achan
37
0
0
14 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
49
0
0
09 Apr 2025
Towards LLMs Robustness to Changes in Prompt Format Styles
Towards LLMs Robustness to Changes in Prompt Format Styles
Lilian Ngweta
Kiran Kate
Jason Tsay
Yara Rizk
AAML
VLM
35
0
0
09 Apr 2025
Model-Agnostic Policy Explanations with Large Language Models
Model-Agnostic Policy Explanations with Large Language Models
Zhang Xi-Jia
Yue (Sophie) Guo
Shufei Chen
Simon Stepputtis
Matthew C. Gombolay
Katia P. Sycara
Joseph Campbell
LM&Ro
LRM
57
0
0
08 Apr 2025
Accelerating Particle-based Energetic Variational Inference
Accelerating Particle-based Energetic Variational Inference
Xuelian Bao
Lulu Kang
Chun Liu
Yiwei Wang
BDL
64
0
0
04 Apr 2025
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models
A Framework for Situating Innovations, Opportunities, and Challenges in Advancing Vertical Systems with Large AI Models
Gaurav Verma
Jiawei Zhou
Mohit Chandra
Srijan Kumar
M. D. Choudhury
53
0
0
03 Apr 2025
The quasi-semantic competence of LLMs: a case study on the part-whole relation
The quasi-semantic competence of LLMs: a case study on the part-whole relation
Mattia Proietti
Alessandro Lenci
48
0
0
03 Apr 2025
A Framework for Robust Cognitive Evaluation of LLMs
A Framework for Robust Cognitive Evaluation of LLMs
Karin de Langis
J. Park
Bin Hu
Khanh Chi Le
Andreas Schramm
Michael C. Mensink
Andrew Elfenbein
Dongyeop Kang
37
0
0
03 Apr 2025
Token embeddings violate the manifold hypothesis
Token embeddings violate the manifold hypothesis
Michael Robinson
Sourya Dey
Tony Chiang
41
1
0
01 Apr 2025
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
Leander Girrbach
Stephan Alaniz
Genevieve Smith
Zeynep Akata
42
0
0
30 Mar 2025
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions
Yubo Li
Yidi Miao
Xueying Ding
Ramayya Krishnan
R. Padman
37
0
0
28 Mar 2025
LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation
LLM-based Agent Simulation for Maternal Health Interventions: Uncertainty Estimation and Decision-focused Evaluation
Sarah Martinson
Lingkai Kong
Cheol Woo Kim
Aparna Taneja
Milind Tambe
37
0
0
25 Mar 2025
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language
HoarePrompt: Structural Reasoning About Program Correctness in Natural Language
Dimitrios Stamatios Bouras
Yihan Dai
Tairan Wang
Yingfei Xiong
Sergey Mechtaev
LRM
53
0
0
25 Mar 2025
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence
Sophia Hager
David Mueller
Kevin Duh
Nicholas Andrews
67
0
0
18 Mar 2025
Aligned Probing: Relating Toxic Behavior and Model Internals
Aligned Probing: Relating Toxic Behavior and Model Internals
Andreas Waldis
Vagrant Gautam
Anne Lauscher
Dietrich Klakow
Iryna Gurevych
45
0
0
17 Mar 2025
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu
Michihiro Yasunaga
Andrew Cohen
Yoon Kim
Asli Celikyilmaz
Marjan Ghazvininejad
46
2
0
14 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin
Ziqiao Wang
Yang You
FaML
86
1
0
07 Mar 2025
Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data
Shiping Yang
Jie Wu
Wenbiao Ding
Ning Wu
Shining Liang
Ming Gong
Hengyuan Zhang
Dongmei Zhang
AAML
66
1
0
07 Mar 2025
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
Tingchen Fu
Fazl Barez
AAML
65
0
0
03 Mar 2025
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation
Eliya Habba
Ofir Arviv
Itay Itzhak
Yotam Perlitz
Elron Bandel
Leshem Choshen
Michal Shmueli-Scheuer
Gabriel Stanovsky
77
2
0
03 Mar 2025
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
Omer Goldman
Uri Shaham
Dan Malkin
Sivan Eiger
Avinatan Hassidim
...
Shruti Rijhwani
Laura Rimell
Idan Szpektor
Reut Tsarfaty
Matan Eyal
47
3
0
28 Feb 2025
SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models
Grigor Nalbandyan
Rima Shahbazyan
Evelina Bakhturina
ELM
38
0
0
28 Feb 2025
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models
Huazheng Wang
Yongcheng Jing
Haifeng Sun
Yingjie Wang
Jingchao Wang
Jianxin Liao
Dacheng Tao
KELM
MU
47
0
0
27 Feb 2025
END: Early Noise Dropping for Efficient and Effective Context Denoising
END: Early Noise Dropping for Efficient and Effective Context Denoising
Hongye Jin
Pei Chen
Jingfeng Yang
Zhaoxiang Wang
Meng Jiang
...
Xuzhi Zhang
Zheng Li
Tianyi Liu
Huasheng Li
Bing Yin
152
0
0
26 Feb 2025
Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models
Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models
Konstantina Palla
José Luis Redondo García
C. Hauff
Francesco Fabbri
Henrik Lindström
Daniel R. Taber
Andreas Damianou
M. Lalmas
AILaw
67
0
0
25 Feb 2025
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions
Joseph Suh
Erfan Jahanparast
Suhong Moon
Minwoo Kang
Serina Chang
ALM
LM&MA
57
1
0
24 Feb 2025
From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task
From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task
Nicolas Martorell
LLMAG
66
1
0
23 Feb 2025
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction
Sarah Ball
Simeon Allmendinger
Frauke Kreuter
Niklas Kühl
57
0
0
22 Feb 2025
Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning
Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning
Yilei Tu
Andrew Xue
Freda Shi
49
0
0
17 Feb 2025
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Hui Wei
Zihao Zhang
Shenghua He
Tian Xia
Shijia Pan
Fei Liu
58
4
0
16 Feb 2025
Expect the Unexpected: FailSafe Long Context QA for Finance
Expect the Unexpected: FailSafe Long Context QA for Finance
Kiran Kamble
M. Russak
Dmytro Mozolevskyi
Muayad Ali
Mateusz Russak
Waseem Alshikh
85
0
0
10 Feb 2025
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation
Saurabh Kumar Pandey
S. Vashistha
Debrup Das
Somak Aditya
Monojit Choudhury
AAML
74
0
0
10 Feb 2025
Benchmarking Prompt Sensitivity in Large Language Models
Benchmarking Prompt Sensitivity in Large Language Models
Amirhossein Razavi
Mina Soltangheis
Negar Arabzadeh
Sara Salamat
Morteza Zihayat
Ebrahim Bagheri
69
2
0
09 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Yuanye Liu
Jiahang Xu
Li Zhang
Qi Chen
Xuan Feng
Yang Chen
Zhongxin Guo
Yuqing Yang
Cheng Peng
84
2
0
06 Feb 2025
12345
Next