Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find

23 May 2025

Abstract

Large language models (LLMs) face significant challenges with needle-in-a-haystack tasks, where relevant information ("the needle") must be drawn from a large pool of irrelevant context ("the haystack"). Previous studies have highlighted positional bias and distractor quantity as critical factors affecting model performance, yet the influence of gold context size has received little attention. We address this gap by systematically studying how variations in gold context length impact LLM performance on long-context question answering tasks. Our experiments reveal that LLM performance drops sharply when the gold context is shorter, i.e., smaller gold contexts consistently degrade model performance and amplify positional sensitivity, posing a major challenge for agentic systems that must integrate scattered, fine-grained information of varying lengths. This pattern holds across three diverse domains (general knowledge, biomedical reasoning, and mathematical reasoning) and seven state-of-the-art LLMs of various sizes and architectures. Our work provides clear insights to guide the design of robust, context-aware LLM-driven systems.

View on arXiv

@article{bianchi2025_2505.18148,
  title={ Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find },
  author={ Owen Bianchi and Mathew J. Koretsky and Maya Willey and Chelsea X. Alvarado and Tanay Nayak and Adi Asija and Nicole Kuznetsov and Mike A. Nalls and Faraz Faghri and Daniel Khashabi },
  journal={arXiv preprint arXiv:2505.18148},
  year={ 2025 }
}

Comments on this paper