ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18148
  4. Cited By
Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find

Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find

23 May 2025
Owen Bianchi
Mathew J. Koretsky
Maya Willey
Chelsea X. Alvarado
Tanay Nayak
Adi Asija
Nicole Kuznetsov
M. Nalls
F. Faghri
Daniel Khashabi
ArXivPDFHTML

Papers citing "Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find"

42 / 42 papers shown
Title
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol
Weiqi Wang
Jiefu Ou
Yangqiu Song
Benjamin Van Durme
Daniel Khashabi
LMTD
83
2
0
14 Apr 2025
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation
Bowen Cao
Deng Cai
W. Lam
CLL
56
1
0
02 Apr 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
79
2
0
14 Mar 2025
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack
Yunfan Gao
Yun Xiong
Wenlong Wu
Zijing Huang
Bohan Li
Haoyu Wang
80
4
0
01 Mar 2025
DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities
Tianyi Zhuang
Chuqiao Kuang
Xiaoguang Li
Yihua Teng
Jihao Wu
Yijiao Wang
Lifeng Shang
RALM
ELM
LRM
75
1
0
25 Feb 2025
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion
Zhan Ling
Kang Liu
Kai Yan
Yue Yang
Weijian Lin
Ting-Han Fan
Lingfeng Shen
Zhengyin Du
Jiecao Chen
ReLM
ELM
LRM
64
4
0
25 Jan 2025
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems
Yannis Katsis
Sara Rosenthal
Kshitij P. Fadnis
Chulaka Gunasekara
Young-Suk Lee
Lucian Popa
Vraj Shah
Huaiyu Zhu
Danish Contractor
Marina Danilevsky
RALM
LRM
36
10
0
08 Jan 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
144
15
0
06 Jan 2025
Understanding Synthetic Context Extension via Retrieval Heads
Understanding Synthetic Context Extension via Retrieval Heads
Xinyu Zhao
Fangcong Yin
Greg Durrett
69
1
0
29 Oct 2024
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang
Yecheng Wu
Shang Yang
Enze Xie
Junsong Chen
Junyu Chen
Zhuoyang Zhang
Han Cai
Yaojie Lu
Song Han
125
18
0
14 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
90
31
0
03 Oct 2024
mGTE: Generalized Long-Context Text Representation and Reranking Models
  for Multilingual Text Retrieval
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Xin Zhang
Yanzhao Zhang
Dingkun Long
Wen Xie
Ziqi Dai
...
Pengjun Xie
Fei Huang
Meishan Zhang
Wenjie Li
Min Zhang
55
91
0
29 Jul 2024
From Artificial Needles to Real Haystacks: Improving Retrieval
  Capabilities in LLMs by Finetuning on Synthetic Data
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong
Vasilis Papageorgiou
Kangwook Lee
Dimitris Papailiopoulos
SyDa
RALM
59
13
0
27 Jun 2024
Found in the Middle: Calibrating Positional Attention Bias Improves Long
  Context Utilization
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu Hsieh
Yung-Sung Chuang
Chun-Liang Li
Zifeng Wang
Long T. Le
...
James R. Glass
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
65
34
0
23 Jun 2024
Insights into LLM Long-Context Failures: When Transformers Know but
  Don't Tell
Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell
Taiming Lu
Muhan Gao
Kuai Yu
Adam Byerly
Daniel Khashabi
71
14
0
20 Jun 2024
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?
Jinhyuk Lee
Anthony Chen
Zhuyun Dai
Dheeru Dua
Devendra Singh Sachan
...
Jeremy R. Cole
Sebastian Riedel
Iftekhar Naim
Ming-Wei Chang
Kelvin Guu
RALM
LRM
71
35
0
19 Jun 2024
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Egor Bogomolov
Aleksandra V. Eliseeva
Timur Galimzyanov
Evgeniy Glukhov
Anton Shapkin
...
Yaroslav Golubev
Alexander Kovrigin
Arie van Deursen
Maliheh Izadi
T. Bryksin
ELM
35
21
0
17 Jun 2024
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Jinheon Baek
S. Jauhar
Silviu Cucerzan
Sung Ju Hwang
AI4CE
LLMAG
LM&Ro
70
48
0
11 Apr 2024
Found in the Middle: How Language Models Use Long Contexts Better via
  Plug-and-Play Positional Encoding
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
Zhenyu Zhang
Runjin Chen
Shiwei Liu
Zhewei Yao
Olatunji Ruwase
Beidi Chen
Xiaoxia Wu
Zhangyang Wang
60
30
0
05 Mar 2024
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens
∞\infty∞Bench: Extending Long Context Evaluation Beyond 100K Tokens
Xinrong Zhang
Yingfa Chen
Shengding Hu
Zihang Xu
Junhao Chen
...
Xu Han
Zhen Leng Thai
Shuo Wang
Zhiyuan Liu
Maosong Sun
RALM
LRM
55
173
0
21 Feb 2024
AnaloBench: Benchmarking the Identification of Abstract and Long-context
  Analogies
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies
Xiao Ye
Andrew Wang
Jacob Choi
Yining Lu
Shreya Sharma
Lingfeng Shen
Vijay Tiyyala
Nicholas Andrews
Daniel Khashabi
ELM
44
9
0
19 Feb 2024
Same Task, More Tokens: the Impact of Input Length on the Reasoning
  Performance of Large Language Models
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
Mosh Levy
Alon Jacoby
Yoav Goldberg
67
74
0
19 Feb 2024
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs
  Miss
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Dmitry Sorokin
Artyom Sorokin
Andrey Kravchenko
RALM
131
34
0
16 Feb 2024
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge
  Space using Quantum-Chemical Feedback
ChemReasoner: Heuristic Search over a Large Language Model's Knowledge Space using Quantum-Chemical Feedback
Henry W Sprueill
Carl Edwards
Khushbu Agarwal
Mariefel V. Olarte
Udishnu Sanyal
Conrad Johnston
Hongbin Liu
Heng Ji
Sutanay Choudhury
LRM
47
9
0
15 Feb 2024
PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical
  Knowledge
PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge
Chih-Hsuan Wei
Alexis Allot
Po-Ting Lai
Robert Leaman
Shubo Tian
Ling Luo
Qiao Jin
Zhizheng Wang
Qingyu Chen
Zhiyong Lu
16
30
0
19 Jan 2024
DocFinQA: A Long-Context Financial Reasoning Dataset
DocFinQA: A Long-Context Financial Reasoning Dataset
Varshini Reddy
Rik Koncel-Kedziorski
Viet Dac Lai
Michael Krumdick
Charles Lovering
Chris Tanner
RALM
59
19
0
12 Jan 2024
LooGLE: Can Long-Context Language Models Understand Long Contexts?
LooGLE: Can Long-Context Language Models Understand Long Contexts?
Jiaqi Li
Mengmeng Wang
Zilong Zheng
Muhan Zhang
ELM
RALM
42
123
0
08 Nov 2023
Primacy Effect of ChatGPT
Primacy Effect of ChatGPT
Yiwei Wang
Yujun Cai
Muhao Chen
Yuxuan Liang
Bryan Hooi
ALM
AI4MH
LRM
41
16
0
20 Oct 2023
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
  via Prompt Compression
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
Huiqiang Jiang
Qianhui Wu
Xufang Luo
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
Lili Qiu
RALM
142
200
0
10 Oct 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
53
36
0
23 Sep 2023
Large Language Models Are Not Robust Multiple Choice Selectors
Large Language Models Are Not Robust Multiple Choice Selectors
Chujie Zheng
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
46
233
0
07 Sep 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
50
548
0
28 Aug 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language
  Models
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
65
147
0
20 Jul 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
74
1,521
0
06 Jul 2023
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
Tianyang Liu
Canwen Xu
Julian McAuley
ALM
54
161
0
05 Jun 2023
ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
Uri Shaham
Maor Ivgi
Avia Efrat
Jonathan Berant
Omer Levy
VLM
56
133
0
23 May 2023
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
  and Generation
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
Fengji Zhang
B. Chen
Yue Zhang
Jacky Keung
Jin Liu
Daoguang Zan
Yi Mao
Jian-Guang Lou
Weizhu Chen
46
228
0
22 Mar 2023
MuLD: The Multitask Long Document Benchmark
MuLD: The Multitask Long Document Benchmark
G. Hudson
Noura Al Moubayed
73
11
0
15 Feb 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences
SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham
Elad Segal
Maor Ivgi
Avia Efrat
Ori Yoran
...
Ankit Gupta
Wenhan Xiong
Mor Geva
Jonathan Berant
Omer Levy
RALM
60
136
0
10 Jan 2022
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text
  Understanding and Generation
LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation
Jian Guan
Zhuoer Feng
Yamei Chen
Ru He
Xiaoxi Mao
Changjie Fan
Minlie Huang
58
34
0
30 Aug 2021
Simplified Data Wrangling with ir_datasets
Simplified Data Wrangling with ir_datasets
Sean MacAvaney
Andrew Yates
Sergey Feldman
Doug Downey
Arman Cohan
Nazli Goharian
122
109
0
03 Mar 2021
Long Range Arena: A Benchmark for Efficient Transformers
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
89
706
0
08 Nov 2020
1