ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01370
  4. Cited By
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

1 July 2024
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
    RALM
ArXivPDFHTML

Papers citing "Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems"

36 / 36 papers shown
Title
LLMs Get Lost In Multi-Turn Conversation
LLMs Get Lost In Multi-Turn Conversation
Philippe Laban
Hiroaki Hayashi
Yingbo Zhou
Jennifer Neville
42
1
0
09 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
1
0
26 Apr 2025
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization
Adithya Pratapa
Teruko Mitamura
RALM
34
0
0
17 Apr 2025
ML For Hardware Design Interpretability: Challenges and Opportunities
ML For Hardware Design Interpretability: Challenges and Opportunities
Raymond Baartmans
Andrew Ensinger
Victor Agostinelli
Lizhong Chen
29
0
0
11 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
131
2
0
26 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
53
0
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq R. Joty
ELM
95
3
0
19 Mar 2025
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration
Hong Qing Yu
Frank McQuade
48
1
0
14 Mar 2025
Lost-in-the-Middle in Long-Text Generation: Synthetic Dataset, Evaluation Framework, and Mitigation
Junhao Zhang
Richong Zhang
Fanshuang Kong
Ziyang Miao
Yanhan Ye
Yaowei Zheng
SyDa
46
0
0
10 Mar 2025
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Jinsong Su
VLM
69
1
0
04 Mar 2025
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack
Yunfan Gao
Yun Xiong
Wenlong Wu
Zijing Huang
Bohan Li
H. Wang
52
3
0
01 Mar 2025
Do Retrieval-Augmented Language Models Adapt to Varying User Needs?
Do Retrieval-Augmented Language Models Adapt to Varying User Needs?
Peilin Wu
Xinlu Zhang
Wenhao Yu
Xingyu Liu
Xinya Du
Zhiyu Zoey Chen
RALM
45
0
0
27 Feb 2025
Evaluating the Effect of Retrieval Augmentation on Social Biases
Evaluating the Effect of Retrieval Augmentation on Social Biases
Tianhui Zhang
Yi Zhou
Danushka Bollegala
38
0
0
24 Feb 2025
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Scaling Multi-Document Event Summarization: Evaluating Compression vs. Full-Text Approaches
Adithya Pratapa
Teruko Mitamura
94
1
0
10 Feb 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
46
1
0
28 Jan 2025
Understanding Synthetic Context Extension via Retrieval Heads
Understanding Synthetic Context Extension via Retrieval Heads
Xinyu Zhao
Fangcong Yin
Greg Durrett
41
0
0
31 Dec 2024
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Jonathan Roberts
Kai Han
Samuel Albanie
LLMAG
136
0
0
07 Nov 2024
Long Context RAG Performance of Large Language Models
Long Context RAG Performance of Large Language Models
Quinn Leng
Jacob P. Portes
Sam Havens
Matei A. Zaharia
Michael Carbin
AIFin
RALM
3DV
41
8
0
05 Nov 2024
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Kung-Hsiang Huang
Akshara Prabhakar
Sidharth Dhawan
Yixin Mao
Huan Wang
Silvio Savarese
Caiming Xiong
Philippe Laban
C. Wu
44
7
0
04 Nov 2024
On Positional Bias of Faithfulness for Long-form Summarization
On Positional Bias of Faithfulness for Long-form Summarization
David Wan
Jesse Vig
Mohit Bansal
Shafiq R. Joty
HILM
48
3
0
31 Oct 2024
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses
  with Sub-Question Coverage
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Kaige Xie
Philippe Laban
Prafulla Kumar Choubey
Caiming Xiong
C. Wu
31
1
0
20 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
65
1
0
17 Oct 2024
Search Engines in an AI Era: The False Promise of Factual and Verifiable
  Source-Cited Responses
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses
Pranav Narayanan Venkit
Philippe Laban
Yilun Zhou
Yixin Mao
C. Wu
ELM
32
7
0
15 Oct 2024
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa
Hayate Iso
Nikita Bhutani
RALM
103
1
0
15 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning
  in LLMs
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
40
1
0
07 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly
Howard Yen
Tianyu Gao
Minmin Hou
Ke Ding
Daniel Fleischer
Peter Izsak
Moshe Wasserblat
Danqi Chen
ALM
ELM
62
25
0
03 Oct 2024
DeFine: Enhancing LLM Decision-Making with Factor Profiles and
  Analogical Reasoning
DeFine: Enhancing LLM Decision-Making with Factor Profiles and Analogical Reasoning
Yebowen Hu
Xiaoyang Wang
Wenlin Yao
Yiming Lu
Daoan Zhang
H. Foroosh
Dong Yu
Fei Liu
34
4
0
02 Oct 2024
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long
  Context Evaluation Tasks
Retrieval Or Holistic Understanding? Dolce: Differentiate Our Long Context Evaluation Tasks
Zi Yang
30
0
0
10 Sep 2024
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned
  Language Models
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
Karime Maamari
Fadhil Abubaker
Daniel Jaroslawicz
Amine Mhedhbi
LRM
52
25
0
14 Aug 2024
Introducing a new hyper-parameter for RAG: Context Window Utilization
Introducing a new hyper-parameter for RAG: Context Window Utilization
Kush Juvekar
A. Purwar
40
3
0
29 Jul 2024
Perceptions of Linguistic Uncertainty by Language Models and Humans
Perceptions of Linguistic Uncertainty by Language Models and Humans
Catarina G Belém
Markelle Kelly
M. Steyvers
Sameer Singh
Padhraic Smyth
43
3
0
22 Jul 2024
One Thousand and One Pairs: A "novel" challenge for long-context
  language models
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
41
40
0
24 Jun 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
78
34
0
29 Apr 2024
Benchmarking Large Language Models in Complex Question Answering
  Attribution using Knowledge Graphs
Benchmarking Large Language Models in Complex Question Answering Attribution using Knowledge Graphs
Nan Hu
Jiaoyan Chen
Yike Wu
Guilin Qi
Sheng Bi
Tongtong Wu
Jeff Z. Pan
HILM
37
8
0
26 Jan 2024
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context
  Evaluation Benchmark for Large Language Models
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yufei Wang
Yusen Sun
Liangyou Li
Lifeng Shang
Qun Liu
Kam-Fai Wong
ELM
89
10
0
30 Oct 2023
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
175
3,509
0
10 Jun 2015
1