RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge

RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge

14 November 2023

Shicheng Li

ArXiv (abs)PDF HTML

Papers citing "RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge"

6 / 6 papers shown

Title
GaRAGe: A Benchmark with Grounding Annotations for RAG Evaluation Ionut Teodor Sorodoc Leonardo F. R. Ribeiro Rexhina Blloshmi Christopher Davis Adria de Gispert 12 0 0 09 Jun 2025
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems Yuxin Zhang Yan Wang Yongrui Chen Shenyu Zhang Xinbang Dai Sheng Bi Guilin Qi 114 0 0 04 Jun 2025
The Viability of Crowdsourcing for RAG Evaluation Lukas Gienapp Tim Hagen Maik Fröbe Matthias Hagen Benno Stein Martin Potthast Harrisen Scells 121 0 0 22 Apr 2025
Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information Ming Jiang Tingting Huang Biao Guo Yao Lu Feng Zhang LRM 84 3 0 20 Aug 2024
Evaluation of Retrieval-Augmented Generation: A Survey Hao Yu Aoran Gan Kai Zhang Shiwei Tong Qi Liu Zhaofeng Liu 3DV 136 100 0 13 May 2024
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing Yucheng Hu Yuxing Lu RALM 124 25 0 30 Apr 2024