AttributionBench: How Hard is Automatic Attribution Evaluation?

AttributionBench: How Hard is Automatic Attribution Evaluation?

23 February 2024

Zeyi Liao

Papers citing "AttributionBench: How Hard is Automatic Attribution Evaluation?"

13 / 13 papers shown

Title
Document Attribution: Examining Citation Relationships using Large Language Models Vipula Rawte Ryan A. Rossi Franck Dernoncourt Nedim Lipka HILM 38 0 0 09 May 2025
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents Nandan Thakur Jimmy J. Lin Sam Havens Michael Carbin Omar Khattab Andrew Drozdov 44 2 0 17 Apr 2025
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics Ameya Godbole Robin Jia HILM 53 1 0 24 Jan 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems Robert Friel Masha Belyi Atindriyo Sanyal 82 19 0 17 Jan 2025
Evaluation of Attribution Bias in Retrieval-Augmented Large Language Models Amin Abolghasemi Leif Azzopardi Seyyed Hadi Hashemi Maarten de Rijke Suzan Verberne 39 0 0 16 Oct 2024
A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation Weijia Zhang Mohammad Aliannejadi Jiahuan Pei Yifei Yuan Jia-Hong Huang Evangelos Kanoulas HILM 40 4 0 22 Aug 2024
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation Tong Chen Akari Asai Niloofar Mireshghallah Sewon Min James Grimmelmann Yejin Choi Hannaneh Hajishirzi Luke Zettlemoyer Pang Wei Koh 51 17 0 09 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Philippe Laban Alexander R. Fabbri Caiming Xiong Chien-Sheng Wu RALM 51 41 0 01 Jul 2024
Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics Weijia Zhang Mohammad Aliannejadi Yifei Yuan Jiahuan Pei Jia-Hong Huang Evangelos Kanoulas HILM 31 12 0 21 Jun 2024
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost Masha Belyi Robert Friel Shuai Shao Atindriyo Sanyal HILM RALM 64 5 0 03 Jun 2024
Improving alignment of dialogue agents via targeted human judgements Amelia Glaese Nat McAleese Maja Trkebacz John Aslanides Vlad Firoiu ... John F. J. Mellor Demis Hassabis Koray Kavukcuoglu Lisa Anne Hendricks G. Irving ALM AAML 227 502 0 28 Sep 2022
Internet-Augmented Dialogue Generation M. Komeili Kurt Shuster Jason Weston RALM 244 280 0 15 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark Nouha Dziri Hannah Rashkin Tal Linzen David Reitter ALM 195 79 0 30 Apr 2021