Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09476
Cited By
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
16 November 2023
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"
31 / 81 papers shown
Title
Evaluation of RAG Metrics for Question Answering in the Telecom Domain
Sujoy Roychowdhury
Sumit Soman
H. G. Ranjani
Neeraj Gunda
Vansh Chhabra
Sai Krishna Bala
66
14
0
15 Jul 2024
Lynx: An Open Source Hallucination Evaluation Model
Selvan Sunitha Ravi
B. Mielczarek
Anand Kannappan
Douwe Kiela
Rebecca Qian
VLM
RALM
HILM
53
17
0
11 Jul 2024
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey)
K. Kenthapadi
M. Sameki
Ankur Taly
HILM
ELM
AILaw
39
12
0
10 Jul 2024
Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop
Anum Afzal
Alexander Kowsik
Rajna Fani
Florian Matthes
52
6
0
08 Jul 2024
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
David Rau
Hervé Déjean
Nadezhda Chirkova
Thibault Formal
Shuai Wang
Vassilina Nikoulina
S. Clinchant
45
12
0
01 Jul 2024
When Search Engine Services meet Large Language Models: Visions and Challenges
Haoyi Xiong
Jiang Bian
Yuchen Li
Xuhong Li
Jundong Li
Shuaiqiang Wang
Dawei Yin
Sumi Helal
53
28
0
28 Jun 2024
Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need
Yang Wang
Alberto Garcia Hernandez
Roman Kyslyi
Nicholas S. Kersting
36
3
0
26 Jun 2024
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework
Zackary Rackauckas
Arthur Camara
Jakub Zavrel
42
8
0
20 Jun 2024
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Adam Fisch
Joshua Maynez
R. A. Hofer
Bhuwan Dhingra
Amir Globerson
William W. Cohen
44
8
0
06 Jun 2024
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches
Bhashithe Abeysinghe
Ruhan Circi
ELM
38
22
0
05 Jun 2024
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost
Masha Belyi
Robert Friel
Shuai Shao
Atindriyo Sanyal
HILM
RALM
64
5
0
03 Jun 2024
CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems
Yanlin Feng
Sajjadur Rahman
Aaron Feng
Vincent Chen
Eser Kandogan
48
4
0
02 Jun 2024
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation
Gauthier Guinet
Behrooz Omidvar-Tehrani
Anoop Deoras
Laurent Callot
RALM
73
17
0
22 May 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
62
82
0
13 May 2024
Self-Improving Customer Review Response Generation Based on LLMs
Guy Azov
Tatiana Pelc
Adi Fledel Alon
Gila Kamhi
40
0
0
06 May 2024
On the Evaluation of Machine-Generated Reports
James Mayfield
Eugene Yang
Dawn J Lawrie
Sean MacAvaney
Paul McNamee
...
Orion Weller
Efsun Kayi
Kate Sanders
Marc Mason
Noah Hibbler
ALM
101
12
0
02 May 2024
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
Yucheng Hu
Yuxing Lu
RALM
60
18
0
30 Apr 2024
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model
Xinzhe Li
Ming Liu
Shang Gao
RALM
40
0
0
30 Apr 2024
InspectorRAGet: An Introspection Platform for RAG Evaluation
Kshitij P. Fadnis
Siva Sankalp Patel
O. Boni
Yannis Katsis
Sara Rosenthal
Benjamin Sznajder
Marina Danilevsky
40
2
0
26 Apr 2024
Evaluating Retrieval Quality in Retrieval-Augmented Generation
Alireza Salemi
Hamed Zamani
RALM
46
62
0
21 Apr 2024
Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation
Guanhua Chen
Wenhan Yu
Lei Sha
3DV
39
0
0
19 Apr 2024
A Survey on Retrieval-Augmented Text Generation for Large Language Models
Yizheng Huang
Jimmy X. Huang
3DV
RALM
66
46
0
17 Apr 2024
ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence
Kevin Wu
Eric Wu
James Zou
AAML
61
40
0
16 Apr 2024
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Pierre Boyeau
Anastasios Nikolas Angelopoulos
N. Yosef
Jitendra Malik
Michael I. Jordan
SyDa
38
14
0
09 Mar 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao
Hailin Zhang
Qinhan Yu
Zhengren Wang
Yunteng Geng
Fangcheng Fu
Ling Yang
Wentao Zhang
Jie Jiang
Bin Cui
3DV
121
228
0
29 Feb 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
29
5
0
27 Feb 2024
RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering
Zihan Zhang
Meng Fang
Ling-Hao Chen
RALM
56
13
0
26 Feb 2024
Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering
Tobias Schimanski
Jingwei Ni
Mathias Kraus
Elliott Ash
Markus Leippold
29
4
0
13 Feb 2024
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
Yuanjie Lyu
Zhiyu Li
Simin Niu
Zhiyu Li
Bo Tang
Wenjin Wang
Hao Wu
Huan Liu
Tong Xu
Enhong Chen
RALM
44
32
0
30 Jan 2024
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Yixuan Tang
Yi Yang
RALM
41
79
0
27 Jan 2024
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DV
RALM
61
1,530
1
18 Dec 2023
Previous
1
2