Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09476
Cited By
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
16 November 2023
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"
50 / 81 papers shown
Title
Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency
Adel Ammar
Anis Koubaa
Omer Nacar
W. Boulila
RALM
3DV
37
0
0
13 May 2025
Securing RAG: A Risk Assessment and Mitigation Framework
Lukas Ammann
Sara Ott
Christoph R. Landolt
Marco P. Lehmann
SILM
33
0
0
13 May 2025
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
67
0
0
30 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
65
0
0
28 Apr 2025
MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation
Chanhee Park
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
RALM
62
0
0
23 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
23
0
0
22 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM
3DV
43
0
0
21 Apr 2025
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges
Nandan Thakur
Ronak Pradeep
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
ELM
40
0
0
21 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
31
0
0
21 Apr 2025
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation
Hanmeng Zhong
Linqing Chen
Weilei Wang
Wentao Wu
30
0
0
15 Apr 2025
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
Hang Ni
Fan Liu
Xinyu Ma
Lixin Su
S. Wang
Dawei Yin
Hui Xiong
Hao Liu
LLMAG
AI4TS
52
0
0
11 Apr 2025
Automated Construction of a Knowledge Graph of Nuclear Fusion Energy for Effective Elicitation and Retrieval of Information
A. Loreti
K. Chen
R. George
R. Firth
A. Agnello
S. Tanaka
38
0
0
10 Apr 2025
A System for Comprehensive Assessment of RAG Frameworks
Mattia Rengo
Senad Beadini
Domenico Alfano
Roberto Abbruzzese
45
1
0
10 Apr 2025
QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors
Kepu Zhang
ZhongXiang Sun
Weijie Yu
Xiaoxue Zang
Kai Zheng
Yang Song
Han Li
Jun Xu
3DV
43
0
0
05 Apr 2025
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation
Jeongsoo Lee
Daeyong Kwon
Kyohoon Jin
Junnyeong Jeong
Minwoo Sim
Minwoo Kim
31
0
0
29 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq R. Joty
ELM
95
3
0
19 Mar 2025
Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets
Zahra Abbasiantaeb
Simon Lupart
Leif Azzopardi
Jeffery Dalton
Mohammad Aliannejadi
RALM
62
1
0
12 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Qiang Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Enhong Chen
3DV
73
3
0
11 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
83
2
0
03 Mar 2025
Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks
Umar Ali Khan
Ekram Khan
Fiza Khan
A. A. Moinuddin
48
0
0
02 Mar 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
RALM
LRM
62
0
0
27 Feb 2025
Bián: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation
Zhouyu Jiang
Mengshu Sun
Qing Cui
Lei Liang
RALM
3DV
233
0
0
26 Feb 2025
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems
Nayoung Choi
Grace Byun
Andrew Chung
Ellie S. Paek
S. Lee
Jinho D. Choi
RALM
86
1
0
26 Feb 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu
Xinze Li
Zhenghao Liu
Yukun Yan
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
110
1
0
26 Feb 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications
Adam Kovacs
Gábor Recski
45
2
0
24 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
51
0
0
03 Feb 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
82
19
0
17 Jan 2025
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Mohita Chowdhury
Yajie Vera He
Aisling Higham
Ernest Lim
58
1
0
14 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
1
0
03 Jan 2025
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Zhuoran Jin
Hongbang Yuan
Tianyi Men
Pengfei Cao
Yubo Chen
Kang-Jun Liu
Jun Zhao
ALM
82
7
0
18 Dec 2024
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems
Rafael Teixeira de Lima
Shubham Gupta
Cesar Berrospi
Lokesh Mishra
Michele Dolfi
Peter W. J. Staar
Panagiotis Vagenas
87
1
0
29 Nov 2024
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
Ian Poey
Jiajun Liu
Qishuai Zhong
Adrien Chenailler
63
0
0
06 Nov 2024
VERITAS: A Unified Approach to Reliability Evaluation
Rajkumar Ramamurthy
Meghana Arakkal Rajeev
Oliver Molenschot
James Zou
Nazneen Rajani
HILM
49
1
0
05 Nov 2024
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
Jiwoong Sohn
Yein Park
Chanwoong Yoon
Sihyeon Park
Hyeon Hwang
Mujeen Sung
Hyunjae Kim
Jaewoo Kang
RALM
67
6
0
01 Nov 2024
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large Language Models
Kazi Ahmed Asif Fuad
Lizhong Chen
26
0
0
01 Nov 2024
Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation
Suhang Wu
Jialong Tang
Baosong Yang
Ante Wang
Kaidi Jia
Jiawei Yu
Junfeng Yao
Jinsong Su
43
1
0
29 Oct 2024
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Kaige Xie
Philippe Laban
Prafulla Kumar Choubey
Caiming Xiong
C. Wu
37
1
0
20 Oct 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
50
5
0
17 Oct 2024
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity
Jintao Liu
Ruixue Ding
Linhao Zhang
Pengjun Xie
Fie Huang
33
3
0
16 Oct 2024
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Tobias Leemann
Periklis Petridis
G. Vietri
Dionysis Manousakas
Aaron Roth
Sergul Aydore
56
0
0
04 Oct 2024
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs
Sihui Yang
Keping Bi
Wanqing Cui
Jiafeng Guo
Xueqi Cheng
23
2
0
23 Sep 2024
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
To Eun Kim
Fernando Diaz
56
2
0
17 Sep 2024
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
Rishi Kalra
Zekun Wu
Ayesha Gulley
Airlie Hilliard
Xin Guan
Adriano Soares Koshiyama
Philip C. Treleaven
RALM
AILaw
54
5
0
29 Aug 2024
Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation
N. E. Kriman
HILM
54
0
0
27 Aug 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
72
5
0
27 Aug 2024
Web Retrieval Agents for Evidence-Based Misinformation Detection
Jacob-Junqi Tian
Hao Yu
Yury Orlovskiy
Tyler Vergho
Mauricio Rivera
Mayank Goel
Zachary Yang
Jean-Francois Godbout
Reihaneh Rabbany
Kellin Pelrine
LLMAG
OffRL
28
4
0
15 Aug 2024
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Daniel Fleischer
Moshe Berchansky
Moshe Wasserblat
Peter Izsak
3DV
58
4
0
05 Aug 2024
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
Kunlun Zhu
Yifan Luo
Dingling Xu
Ruobing Wang
Shi Yu
...
Yishan Li
Zhiyuan Liu
Xu Han
Zhiyuan Liu
Maosong Sun
31
17
0
02 Aug 2024
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Shangyu Wu
Ying Xiong
Yufei Cui
Haolun Wu
Can Chen
...
Lianming Huang
Xue Liu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
3DV
RALM
38
26
0
18 Jul 2024
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
To Eun Kim
Alireza Salemi
Andrew Drozdov
Fernando Diaz
Hamed Zamani
56
7
0
17 Jul 2024
1
2
Next