ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.09476
  4. Cited By
ARES: An Automated Evaluation Framework for Retrieval-Augmented
  Generation Systems

ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

16 November 2023
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
    RALM
ArXivPDFHTML

Papers citing "ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems"

50 / 81 papers shown
Title
Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency
Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency
Adel Ammar
Anis Koubaa
Omer Nacar
W. Boulila
RALM
3DV
37
0
0
13 May 2025
Securing RAG: A Risk Assessment and Mitigation Framework
Securing RAG: A Risk Assessment and Mitigation Framework
Lukas Ammann
Sara Ott
Christoph R. Landolt
Marco P. Lehmann
SILM
33
0
0
13 May 2025
ConSens: Assessing context grounding in open-book question answering
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
67
0
0
30 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
65
0
0
28 Apr 2025
MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation
MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation
Chanhee Park
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
RALM
62
0
0
23 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Frobe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
23
0
0
22 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM
3DV
43
0
0
21 Apr 2025
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges
Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges
Nandan Thakur
Ronak Pradeep
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
ELM
40
0
0
21 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
31
0
0
21 Apr 2025
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation
Benchmarking Biopharmaceuticals Retrieval-Augmented Generation Evaluation
Hanmeng Zhong
Linqing Chen
Weilei Wang
Wentao Wu
30
0
0
15 Apr 2025
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
Hang Ni
Fan Liu
Xinyu Ma
Lixin Su
S. Wang
Dawei Yin
Hui Xiong
Hao Liu
LLMAG
AI4TS
52
0
0
11 Apr 2025
Automated Construction of a Knowledge Graph of Nuclear Fusion Energy for Effective Elicitation and Retrieval of Information
Automated Construction of a Knowledge Graph of Nuclear Fusion Energy for Effective Elicitation and Retrieval of Information
A. Loreti
K. Chen
R. George
R. Firth
A. Agnello
S. Tanaka
38
0
0
10 Apr 2025
A System for Comprehensive Assessment of RAG Frameworks
A System for Comprehensive Assessment of RAG Frameworks
Mattia Rengo
Senad Beadini
Domenico Alfano
Roberto Abbruzzese
45
1
0
10 Apr 2025
QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors
QE-RAG: A Robust Retrieval-Augmented Generation Benchmark for Query Entry Errors
Kepu Zhang
ZhongXiang Sun
Weijie Yu
Xiaoxue Zang
Kai Zheng
Yang Song
Han Li
Jun Xu
3DV
43
0
0
05 Apr 2025
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation
MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation
Jeongsoo Lee
Daeyong Kwon
Kyohoon Jin
Junnyeong Jeong
Minwoo Sim
Minwoo Kim
31
0
0
29 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq R. Joty
ELM
95
3
0
19 Mar 2025
Conversational Gold: Evaluating Personalized Conversational Search System using Gold Nuggets
Zahra Abbasiantaeb
Simon Lupart
Leif Azzopardi
Jeffery Dalton
Mohammad Aliannejadi
RALM
62
1
0
12 Mar 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
Mingyue Cheng
Yucong Luo
Jie Ouyang
Qiang Liu
Huijie Liu
...
Bohou Zhang
Jiawei Cao
Jie Ma
Daoyu Wang
Enhong Chen
3DV
73
3
0
11 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
83
2
0
03 Mar 2025
Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks
Umar Ali Khan
Ekram Khan
Fiza Khan
A. A. Moinuddin
48
0
0
02 Mar 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
RALM
LRM
62
0
0
27 Feb 2025
Bián: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation
Bián: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation
Zhouyu Jiang
Mengshu Sun
Qing Cui
Lei Liang
RALM
3DV
233
0
0
26 Feb 2025
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems
Nayoung Choi
Grace Byun
Andrew Chung
Ellie S. Paek
S. Lee
Jinho D. Choi
RALM
86
1
0
26 Feb 2025
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Shuliang Liu
Xinze Li
Zhenghao Liu
Yukun Yan
Cheng Yang
Zheni Zeng
Zhiyuan Liu
Maosong Sun
Ge Yu
RALM
110
1
0
26 Feb 2025
LettuceDetect: A Hallucination Detection Framework for RAG Applications
LettuceDetect: A Hallucination Detection Framework for RAG Applications
Adam Kovacs
Gábor Recski
45
2
0
24 Feb 2025
Evaluation of Large Language Models via Coupled Token Generation
Evaluation of Large Language Models via Coupled Token Generation
N. C. Benz
Stratis Tsirtsis
Eleni Straitouri
Ivi Chatzi
Ander Artola Velasco
Suhas Thejaswi
Manuel Gomez Rodriguez
51
0
0
03 Feb 2025
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems
Robert Friel
Masha Belyi
Atindriyo Sanyal
82
19
0
17 Jan 2025
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Mohita Chowdhury
Yajie Vera He
Aisling Higham
Ernest Lim
58
1
0
14 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
1
0
03 Jan 2025
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
  Generation for Preference Alignment
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Zhuoran Jin
Hongbang Yuan
Tianyi Men
Pengfei Cao
Yubo Chen
Kang-Jun Liu
Jun Zhao
ALM
82
7
0
18 Dec 2024
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating
  RAG Systems
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems
Rafael Teixeira de Lima
Shubham Gupta
Cesar Berrospi
Lokesh Mishra
Michele Dolfi
Peter W. J. Staar
Panagiotis Vagenas
87
1
0
29 Nov 2024
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text
  Generation
RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
Ian Poey
Jiajun Liu
Qishuai Zhong
Adrien Chenailler
63
0
0
06 Nov 2024
VERITAS: A Unified Approach to Reliability Evaluation
VERITAS: A Unified Approach to Reliability Evaluation
Rajkumar Ramamurthy
Meghana Arakkal Rajeev
Oliver Molenschot
James Zou
Nazneen Rajani
HILM
49
1
0
05 Nov 2024
Rationale-Guided Retrieval Augmented Generation for Medical Question
  Answering
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
Jiwoong Sohn
Yein Park
Chanwoong Yoon
Sihyeon Park
Hyeon Hwang
Mujeen Sung
Hyunjae Kim
Jaewoo Kang
RALM
67
6
0
01 Nov 2024
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large
  Language Models
LLM-Ref: Enhancing Reference Handling in Technical Writing with Large Language Models
Kazi Ahmed Asif Fuad
Lizhong Chen
26
0
0
01 Nov 2024
Not All Languages are Equal: Insights into Multilingual
  Retrieval-Augmented Generation
Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation
Suhang Wu
Jialong Tang
Baosong Yang
Ante Wang
Kaidi Jia
Jiawei Yu
Junfeng Yao
Jinsong Su
43
1
0
29 Oct 2024
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses
  with Sub-Question Coverage
Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Kaige Xie
Philippe Laban
Prafulla Kumar Choubey
Caiming Xiong
C. Wu
37
1
0
20 Oct 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
50
5
0
17 Oct 2024
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for
  Retrieval-Augmented Generation with Enhanced Data Diversity
CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity
Jintao Liu
Ruixue Ding
Linhao Zhang
Pengjun Xie
Fie Huang
33
3
0
16 Oct 2024
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval-Augmented Generation
Tobias Leemann
Periklis Petridis
G. Vietri
Dionysis Manousakas
Aaron Roth
Sergul Aydore
56
0
0
04 Oct 2024
LINKAGE: Listwise Ranking among Varied-Quality References for
  Non-Factoid QA Evaluation via LLMs
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs
Sihui Yang
Keping Bi
Wanqing Cui
Jiafeng Guo
Xueqi Cheng
23
2
0
23 Sep 2024
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation
To Eun Kim
Fernando Diaz
56
2
0
17 Sep 2024
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications
Rishi Kalra
Zekun Wu
Ayesha Gulley
Airlie Hilliard
Xin Guan
Adriano Soares Koshiyama
Philip C. Treleaven
RALM
AILaw
54
5
0
29 Aug 2024
Measuring text summarization factuality using atomic facts entailment
  metrics in the context of retrieval augmented generation
Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation
N. E. Kriman
HILM
54
0
0
27 Aug 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Can Unconfident LLM Annotations Be Used for Confident Conclusions?
Kristina Gligorić
Tijana Zrnic
Cinoo Lee
Emmanuel J. Candès
Dan Jurafsky
72
5
0
27 Aug 2024
Web Retrieval Agents for Evidence-Based Misinformation Detection
Web Retrieval Agents for Evidence-Based Misinformation Detection
Jacob-Junqi Tian
Hao Yu
Yury Orlovskiy
Tyler Vergho
Mauricio Rivera
Mayank Goel
Zachary Yang
Jean-Francois Godbout
Reihaneh Rabbany
Kellin Pelrine
LLMAG
OffRL
28
4
0
15 Aug 2024
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented
  Generation
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Daniel Fleischer
Moshe Berchansky
Moshe Wasserblat
Peter Izsak
3DV
58
4
0
05 Aug 2024
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
Kunlun Zhu
Yifan Luo
Dingling Xu
Ruobing Wang
Shi Yu
...
Yishan Li
Zhiyuan Liu
Xu Han
Zhiyuan Liu
Maosong Sun
31
17
0
02 Aug 2024
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Shangyu Wu
Ying Xiong
Yufei Cui
Haolun Wu
Can Chen
...
Lianming Huang
Xue Liu
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
3DV
RALM
38
26
0
18 Jul 2024
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
To Eun Kim
Alireza Salemi
Andrew Drozdov
Fernando Diaz
Hamed Zamani
56
7
0
17 Jul 2024
12
Next