ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.11626
  4. Cited By
THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering

THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering

16 May 2025
Udita Patel
Rutu Mulkar
Jay Roberts
Cibi Chakravarthy Senthilkumar
Sujay Gandhi
Xiaofei Zheng
Naumaan Nayyar
Parul Kalra
Rafael Castrillo
ArXivPDFHTML

Papers citing "THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering"

8 / 8 papers shown
Title
On the Implications of Verbose LLM Outputs: A Case Study in Translation
  Evaluation
On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation
Eleftheria Briakou
Zhongtao Liu
Colin Cherry
Markus Freitag
32
3
0
01 Oct 2024
Retrieval-Augmented Generation with Knowledge Graphs for Customer
  Service Question Answering
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
Zhentao Xu
Mark Jerome Cruz
Matthew Guevara
Tie Wang
Manasi Deshpande
Xiaofeng Wang
Zheng Li
RALM
43
73
0
26 Apr 2024
The Power of Noise: Redefining Retrieval for RAG Systems
The Power of Noise: Redefining Retrieval for RAG Systems
Florin Cuconasu
Giovanni Trappolini
F. Siciliano
Simone Filice
Cesare Campagnano
Y. Maarek
Nicola Tonellotto
Fabrizio Silvestri
RALM
89
169
0
26 Jan 2024
ARES: An Automated Evaluation Framework for Retrieval-Augmented
  Generation Systems
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
68
116
0
16 Nov 2023
Lost in the Middle: How Language Models Use Long Contexts
Lost in the Middle: How Language Models Use Long Contexts
Nelson F. Liu
Kevin Lin
John Hewitt
Ashwin Paranjape
Michele Bevilacqua
Fabio Petroni
Percy Liang
RALM
84
1,570
0
06 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
312
4,253
0
09 Jun 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
113
678
0
23 May 2023
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
275
5,764
0
21 Apr 2019
1