Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.03759
Cited By
VERA: Validation and Evaluation of Retrieval-Augmented Systems
16 August 2024
Tianyu Ding
Adi Banerjee
Laurent Mombaerts
Yunhong Li
Tarik Borogovac
Juan Pablo De la Cruz Weinstein
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VERA: Validation and Evaluation of Retrieval-Augmented Systems"
12 / 12 papers shown
Title
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
199
0
0
28 Apr 2025
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Wei Ping
Ming-Yu Liu
Lawrence C. McAfee
Peng Xu
Bo Li
Mohammad Shoeybi
Bryan Catanzaro
RALM
90
52
0
11 Oct 2023
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen
Hongyu Lin
Xianpei Han
Le Sun
3DV
RALM
92
307
0
04 Sep 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
458
4,444
0
09 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,761
0
15 Mar 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
138
472
0
07 Mar 2023
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi
Sewon Min
Michihiro Yasunaga
Minjoon Seo
Rich James
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
VLM
KELM
165
643
0
30 Jan 2023
Text Embeddings by Weakly-Supervised Contrastive Pre-training
Liang Wang
Nan Yang
Xiaolong Huang
Binxing Jiao
Linjun Yang
Daxin Jiang
Rangan Majumder
Furu Wei
VLM
249
623
0
07 Dec 2022
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao
Maxime Peyrard
Fei Liu
Yang Gao
Christian M. Meyer
Steffen Eger
194
602
0
05 Sep 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
371
5,872
0
21 Apr 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
212
2,703
0
25 Sep 2018
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
174
1,667
0
14 Mar 2018
1