Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.13716
Cited By
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
17 October 2024
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLM
RALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems"
50 / 53 papers shown
Title
XRAG: Cross-lingual Retrieval-Augmented Generation
Wei Liu
Sony Trenous
Leonardo F. R. Ribeiro
Bill Byrne
Felix Hieber
RALM
92
0
0
15 May 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
176
0
0
28 Apr 2025
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Sahel Sharifymoghaddam
Shivani Upadhyay
Nandan Thakur
Ronak Pradeep
Jimmy Lin
RALM
145
0
0
28 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
52
2
0
21 Apr 2025
Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task
Leonardo Ranaldi
Barry Haddow
Alexandra Birch
RALM
115
1
0
04 Apr 2025
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation
Jirui Qi
Raquel Fernández
Arianna Bisazza
RALM
110
0
0
01 Apr 2025
Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy J. Lin
49
11
0
14 Nov 2024
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Ronak Pradeep
Nandan Thakur
Sahel Sharifymoghaddam
Eric Zhang
Ryan Nguyen
Daniel Campos
Nick Craswell
Jimmy Lin
86
15
0
24 Jun 2024
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework
Zackary Rackauckas
Arthur Camara
Jakub Zavrel
60
10
0
20 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
121
64
0
18 Jun 2024
CRAG -- Comprehensive RAG Benchmark
Xiao Yang
Kai Sun
Hao Xin
Yushi Sun
Nikita Bhalla
...
Nirav Shah
Rakesh Wanga
Anuj Kumar
Wen-tau Yih
Xin Luna Dong
65
28
0
07 Jun 2024
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
Jinjie Ni
Fuzhao Xue
Xiang Yue
Yuntian Deng
Mahir Shah
Kabir Jain
Graham Neubig
Yang You
ELM
65
45
0
03 Jun 2024
Aya 23: Open Weight Releases to Further Multilingual Progress
Viraat Aryabumi
John Dang
Dwarak Talupuru
Saurabh Dash
David Cairuz
...
Aidan Gomez
Phil Blunsom
Marzieh Fadaee
Ahmet Üstün
Sara Hooker
OSLM
69
85
0
23 May 2024
On the Evaluation of Machine-Generated Reports
James Mayfield
Eugene Yang
Dawn J Lawrie
Sean MacAvaney
Paul McNamee
...
Orion Weller
Efsun Kayi
Kate Sanders
Marc Mason
Noah Hibbler
ALM
134
14
0
02 May 2024
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin
Sam Ade Jacobs
A. A. Awan
J. Aneja
Ahmed Hassan Awadallah
...
Li Zhang
Yi Zhang
Yue Zhang
Yunan Zhang
Xiren Zhou
LRM
ALM
123
1,208
0
22 Apr 2024
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
84
185
0
15 Apr 2024
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems
Sara Rosenthal
Avirup Sil
Radu Florian
Salim Roukos
82
13
0
02 Apr 2024
RAFT: Adapting Language Model to Domain Specific RAG
Tianjun Zhang
Shishir G. Patil
Naman Jain
Sheng Shen
Matei A. Zaharia
Ion Stoica
Joseph E. Gonzalez
RALM
79
200
0
15 Mar 2024
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang
Lianmin Zheng
Ying Sheng
Anastasios Nikolas Angelopoulos
Tianle Li
...
Hao Zhang
Banghua Zhu
Michael I. Jordan
Joseph E. Gonzalez
Ion Stoica
OSLM
142
570
0
07 Mar 2024
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Wenda Xu
Guanglei Zhu
Xuandong Zhao
Liangming Pan
Lei Li
Wenjie Wang
62
60
0
18 Feb 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
Guiming Hardy Chen
Shunian Chen
Ziche Liu
Feng Jiang
Benyou Wang
103
107
0
16 Feb 2024
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
Jianlv Chen
Shitao Xiao
Peitian Zhang
Kun Luo
Defu Lian
Zheng Liu
619
409
0
05 Feb 2024
ChatQA: Surpassing GPT-4 on Conversational QA and RAG
Zihan Liu
Ming-Yu Liu
Rajarshi Roy
Peng Xu
Chankyu Lee
Mohammad Shoeybi
Bryan Catanzaro
ALM
RALM
AI4MH
72
47
0
18 Jan 2024
RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
Cheng Niu
Yuanhao Wu
Juno Zhu
Siliang Xu
Kashun Shum
Randy Zhong
Juntong Song
Tong Zhang
HILM
72
103
0
31 Dec 2023
Retrieval-Augmented Generation for Large Language Models: A Survey
Yunfan Gao
Yun Xiong
Xinyu Gao
Kangxiang Jia
Jinliu Pan
Yuxi Bi
Yi Dai
Jiawei Sun
Meng Wang
Haofen Wang
3DV
RALM
147
1,767
1
18 Dec 2023
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Nandan Thakur
Jianmo Ni
Gustavo Hernández Ábrego
John Wieting
Jimmy J. Lin
Daniel Cer
RALM
84
13
0
10 Nov 2023
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation
Tu Vu
Mohit Iyyer
Xuezhi Wang
Noah Constant
Jerry W. Wei
...
Chris Tar
Yun-hsuan Sung
Denny Zhou
Quoc Le
Thang Luong
KELM
HILM
LRM
92
215
0
05 Oct 2023
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
175
2,197
0
12 Sep 2023
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen
Hongyu Lin
Xianpei Han
Le Sun
3DV
RALM
55
294
0
04 Sep 2023
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
Ehsan Kamalloo
A. Jafari
Xinyu Crystina Zhang
Nandan Thakur
Jimmy J. Lin
50
43
0
31 Jul 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
LRM
110
1,277
0
17 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
344
4,298
0
09 Jun 2023
Large Language Models are not Fair Evaluators
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
112
561
0
29 May 2023
Enabling Large Language Models to Generate Text with Citations
Tianyu Gao
Howard Yen
Jiatong Yu
Danqi Chen
LM&MA
HILM
80
348
0
24 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
46
41
0
22 May 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
163
1,187
0
29 Mar 2023
Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking
Keshav Santhanam
Jon Saad-Falcon
M. Franz
Omar Khattab
Avirup Sil
Radu Florian
Md Arafat Sultan
Salim Roukos
Matei A. Zaharia
Christopher Potts
OffRL
64
10
0
02 Dec 2022
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud
A. Mensch
Jordan Hoffmann
Trevor Cai
Eliza Rutherford
...
Simon Osindero
Karen Simonyan
Jack W. Rae
Erich Elsen
Laurent Sifre
KELM
RALM
242
1,085
0
08 Dec 2021
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Pengcheng He
Jianfeng Gao
Weizhu Chen
151
1,192
0
18 Nov 2021
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
425
1,034
0
17 Apr 2021
Dynabench: Rethinking Benchmarking in NLP
Douwe Kiela
Max Bartolo
Yixin Nie
Divyansh Kaushik
Atticus Geiger
...
Pontus Stenetorp
Robin Jia
Joey Tianyi Zhou
Christopher Potts
Adina Williams
198
407
0
07 Apr 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
98
360
0
02 Feb 2021
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering
Gautier Izacard
Edouard Grave
RALM
119
1,171
0
02 Jul 2020
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
Omar Khattab
Matei A. Zaharia
136
1,364
0
27 Apr 2020
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin
Barlas Oğuz
Sewon Min
Patrick Lewis
Ledell Yu Wu
Sergey Edunov
Danqi Chen
Wen-tau Yih
RALM
180
3,755
0
10 Apr 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
141
609
0
10 Mar 2020
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
122
2,095
0
10 Feb 2020
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
67
325
0
04 Dec 2019
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
212
6,555
0
05 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
149
837
0
01 Nov 2019
1
2
Next