MTRAG-UN: A Benchmark for Open Challenges in Multi-Turn RAG Conversations

26 February 2026

Sara Rosenthal

Yannis Katsis

Vraj Shah

Lihong He

Lucian Popa

Marina Danilevsky

RALM

LRM

ArXiv (abs)PDF HTML Github (122★)

Main:3 Pages

7 Figures

Bibliography:2 Pages

6 Tables

Appendix:2 Pages

Abstract

We present MTRAG-UN, a benchmark for exploring open challenges in multi-turn retrieval augmented generation, a popular use of large language models. We release a benchmark of 666 tasks containing over 2,800 conversation turns across 6 domains with accompanying corpora. Our experiments show that retrieval and generation models continue to struggle on conversations with UNanswerable, UNderspecified, and NONstandalone questions and UNclear responses. Our benchmark is available atthis https URL

View on arXiv

Comments on this paper