ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14395
  4. Cited By
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

20 May 2025
Seyoung Song
Seogyeong Jeong
Eunsu Kim
Jiho Jin
Dongkwan Kim
Jay Shin
Alice Oh
ArXivPDFHTML

Papers citing "MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language"

11 / 11 papers shown
Title
M-Prometheus: A Suite of Open Multilingual LLM Judges
M-Prometheus: A Suite of Open Multilingual LLM Judges
José P. Pombal
Dongkeun Yoon
Patrick Fernandes
Ian Wu
Seungone Kim
Ricardo Rei
Graham Neubig
André F. T. Martins
ELM
48
5
0
07 Apr 2025
CoGen: Learning from Feedback with Coupled Comprehension and Generation
CoGen: Learning from Feedback with Coupled Comprehension and Generation
Mustafa Omer Gul
Yoav Artzi
59
5
0
28 Aug 2024
Evaluating the Elementary Multilingual Capabilities of Large Language
  Models with MultiQ
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
Carolin Holtermann
Paul Röttger
Timm Dill
Anne Lauscher
ELM
LRM
62
25
0
06 Mar 2024
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question
  Games
Probing the Multi-turn Planning Capabilities of LLMs via 20 Question Games
Yizhe Zhang
Jiarui Lu
Navdeep Jaitly
LRM
ELM
40
13
0
02 Oct 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up
  Multilingual Evaluation?
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
58
67
0
14 Sep 2023
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic
  Classification in 200+ Languages and Dialects
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
David Ifeoluwa Adelani
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
ELM
76
76
0
14 Sep 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122
  Language Variants
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
72
153
0
31 Aug 2023
MD3: The Multi-Dialect Dataset of Dialogues
MD3: The Multi-Dialect Dataset of Dialogues
Jacob Eisenstein
Vinodkumar Prabhakaran
Clara E. Rivera
Dorottya Demszky
D. Sharma
59
10
0
19 May 2023
MEGA: Multilingual Evaluation of Generative AI
MEGA: Multilingual Evaluation of Generative AI
Kabir Ahuja
Harshita Diddee
Rishav Hada
Millicent Ochieng
Krithika Ramesh
...
T. Ganu
Sameer Segal
Maxamed Axmed
Kalika Bali
Sunayana Sitaram
LM&MA
LRM
ELM
79
283
0
22 Mar 2023
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44
  Languages
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages
Tahmid Hasan
Abhik Bhattacharjee
Md. Saiful Islam
Kazi Samin Mubasshir
Yuan-Fang Li
Yong-Bin Kang
M. Rahman
Rifat Shahriyar
73
370
0
25 Jun 2021
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual
  Machine Translation
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation
Naman Goyal
Cynthia Gao
Vishrav Chaudhary
Peng-Jen Chen
Guillaume Wenzek
Da Ju
Sanjan Krishnan
MarcÁurelio Ranzato
Francisco Guzman
Angela Fan
88
583
0
06 Jun 2021
1