Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2008.12009
Cited By
A Survey of Evaluation Metrics Used for NLG Systems
27 August 2020
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey of Evaluation Metrics Used for NLG Systems"
38 / 38 papers shown
Title
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Markos Stamatakis
Joshua Berger
Christian Wartena
Ralph Ewerth
Anett Hoppe
AI4Ed
41
0
0
03 May 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
85
0
0
20 Feb 2025
Consistency of Responses and Continuations Generated by Large Language Models on Social Media
Wenlu Fan
Yichen Zhu
Chenyang Wang
Bin Wang
Wentao Xu
62
1
0
14 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
120
67
0
25 Nov 2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani
Dinura Dissanayake
Hasindri Watawana
Noor Ahsan
Nevasini Sasikumar
...
Monojit Choudhury
Ivan Laptev
Mubarak Shah
Salman Khan
Fahad A Khan
124
8
0
25 Nov 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
62
80
0
13 May 2024
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models
Yukyung Lee
Soonwon Ka
Bokyung Son
Pilsung Kang
Jaewook Kang
LLMAG
52
6
0
22 Apr 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
35
3
0
19 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
57
29
0
02 Feb 2024
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
40
0
0
09 Jan 2024
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
37
66
0
21 Sep 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
38
158
0
02 Jun 2023
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap
Q. V. Liao
Ziang Xiao
ALM
ELM
43
29
0
01 Jun 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
57
9
0
23 May 2023
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
14
0
0
22 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
32
15
0
12 Apr 2023
Evaluating NLG systems: A brief introduction
Emiel van Miltenburg
29
0
0
29 Mar 2023
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
35
21
0
30 Dec 2022
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
George Kour
Samuel Ackerman
Orna Raz
E. Farchi
Boaz Carmeli
Ateret Anaby-Tavor
41
10
0
29 Nov 2022
Revisiting Grammatical Error Correction Evaluation and Beyond
Peiyuan Gong
Xuebo Liu
Heyan Huang
Min Zhang
26
16
0
03 Nov 2022
Quantum Natural Language Generation on Near-Term Devices
Amin Karamlou
Marcel Pfaffhauser
James R. Wootton
45
11
0
01 Nov 2022
Universal Evasion Attacks on Summarization Scoring
Wenchuan Mu
Kwan Hui Lim
AAML
32
1
0
25 Oct 2022
Revision Transformers: Instructing Language Models to Change their Values
Felix Friedrich
Wolfgang Stammer
P. Schramowski
Kristian Kersting
KELM
30
6
0
19 Oct 2022
DATScore: Evaluating Translation with Data Augmented Translations
Moussa Kamal Eddine
Guokan Shang
Michalis Vazirgiannis
41
5
0
12 Oct 2022
Abstractive Meeting Summarization: A Survey
Virgile Rennard
Guokan Shang
Julie Hunter
Michalis Vazirgiannis
32
15
0
08 Aug 2022
Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation
Prakhar Gupta
Harsh Jhamtani
Jeffrey P. Bigham
46
12
0
19 May 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
99
26
0
13 May 2022
CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech
Punyajoy Saha
Kanishk Singh
Adarsh Kumar
Binny Mathew
Animesh Mukherjee
16
35
0
09 May 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Dustin Wright
David Wadden
Kyle Lo
Bailey Kuehl
Arman Cohan
Isabelle Augenstein
Lucy Lu Wang
MedIm
45
54
0
24 Mar 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
AAML
ELM
22
20
0
21 Mar 2022
Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities
Waddah Saeed
C. Omlin
XAI
36
414
0
11 Nov 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
20
7
0
18 Oct 2021
Investigating Robustness of Dialog Models to Popular Figurative Language Constructs
Harsh Jhamtani
Varun Gangal
Eduard H. Hovy
Taylor Berg-Kirkpatrick
28
21
0
01 Oct 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
108
57
0
13 Sep 2021
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Pierre Colombo
Guillaume Staerman
Chloé Clavel
Pablo Piantanida
27
41
0
27 Aug 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
11
126
0
02 Jul 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
37
343
0
02 Feb 2021
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,464
0
06 Jun 2016
1