ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.12009
  4. Cited By
A Survey of Evaluation Metrics Used for NLG Systems

A Survey of Evaluation Metrics Used for NLG Systems

27 August 2020
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
    ELM
ArXivPDFHTML

Papers citing "A Survey of Evaluation Metrics Used for NLG Systems"

39 / 39 papers shown
Title
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Markos Stamatakis
Joshua Berger
Christian Wartena
Ralph Ewerth
Anett Hoppe
AI4Ed
41
0
0
03 May 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
85
0
0
20 Feb 2025
Consistency of Responses and Continuations Generated by Large Language Models on Social Media
Consistency of Responses and Continuations Generated by Large Language Models on Social Media
Wenlu Fan
Bo Li
Chenyang Wang
Bin Wang
Wentao Xu
62
1
0
14 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
120
67
0
25 Nov 2024
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani
Dinura Dissanayake
Hasindri Watawana
Noor Ahsan
Nevasini Sasikumar
...
Monojit Choudhury
Ivan Laptev
Mubarak Shah
Salman Khan
Fahad A Khan
124
8
0
25 Nov 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
62
82
0
13 May 2024
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models
Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models
Yukyung Lee
Soonwon Ka
Bokyung Son
Pilsung Kang
Jaewook Kang
LLMAG
52
6
0
22 Apr 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot
  Interaction
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
35
3
0
19 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
60
29
0
02 Feb 2024
The Critique of Critique
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
40
0
0
09 Jan 2024
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
37
66
0
21 Sep 2023
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language
  Models
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Yangyi Chen
Karan Sikka
Michael Cogswell
Heng Ji
Ajay Divakaran
LRM
36
24
0
08 Sep 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
38
158
0
02 Jun 2023
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap
Rethinking Model Evaluation as Narrowing the Socio-Technical Gap
Q. V. Liao
Ziang Xiao
ALM
ELM
47
29
0
01 Jun 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
57
9
0
23 May 2023
GEST: the Graph of Events in Space and Time as a Common Representation
  between Vision and Language
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
14
0
0
22 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model
  Improvements
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
32
15
0
12 Apr 2023
Evaluating NLG systems: A brief introduction
Evaluating NLG systems: A brief introduction
Emiel van Miltenburg
31
0
0
29 Mar 2023
MAUVE Scores for Generative Models: Theory and Practice
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
35
21
0
30 Dec 2022
Measuring the Measuring Tools: An Automatic Evaluation of Semantic
  Metrics for Text Corpora
Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
George Kour
Samuel Ackerman
Orna Raz
E. Farchi
Boaz Carmeli
Ateret Anaby-Tavor
41
10
0
29 Nov 2022
Revisiting Grammatical Error Correction Evaluation and Beyond
Revisiting Grammatical Error Correction Evaluation and Beyond
Peiyuan Gong
Xuebo Liu
Heyan Huang
Min Zhang
28
16
0
03 Nov 2022
Quantum Natural Language Generation on Near-Term Devices
Quantum Natural Language Generation on Near-Term Devices
Amin Karamlou
Marcel Pfaffhauser
James R. Wootton
45
11
0
01 Nov 2022
Universal Evasion Attacks on Summarization Scoring
Universal Evasion Attacks on Summarization Scoring
Wenchuan Mu
Kwan Hui Lim
AAML
38
1
0
25 Oct 2022
Revision Transformers: Instructing Language Models to Change their
  Values
Revision Transformers: Instructing Language Models to Change their Values
Felix Friedrich
Wolfgang Stammer
P. Schramowski
Kristian Kersting
KELM
30
6
0
19 Oct 2022
DATScore: Evaluating Translation with Data Augmented Translations
DATScore: Evaluating Translation with Data Augmented Translations
Moussa Kamal Eddine
Guokan Shang
Michalis Vazirgiannis
44
5
0
12 Oct 2022
Abstractive Meeting Summarization: A Survey
Abstractive Meeting Summarization: A Survey
Virgile Rennard
Guokan Shang
Julie Hunter
Michalis Vazirgiannis
34
15
0
08 Aug 2022
Target-Guided Dialogue Response Generation Using Commonsense and Data
  Augmentation
Target-Guided Dialogue Response Generation Using Commonsense and Data Augmentation
Prakhar Gupta
Harsh Jhamtani
Jeffrey P. Bigham
49
12
0
19 May 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and
  Their Implications
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
99
26
0
13 May 2022
CounterGeDi: A controllable approach to generate polite, detoxified and
  emotional counterspeech
CounterGeDi: A controllable approach to generate polite, detoxified and emotional counterspeech
Punyajoy Saha
Kanishk Singh
Adarsh Kumar
Binny Mathew
Animesh Mukherjee
16
35
0
09 May 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Dustin Wright
David Wadden
Kyle Lo
Bailey Kuehl
Arman Cohan
Isabelle Augenstein
Lucy Lu Wang
MedIm
45
54
0
24 Mar 2022
Towards Explainable Evaluation Metrics for Natural Language Generation
Towards Explainable Evaluation Metrics for Natural Language Generation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
AAML
ELM
24
20
0
21 Mar 2022
Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and
  Future Opportunities
Explainable AI (XAI): A Systematic Meta-Survey of Current Challenges and Future Opportunities
Waddah Saeed
C. Omlin
XAI
36
415
0
11 Nov 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
20
7
0
18 Oct 2021
Investigating Robustness of Dialog Models to Popular Figurative Language
  Constructs
Investigating Robustness of Dialog Models to Popular Figurative Language Constructs
Harsh Jhamtani
Varun Gangal
Eduard H. Hovy
Taylor Berg-Kirkpatrick
28
21
0
01 Oct 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
110
57
0
13 Sep 2021
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Pierre Colombo
Guillaume Staerman
Chloé Clavel
Pablo Piantanida
27
41
0
27 Aug 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
11
126
0
02 Jul 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using
  Divergence Frontiers
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
37
343
0
02 Feb 2021
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,464
0
06 Jun 2016
1