Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.03724
Cited By
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
7 May 2020
Yang Gao
Wei-Ye Zhao
Steffen Eger
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization"
50 / 78 papers shown
Title
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
45
0
0
04 May 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving
Avinash Kumar
Shashank Nag
Jason Clemons
L. John
Poulami Das
31
0
0
14 Apr 2025
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?
Daniil Larionov
Sotaro Takeshita
Ran Zhang
Yanran Chen
Christoph Leiter
Zhipin Wang
Christian Greisinger
Steffen Eger
ReLM
ELM
LRM
74
1
0
10 Apr 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
62
1
0
21 Mar 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
Roshan S. Sharma
Suwon Shon
Mark Lindsey
Hira Dhamyal
Rita Singh
Bhiksha Raj
56
1
0
12 Aug 2024
Large Language Models as Evaluators for Scientific Synthesis
Julia Evans
Jennifer D'Souza
Sören Auer
ELM
42
4
0
03 Jul 2024
PerSEval: Assessing Personalization in Text Summarizers
Sourish Dasgupta
Ankush Chander
Parth Borad
Isha Motiyani
Tanmoy Chakraborty
40
0
0
29 Jun 2024
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Haopeng Zhang
Philip S. Yu
Jiawei Zhang
37
17
0
17 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
31
1
0
03 Jun 2024
JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization
Xiaobo Guo
Jay Desai
Srinivasan H. Sengamedu
AI4TS
46
0
0
28 May 2024
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation
Cyril Chhun
Fabian M. Suchanek
Chloé Clavel
LRM
42
14
0
22 May 2024
LUNA: A Framework for Language Understanding and Naturalness Assessment
Marat Saidov
A. Bakalova
Ekaterina Taktasheva
Vladislav Mikhailov
Ekaterina Artemova
ELM
39
1
0
09 Jan 2024
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu
N. Moosavi
Chenghua Lin
ELM
30
48
0
16 Nov 2023
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey
Ashok Urlana
Pruthwik Mishra
Tathagato Roy
Rahul Mishra
37
8
0
15 Nov 2023
GNAT: A General Narrative Alignment Tool
T. Pial
Steven Skiena
15
4
0
07 Nov 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
Juri Opitz
Daniel Deutsch
Yang Gao
Rotem Dror
Steffen Eger
ALM
LRM
ELM
40
31
0
30 Oct 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
Yuchen Shen
Xiaojun Wan
38
9
0
27 Oct 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang
Kyle Lo
Tanya Goyal
Mohit Iyyer
ALM
26
106
0
01 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
30
14
0
29 Sep 2023
OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement
Yang Gao
Ji Ma
I. Korotkov
Keith B. Hall
Dana Alon
Donald Metzler
13
0
0
19 Sep 2023
Automatic Personalized Impression Generation for PET Reports Using Large Language Models
Xin Tie
Muheon Shin
Ali Pirasteh
Nevein Ibrahim
Zachary Huemann
...
K. M. Kelly
John W. Garrett
Junjie Hu
Steve Y. Cho
Tyler Bradshaw
LM&MA
27
10
0
18 Sep 2023
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization
Mousumi Akter
Shubhra (Santu) Karmaker
23
1
0
04 Aug 2023
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
Ran Zhang
Jihed Ouni
Steffen Eger
32
6
0
22 Jun 2023
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types
K. Murugesan
Sarathkrishna Swaminathan
Soham Dan
Subhajit Chaudhury
Chulaka Gunasekara
...
Ibrahim Abdelaziz
Achille Fokoue
Pavan Kapanipathi
Salim Roukos
Alexander G. Gray
42
5
0
18 Jun 2023
Correction of Errors in Preference Ratings from Automated Metrics for Text Generation
Jan Deriu
Pius von Daniken
Don Tuggener
Mark Cieliebak
29
2
0
06 Jun 2023
UMSE: Unified Multi-scenario Summarization Evaluation
Shen Gao
Zhitao Yao
Chongyang Tao
Preslav Nakov
Pengjie Ren
Z. Ren
Zhumin Chen
30
5
0
26 May 2023
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory
Ziang Xiao
Susu Zhang
Vivian Lai
Q. V. Liao
ELM
35
24
0
24 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
113
20
0
23 May 2023
Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization
Sumanta Bhattacharyya
R. Manuvinakurike
Sahisnu Mazumder
Saurav Sahay
VLM
21
0
0
08 Mar 2023
A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding
D. Cajueiro
A. G. Nery
Igor Tavares
Maísa Kely de Melo
Silvia A. dos Reis
Weigang Li
V. R. R. Celestino
33
15
0
04 Jan 2023
DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely
F. S. Bao
Ruixuan Tu
Ge Luo
Yinfei Yang
Hebi Li
Minghui Qiu
Youbiao He
Cen Chen
21
2
0
20 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Chenyu You
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
26
133
0
15 Dec 2022
Towards Interpretable Summary Evaluation via Allocation of Contextual Embeddings to Reference Text Topics
Ben Schaper
Christopher Lohse
Marcell Streile
Andrea Giovannini
Richard Osuala
24
1
0
25 Oct 2022
On the Limitations of Reference-Free Evaluations of Generated Text
Daniel Deutsch
Rotem Dror
Dan Roth
40
45
0
22 Oct 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
31
387
0
26 Sep 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
53
50
0
24 Aug 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference
Yanran Chen
Steffen Eger
32
16
0
15 Aug 2022
SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder
Wuhang Lin
Shasha Li
Chen Zhang
Bing Ji
Jie Yu
Jun Ma
Zibo Yi
11
6
0
11 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
81
122
0
03 Jul 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
Elisa Kreiss
Cynthia L. Bennett
Shayan Hooshmand
E. Zelikman
Meredith Ringel Morris
Christopher Potts
48
27
0
21 May 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code
Daniel Deutsch
Dan Roth
AI4CE
45
2
0
29 Apr 2022
Entity-driven Fact-aware Abstractive Summarization of Biomedical Literature
Amanuel Alambo
Tanvi Banerjee
K. Thirunarayan
M. Raymer
MedIm
21
7
0
30 Mar 2022
PeerSum: A Peer Review Dataset for Abstractive Multi-document Summarization
Miao Li
Jianzhong Qi
Jey Han Lau
14
2
0
03 Mar 2022
USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation
Jonas Belouadi
Steffen Eger
33
20
0
21 Feb 2022
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei-Ye Zhao
Michael Strube
Steffen Eger
27
37
0
26 Jan 2022
WIDAR -- Weighted Input Document Augmented ROUGE
Raghav Jain
Vaibhav Mavi
Anubhav Jangra
S. Saha
16
4
0
23 Jan 2022
Consistency and Coherence from Points of Contextual Similarity
Oleg V. Vasilyev
John Bohannon
HILM
33
1
0
22 Dec 2021
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors
Marvin Kaster
Wei-Ye Zhao
Steffen Eger
33
24
0
08 Oct 2021
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results
M. Fomicheva
Piyawat Lertvittayakumjorn
Wei-Ye Zhao
Steffen Eger
Yang Gao
ELM
24
39
0
08 Oct 2021
1
2
Next