ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.07100
  4. Cited By
Re-evaluating Evaluation in Text Summarization

Re-evaluating Evaluation in Text Summarization

14 October 2020
Manik Bhandari
Pranav Narayan Gour
A. Ashfaq
Pengfei Liu
Graham Neubig
ArXivPDFHTML

Papers citing "Re-evaluating Evaluation in Text Summarization"

50 / 59 papers shown
Title
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
62
1
0
21 Mar 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
73
4
0
28 Jan 2025
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
MIRROR: A Novel Approach for the Automated Evaluation of Open-Ended Question Generation
Aniket Deroy
Subhankar Maity
Sudeshna Sarkar
LLMAG
LRM
41
3
0
16 Oct 2024
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation
  for Meeting Summarization
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization
Ziwei Gong
Lin Ai
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Zehui Wu
Ahmad Emami
Julia Hirschberg
44
2
0
17 Sep 2024
Query-Guided Self-Supervised Summarization of Nursing Notes
Query-Guided Self-Supervised Summarization of Nursing Notes
Ya Gao
H. Moen
S. Koivusalo
M. Koskinen
Pekka Marttinen
44
1
0
04 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
51
41
0
01 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
41
31
0
01 Jul 2024
Are We Done with MMLU?
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
60
31
0
06 Jun 2024
On the Role of Summary Content Units in Text Summarization Evaluation
On the Role of Summary Content Units in Text Summarization Evaluation
Marcel Nawrath
Agnieszka Nowak
Tristan Ratz
Danilo C. Walenta
Juri Opitz
...
Sebastian Gehrmann
Saad Mahamood
Miruna Clinciu
Khyathi Raghavi Chandu
Yufang Hou
ELM
29
5
0
02 Apr 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Preslav Nakov
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
80
2
0
22 Feb 2024
Event-Keyed Summarization
Event-Keyed Summarization
William Gantt
Alexander Martin
Pavlo Kuchmiichuk
Aaron Steven White
30
1
0
10 Feb 2024
Evaluating Generative Ad Hoc Information Retrieval
Evaluating Generative Ad Hoc Information Retrieval
Lukas Gienapp
Harrisen Scells
Niklas Deckers
Janek Bevendorff
Shuai Wang
...
Maik Frobe
Guide Zucoon
Benno Stein
Matthias Hagen
Martin Potthast
RALM
45
11
0
08 Nov 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
Yuchen Shen
Xiaojun Wan
38
9
0
27 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
79
0
09 Oct 2023
The Trickle-down Impact of Reward (In-)consistency on RLHF
The Trickle-down Impact of Reward (In-)consistency on RLHF
Lingfeng Shen
Sihao Chen
Linfeng Song
Lifeng Jin
Baolin Peng
Haitao Mi
Daniel Khashabi
Dong Yu
40
21
0
28 Sep 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
86
607
0
23 May 2023
Towards More Robust NLP System Evaluation: Handling Missing Scores in
  Benchmarks
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Anas Himmi
Ekhine Irurozki
Nathan Noiry
Stéphan Clémençon
Pierre Colombo
34
5
0
17 May 2023
Discourse over Discourse: The Need for an Expanded Pragmatic Focus in
  Conversational AI
Discourse over Discourse: The Need for an Expanded Pragmatic Focus in Conversational AI
S. M. Seals
V. Shalin
29
4
0
27 Apr 2023
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
62
446
0
07 Mar 2023
Towards Interpretable and Efficient Automatic Reference-Based
  Summarization Evaluation
Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation
Yixin Liu
Alexander R. Fabbri
Yilun Zhao
Pengfei Liu
Chenyu You
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
15
27
0
07 Mar 2023
BMX: Boosting Natural Language Generation Metrics with Explainability
BMX: Boosting Natural Language Generation Metrics with Explainability
Christoph Leiter
Hoang-Quan Nguyen
Steffen Eger
ELM
24
0
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error
  Analysis
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
36
14
0
20 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with
  Robust Human Evaluation
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Chenyu You
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
26
133
0
15 Dec 2022
A Survey on Medical Document Summarization
A Survey on Medical Document Summarization
Raghav Jain
Anubhav Jangra
S. Saha
Adam Jatowt
3DGS
MedIm
42
19
0
03 Dec 2022
How Far are We from Robust Long Abstractive Summarization?
How Far are We from Robust Long Abstractive Summarization?
Huan Yee Koh
Jiaxin Ju
He Zhang
Ming Liu
Shirui Pan
HILM
31
39
0
30 Oct 2022
Towards Interpretable Summary Evaluation via Allocation of Contextual
  Embeddings to Reference Text Topics
Towards Interpretable Summary Evaluation via Allocation of Contextual Embeddings to Reference Text Topics
Ben Schaper
Christopher Lohse
Marcell Streile
Andrea Giovannini
Richard Osuala
24
1
0
25 Oct 2022
BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for
  Text Generation
BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
Tianxiang Sun
Junliang He
Xipeng Qiu
Xuanjing Huang
24
44
0
14 Oct 2022
DATScore: Evaluating Translation with Data Augmented Translations
DATScore: Evaluating Translation with Data Augmented Translations
Moussa Kamal Eddine
Guokan Shang
Michalis Vazirgiannis
44
5
0
12 Oct 2022
Readability Controllable Biomedical Document Summarization
Readability Controllable Biomedical Document Summarization
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
42
36
0
10 Oct 2022
Generative Language Models for Paragraph-Level Question Generation
Generative Language Models for Paragraph-Level Question Generation
Asahi Ushio
Fernando Alva-Manchego
Jose Camacho-Collados
ELM
13
45
0
08 Oct 2022
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
The Glass Ceiling of Automatic Evaluation in Natural Language Generation
Pierre Colombo
Maxime Peyrard
Nathan Noiry
Robert West
Pablo Piantanida
49
11
0
31 Aug 2022
Podcast Summary Assessment: A Resource for Evaluating Summary Assessment
  Methods
Podcast Summary Assessment: A Resource for Evaluating Summary Assessment Methods
Potsawee Manakul
Mark Gales
15
5
0
28 Aug 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation
  of Story Generation
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
53
50
0
24 Aug 2022
Abstractive Meeting Summarization: A Survey
Abstractive Meeting Summarization: A Survey
Virgile Rennard
Guokan Shang
Julie Hunter
Michalis Vazirgiannis
40
15
0
08 Aug 2022
SMART: Sentences as Basic Units for Text Evaluation
SMART: Sentences as Basic Units for Text Evaluation
Reinald Kim Amplayo
Peter J. Liu
Yao-Min Zhao
Shashi Narayan
38
21
0
01 Aug 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and
  Metrics
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
Huan Yee Koh
Jiaxin Ju
Ming Liu
Shirui Pan
81
122
0
03 Jul 2022
SNaC: Coherence Error Detection for Narrative Summarization
SNaC: Coherence Error Detection for Narrative Summarization
Tanya Goyal
Junyi Jessy Li
Greg Durrett
40
27
0
19 May 2022
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Affective Feedback Synthesis Towards Multimodal Text and Image Data
Puneet Kumar
Gaurav Bhatt
Omkar Ingle
Daksh Goyal
Balasubramanian Raman
EGVM
36
3
0
23 Mar 2022
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Wei-Ye Zhao
Michael Strube
Steffen Eger
27
37
0
26 Jan 2022
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
Pierre Colombo
Chloe Clave
Pablo Piantanida
34
41
0
02 Dec 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
28
7
0
18 Oct 2021
Summarize-then-Answer: Generating Concise Explanations for Multi-hop
  Reading Comprehension
Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension
Naoya Inoue
H. Trivedi
Steven K. Sinha
Niranjan Balasubramanian
Kentaro Inui
58
14
0
14 Sep 2021
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Automatic Text Evaluation through the Lens of Wasserstein Barycenters
Pierre Colombo
Guillaume Staerman
Chloé Clavel
Pablo Piantanida
27
41
0
27 Aug 2021
ComSum: Commit Messages Summarization and Meaning Preservation
ComSum: Commit Messages Summarization and Meaning Preservation
Leshem Choshen
Idan Amit
17
4
0
23 Aug 2021
EmailSum: Abstractive Email Thread Summarization
EmailSum: Abstractive Email Thread Summarization
Shiyue Zhang
Asli Celikyilmaz
Jianfeng Gao
Joey Tianyi Zhou
30
38
0
30 Jul 2021
BARTScore: Evaluating Generated Text as Text Generation
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
50
809
0
22 Jun 2021
How well do you know your summarization datasets?
How well do you know your summarization datasets?
Priyam Tejaswin
Dhruv Naik
Peng Liu
33
26
0
21 Jun 2021
Evaluating the Efficacy of Summarization Evaluation across Languages
Evaluating the Efficacy of Summarization Evaluation across Languages
Fajri Koto
Jey Han Lau
Timothy Baldwin
50
19
0
02 Jun 2021
Towards Human-Free Automatic Quality Evaluation of German Summarization
Towards Human-Free Automatic Quality Evaluation of German Summarization
Neslihan Iskender
Oleg V. Vasilyev
Tim Polzehl
John Bohannon
Sebastian Möller
29
1
0
13 May 2021
12
Next