SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization

7 May 2020

Papers citing "SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization"

50 / 78 papers shown

Title
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation Tanguy Herserant Vincent Guigue ELM 45 0 0 04 May 2025
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving Avinash Kumar Shashank Nag Jason Clemons L. John Poulami Das 31 0 0 14 Apr 2025
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? Daniil Larionov Sotaro Takeshita Ran Zhang Yanran Chen Christoph Leiter Zhipin Wang Christian Greisinger Steffen Eger ReLM ELM LRM 74 1 0 10 Apr 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 62 1 0 21 Mar 2025
Reference-free Evaluation Metrics for Text Generation: A Survey Takumi Ito Kees van Deemter Jun Suzuki ELM 41 2 0 21 Jan 2025
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization? Roshan S. Sharma Suwon Shon Mark Lindsey Hira Dhamyal Rita Singh Bhiksha Raj 56 1 0 12 Aug 2024
Large Language Models as Evaluators for Scientific Synthesis Julia Evans Jennifer D'Souza Sören Auer ELM 42 4 0 03 Jul 2024
PerSEval: Assessing Personalization in Text Summarizers Sourish Dasgupta Ankush Chander Parth Borad Isha Motiyani Tanmoy Chakraborty 40 0 0 29 Jun 2024
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models Haopeng Zhang Philip S. Yu Jiawei Zhang 37 17 0 17 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation Pius von Daniken Jan Deriu Don Tuggener Mark Cieliebak 31 1 0 03 Jun 2024
JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization Xiaobo Guo Jay Desai Srinivasan H. Sengamedu AI4TS 46 0 0 28 May 2024
Do Language Models Enjoy Their Own Stories? Prompting Large Language Models for Automatic Story Evaluation Cyril Chhun Fabian M. Suchanek Chloé Clavel LRM 42 14 0 22 May 2024
LUNA: A Framework for Language Understanding and Naturalness Assessment Marat Saidov A. Bakalova Ekaterina Taktasheva Vladislav Mikhailov Ekaterina Artemova ELM 39 1 0 09 Jan 2024
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores Yiqi Liu N. Moosavi Chenghua Lin ELM 30 48 0 16 Nov 2023
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey Ashok Urlana Pruthwik Mishra Tathagato Roy Rahul Mishra 37 8 0 15 Nov 2023
GNAT: A General Narrative Alignment Tool T. Pial Steven Skiena 15 4 0 07 Nov 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics Christoph Leiter Juri Opitz Daniel Deutsch Yang Gao Rotem Dror Steffen Eger ALM LRM ELM 40 31 0 30 Oct 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization Yuchen Shen Xiaojun Wan 38 9 0 27 Oct 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs Yapei Chang Kyle Lo Tanya Goyal Mohit Iyyer ALM 26 106 0 01 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation Hangfeng He Hongming Zhang Dan Roth LRM ELM ReLM 30 14 0 29 Sep 2023
OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement Yang Gao Ji Ma I. Korotkov Keith B. Hall Dana Alon Donald Metzler 13 0 0 19 Sep 2023
Automatic Personalized Impression Generation for PET Reports Using Large Language Models Xin Tie Muheon Shin Ali Pirasteh Nevein Ibrahim Zachary Huemann ... K. M. Kelly John W. Garrett Junjie Hu Steve Y. Cho Tyler Bradshaw LM&MA 27 10 0 18 Sep 2023
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization Mousumi Akter Shubhra (Santu) Karmaker 23 1 0 04 Aug 2023
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation Ran Zhang Jihed Ouni Steffen Eger 32 6 0 22 Jun 2023
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types K. Murugesan Sarathkrishna Swaminathan Soham Dan Subhajit Chaudhury Chulaka Gunasekara ... Ibrahim Abdelaziz Achille Fokoue Pavan Kapanipathi Salim Roukos Alexander G. Gray 42 5 0 18 Jun 2023
Correction of Errors in Preference Ratings from Automated Metrics for Text Generation Jan Deriu Pius von Daniken Don Tuggener Mark Cieliebak 29 2 0 06 Jun 2023
UMSE: Unified Multi-scenario Summarization Evaluation Shen Gao Zhitao Yao Chongyang Tao Preslav Nakov Pengjie Ren Z. Ren Zhumin Chen 30 5 0 26 May 2023
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory Ziang Xiao Susu Zhang Vivian Lai Q. V. Liao ELM 35 24 0 24 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations Lucy Lu Wang Yulia Otmakhova Jay DeYoung Thinh Hung Truong Bailey Kuehl Erin Bransom Byron C. Wallace 113 20 0 23 May 2023
Sample Efficient Multimodal Semantic Augmentation for Incremental Summarization Sumanta Bhattacharyya R. Manuvinakurike Sahisnu Mazumder Saurav Sahay VLM 21 0 0 08 Mar 2023
A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding D. Cajueiro A. G. Nery Igor Tavares Maísa Kely de Melo Silvia A. dos Reis Weigang Li V. R. R. Celestino 33 15 0 04 Jan 2023
DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely F. S. Bao Ruixuan Tu Ge Luo Yinfei Yang Hebi Li Minghui Qiu Youbiao He Cen Chen 21 2 0 20 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu Alexander R. Fabbri Pengfei Liu Yilun Zhao Linyong Nan ... Simeng Han Chenyu You Chien-Sheng Wu Caiming Xiong Dragomir R. Radev ALM 26 133 0 15 Dec 2022
Towards Interpretable Summary Evaluation via Allocation of Contextual Embeddings to Reference Text Topics Ben Schaper Christopher Lohse Marcell Streile Andrea Giovannini Richard Osuala 24 1 0 25 Oct 2022
On the Limitations of Reference-Free Evaluations of Generated Text Daniel Deutsch Rotem Dror Dan Roth 40 45 0 22 Oct 2022
News Summarization and Evaluation in the Era of GPT-3 Tanya Goyal Junyi Jessy Li Greg Durrett ELM 31 387 0 26 Sep 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation Cyril Chhun Pierre Colombo Chloé Clavel Fabian M. Suchanek 53 50 0 24 Aug 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference Yanran Chen Steffen Eger 32 16 0 15 Aug 2022
SummScore: A Comprehensive Evaluation Metric for Summary Quality Based on Cross-Encoder Wuhang Lin Shasha Li Chen Zhang Bing Ji Jie Yu Jun Ma Zibo Yi 11 6 0 11 Jul 2022
An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics Huan Yee Koh Jiaxin Ju Ming Liu Shirui Pan 81 122 0 03 Jul 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics Elisa Kreiss Cynthia L. Bennett Shayan Hooshmand E. Zelikman Meredith Ringel Morris Christopher Potts 48 27 0 21 May 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code Daniel Deutsch Dan Roth AI4CE 45 2 0 29 Apr 2022
Entity-driven Fact-aware Abstractive Summarization of Biomedical Literature Amanuel Alambo Tanvi Banerjee K. Thirunarayan M. Raymer MedIm 21 7 0 30 Mar 2022
PeerSum: A Peer Review Dataset for Abstractive Multi-document Summarization Miao Li Jianzhong Qi Jey Han Lau 14 2 0 03 Mar 2022
USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation Jonas Belouadi Steffen Eger 33 20 0 21 Feb 2022
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence Wei-Ye Zhao Michael Strube Steffen Eger 27 37 0 26 Jan 2022
WIDAR -- Weighted Input Document Augmented ROUGE Raghav Jain Vaibhav Mavi Anubhav Jangra S. Saha 16 4 0 23 Jan 2022
Consistency and Coherence from Points of Contextual Similarity Oleg V. Vasilyev John Bohannon HILM 33 1 0 22 Dec 2021
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors Marvin Kaster Wei-Ye Zhao Steffen Eger 33 24 0 08 Oct 2021
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results M. Fomicheva Piyawat Lertvittayakumjorn Wei-Ye Zhao Steffen Eger Yang Gao ELM 24 39 0 08 Oct 2021