Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics

21 April 2022

Papers citing "Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics"

11 / 11 papers shown

Title
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 62 1 0 21 Mar 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge Aparna Elangovan Jongwoo Ko Lei Xu Mahsa Elyasi Ling Liu S. Bodapati Dan Roth 49 5 0 28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation Mingqi Gao Xinyu Hu Li Lin Xiaojun Wan 28 1 0 28 Jan 2025
How to Train Long-Context Language Models (Effectively) Tianyu Gao Alexander Wettig Howard Yen Danqi Chen RALM 72 38 0 03 Oct 2024
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly Howard Yen Tianyu Gao Minmin Hou Ke Ding Daniel Fleischer Peter Izsak Moshe Wasserblat Danqi Chen ALM ELM 62 25 0 03 Oct 2024
On the Challenges and Opportunities in Generative AI Laura Manduchi Kushagra Pandey Robert Bamler Ryan Cotterell Sina Daubener ... F. Wenzel Frank Wood Stephan Mandt Vincent Fortuin Vincent Fortuin 56 17 0 28 Feb 2024
Evaluating Factual Consistency of Texts with Semantic Role Labeling Jing Fan Dennis Aumiller Michael Gertz HILM 34 4 0 22 May 2023
LENS: A Learnable Evaluation Metric for Text Simplification Mounica Maddela Yao Dou David Heineman Wei-ping Xu 29 62 0 19 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu Alexander R. Fabbri Pengfei Liu Yilun Zhao Linyong Nan ... Simeng Han Shafiq R. Joty Chien-Sheng Wu Caiming Xiong Dragomir R. Radev ALM 24 132 0 15 Dec 2022
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning O. Yu. Golovneva Moya Chen Spencer Poff Martin Corredor Luke Zettlemoyer Maryam Fazel-Zarandi Asli Celikyilmaz ReLM LRM 34 138 0 15 Dec 2022
Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries Daniel Deutsch Dan Roth 56 7 0 23 Oct 2020