v1v2v3 (latest)

Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary

1 October 2020

Papers citing "Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary"

50 / 66 papers shown

Title
Simple and Effective Baselines for Code Summarisation Evaluation Jade Robinson Jonathan K. Kummerfeld 103 0 0 26 May 2025
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation Yue Guo Jae Ho Sohn Gondy Leroy Trevor Cohen ELM 75 0 0 15 May 2025
AskQE: Question Answering as Automatic Evaluation for Machine Translation Dayeon Ki Kevin Duh Marine Carpuat 108 3 0 15 Apr 2025
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing Madeline Anderson Miriam Cha William T. Freeman J. Taylor Perron Nathaniel Maidel Kerri Cahoy 46 0 0 28 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs Jiaxing Wu Lin Ning Luyang Liu Harrison Lee Neo Wu Chao Wang Sushant Prakash S. O’Banion Bradley Green Jun Xie 197 1 0 20 Jan 2025
BookWorm: A Dataset for Character Description and Analysis Argyrios Papoudakis Mirella Lapata Frank Keller 61 2 0 14 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics Xiang Dai Sarvnaz Karimi Biaoyan Fang 66 0 0 29 Sep 2024
NovAScore: A New Automated Metric for Evaluating Document Level Novelty Lin Ai Ziwei Gong Harshsaiprasad Deshpande Alexander Johnson Emmy Phung Ahmad Emami Julia Hirschberg 45 1 0 14 Sep 2024
Addressing Topic Leakage in Cross-Topic Evaluation for Authorship Verification Jitkapat Sawatphol Can Udomcharoenchaikit Sarana Nutanong 55 0 0 27 Jul 2024
Benchmarking Complex Instruction-Following with Multiple Constraints Composition Bosi Wen Pei Ke Xiaotao Gu Lindong Wu Hao Huang ... Jiaxin Xu Yiming Liu Jie Tang Hongning Wang Minlie Huang CoGe 130 53 0 04 Jul 2024
A Comparative Study of Quality Evaluation Methods for Text Summarization Huyen Nguyen Haihua Chen Lavanya Pobbathi Junhua Ding ELM 88 6 0 30 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs Jannik Kossen Jiatong Han Muhammed Razzak Lisa Schut Shreshth A. Malik Yarin Gal HILM 115 54 0 22 Jun 2024
Linguistically Conditioned Semantic Textual Similarity Jingxuan Tu Keer Xu Liulu Yue Bingyang Ye Kyeongmin Rim James Pustejovsky 87 1 0 06 Jun 2024
Select and Summarize: Scene Saliency for Movie Script Summarization Rohit Saxena Frank Keller 77 4 0 04 Apr 2024
ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications Sotaro Takeshita Tommaso Green Ines Reinig Kai Eckert Simone Paolo Ponzetto 69 12 0 08 Mar 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification Jan Trienes Sebastian Antony Joseph Jorg Schlotterer Christin Seifert Kyle Lo Wei Xu Byron C. Wallace Junyi Jessy Li 125 7 0 29 Jan 2024
Structsum Generation for Faster Text Comprehension Parag Jain Andreea Marzoca Francesco Piccinno ReLM 72 8 0 12 Jan 2024
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization G. Chrysostomou Zhixue Zhao Miles Williams Nikolaos Aletras HILM 74 11 0 15 Nov 2023
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation Jaemin Cho Yushi Hu Roopal Garg Peter Anderson Ranjay Krishna Jason Baldridge Mohit Bansal Jordi Pont-Tuset Su Wang EGVM 86 81 0 27 Oct 2023
Metric Ensembles For Hallucination Detection Grant C. Forbes Parth Katlana Zeydy Ortiz HILM 48 4 0 16 Oct 2023
Calibrating Likelihoods towards Consistency in Summarization Models Polina Zablotskaia Misha Khalman Rishabh Joshi Livio Baldini Soares Shoshana Jakobovits Joshua Maynez Shashi Narayan 49 4 0 12 Oct 2023
Visual Storytelling with Question-Answer Plans Danyang Liu Mirella Lapata Frank Keller CoGe 94 9 0 08 Oct 2023
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models Nedelina Teneva 48 0 0 20 Jul 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering Pei Ke Fei Huang Fei Mi Yasheng Wang Qun Liu Xiaoyan Zhu Minlie Huang ReLM ELM 92 10 0 13 Jul 2023
MeetingBank: A Benchmark Dataset for Meeting Summarization Yebowen Hu Timothy Jeewun Ganter Hanieh Deilamsalehy Franck Dernoncourt H. Foroosh Fei Liu AI4TS 82 50 0 27 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation Sewon Min Kalpesh Krishna Xinxi Lyu M. Lewis Wen-tau Yih Pang Wei Koh Mohit Iyyer Luke Zettlemoyer Hannaneh Hajishirzi HILM ALM 259 705 0 23 May 2023
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media Kung-Hsiang Huang Hou Pong Chan Kathleen McKeown Heng Ji 95 1 0 23 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations Lucy Lu Wang Yulia Otmakhova Jay DeYoung Thinh Hung Truong Bailey Kuehl Erin Bransom Byron C. Wallace 169 22 0 23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization Yue Guo Tal August Gondy Leroy T. Cohen Lucy Lu Wang 182 9 0 23 May 2023
Attributable and Scalable Opinion Summarization Tom Hosking Hao Tang Mirella Lapata 71 9 0 19 May 2023
Zero-shot Faithful Factual Error Correction Kung-Hsiang Huang Hou Pong Chan Heng Ji KELM HILM 104 32 0 13 May 2023
Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation Yixin Liu Alexander R. Fabbri Yilun Zhao Pengfei Liu Shafiq Joty Chien-Sheng Wu Caiming Xiong Dragomir R. Radev 53 28 0 07 Mar 2023
MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization Potsawee Manakul Adian Liusie Mark Gales HILM 87 36 0 28 Jan 2023
On the State of German (Abstractive) Text Summarization Dennis Aumiller Jing Fan Michael Gertz 63 1 0 17 Jan 2023
Rethinking with Retrieval: Faithful Large Language Model Inference Hangfeng He Hongming Zhang Dan Roth KELM LRM 247 169 0 31 Dec 2022
mFACE: Multilingual Summarization with Factual Consistency Evaluation Roee Aharoni Shashi Narayan Joshua Maynez Jonathan Herzig Elizabeth Clark Mirella Lapata HILM 85 47 0 20 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu Alexander R. Fabbri Pengfei Liu Yilun Zhao Linyong Nan ... Simeng Han Shafiq Joty Chien-Sheng Wu Caiming Xiong Dragomir R. Radev ALM 86 134 0 15 Dec 2022
HaRiM $^+$ : Evaluating Summary Quality with Hallucination Risk Seonil Son Junsoo Park J. Hwang Junghwa Lee Hyungjong Noh Yeonsoo Lee HILM 63 8 0 22 Nov 2022
On the Limitations of Reference-Free Evaluations of Generated Text Daniel Deutsch Rotem Dror Dan Roth 122 48 0 22 Oct 2022
Shortcomings of Question Answering Based Factuality Frameworks for Error Localization Ryo Kamoi Tanya Goyal Greg Durrett HILM 92 14 0 13 Oct 2022
News Summarization and Evaluation in the Era of GPT-3 Tanya Goyal Junyi Jessy Li Greg Durrett ELM 136 412 0 26 Sep 2022
Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization Shiyue Zhang David Wan Joey Tianyi Zhou HILM 113 31 0 08 Sep 2022
Podcast Summary Assessment: A Resource for Evaluating Summary Assessment Methods Potsawee Manakul Mark Gales 65 5 0 28 Aug 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference Yanran Chen Steffen Eger 107 18 0 15 Aug 2022
SMART: Sentences as Basic Units for Text Evaluation Reinald Kim Amplayo Peter J. Liu Yao-Min Zhao Shashi Narayan 79 22 0 01 Aug 2022
QASem Parsing: Text-to-text Modeling of QA-based Semantics Ayal Klein Eran Hirsch Ron Eliav Valentina Pyatkin Avi Caciularu Ido Dagan 97 13 0 23 May 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics Elisa Kreiss Cynthia L. Bennett Shayan Hooshmand E. Zelikman Meredith Ringel Morris Christopher Potts 83 27 0 21 May 2022
PREME: Preference-based Meeting Exploration through an Interactive Questionnaire Negar Arabzadeh Ali Ahmadvand Julia Kiseleva Yang Liu Ahmed Hassan Awadallah Ming Zhong Milad Shokouhi 86 4 0 05 May 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code Daniel Deutsch Dan Roth AI4CE 97 2 0 29 Apr 2022
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics Daniel Deutsch Rotem Dror Dan Roth 75 45 0 21 Apr 2022