Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.00490
Cited By
v1
v2
v3 (latest)
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
1 October 2020
Daniel Deutsch
Tania Bedrax-Weiss
Dan Roth
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary"
50 / 66 papers shown
Title
Simple and Effective Baselines for Code Summarisation Evaluation
Jade Robinson
Jonathan K. Kummerfeld
103
0
0
26 May 2025
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation
Yue Guo
Jae Ho Sohn
Gondy Leroy
Trevor Cohen
ELM
75
0
0
15 May 2025
AskQE: Question Answering as Automatic Evaluation for Machine Translation
Dayeon Ki
Kevin Duh
Marine Carpuat
108
3
0
15 Apr 2025
Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing
Madeline Anderson
Miriam Cha
William T. Freeman
J. Taylor Perron
Nathaniel Maidel
Kerri Cahoy
46
0
0
28 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
197
1
0
20 Jan 2025
BookWorm: A Dataset for Character Description and Analysis
Argyrios Papoudakis
Mirella Lapata
Frank Keller
61
2
0
14 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
66
0
0
29 Sep 2024
NovAScore: A New Automated Metric for Evaluating Document Level Novelty
Lin Ai
Ziwei Gong
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Ahmad Emami
Julia Hirschberg
45
1
0
14 Sep 2024
Addressing Topic Leakage in Cross-Topic Evaluation for Authorship Verification
Jitkapat Sawatphol
Can Udomcharoenchaikit
Sarana Nutanong
55
0
0
27 Jul 2024
Benchmarking Complex Instruction-Following with Multiple Constraints Composition
Bosi Wen
Pei Ke
Xiaotao Gu
Lindong Wu
Hao Huang
...
Jiaxin Xu
Yiming Liu
Jie Tang
Hongning Wang
Minlie Huang
CoGe
130
53
0
04 Jul 2024
A Comparative Study of Quality Evaluation Methods for Text Summarization
Huyen Nguyen
Haihua Chen
Lavanya Pobbathi
Junhua Ding
ELM
88
6
0
30 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jannik Kossen
Jiatong Han
Muhammed Razzak
Lisa Schut
Shreshth A. Malik
Yarin Gal
HILM
115
54
0
22 Jun 2024
Linguistically Conditioned Semantic Textual Similarity
Jingxuan Tu
Keer Xu
Liulu Yue
Bingyang Ye
Kyeongmin Rim
James Pustejovsky
87
1
0
06 Jun 2024
Select and Summarize: Scene Saliency for Movie Script Summarization
Rohit Saxena
Frank Keller
77
4
0
04 Apr 2024
ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications
Sotaro Takeshita
Tommaso Green
Ines Reinig
Kai Eckert
Simone Paolo Ponzetto
69
12
0
08 Mar 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes
Sebastian Antony Joseph
Jorg Schlotterer
Christin Seifert
Kyle Lo
Wei Xu
Byron C. Wallace
Junyi Jessy Li
125
7
0
29 Jan 2024
Structsum Generation for Faster Text Comprehension
Parag Jain
Andreea Marzoca
Francesco Piccinno
ReLM
72
8
0
12 Jan 2024
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
G. Chrysostomou
Zhixue Zhao
Miles Williams
Nikolaos Aletras
HILM
74
11
0
15 Nov 2023
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Jaemin Cho
Yushi Hu
Roopal Garg
Peter Anderson
Ranjay Krishna
Jason Baldridge
Mohit Bansal
Jordi Pont-Tuset
Su Wang
EGVM
86
81
0
27 Oct 2023
Metric Ensembles For Hallucination Detection
Grant C. Forbes
Parth Katlana
Zeydy Ortiz
HILM
48
4
0
16 Oct 2023
Calibrating Likelihoods towards Consistency in Summarization Models
Polina Zablotskaia
Misha Khalman
Rishabh Joshi
Livio Baldini Soares
Shoshana Jakobovits
Joshua Maynez
Shashi Narayan
49
4
0
12 Oct 2023
Visual Storytelling with Question-Answer Plans
Danyang Liu
Mirella Lapata
Frank Keller
CoGe
94
9
0
08 Oct 2023
The Extractive-Abstractive Axis: Measuring Content "Borrowing" in Generative Language Models
Nedelina Teneva
48
0
0
20 Jul 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Pei Ke
Fei Huang
Fei Mi
Yasheng Wang
Qun Liu
Xiaoyan Zhu
Minlie Huang
ReLM
ELM
92
10
0
13 Jul 2023
MeetingBank: A Benchmark Dataset for Meeting Summarization
Yebowen Hu
Timothy Jeewun Ganter
Hanieh Deilamsalehy
Franck Dernoncourt
H. Foroosh
Fei Liu
AI4TS
82
50
0
27 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
259
705
0
23 May 2023
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media
Kung-Hsiang Huang
Hou Pong Chan
Kathleen McKeown
Heng Ji
95
1
0
23 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
169
22
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
182
9
0
23 May 2023
Attributable and Scalable Opinion Summarization
Tom Hosking
Hao Tang
Mirella Lapata
71
9
0
19 May 2023
Zero-shot Faithful Factual Error Correction
Kung-Hsiang Huang
Hou Pong Chan
Heng Ji
KELM
HILM
104
32
0
13 May 2023
Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation
Yixin Liu
Alexander R. Fabbri
Yilun Zhao
Pengfei Liu
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
53
28
0
07 Mar 2023
MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization
Potsawee Manakul
Adian Liusie
Mark Gales
HILM
87
36
0
28 Jan 2023
On the State of German (Abstractive) Text Summarization
Dennis Aumiller
Jing Fan
Michael Gertz
63
1
0
17 Jan 2023
Rethinking with Retrieval: Faithful Large Language Model Inference
Hangfeng He
Hongming Zhang
Dan Roth
KELM
LRM
247
169
0
31 Dec 2022
mFACE: Multilingual Summarization with Factual Consistency Evaluation
Roee Aharoni
Shashi Narayan
Joshua Maynez
Jonathan Herzig
Elizabeth Clark
Mirella Lapata
HILM
85
47
0
20 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
86
134
0
15 Dec 2022
HaRiM
+
^+
+
: Evaluating Summary Quality with Hallucination Risk
Seonil Son
Junsoo Park
J. Hwang
Junghwa Lee
Hyungjong Noh
Yeonsoo Lee
HILM
63
8
0
22 Nov 2022
On the Limitations of Reference-Free Evaluations of Generated Text
Daniel Deutsch
Rotem Dror
Dan Roth
122
48
0
22 Oct 2022
Shortcomings of Question Answering Based Factuality Frameworks for Error Localization
Ryo Kamoi
Tanya Goyal
Greg Durrett
HILM
92
14
0
13 Oct 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
136
412
0
26 Sep 2022
Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
Shiyue Zhang
David Wan
Joey Tianyi Zhou
HILM
113
31
0
08 Sep 2022
Podcast Summary Assessment: A Resource for Evaluating Summary Assessment Methods
Potsawee Manakul
Mark Gales
65
5
0
28 Aug 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference
Yanran Chen
Steffen Eger
107
18
0
15 Aug 2022
SMART: Sentences as Basic Units for Text Evaluation
Reinald Kim Amplayo
Peter J. Liu
Yao-Min Zhao
Shashi Narayan
79
22
0
01 Aug 2022
QASem Parsing: Text-to-text Modeling of QA-based Semantics
Ayal Klein
Eran Hirsch
Ron Eliav
Valentina Pyatkin
Avi Caciularu
Ido Dagan
97
13
0
23 May 2022
Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
Elisa Kreiss
Cynthia L. Bennett
Shayan Hooshmand
E. Zelikman
Meredith Ringel Morris
Christopher Potts
83
27
0
21 May 2022
PREME: Preference-based Meeting Exploration through an Interactive Questionnaire
Negar Arabzadeh
Ali Ahmadvand
Julia Kiseleva
Yang Liu
Ahmed Hassan Awadallah
Ming Zhong
Milad Shokouhi
86
4
0
05 May 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code
Daniel Deutsch
Dan Roth
AI4CE
97
2
0
29 Apr 2022
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Daniel Deutsch
Rotem Dror
Dan Roth
75
45
0
21 Apr 2022
1
2
Next