Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04228
Cited By
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
8 April 2020
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Asking and Answering Questions to Evaluate the Factual Consistency of Summaries"
50 / 327 papers shown
Title
PEARL: Prompting Large Language Models to Plan and Execute Actions Over Long Documents
Simeng Sun
Y. Liu
Shuohang Wang
Chenguang Zhu
Mohit Iyyer
RALM
LRM
ReLM
33
52
0
23 May 2023
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
Philippe Laban
Wojciech Kry'sciñski
Divyansh Agarwal
Alexander R. Fabbri
Caiming Xiong
Chenyu You
Chien-Sheng Wu
ALM
HILM
35
33
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
86
607
0
23 May 2023
ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media
Kung-Hsiang Huang
Hou Pong Chan
Kathleen McKeown
Heng Ji
39
1
0
23 May 2023
Evaluating Factual Consistency of Summaries with Large Language Models
Shiqi Chen
Siyang Gao
Junxian He
ELM
LRM
HILM
37
6
0
23 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
113
20
0
23 May 2023
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan
Dennis Aumiller
Michael Gertz
HILM
36
4
0
22 May 2023
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
41
120
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
32
38
0
22 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
36
358
0
19 May 2023
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILM
ELM
29
71
0
18 May 2023
Counterfactual Debiasing for Generating Factually Consistent Text Summaries
Chenhe Dong
Yuexiang Xie
Yaliang Li
Ying Shen
CML
HILM
29
0
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
59
74
0
17 May 2023
FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge
Shangbin Feng
Vidhisha Balachandran
Yuyang Bai
Yulia Tsvetkov
KELM
HILM
26
52
0
14 May 2023
Zero-shot Faithful Factual Error Correction
Kung-Hsiang Huang
Hou Pong Chan
Heng Ji
KELM
HILM
30
30
0
13 May 2023
Can LMs Generalize to Future Data? An Empirical Analysis on Text Summarization
C. Cheang
Hou Pong Chan
Derek F. Wong
Xuebo Liu
Zhao Li
Yanming Sun
Shudong Liu
Lidia S. Chao
202
6
0
03 May 2023
WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus
Hongjing Qian
Yutao Zhu
Zhicheng Dou
Haoqi Gu
Xinyu Zhang
Zheng Liu
Ruofei Lai
Bo Zhao
J. Nie
Ji-Rong Wen
38
25
0
10 Apr 2023
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
29
125
0
05 Apr 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
59
1,082
0
29 Mar 2023
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
ELM
HILM
ALM
41
73
0
27 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
41
208
0
21 Mar 2023
A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course Summarization
Griffin Adams
Jason Zucker
Noémie Elhadad
54
23
0
07 Mar 2023
Models See Hallucinations: Evaluating the Factuality in Video Captioning
Hui Liu
Xiaojun Wan
HILM
34
10
0
06 Mar 2023
WiCE: Real-World Entailment for Claims in Wikipedia
Ryo Kamoi
Tanya Goyal
Juan Diego Rodriguez
Greg Durrett
41
81
0
02 Mar 2023
Factual Consistency Oriented Speech Recognition
Naoyuki Kanda
Takuya Yoshioka
Yang Liu
43
0
0
24 Feb 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
28
15
0
17 Feb 2023
"Why is this misleading?": Detecting News Headline Hallucinations with Explanations
Jiaming Shen
Jialu Liu
Daniel Finnie
N. Rahmati
Michael Bendersky
Marc Najork
30
19
0
12 Feb 2023
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MA
ALM
ELM
15
266
0
08 Feb 2023
Do Multi-Document Summarization Models Synthesize?
Jay DeYoung
Stephanie C. Martinez
Iain J. Marshall
Byron C. Wallace
24
8
0
31 Jan 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
22
90
0
30 Jan 2023
MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization
Potsawee Manakul
Adian Liusie
Mark Gales
HILM
13
35
0
28 Jan 2023
mFACE: Multilingual Summarization with Factual Consistency Evaluation
Roee Aharoni
Shashi Narayan
Joshua Maynez
Jonathan Herzig
Elizabeth Clark
Mirella Lapata
HILM
27
44
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
36
14
0
20 Dec 2022
WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Wenhao Wu
Wei Li
Xinyan Xiao
Jiachen Liu
Sujian Li
Yajuan Lv
HILM
28
4
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
40
44
0
20 Dec 2022
On Improving Summarization Factual Consistency from Natural Language Feedback
Yixin Liu
Budhaditya Deb
Milagro Teruel
Aaron L Halfaker
Dragomir R. Radev
Ahmed Hassan Awadallah
HILM
29
35
0
20 Dec 2022
BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics
Liang Ma
Shuyang Cao
IV RobertL.Logan
Di Lu
Shihao Ran
Kecheng Zhang
Joel R. Tetreault
A. Jaimes
17
6
0
20 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Chenyu You
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
24
133
0
15 Dec 2022
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation
Faeze Brahman
Baolin Peng
Michel Galley
Sudha Rao
Bill Dolan
Snigdha Chaturvedi
Jianfeng Gao
HILM
22
4
0
04 Dec 2022
CoP: Factual Inconsistency Detection by Controlling the Preference
Shuaijie She
Xiang Geng
Shujian Huang
Jiajun Chen
27
4
0
03 Dec 2022
HaRiM
+
^+
+
: Evaluating Summary Quality with Hallucination Risk
Seonil Son
Junsoo Park
J. Hwang
Junghwa Lee
Hyungjong Noh
Yeonsoo Lee
HILM
19
8
0
22 Nov 2022
Consecutive Question Generation via Dynamic Multitask Learning
Yun Li
Sujian Li
Xing Shi
LRM
27
2
0
16 Nov 2022
ED-FAITH: Evaluating Dialogue Summarization on Faithfulness
Sicong Huang
Asli Celikyilmaz
Haoran Li
HILM
36
4
0
15 Nov 2022
Evaluating the Factual Consistency of Large Language Models Through News Summarization
Derek Tam
Anisha Mascarenhas
Shiyue Zhang
Sarah Kwan
Joey Tianyi Zhou
Colin Raffel
HILM
30
96
0
15 Nov 2022
Discharge Summary Hospital Course Summarisation of In Patient Electronic Health Record Text with Clinical Concept Guided Deep Pre-Trained Transformer Models
Thomas Searle
Zina M. Ibrahim
J. Teo
Richard J. B. Dobson
21
29
0
14 Nov 2022
Evaluating and Improving Factuality in Multimodal Abstractive Summarization
David Wan
Joey Tianyi Zhou
20
10
0
04 Nov 2022
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question
Alireza Mohammadshahi
Thomas Scialom
Majid Yazdani
Pouya Yanki
Angela Fan
James Henderson
Marzieh Saeidi
31
20
0
02 Nov 2022
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency
Yanzhu Guo
Chloé Clavel
Moussa Kamal Eddine
Michalis Vazirgiannis
HILM
32
11
0
31 Oct 2022
How Far are We from Robust Long Abstractive Summarization?
Huan Yee Koh
Jiaxin Ju
He Zhang
Ming Liu
Shirui Pan
HILM
31
39
0
30 Oct 2022
Improving abstractive summarization with energy-based re-ranking
Diogo Pernes
Afonso Mendes
André F. T. Martins
23
6
0
27 Oct 2022
Previous
1
2
3
4
5
6
7
Next