Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04228
Cited By
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
8 April 2020
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Asking and Answering Questions to Evaluate the Factual Consistency of Summaries"
50 / 327 papers shown
Title
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
41
31
0
01 Jul 2024
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALM
ELM
54
62
0
26 Jun 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
34
7
0
26 Jun 2024
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
43
41
0
24 Jun 2024
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs
Jannik Kossen
Jiatong Han
Muhammed Razzak
Lisa Schut
Shreshth A. Malik
Yarin Gal
HILM
60
35
0
22 Jun 2024
Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics
Weijia Zhang
Mohammad Aliannejadi
Yifei Yuan
Jiahuan Pei
Jia-Hong Huang
Evangelos Kanoulas
HILM
31
12
0
21 Jun 2024
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models
Haopeng Zhang
Philip S. Yu
Jiawei Zhang
37
17
0
17 Jun 2024
PatentEval: Understanding Errors in Patent Generation
You Zuo
Kim Gerdes
Eric Villemonte de la Clergerie
Benoît Sagot
31
1
0
05 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
50
58
0
03 Jun 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
Ziwei Ji
Yuzhe Gu
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai-xiang Chen
HILM
56
2
0
30 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark Gales
43
9
0
09 May 2024
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Yoonjoo Lee
Kihoon Son
Tae Soo Kim
Jisu Kim
John Joon Young Chung
Eytan Adar
Juho Kim
39
11
0
09 May 2024
Utilizing GPT to Enhance Text Summarization: A Strategy to Minimize Hallucinations
Hassan Shakil
Zeydy Ortiz
Grant C. Forbes
18
3
0
07 May 2024
RepEval: Effective Text Evaluation with LLM Representation
Shuqian Sheng
Yi Xu
Tianhang Zhang
Zanwei Shen
Luoyi Fu
Jiaxin Ding
Lei Zhou
Xinbing Wang
Cheng Zhou
27
1
0
30 Apr 2024
QANA: LLM-based Question Generation and Network Analysis for Zero-shot Key Point Analysis and Beyond
Tomoki Fukuma
Koki Noda
Toshihide Ubukata Kousuke Hoso
Yoshiharu Ichikawa
Kyosuke Kambe
Yu Masubuchi
F. Toriumi
29
0
0
29 Apr 2024
ISQA: Informative Factuality Feedback for Scientific Summarization
Zekai Li
Yanxia Qin
Qian Liu
Min-Yen Kan
HILM
37
1
0
20 Apr 2024
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation
Siya Qi
Yulan He
Zheng Yuan
LRM
HILM
46
1
0
18 Apr 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang
Philippe Laban
Greg Durrett
HILM
SyDa
43
76
0
16 Apr 2024
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models
Andreas Säuberli
Simon Clematide
ELM
35
7
0
11 Apr 2024
Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study
Alessandro Stolfo
RALM
HILM
34
6
0
10 Apr 2024
Know When To Stop: A Study of Semantic Drift in Text Generation
Ava Spataru
Eric Hambro
Elena Voita
Nicola Cancedda
37
3
0
08 Apr 2024
Schroedinger's Threshold: When the AUC doesn't predict Accuracy
Juri Opitz
UQCV
41
0
0
04 Apr 2024
Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation
Liam Cripwell
Joël Legrand
Claire Gardent
31
3
0
04 Apr 2024
Hallucination Diversity-Aware Active Learning for Text Summarization
Yu Xia
Xu Liu
Tong Yu
Sungchul Kim
Ryan A. Rossi
Anup B. Rao
Tung Mai
Shuai Li
HILM
40
3
0
02 Apr 2024
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
Yukyung Lee
Joonghoon Kim
Jaehee Kim
Hyowon Cho
Pilsung Kang
Pilsung Kang
Najoung Kim
ELM
47
4
0
27 Mar 2024
Is Reference Necessary in the Evaluation of NLG Systems? When and Where?
Shuqian Sheng
Yi Xu
Luoyi Fu
Jiaxin Ding
Lei Zhou
Xinbing Wang
Cheng Zhou
43
3
0
21 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Chenyu You
Shih-Fu Chang
Chenhui Xu
AI4TS
72
14
0
18 Mar 2024
A Closer Look at Claim Decomposition
Miriam Wanner
Seth Ebner
Zhengping Jiang
Mark Dredze
Benjamin Van Durme
49
18
0
18 Mar 2024
SIFiD: Reassess Summary Factual Inconsistency Detection with LLM
Jiuding Yang
Hui Liu
Weidong Guo
Zhuwei Rao
Yu-Syuan Xu
Di Niu
HILM
21
0
0
12 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
29
7
0
04 Mar 2024
Ever-Evolving Memory by Blending and Refining the Past
Seo Hyun Kim
Keummin Ka
Yohan Jo
Seung-won Hwang
Dongha Lee
Jinyoung Yeo
KELM
39
1
0
03 Mar 2024
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks
Huajian Zhang
Yumo Xu
Laura Perez-Beltrachini
HILM
34
9
0
27 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
34
13
0
24 Feb 2024
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models
Zhaoheng Huang
Zhicheng Dou
Yutao Zhu
Ji-Rong Wen
HILM
38
1
0
22 Feb 2024
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Vyas Raina
Adian Liusie
Mark Gales
AAML
ELM
32
53
0
21 Feb 2024
Factual consistency evaluation of summarization in the Era of large language models
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
HILM
35
1
0
21 Feb 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
43
2
0
20 Feb 2024
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
38
8
0
19 Feb 2024
Improving Factual Error Correction for Abstractive Summarization via Data Distillation and Conditional-generation Cloze
Yiyang Li
Lei Li
Dingxing Hu
Xueyi Hao
Marina Litvak
N. Vanetik
Yanquan Zhou
HILM
KELM
24
0
0
13 Feb 2024
Source Identification in Abstractive Summarization
Yoshi Suhara
Dimitris Alikaniotis
38
1
0
07 Feb 2024
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains
S. Ramprasad
Kundan Krishna
Zachary Chase Lipton
Byron C. Wallace
HILM
52
6
0
05 Feb 2024
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
36
38
0
05 Feb 2024
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
Yuanjie Lyu
Zhiyu Li
Pengnian Qi
Zhiyu Li
Simin Niu
Wenjin Wang
Hao Wu
Huan Liu
Tong Xu
Enhong Chen
RALM
44
32
0
30 Jan 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes
Sebastian Antony Joseph
Jorg Schlotterer
Christin Seifert
Kyle Lo
Wei Xu
Byron C. Wallace
Junyi Jessy Li
50
6
0
29 Jan 2024
Deductive Closure Training of Language Models for Coherence, Accuracy, and Updatability
Afra Feyza Akyürek
Ekin Akyürek
Leshem Choshen
Derry Wijaya
Jacob Andreas
HILM
SyDa
54
16
0
16 Jan 2024
Hallucination Detection and Hallucination Mitigation: An Investigation
Junliang Luo
Tianyu Li
Di Wu
Michael R. M. Jenkin
Steve Liu
Gregory Dudek
HILM
LLMAG
46
22
0
16 Jan 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
39
9
0
13 Jan 2024
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
40
0
0
09 Jan 2024
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Wendi Cui
Jiaxin Zhang
Zhuohang Li
Lopez Damien
Kamalika Das
Bradley Malin
Kumar Sricharan
30
2
0
04 Jan 2024
BatchEval: Towards Human-like Text Evaluation
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Xinglin Wang
Boyuan Pan
Heda Wang
Kan Li
ALM
23
11
0
31 Dec 2023
Previous
1
2
3
4
5
6
7
Next