Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.08542
Cited By
QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization
16 December 2021
Alexander R. Fabbri
C. Wu
Wenhao Liu
Caiming Xiong
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization"
50 / 167 papers shown
Title
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Danna Zheng
Mirella Lapata
Jeff Z. Pan
HILM
39
0
0
21 May 2025
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion Summarization
Weixiao Zhou
Junnan Zhu
Gengyao Li
Xianfu Cheng
Xinnian Liang
Feifei Zhai
Zhiyu Li
ALM
14
0
0
18 May 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
36
0
0
10 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
36
0
0
07 May 2025
AskQE: Question Answering as Automatic Evaluation for Machine Translation
Dayeon Ki
Kevin Duh
Marine Carpuat
33
1
0
15 Apr 2025
Exploration of Plan-Guided Summarization for Narrative Texts: the Case of Small Language Models
Matt Grenander
Siddharth Varia
Paula Czarnowska
Yogarshi Vyas
Kishaloy Halder
Bonan Min
HILM
39
0
0
12 Apr 2025
News is More than a Collection of Facts: Moral Frame Preserving News Summarization
Enrico Liscio
Michela Lorandi
P. Murukannaiah
41
0
0
01 Apr 2025
Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
Mooho Song
Hyeryung Son
Jay-Yoon Lee
52
0
0
12 Mar 2025
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection
Dimitra Karkani
Maria Lymperaiou
Giorgos Filandrianos
Nikolaos Spanos
Athanasios Voulodimos
Giorgos Stamou
HILM
LRM
86
0
0
04 Mar 2025
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization
Siya Qi
Rui Cao
Yulan He
Zheng Yuan
HILM
63
0
0
03 Mar 2025
Independent Mobility GPT (IDM-GPT): A Self-Supervised Multi-Agent Large Language Model Framework for Customized Traffic Mobility Analysis Using Machine Learning Models
Fengze Yang
Xiaoyue Cathy Liu
Lingjiu Lu
Bingzhang Wang
Chenxi
45
0
0
25 Feb 2025
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
Rohit Saxena
Pasquale Minervini
Frank Keller
VLM
64
0
0
24 Feb 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Xinze Wang
VLM
LRM
95
2
0
22 Feb 2025
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Elliot Schumacher
Dhruv Naik
Anitha Kannan
LM&MA
44
0
0
20 Feb 2025
Factual Inconsistency in Data-to-Text Generation Scales Exponentially with LLM Size: A Statistical Validation
Joy Mahapatra
Soumyajit Roy
Utpal Garain
HILM
ALM
88
0
0
17 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
93
0
0
12 Feb 2025
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data
Deren Lei
Yaxi Li
Siyao Li
Mengya Hu
Rui Xu
Ken Archer
Mingyu Wang
Emily Ching
Alex Deng
SyDa
HILM
LRM
81
1
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
75
4
0
28 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
71
1
0
20 Jan 2025
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions
Mourad Heddaya
Kyle MacMillan
Anup Malani
Hongyuan Mei
Chenhao Tan
AILaw
ELM
39
0
0
03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
73
97
0
03 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
50
8
0
31 Dec 2024
SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits
Onkar Thorat
Philippe Laban
C. Wu
HILM
93
0
0
17 Dec 2024
Learning to Verify Summary Facts with Fine-Grained LLM Feedback
Jihwan Oh
J. Choi
Nicole Hee-Yeon Kim
Taewon Yun
Hwanjun Song
SyDa
ALM
HILM
83
1
0
14 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
96
0
0
10 Dec 2024
An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation
Joy Mahapatra
Utpal Garain
HILM
ALM
69
1
0
28 Nov 2024
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
S. Ramprasad
Byron C. Wallace
LLMAG
HILM
97
2
0
25 Nov 2024
Bayesian Calibration of Win Rate Estimation with LLM Evaluators
Yicheng Gao
G. Xu
Zhe Wang
Arman Cohan
38
6
0
07 Nov 2024
On Positional Bias of Faithfulness for Long-form Summarization
David Wan
Jesse Vig
Joey Tianyi Zhou
Shafiq Joty
HILM
60
5
0
31 Oct 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum
Nitay Calderon
Orgad Keller
Idan Szpektor
Roi Reichart
30
2
0
24 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
75
3
0
17 Oct 2024
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
Xindi Tong
Yujin Zhu
Shijian Fan
Liang Xu
69
1
0
26 Sep 2024
Using Similarity to Evaluate Factual Consistency in Summaries
Yuxuan Ye
Edwin Simpson
Raul Santos Rodriguez
HILM
28
2
0
23 Sep 2024
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs
Sihui Yang
Keping Bi
Wanqing Cui
Jiafeng Guo
Xueqi Cheng
23
2
0
23 Sep 2024
NovAScore: A New Automated Metric for Evaluating Document Level Novelty
Lin Ai
Ziwei Gong
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Ahmad Emami
Julia Hirschberg
23
1
0
14 Sep 2024
Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation
N. E. Kriman
HILM
57
0
0
27 Aug 2024
Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy
Priyanka Mandikal
RALM
VLM
45
0
0
21 Aug 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen
Yang Liu
Jianhao Yan
X. Bai
Ming Zhong
Yinghao Yang
Ziyi Yang
Chenguang Zhu
Yue Zhang
ALM
ELM
39
8
0
16 Aug 2024
Learning Fine-Grained Grounded Citations for Attributed Large Language Models
Lei Huang
Xiaocheng Feng
Weitao Ma
Yuxuan Gu
Weihong Zhong
...
Weijiang Yu
Weihua Peng
Duyu Tang
Dandan Tu
Bing Qin
HILM
27
4
0
08 Aug 2024
Zero-shot Factual Consistency Evaluation Across Domains
Raunak Agarwal
HILM
52
0
0
07 Aug 2024
DebateQA: Evaluating Question Answering on Debatable Knowledge
Rongwu Xu
Xuan Qi
Zehan Qi
Wei Xu
Zhijiang Guo
ELM
56
5
0
02 Aug 2024
Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models
Zhengxuan Wu
Yuhao Zhang
Linquan Wei
Yumo Xu
Rujun Han
Yi Liu
Jifan Chen
Bonan Min
Zhiheng Huang
38
0
0
31 Jul 2024
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao
Tanya Goyal
Yu Ying Chiu
Liwei Jiang
Benjamin Newman
...
Khyathi Raghavi Chandu
Ronan Le Bras
Claire Cardie
Yuntian Deng
Yejin Choi
HILM
46
7
0
24 Jul 2024
Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts
Sam Yu-Te Lee
Aryaman Bahukhandi
Dongyu Liu
Kwan-Liu Ma
AAML
42
5
0
16 Jul 2024
Evaluating Large Language Models with fmeval
Pola Schwöbel
Luca Franceschi
Muhammad Bilal Zafar
Keerthan Vasist
Aman Malhotra
Tomer Shenhar
Pinal Tailor
Pinar Yilmaz
Michael Diamond
Michele Donini
LM&MA
ELM
27
2
0
15 Jul 2024
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Yung-Sung Chuang
Linlu Qiu
Cheng-Yu Hsieh
Ranjay Krishna
Yoon Kim
James R. Glass
HILM
18
36
0
09 Jul 2024
STORYSUMM: Evaluating Faithfulness in Story Summarization
Melanie Subbiah
Faisal Ladhak
Akankshya Mishra
Griffin Adams
Lydia B. Chilton
Kathleen McKeown
52
4
0
09 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
59
18
0
08 Jul 2024
Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses
Dongxu Zhang
Varun Gangal
B. Lattimer
Yi Yang
40
6
0
07 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
41
32
0
01 Jul 2024
1
2
3
4
Next