Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04228
Cited By
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
8 April 2020
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Asking and Answering Questions to Evaluate the Factual Consistency of Summaries"
50 / 327 papers shown
Title
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion Summarization
Weixiao Zhou
Junnan Zhu
Gengyao Li
Xianfu Cheng
Xinnian Liang
Feifei Zhai
Zhiyu Li
ALM
4
0
0
18 May 2025
Towards Better Evaluation for Generated Patent Claims
Lekang Jiang
Pascal A Scherz
Stephan Goetz
ELM
30
0
0
16 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
42
0
0
04 May 2025
Consistency in Language Models: Current Landscape, Challenges, and Future Directions
Jekaterina Novikova
Carol Anderson
Borhane Blili-Hamelin
Subhabrata Majumdar
HILM
73
0
0
01 May 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
31
0
0
29 Apr 2025
Conflicts in Texts: Data, Implications and Challenges
Siyi Liu
Dan Roth
175
0
0
28 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
Shri Kiran Srinivasan
Hong-ye Yu
HILM
86
0
0
25 Apr 2025
AskQE: Question Answering as Automatic Evaluation for Machine Translation
Dayeon Ki
Kevin Duh
Marine Carpuat
26
0
0
15 Apr 2025
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
Xu Zhang
Zhifei Liu
Jiahao Wang
Huixuan Zhang
Fan Xu
Junzhe Zhang
Xiaojun Wan
HILM
39
0
0
14 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
46
0
0
10 Apr 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
54
1
0
14 Mar 2025
Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
Mooho Song
Hyeryung Son
Jay-Yoon Lee
52
0
0
12 Mar 2025
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization
Siya Qi
Rui Cao
Yulan He
Zheng Yuan
HILM
61
0
0
03 Mar 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Qing Guo
VLM
LRM
82
2
0
22 Feb 2025
LAMD: Context-driven Android Malware Detection and Classification with LLMs
Xingzhi Qian
Xinran Zheng
Yiling He
Shuo Yang
Lorenzo Cavallaro
83
2
0
18 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Jing Yang
Max Glockner
Anderson de Rezende Rocha
Iryna Gurevych
LRM
73
1
0
07 Feb 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
52
6
0
28 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
SteLLA: A Structured Grading System Using LLMs with RAG
Hefei Qiu
Brian White
Ashley Ding
Reinaldo Costa
Ali Hachem
Wei Ding
Ping Chen
AI4Ed
61
0
0
17 Jan 2025
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Shunfan Zheng
Xiechi Zhang
Gerard de Melo
Xiaoling Wang
Linlin Wang
LM&MA
ELM
47
0
0
12 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
70
96
0
03 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice H. Oh
Seohyon Jung
94
1
0
03 Jan 2025
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions
Mourad Heddaya
Kyle MacMillan
Anup Malani
Hongyuan Mei
Chenhao Tan
AILaw
ELM
34
0
0
03 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
50
7
0
31 Dec 2024
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
Qiang Ding
Lvzhou Luo
Yixuan Cao
Ping Luo
84
0
0
16 Dec 2024
Learning to Verify Summary Facts with Fine-Grained LLM Feedback
Jihwan Oh
J. Choi
Nicole Hee-Yeon Kim
Taewon Yun
Hwanjun Song
SyDa
ALM
HILM
78
1
0
14 Dec 2024
MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation
Yash Malviya
Karan Dhingra
Maneesh Singh
77
0
0
13 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
86
0
0
10 Dec 2024
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Roi Cohen
Konstantin Dobler
Eden Biran
Gerard de Melo
95
3
0
09 Dec 2024
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
S. Ramprasad
Byron C. Wallace
LLMAG
HILM
92
2
0
25 Nov 2024
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text
Reshmi Ghosh
Tianyi Yao
Lizzy Chen
Sadid Hasan
Tianwei Chen
Dario Bernal
Huitian Jiao
H M Sajjad Hossain
ELM
76
0
0
25 Nov 2024
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Moran Yanuka
Assaf Ben-Kish
Yonatan Bitton
Idan Szpektor
Raja Giryes
VLM
47
2
0
13 Nov 2024
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings
Yashvir S. Grewal
Edwin V. Bonilla
Thang D. Bui
UQCV
43
4
0
30 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
33
6
0
24 Oct 2024
Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization
Sindhu Nair
Y. S. Rao
Radha Shankarmani
26
0
0
22 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
73
1
0
17 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
36
0
0
29 Sep 2024
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
P Aditya Sreekar
Sahil Verma
Suransh Chopra
Sarik Ghazarian
Abhishek Persad
Narayanan Sadagopan
LRM
36
0
0
25 Sep 2024
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
Sathya Krishnan Suresh
Wu Mengjun
Tushar Pranav
Eng Siong Chng
34
2
0
25 Sep 2024
Using Similarity to Evaluate Factual Consistency in Summaries
Yuxuan Ye
Edwin Simpson
Raul Santos Rodriguez
HILM
23
2
0
23 Sep 2024
Broadening Access to Simulations for End-Users via Large Language Models: Challenges and Opportunities
Philippe J. Giabbanelli
Jose J. Padilla
Ameeta Agrawal
30
2
0
03 Sep 2024
A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation
Weijia Zhang
Mohammad Aliannejadi
Jiahuan Pei
Yifei Yuan
Jia-Hong Huang
Evangelos Kanoulas
HILM
48
4
0
22 Aug 2024
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
Tianshi Zheng
Jiaxin Bai
Yicheng Wang
Tianqing Fang
Yue Guo
Yauwai Yim
Yangqiu Song
ELM
LRM
34
3
0
30 Jul 2024
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao
Tanya Goyal
Yu Ying Chiu
Liwei Jiang
Benjamin Newman
...
Khyathi Raghavi Chandu
Ronan Le Bras
Claire Cardie
Yuntian Deng
Yejin Choi
HILM
46
7
0
24 Jul 2024
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework
Hannah Sansford
Nicholas Richardson
Hermina Petric Maretic
Juba Nait Saada
42
13
0
15 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
42
5
0
05 Jul 2024
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Yuyan Chen
Qiang Fu
Yichen Yuan
Zhihao Wen
Ge Fan
Dayiheng Liu
Dongmei Zhang
Zhixu Li
Yanghua Xiao
HILM
52
69
0
04 Jul 2024
Synthetic Multimodal Question Generation
Ian Wu
Sravan Jayanthi
Vijay Viswanathan
Simon Rosenberg
Sina Pakazad
Tongshuang Wu
Graham Neubig
50
2
0
02 Jul 2024
1
2
3
4
5
6
7
Next