ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.04228
  4. Cited By
Asking and Answering Questions to Evaluate the Factual Consistency of
  Summaries

Asking and Answering Questions to Evaluate the Factual Consistency of Summaries

8 April 2020
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
    HILM
ArXivPDFHTML

Papers citing "Asking and Answering Questions to Evaluate the Factual Consistency of Summaries"

50 / 327 papers shown
Title
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion Summarization
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion Summarization
Weixiao Zhou
Junnan Zhu
Gengyao Li
Xianfu Cheng
Xinnian Liang
Feifei Zhai
Zhiyu Li
ALM
7
0
0
18 May 2025
Towards Better Evaluation for Generated Patent Claims
Towards Better Evaluation for Generated Patent Claims
Lekang Jiang
Pascal A Scherz
Stephan Goetz
ELM
30
0
0
16 May 2025
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
45
0
0
04 May 2025
Consistency in Language Models: Current Landscape, Challenges, and Future Directions
Consistency in Language Models: Current Landscape, Challenges, and Future Directions
Jekaterina Novikova
Carol Anderson
Borhane Blili-Hamelin
Subhabrata Majumdar
HILM
73
0
0
01 May 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
31
0
0
29 Apr 2025
Conflicts in Texts: Data, Implications and Challenges
Conflicts in Texts: Data, Implications and Challenges
Siyi Liu
Dan Roth
184
0
0
28 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
Shri Kiran Srinivasan
Hong-ye Yu
HILM
86
0
0
25 Apr 2025
AskQE: Question Answering as Automatic Evaluation for Machine Translation
AskQE: Question Answering as Automatic Evaluation for Machine Translation
Dayeon Ki
Kevin Duh
Marine Carpuat
26
0
0
15 Apr 2025
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation
Xu Zhang
Zhifei Liu
Jiahao Wang
Huixuan Zhang
Fan Xu
Junzhe Zhang
Xiaojun Wan
HILM
39
0
0
14 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
46
0
0
10 Apr 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
54
1
0
14 Mar 2025
Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
Introducing Verification Task of Set Consistency with Set-Consistency Energy Networks
Mooho Song
Hyeryung Son
Jay-Yoon Lee
52
0
0
12 Mar 2025
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization
Siya Qi
Rui Cao
Yulan He
Zheng Yuan
HILM
61
0
0
03 Mar 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Qing Guo
VLM
LRM
82
2
0
22 Feb 2025
LAMD: Context-driven Android Malware Detection and Classification with LLMs
LAMD: Context-driven Android Malware Detection and Classification with LLMs
Xingzhi Qian
Xinran Zheng
Yiling He
Shuo Yang
Lorenzo Cavallaro
83
2
0
18 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Jing Yang
Max Glockner
Anderson de Rezende Rocha
Iryna Gurevych
LRM
73
1
0
07 Feb 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
52
6
0
28 Jan 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
SteLLA: A Structured Grading System Using LLMs with RAG
SteLLA: A Structured Grading System Using LLMs with RAG
Hefei Qiu
Brian White
Ashley Ding
Reinaldo Costa
Ali Hachem
Wei Ding
Ping Chen
AI4Ed
61
0
0
17 Jan 2025
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation
Shunfan Zheng
Xiechi Zhang
Gerard de Melo
Xiaoling Wang
Linlin Wang
LM&MA
ELM
47
0
0
12 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
70
96
0
03 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice Oh
Seohyon Jung
94
1
0
03 Jan 2025
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions
Mourad Heddaya
Kyle MacMillan
Anup Malani
Hongyuan Mei
Chenhao Tan
AILaw
ELM
34
0
0
03 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
50
7
0
31 Dec 2024
Attention with Dependency Parsing Augmentation for Fine-Grained
  Attribution
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
Qiang Ding
Lvzhou Luo
Yixuan Cao
Ping Luo
84
0
0
16 Dec 2024
Learning to Verify Summary Facts with Fine-Grained LLM Feedback
Learning to Verify Summary Facts with Fine-Grained LLM Feedback
Jihwan Oh
J. Choi
Nicole Hee-Yeon Kim
Taewon Yun
Hwanjun Song
SyDa
ALM
HILM
78
1
0
14 Dec 2024
MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation
MST-R: Multi-Stage Tuning for Retrieval Systems and Metric Evaluation
Yash Malviya
Karan Dhingra
Maneesh Singh
77
0
0
13 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text
  Summarization
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
86
0
0
10 Dec 2024
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Roi Cohen
Konstantin Dobler
Eden Biran
Gerard de Melo
98
3
0
09 Dec 2024
Do Automatic Factuality Metrics Measure Factuality? A Critical
  Evaluation
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
S. Ramprasad
Byron C. Wallace
LLMAG
HILM
92
2
0
25 Nov 2024
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for
  reference-free open-ended text
SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text
Reshmi Ghosh
Tianyi Yao
Lizzy Chen
Sadid Hasan
Tianwei Chen
Dario Bernal
Huitian Jiao
H M Sajjad Hossain
ELM
76
0
0
25 Nov 2024
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Moran Yanuka
Assaf Ben-Kish
Yonatan Bitton
Idan Szpektor
Raja Giryes
VLM
47
2
0
13 Nov 2024
Improving Uncertainty Quantification in Large Language Models via
  Semantic Embeddings
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings
Yashvir S. Grewal
Edwin V. Bonilla
Thang D. Bui
UQCV
43
4
0
30 Oct 2024
Improving Model Factuality with Fine-grained Critique-based Evaluator
Improving Model Factuality with Fine-grained Critique-based Evaluator
Yiqing Xie
Wenxuan Zhou
Pradyot Prakash
Di Jin
Yuning Mao
...
Sinong Wang
Han Fang
Carolyn Rose
Daniel Fried
Hejia Zhang
HILM
33
6
0
24 Oct 2024
Assessment of Transformer-Based Encoder-Decoder Model for Human-Like
  Summarization
Assessment of Transformer-Based Encoder-Decoder Model for Human-Like Summarization
Sindhu Nair
Y. S. Rao
Radha Shankarmani
26
0
0
22 Oct 2024
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
73
1
0
17 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics
Xiang Dai
Sarvnaz Karimi
Biaoyan Fang
36
0
0
29 Sep 2024
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
P Aditya Sreekar
Sahil Verma
Suransh Chopra
Sarik Ghazarian
Abhishek Persad
Narayanan Sadagopan
LRM
36
0
0
25 Sep 2024
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
Sathya Krishnan Suresh
Wu Mengjun
Tushar Pranav
Eng Siong Chng
34
2
0
25 Sep 2024
Using Similarity to Evaluate Factual Consistency in Summaries
Using Similarity to Evaluate Factual Consistency in Summaries
Yuxuan Ye
Edwin Simpson
Raul Santos Rodriguez
HILM
23
2
0
23 Sep 2024
Broadening Access to Simulations for End-Users via Large Language
  Models: Challenges and Opportunities
Broadening Access to Simulations for End-Users via Large Language Models: Challenges and Opportunities
Philippe J. Giabbanelli
Jose J. Padilla
Ameeta Agrawal
30
2
0
03 Sep 2024
A Comparative Analysis of Faithfulness Metrics and Humans in Citation
  Evaluation
A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation
Weijia Zhang
Mohammad Aliannejadi
Jiahuan Pei
Yifei Yuan
Jia-Hong Huang
Evangelos Kanoulas
HILM
48
4
0
22 Aug 2024
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large
  Language Models over Factual Knowledge
CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge
Tianshi Zheng
Jiaxin Bai
Yicheng Wang
Tianqing Fang
Yue Guo
Yauwai Yim
Yangqiu Song
ELM
LRM
34
3
0
30 Jul 2024
WildHallucinations: Evaluating Long-form Factuality in LLMs with
  Real-World Entity Queries
WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Wenting Zhao
Tanya Goyal
Yu Ying Chiu
Liwei Jiang
Benjamin Newman
...
Khyathi Raghavi Chandu
Ronan Le Bras
Claire Cardie
Yuntian Deng
Yejin Choi
HILM
46
7
0
24 Jul 2024
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation
  Framework
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework
Hannah Sansford
Nicholas Richardson
Hermina Petric Maretic
Juba Nait Saada
42
13
0
15 Jul 2024
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language
  Models
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models
Yuzhe Gu
Ziwei Ji
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
HILM
42
5
0
05 Jul 2024
Hallucination Detection: Robustly Discerning Reliable Answers in Large
  Language Models
Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models
Yuyan Chen
Qiang Fu
Yichen Yuan
Zhihao Wen
Ge Fan
Dayiheng Liu
Dongmei Zhang
Zhixu Li
Yanghua Xiao
HILM
52
69
0
04 Jul 2024
Synthetic Multimodal Question Generation
Synthetic Multimodal Question Generation
Ian Wu
Sravan Jayanthi
Vijay Viswanathan
Simon Rosenberg
Sina Pakazad
Tongshuang Wu
Graham Neubig
50
2
0
02 Jul 2024
1234567
Next