Evaluating the Factual Consistency of Abstractive Text Summarization

28 October 2019

Papers citing "Evaluating the Factual Consistency of Abstractive Text Summarization"

50 / 463 papers shown

Title
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation Galann Pennec Zhengyuan Liu Nicholas Asher Philippe Muller Nancy F. Chen VGen 31 0 0 10 May 2025
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation Tanguy Herserant Vincent Guigue ELM 40 0 0 04 May 2025
Combining LLMs with Logic-Based Framework to Explain MCTS Ziyan An Xia Wang Hendrik Baier Zirong Chen A. Dubey Taylor T. Johnson Jonathan Sprinkle Ayan Mukhopadhyay Meiyi Ma 34 1 0 01 May 2025
Towards Long Context Hallucination Detection Siyi Liu Kishaloy Halder Zheng Qi Wei Xiao Nikolaos Pappas Phu Mon Htut Neha Anna John Yassine Benajiba Dan Roth HILM 75 0 0 28 Apr 2025
Conflicts in Texts: Data, Implications and Challenges Siyi Liu Dan Roth 166 0 0 28 Apr 2025
ScholarMate: A Mixed-Initiative Tool for Qualitative Knowledge Work and Information Sensemaking Runlong Ye Patrick Yung Kang Lee Matthew Varona Oliver Huang Carolina Nobre 41 0 0 19 Apr 2025
Large Language Models as Span Annotators Zdeněk Kasner Vilém Zouhar Patrícia Schmidtová Ivan Kartáč Kristýna Onderková Ondřej Plátek Dimitra Gkatzia Saad Mahamood Ondrej Dusek Simone Balloccu ALM 37 0 0 11 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization Fabian Retkowski Maike Züfle Andreas Sudmann Dinah Pfau Jan Niehues Alexander Waibel 46 0 0 10 Apr 2025
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models Runlong Zhou Yi Zhang RALM 61 0 0 02 Apr 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans? Jeremy Barnes Naiara Perez Alba Bonet-Jover Begoña Altuna 62 1 0 21 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings Austin Xu Srijan Bansal Yifei Ming Semih Yavuz Chenyu You ELM 95 3 0 19 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs Ivan Kartáč Mateusz Lango Ondrej Dusek ELM 51 1 0 14 Mar 2025
Uncertainty-Aware Decoding with Minimum Bayes Risk Nico Daheim Clara Meister Thomas Möllenhoff Iryna Gurevych 53 0 0 07 Mar 2025
Evaluating LLMs' Assessment of Mixed-Context Hallucination Through the Lens of Summarization Siya Qi Rui Cao Yulan He Zheng Yuan HILM 61 0 0 03 Mar 2025
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning Maria Lymperaiou Giorgos Filandrianos Angeliki Dimitriou Athanasios Voulodimos Giorgos Stamou MLLM 40 0 0 01 Mar 2025
Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems Alexander W. Lee Justin Chan Michael Fu Nicolas Kim Akshay Mehta Deepti Raghavan Ugur Cetintemel 31 0 0 01 Mar 2025
Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization Ryan Barron Maksim E. Eren Olga M. Serafimova Cynthia Matuszek Boian S. Alexandrov AILaw 78 0 0 27 Feb 2025
SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation Song Duong Florian Le Bronnec Alexandre Allauzen Vincent Guigue Alberto Lumbreras Laure Soulier Patrick Gallinari HILM 50 0 0 20 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations Borui Yang Md Afif Al Mamun Jie M. Zhang Gias Uddin HILM 64 0 0 20 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks Jing Yang Max Glockner Anderson de Rezende Rocha Iryna Gurevych LRM 73 1 0 07 Feb 2025
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data Deren Lei Yaxi Li Siyao Li Mengya Hu Rui Xu Ken Archer Mingyu Wang Emily Ching Alex Deng SyDa HILM LRM 73 1 0 28 Jan 2025
Fact-Preserved Personalized News Headline Generation Zhao Yang Junhong Lian Xiang Ao 104 1 0 21 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs Jiaxing Wu Lin Ning Luyang Liu Harrison Lee Neo Wu Chao Wang Sushant Prakash S. O’Banion Bradley Green Jun Xie 71 1 0 20 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models Ben Malin Tatiana Kalganova Nikoloas Boulgouris HILM 59 2 0 03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations Ruosen Li Teerth Patel Xinya Du LLMAG ALM 67 96 0 03 Jan 2025
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization Liqiang Jing Jingxuan Zuo Yue Zhang 47 7 0 31 Dec 2024
A Survey of Calibration Process for Black-Box LLMs Liangru Xie Hui Liu Jingying Zeng Xianfeng Tang Yan Han Chen Luo Jing Huang Zhen Li Suhang Wang Qi He 74 1 0 17 Dec 2024
Learning to Verify Summary Facts with Fine-Grained LLM Feedback Jihwan Oh J. Choi Nicole Hee-Yeon Kim Taewon Yun Hwanjun Song SyDa ALM HILM 76 1 0 14 Dec 2024
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation S. Ramprasad Byron C. Wallace LLMAG HILM 87 2 0 25 Nov 2024
Domain-specific Guided Summarization for Mental Health Posts Lu Qian Yuqi Wang Zehua Wang H. Zhang Wei Wang Ting Yu Anh Nguyen AI4MH 41 2 0 03 Nov 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance Omer Nahum Nitay Calderon Orgad Keller Idan Szpektor Roi Reichart 27 2 0 24 Oct 2024
VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference Arjun Shah Hetansh Shah Vedica Bafna Charmi Khandor Sindhu Nair 19 0 0 12 Oct 2024
Measuring the Groundedness of Legal Question-Answering Systems Dietrich Trautmann Natalia Ostapuk Quentin Grail Adrian Alan Pol Guglielmo Bonifazi Shang Gao Martin Gajek HILM AILaw ELM 23 0 0 11 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Hadas Orgad Michael Toker Zorik Gekhman Roi Reichart Idan Szpektor Hadas Kotek Yonatan Belinkov HILM AIFin 61 29 0 03 Oct 2024
A Critical Look at Meta-evaluating Summarisation Evaluation Metrics Xiang Dai Sarvnaz Karimi Biaoyan Fang 36 0 0 29 Sep 2024
Model-based Preference Optimization in Abstractive Summarization without Human Feedback Jaepill Choi Kyubyung Chae Jiwoo Song Yohan Jo Taesup Kim 24 0 0 27 Sep 2024
Leveraging Long-Context Large Language Models for Multi-Document Understanding and Summarization in Enterprise Applications Aditi Godbole Jabin Geevarghese George Smita Shandilya 24 3 0 27 Sep 2024
Using Similarity to Evaluate Factual Consistency in Summaries Yuxuan Ye Edwin Simpson Raul Santos Rodriguez HILM 23 2 0 23 Sep 2024
Can pre-trained language models generate titles for research papers? Tohida Rehman Debarshi Kumar Sanyal S. Chattopadhyay 27 3 0 22 Sep 2024
A Dataset for Evaluating LLM-based Evaluation Functions for Research Question Extraction Task Yuya Fujisaki Shiro Takagi Hideki Asoh Wataru Kumagai 28 0 0 10 Sep 2024
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models Gabriel Y. Arteaga Thomas B. Schon Nicolas Pielawski 38 7 0 04 Sep 2024
Broadening Access to Simulations for End-Users via Large Language Models: Challenges and Opportunities Philippe J. Giabbanelli Jose J. Padilla Ameeta Agrawal 30 2 0 03 Sep 2024
Measuring text summarization factuality using atomic facts entailment metrics in the context of retrieval augmented generation N. E. Kriman HILM 54 0 0 27 Aug 2024
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection Mengya Hu Rui Xu Deren Lei Yaxi Li Mingyu Wang Emily Ching Eslam Kamal Alex Deng 40 3 0 22 Aug 2024
A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation Weijia Zhang Mohammad Aliannejadi Jiahuan Pei Yifei Yuan Jia-Hong Huang Evangelos Kanoulas HILM 45 4 0 22 Aug 2024
Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process Peng Wang Xiaobin Wang Chao Lou Shengyu Mao Pengjun Xie Yong-jia Jiang 52 0 0 04 Aug 2024
Lynx: An Open Source Hallucination Evaluation Model Selvan Sunitha Ravi B. Mielczarek Anand Kannappan Douwe Kiela Rebecca Qian VLM RALM HILM 56 17 0 11 Jul 2024
STORYSUMM: Evaluating Faithfulness in Story Summarization Melanie Subbiah Faisal Ladhak Akankshya Mishra Griffin Adams Lydia B. Chilton Kathleen McKeown 50 4 0 09 Jul 2024
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs Mihir Parmar Hanieh Deilamsalehy Franck Dernoncourt Seunghyun Yoon Ryan A. Rossi Trung Bui 34 2 0 05 Jul 2024
Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese Yunqi Xu Tianchi Cai Jiyan Jiang Xierui Song 41 2 0 01 Jul 2024