With a Little Push, NLI Models can Robustly and Efficiently Predict
Faithfulness

With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness

26 May 2023

Papers citing "With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness"

13 / 13 papers shown

Title
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance Omer Nahum Nitay Calderon Orgad Keller Idan Szpektor Roi Reichart 30 2 0 24 Oct 2024
Learning to Generate Answers with Citations via Factual Consistency Models Rami Aly Zhiqiang Tang Samson Tan George Karypis HILM 42 5 0 19 Jun 2024
Faithful Chart Summarization with ChaTS-Pi Syrine Krichene Francesco Piccinno Fangyu Liu Julian Martin Eisenschlos 42 1 0 29 May 2024
Natural Language Processing RELIES on Linguistics Juri Opitz Shira Wein Nathan Schneider AI4CE 60 7 0 09 May 2024
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation Siya Qi Yulan He Zheng Yuan LRM HILM 51 1 0 18 Apr 2024
Schroedinger's Threshold: When the AUC doesn't predict Accuracy Juri Opitz UQCV 41 0 0 04 Apr 2024
On the Role of Summary Content Units in Text Summarization Evaluation Marcel Nawrath Agnieszka Nowak Tristan Ratz Danilo C. Walenta Juri Opitz ... Sebastian Gehrmann Saad Mahamood Miruna Clinciu Khyathi Raghavi Chandu Yufang Hou ELM 29 5 0 02 Apr 2024
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks Huajian Zhang Yumo Xu Laura Perez-Beltrachini HILM 36 10 0 27 Feb 2024
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features Hannah Rashkin David Reitter Gaurav Singh Tomar Dipanjan Das 172 101 0 14 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark Nouha Dziri Hannah Rashkin Tal Linzen David Reitter ALM 206 79 0 30 Apr 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics Artidoro Pagnoni Vidhisha Balachandran Yulia Tsvetkov HILM 233 306 0 27 Apr 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 304 6,996 0 20 Apr 2018
Teaching Machines to Read and Comprehend Karl Moritz Hermann Tomás Kociský Edward Grefenstette L. Espeholt W. Kay Mustafa Suleyman Phil Blunsom 211 3,515 0 10 Jun 2015