Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.16819
Cited By
With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness
26 May 2023
Julius Steen
Juri Opitz
Anette Frank
K. Markert
HILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness"
13 / 13 papers shown
Title
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum
Nitay Calderon
Orgad Keller
Idan Szpektor
Roi Reichart
30
2
0
24 Oct 2024
Learning to Generate Answers with Citations via Factual Consistency Models
Rami Aly
Zhiqiang Tang
Samson Tan
George Karypis
HILM
42
5
0
19 Jun 2024
Faithful Chart Summarization with ChaTS-Pi
Syrine Krichene
Francesco Piccinno
Fangyu Liu
Julian Martin Eisenschlos
42
1
0
29 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
60
7
0
09 May 2024
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation
Siya Qi
Yulan He
Zheng Yuan
LRM
HILM
51
1
0
18 Apr 2024
Schroedinger's Threshold: When the AUC doesn't predict Accuracy
Juri Opitz
UQCV
41
0
0
04 Apr 2024
On the Role of Summary Content Units in Text Summarization Evaluation
Marcel Nawrath
Agnieszka Nowak
Tristan Ratz
Danilo C. Walenta
Juri Opitz
...
Sebastian Gehrmann
Saad Mahamood
Miruna Clinciu
Khyathi Raghavi Chandu
Yufang Hou
ELM
29
5
0
02 Apr 2024
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks
Huajian Zhang
Yumo Xu
Laura Perez-Beltrachini
HILM
36
10
0
27 Feb 2024
Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features
Hannah Rashkin
David Reitter
Gaurav Singh Tomar
Dipanjan Das
172
101
0
14 Jul 2021
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
206
79
0
30 Apr 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
233
306
0
27 Apr 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
6,996
0
20 Apr 2018
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
211
3,515
0
10 Jun 2015
1