Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.11171
Cited By
v1
v2
v3 (latest)
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
18 May 2023
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models"
50 / 56 papers shown
Title
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal
Reza Shirkavand
Heng-Chiao Huang
Gowthami Somepalli
Tom Goldstein
30
0
0
09 Jun 2025
After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG
Xinbang Dai
Huikang Hu
Yuncheng Hua
Jiaqi Li
Yongrui Chen
Rihui Jin
Nan Hu
Guilin Qi
RALM
3DV
75
0
0
21 May 2025
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
Ranjan Sapkota
Konstantinos I. Roumeliotis
Manoj Karkee
AI4TS
141
6
0
15 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
68
1
0
07 May 2025
ORION Grounded in Context: Retrieval-Based Method for Hallucination Detection
Assaf Gerner
Netta Madvil
Nadav Barak
Alex Zaikman
Jonatan Liberman
...
Yaron Friedman
Neal Harow
Noam Bresler
Shir Chorev
Philip Tannor
HILM
68
0
0
22 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELM
LM&MA
84
1
0
13 Apr 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Yang Liu
Jian Kang
Jieyu Zhao
KELM
111
1
0
30 Mar 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
193
17
0
06 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
143
6
0
03 Jan 2025
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Roi Cohen
Konstantin Dobler
Eden Biran
Gerard de Melo
189
9
0
09 Dec 2024
Distinguishing Ignorance from Error in LLM Hallucinations
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
90
4
0
29 Oct 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum
Nitay Calderon
Orgad Keller
Idan Szpektor
Roi Reichart
59
4
0
24 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
94
5
0
22 Oct 2024
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
F. S. Bao
Miaoran Li
Renyi Qu
Ge Luo
Erana Wan
...
Ruixuan Tu
Chenyu Xu
Matthew Gonzales
Ofer Mendelevitch
Amin Ahmad
VLM
HILM
89
7
0
17 Oct 2024
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
Xiaonan Jing
Srinivas Billa
Danny Godbout
HILM
125
0
0
16 Oct 2024
NL-Eye: Abductive NLI for Images
Mor Ventura
Michael Toker
Nitay Calderon
Zorik Gekhman
Yonatan Bitton
Roi Reichart
79
1
0
03 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILM
AIFin
123
45
0
03 Oct 2024
Analysis of Plan-based Retrieval for Grounded Text Generation
Ameya Godbole
Nicholas Monath
Seungyeon Kim
A. S. Rawat
Andrew McCallum
Manzil Zaheer
RALM
99
3
0
20 Aug 2024
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework
Hannah Sansford
Nicholas Richardson
Hermina Petric Maretic
Juba Nait Saada
87
17
0
15 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
92
11
0
15 Jul 2024
Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection
Barah Fazili
Ashish Agrawal
Preethi Jyothi
74
1
0
15 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
85
36
0
01 Jul 2024
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
Dan Shi
Renren Jin
Tianhao Shen
Weilong Dong
Xinwei Wu
Deyi Xiong
103
11
0
26 Jun 2024
Learning to Generate Answers with Citations via Factual Consistency Models
Rami Aly
Zhiqiang Tang
Samson Tan
George Karypis
HILM
77
5
0
19 Jun 2024
Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
Sanjay Kariyappa
Freddy Lecue
Saumitra Mishra
Christopher Pond
Daniele Magazzeni
Manuela Veloso
73
2
0
03 Jun 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
140
137
0
09 May 2024
Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts
Lotem Golany
Filippo Galgani
Maya Mamo
Nimrod Parasol
Omer Vandsburger
Nadav Bar
Ido Dagan
79
2
0
02 May 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang
Seunghyun Yoon
Byeongjeong Kim
Hwanhee Lee
HILM
112
7
0
17 Apr 2024
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
115
13
0
15 Apr 2024
SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection
Elisei Rykov
Yana Shishkina
Kseniia Petrushina
Kseniia Titova
Sergey Petrakov
Alexander Panchenko
72
2
0
09 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
162
38
0
07 Apr 2024
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
118
27
0
25 Mar 2024
Multi-Review Fusion-in-Context
Aviv Slobodkin
Ori Shapira
Ran Levy
Ido Dagan
349
1
0
22 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
88
13
0
15 Mar 2024
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu
Zehan Qi
Zhijiang Guo
Cunxiang Wang
Hongru Wang
Yue Zhang
Wei Xu
299
122
0
13 Mar 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
249
96
0
05 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
56
11
0
04 Mar 2024
Large Language Models and Games: A Survey and Roadmap
Roberto Gallotta
Graham Todd
Marvin Zammit
Sam Earle
Antonios Liapis
Julian Togelius
Georgios N. Yannakakis
LLMAG
LM&MA
AI4CE
LRM
131
86
0
28 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
73
16
0
24 Feb 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
93
2
0
20 Feb 2024
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
84
10
0
19 Feb 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MA
ELM
132
15
0
13 Jan 2024
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Arnav Singhvi
Manish Shetty
Shangyin Tan
Christopher Potts
Koushik Sen
Matei A. Zaharia
Omar Khattab
80
21
0
20 Dec 2023
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Mehran Kazemi
Hamidreza Alvari
Ankit Anand
Jialin Wu
Xi Chen
Radu Soricut
LRM
ReLM
71
70
0
19 Dec 2023
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Tianhang Zhang
Lin Qiu
Qipeng Guo
Cheng Deng
Yue Zhang
Zheng Zhang
Cheng Zhou
Xinbing Wang
Luoyi Fu
HILM
138
59
0
22 Nov 2023
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
72
6
0
16 Nov 2023
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
108
120
0
16 Nov 2023
SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Jiaxin Zhang
Zhuohang Li
Kamalika Das
Bradley Malin
Kumar Sricharan
HILM
LRM
69
63
0
03 Nov 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
114
87
0
29 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
139
76
0
21 Sep 2023
1
2
Next