ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.11171
  4. Cited By
TrueTeacher: Learning Factual Consistency Evaluation with Large Language
  Models
v1v2v3 (latest)

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

18 May 2023
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
    HILMELM
ArXiv (abs)PDFHTML

Papers citing "TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models"

50 / 56 papers shown
Title
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal
Reza Shirkavand
Heng-Chiao Huang
Gowthami Somepalli
Tom Goldstein
30
0
0
09 Jun 2025
After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG
After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG
Xinbang Dai
Huikang Hu
Yuncheng Hua
Jiaqi Li
Yongrui Chen
Rihui Jin
Nan Hu
Guilin Qi
RALM3DV
75
0
0
21 May 2025
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
Ranjan Sapkota
Konstantinos I. Roumeliotis
Manoj Karkee
AI4TS
141
6
0
15 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
68
1
0
07 May 2025
ORION Grounded in Context: Retrieval-Based Method for Hallucination Detection
ORION Grounded in Context: Retrieval-Based Method for Hallucination Detection
Assaf Gerner
Netta Madvil
Nadav Barak
Alex Zaikman
Jonatan Liberman
...
Yaron Friedman
Neal Harow
Noam Bresler
Shir Chorev
Philip Tannor
HILM
68
0
0
22 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELMLM&MA
84
1
0
13 Apr 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Yang Liu
Jian Kang
Jieyu Zhao
KELM
111
1
0
30 Mar 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILMALM
193
17
0
06 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
143
6
0
03 Jan 2025
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Roi Cohen
Konstantin Dobler
Eden Biran
Gerard de Melo
189
9
0
09 Dec 2024
Distinguishing Ignorance from Error in LLM Hallucinations
Distinguishing Ignorance from Error in LLM Hallucinations
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
90
4
0
29 Oct 2024
Are LLMs Better than Reported? Detecting Label Errors and Mitigating
  Their Effect on Model Performance
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum
Nitay Calderon
Orgad Keller
Idan Szpektor
Roi Reichart
59
4
0
24 Oct 2024
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler A. Chang
Dheeraj Rajagopal
Tolga Bolukbasi
Lucas Dixon
Ian Tenney
TDI
94
5
0
22 Oct 2024
FaithBench: A Diverse Hallucination Benchmark for Summarization by
  Modern LLMs
FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs
F. S. Bao
Miaoran Li
Renyi Qu
Ge Luo
Erana Wan
...
Ruixuan Tu
Chenyu Xu
Matthew Gonzales
Ofer Mendelevitch
Amin Ahmad
VLMHILM
89
7
0
17 Oct 2024
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluation
Xiaonan Jing
Srinivas Billa
Danny Godbout
HILM
125
0
0
16 Oct 2024
NL-Eye: Abductive NLI for Images
NL-Eye: Abductive NLI for Images
Mor Ventura
Michael Toker
Nitay Calderon
Zorik Gekhman
Yonatan Bitton
Roi Reichart
79
1
0
03 Oct 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Hadas Orgad
Michael Toker
Zorik Gekhman
Roi Reichart
Idan Szpektor
Hadas Kotek
Yonatan Belinkov
HILMAIFin
123
45
0
03 Oct 2024
Analysis of Plan-based Retrieval for Grounded Text Generation
Analysis of Plan-based Retrieval for Grounded Text Generation
Ameya Godbole
Nicholas Monath
Seungyeon Kim
A. S. Rawat
Andrew McCallum
Manzil Zaheer
RALM
99
3
0
20 Aug 2024
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation
  Framework
GraphEval: A Knowledge-Graph Based LLM Hallucination Evaluation Framework
Hannah Sansford
Nicholas Richardson
Hermina Petric Maretic
Juba Nait Saada
87
17
0
15 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated
  Responses
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELMALM
92
11
0
15 Jul 2024
Boosting Zero-Shot Crosslingual Performance using LLM-Based
  Augmentations with Effective Data Selection
Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection
Barah Fazili
Ashish Agrawal
Preethi Jyothi
74
1
0
15 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
85
36
0
01 Jul 2024
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying
  and Reweighting Context-Aware Neurons
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
Dan Shi
Renren Jin
Tianhao Shen
Weilong Dong
Xinwei Wu
Deyi Xiong
103
11
0
26 Jun 2024
Learning to Generate Answers with Citations via Factual Consistency
  Models
Learning to Generate Answers with Citations via Factual Consistency Models
Rami Aly
Zhiqiang Tang
Samson Tan
George Karypis
HILM
77
5
0
19 Jun 2024
Progressive Inference: Explaining Decoder-Only Sequence Classification
  Models Using Intermediate Predictions
Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions
Sanjay Kariyappa
Freddy Lecue
Saumitra Mishra
Christopher Pond
Daniele Magazzeni
Manuela Veloso
73
2
0
03 Jun 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
140
137
0
09 May 2024
Efficient Data Generation for Source-grounded Information-seeking
  Dialogs: A Use Case for Meeting Transcripts
Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts
Lotem Golany
Filippo Galgani
Maya Mamo
Nimrod Parasol
Omer Vandsburger
Nadav Bar
Ido Dagan
79
2
0
02 May 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out
  Document
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang
Seunghyun Yoon
Byeongjeong Kim
Hwanhee Lee
HILM
112
7
0
17 Apr 2024
Constructing Benchmarks and Interventions for Combating Hallucinations
  in LLMs
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs
Adi Simhi
Jonathan Herzig
Idan Szpektor
Yonatan Belinkov
HILM
115
13
0
15 Apr 2024
SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for
  Hallucination Detection
SmurfCat at SemEval-2024 Task 6: Leveraging Synthetic Data for Hallucination Detection
Elisei Rykov
Yana Shishkina
Kseniia Petrushina
Kseniia Titova
Sergey Petrakov
Alexander Panchenko
72
2
0
09 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and
  Frontiers
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
162
38
0
07 Apr 2024
Attribute First, then Generate: Locally-attributable Grounded Text
  Generation
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
118
27
0
25 Mar 2024
Multi-Review Fusion-in-Context
Multi-Review Fusion-in-Context
Aviv Slobodkin
Ori Shapira
Ran Levy
Ido Dagan
349
1
0
22 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with
  Structured Rationale
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
88
13
0
15 Mar 2024
Knowledge Conflicts for LLMs: A Survey
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu
Zehan Qi
Zhijiang Guo
Cunxiang Wang
Hongru Wang
Yue Zhang
Wei Xu
299
122
0
13 Mar 2024
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Hanlei Jin
Yang Zhang
Dan Meng
Jun Wang
Jinghua Tan
249
96
0
05 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language
  Inference and Claim Extraction
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
56
11
0
04 Mar 2024
Large Language Models and Games: A Survey and Roadmap
Large Language Models and Games: A Survey and Roadmap
Roberto Gallotta
Graham Todd
Marvin Zammit
Sam Earle
Antonios Liapis
Julian Togelius
Georgios N. Yannakakis
LLMAGLM&MAAI4CELRM
131
86
0
28 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical
  Criteria Decomposition
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
73
16
0
24 Feb 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model
  Inference via Task Taxonomy
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
93
2
0
20 Feb 2024
A synthetic data approach for domain generalization of NLI models
A synthetic data approach for domain generalization of NLI models
Mohammad Javad Hosseini
Andrey Petrov
Alex Fabrikant
Annie Louis
SyDa
84
10
0
19 Feb 2024
Leveraging Large Language Models for NLG Evaluation: Advances and
  Challenges
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MAELM
132
15
0
13 Jan 2024
DSPy Assertions: Computational Constraints for Self-Refining Language
  Model Pipelines
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Arnav Singhvi
Manish Shetty
Shangyin Tan
Christopher Potts
Koushik Sen
Matei A. Zaharia
Omar Khattab
80
21
0
20 Dec 2023
GeomVerse: A Systematic Evaluation of Large Models for Geometric
  Reasoning
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Mehran Kazemi
Hamidreza Alvari
Ankit Anand
Jialin Wu
Xi Chen
Radu Soricut
LRMReLM
71
70
0
19 Dec 2023
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus
Tianhang Zhang
Lin Qiu
Qipeng Guo
Cheng Deng
Yue Zhang
Zheng Zhang
Cheng Zhou
Xinbing Wang
Luoyi Fu
HILM
138
59
0
22 Nov 2023
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven
  Negative Samples Generation
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
72
6
0
16 Nov 2023
ARES: An Automated Evaluation Framework for Retrieval-Augmented
  Generation Systems
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
108
120
0
16 Nov 2023
SAC3: Reliable Hallucination Detection in Black-Box Language Models via
  Semantic-aware Cross-check Consistency
SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Jiaxin Zhang
Zhuohang Li
Kamalika Das
Bradley Malin
Kumar Sricharan
HILMLRM
69
63
0
03 Nov 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
114
87
0
29 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MAELMAI4MH
139
76
0
21 Sep 2023
12
Next