Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 455 papers shown
Title
Atomic Consistency Preference Optimization for Long-Form Question Answering
Jingfeng Chen
Raghuveer Thirukovalluru
Junlin Wang
Kaiwei Luo
Bhuwan Dhingra
KELM
HILM
20
0
0
14 May 2025
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
Xin Liu
Lechen Zhang
Sheza Munir
Yiyang Gu
Lu Wang
HILM
36
0
0
14 May 2025
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov
Ekaterina Fadeeva
Akim Tsvigun
Ivan Tsvigun
Zhuohan Xie
...
Caiqi Zhang
Artem Vazhentsev
Mrinmaya Sachan
Preslav Nakov
Timothy Baldwin
HILM
50
0
0
13 May 2025
Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis
Heydar Soudani
Evangelos Kanoulas
Faegheh Hasibi
36
0
0
12 May 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
31
0
0
10 May 2025
Summarisation of German Judgments in conjunction with a Class-based Evaluation
Bianca Steffes
Nils Torben Wiedemann
Alexander Gratz
Pamela Hochreither
Jana Elina Meyer
Katharina Luise Schilke
AILaw
ELM
58
0
0
09 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
33
0
0
07 May 2025
Retrieval Augmented Generation Evaluation for Health Documents
Mario Ceresa
Lorenzo Bertolini
Valentin Comte
Nicholas Spadaro
Barbara Raffael
...
Sergio Consoli
Amalia Muñoz Piñeiro
Alex Patak
Maddalena Querci
Tobias Wiesenthal
RALM
3DV
39
0
1
07 May 2025
UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output
Sicong Huang
Jincheng He
Shiyuan Huang
Karthik Raja Anandan
Arkajyoti Chakraborty
Ian Lane
HILM
LRM
41
0
0
05 May 2025
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering
Jihao Zhao
Chunlai Zhou
Biao Qin
55
0
0
05 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
55
0
0
04 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
57
0
0
02 May 2025
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Sahel Sharifymoghaddam
Shivani Upadhyay
Nandan Thakur
Ronak Pradeep
Jimmy Lin
RALM
27
0
0
28 Apr 2025
Towards Long Context Hallucination Detection
Siyi Liu
Kishaloy Halder
Zheng Qi
Wei Xiao
Nikolaos Pappas
Phu Mon Htut
Neha Anna John
Yassine Benajiba
Dan Roth
HILM
75
0
0
28 Apr 2025
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
Rui Xin
Niloofar Mireshghallah
Shuyue Stella Li
Michael Duan
Hyunwoo Kim
Yejin Choi
Yulia Tsvetkov
Sewoong Oh
Pang Wei Koh
76
2
0
28 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
32
0
0
25 Apr 2025
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
92
1
0
24 Apr 2025
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Y. Li
Jama Hussein Mohamud
Chongren Sun
Di Wu
Benoit Boulet
LLMAG
ELM
72
0
0
23 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Shuang Tian
Tao Zhang
Jiaheng Liu
Jiacheng Wang
Xuangou Wu
...
Ruichen Zhang
Feiyu Xiong
Zhenhui Yuan
Shiwen Mao
Dong In Kim
60
0
0
22 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
31
0
0
21 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM
3DV
43
0
0
21 Apr 2025
Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation
Jiajun Shen
Tong Zhou
Yubo Chen
Delai Qiu
Shengping Liu
Kang Liu
Jun Zhao
HILM
RALM
86
0
0
21 Apr 2025
CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge
Armin Toroghi
Willis Guo
Scott Sanner
RALM
LRM
31
0
0
20 Apr 2025
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Huaizhi Qu
Inyoung Choi
Zhen Tan
Song Wang
Sukwon Yun
Qi Long
Faizan Siddiqui
Kwonjoon Lee
Tianlong Chen
43
0
0
17 Apr 2025
Hallucination Detection in LLMs via Topological Divergence on Attention Graphs
Alexandra Bazarova
Aleksandr Yugay
Andrey Shulga
A. Ermilova
Andrei Volodichev
...
Dmitry Simakov
M. Savchenko
Andrey Savchenko
Serguei Barannikov
Alexey Zaytsev
HILM
33
0
0
14 Apr 2025
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
Hao Li
Liuzhenghao Lv
He Cao
Zijing Liu
Zhiyuan Yan
Yu Wang
Yonghong Tian
Yuan Li
Li Yuan
32
0
0
10 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAG
ELM
45
0
0
10 Apr 2025
Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented Generation
Alireza Salemi
Chris Samarinas
Hamed Zamani
36
0
0
10 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
49
0
0
09 Apr 2025
UniRVQA: A Unified Framework for Retrieval-Augmented Vision Question Answering via Self-Reflective Joint Training
Jiaqi Deng
Kaize Shi
Zonghan Wu
Huan Huo
Dingxian Wang
Guandong Xu
26
0
0
05 Apr 2025
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Kate Sanders
Benjamin Van Durme
LRM
38
1
0
04 Apr 2025
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
Qisheng Hu
Quanyu Long
Wenya Wang
LRM
53
0
0
03 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILaw
RALM
ELM
115
0
0
02 Apr 2025
WikiVideo: Article Generation from Multiple Videos
Alexander Martin
Reno Kriz
William Walden
Kate Sanders
Hannah Recknor
Eugene Yang
Francis Ferraro
Benjamin Van Durme
DiffM
VGen
64
1
0
01 Apr 2025
LLMs for Explainable AI: A Comprehensive Survey
Ahsan Bilal
David Ebert
Beiyu Lin
72
1
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
64
2
0
30 Mar 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
85
0
0
29 Mar 2025
Fact-checking AI-generated news reports: Can LLMs catch their own lies?
Jiayi Yao
Haibo Sun
Nianwen Xue
HILM
57
0
0
24 Mar 2025
Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee
Dongjae Lee
Chihun Choi
Youngmin Im
Jaeyoung Wi
Kihong Heo
Sangeun Oh
Sunjae Lee
Insik Shin
LLMAG
80
0
0
24 Mar 2025
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis
Raúl Ortega
José Manuel Gómez-Pérez
68
1
0
24 Mar 2025
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn
Jakub Binkowski
Denis Janiak
Bogdan Gabrys
Tomasz Kajdanowicz
HILM
LRM
58
0
0
21 Mar 2025
ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing
Tianwen Zhou
Jing Wang
Songtao Wu
Kuanhong Xu
DiffM
51
0
0
21 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
55
0
0
20 Mar 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Xiaoou Liu
Tiejin Chen
Longchao Da
Chacha Chen
Zhen Lin
Hua Wei
HILM
72
4
0
20 Mar 2025
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
Alexandra DeLucia
Mark Dredze
47
0
0
20 Mar 2025
FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
Varich Boonsanong
Vidhisha Balachandran
Xiaochuang Han
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
60
1
0
19 Mar 2025
Optimizing Decomposition for Optimal Claim Verification
Yining Lu
Noah Ziems
Hy Dang
Meng Jiang
61
0
0
19 Mar 2025
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Tao Feng
Yihang Sun
Jiaxuan You
55
0
0
16 Mar 2025
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation
Fengyu Li
Yilin Li
Junhao Zhu
Lu Chen
Yanfei Zhang
Jia Zhou
Hui Zu
Jingwen Zhao
Yunjun Gao
LLMAG
54
0
0
14 Mar 2025
Evaluating open-source Large Language Models for automated fact-checking
Nicoló Fontana
Francesco Corso
Enrico Zuccolotto
Francesco Pierri
HILM
62
0
0
07 Mar 2025
1
2
3
4
...
8
9
10
Next