Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
v1
v2 (latest)
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 513 papers shown
Title
UNCLE: Uncertainty Expressions in Long-Form Generation
Ruihan Yang
Caiqi Zhang
Zhisong Zhang
Xinting Huang
Dong Yu
Nigel Collier
Deqing Yang
ELM
69
2
0
22 May 2025
EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions
Spencer Hong
Meng Luo
Xinyi Wan
65
0
0
22 May 2025
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Yanbo Zhang
S. Khan
Adnan Mahmud
Huck Yang
Alexander Lavin
...
James A. Evans
Alan R. Bundy
Jannis Brugger
Jesper Tegner
Hector Zenil
LM&MA
100
1
0
22 May 2025
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Danna Zheng
Mirella Lapata
Jeff Z. Pan
HILM
72
0
0
21 May 2025
Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions
David Thulke
Jakob Kemmler
Christian Dugast
Hermann Ney
RALM
HILM
31
0
0
21 May 2025
Pre-training Large Memory Language Models with Internal and External Knowledge
Linxi Zhao
Sofian Zalouk
Christian K. Belardi
Justin Lovelace
Jin Peng Zhou
Kilian Q. Weinberger
Yoav Artzi
Jennifer J. Sun
KELM
HILM
105
0
0
21 May 2025
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking
Sarfraz Ahmad
Hasan Iqbal
Momina Ahsan
Numaan Naeem
Muhammad Ahsan Riaz Khan
Arham Riaz
Muhammad Arslan Manzoor
Yuxia Wang
Preslav Nakov
HILM
ELM
56
0
0
21 May 2025
Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization
Joonho Yang
Seunghyun Yoon
Hwan Chang
Byeongjeong Kim
Hwanhee Lee
HILM
134
0
0
21 May 2025
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
Haoming Huang
Yibo Yan
Jiahao Huo
Xin Zou
Xinfeng Li
Kun Wang
Xuming Hu
96
0
0
20 May 2025
Think Before You Attribute: Improving the Performance of LLMs Attribution Systems
João Eduardo Batista
Emil Vatai
Mohamed Wahib
213
0
0
19 May 2025
Learning Auxiliary Tasks Improves Reference-Free Hallucination Detection in Open-Domain Long-Form Generation
Chengwei Qin
Wenxuan Zhou
Karthik Abinav Sankararaman
Nanshu Wang
Tengyu Xu
...
Aditya Tayade
Sinong Wang
Shafiq Joty
Han Fang
Hao Ma
HILM
LRM
103
0
0
18 May 2025
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion Summarization
Weixiao Zhou
Junnan Zhu
Gengyao Li
Xianfu Cheng
Xinnian Liang
Feifei Zhai
Zhiyu Li
ALM
63
0
0
18 May 2025
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh Jain
Sungjin Ahn
Sungsoo Ahn
Yoshua Bengio
KELM
ReLM
LRM
87
0
0
17 May 2025
THELMA: Task Based Holistic Evaluation of Large Language Model Applications-RAG Question Answering
Udita Patel
Rutu Mulkar
Jay Roberts
Cibi Chakravarthy Senthilkumar
Sujay Gandhi
Xiaofei Zheng
Naumaan Nayyar
Parul Kalra
Rafael Castrillo
39
0
0
16 May 2025
VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
Xin Liu
Lechen Zhang
Sheza Munir
Yiyang Gu
Lu Wang
HILM
71
0
0
14 May 2025
Atomic Consistency Preference Optimization for Long-Form Question Answering
Jingfeng Chen
Raghuveer Thirukovalluru
Junlin Wang
Kaiwei Luo
Bhuwan Dhingra
KELM
HILM
76
0
0
14 May 2025
A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
Artem Shelmanov
Ekaterina Fadeeva
Akim Tsvigun
Ivan Tsvigun
Zhuohan Xie
...
Caiqi Zhang
Artem Vazhentsev
Mrinmaya Sachan
Preslav Nakov
Timothy Baldwin
HILM
102
2
0
13 May 2025
Why Uncertainty Estimation Methods Fall Short in RAG: An Axiomatic Analysis
Heydar Soudani
Evangelos Kanoulas
Faegheh Hasibi
133
0
0
12 May 2025
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
79
0
0
10 May 2025
Summarisation of German Judgments in conjunction with a Class-based Evaluation
Bianca Steffes
Nils Torben Wiedemann
Alexander Gratz
Pamela Hochreither
Jana Elina Meyer
Katharina Luise Schilke
AILaw
ELM
89
0
0
09 May 2025
Retrieval Augmented Generation Evaluation for Health Documents
Mario Ceresa
Lorenzo Bertolini
Valentin Comte
Nicholas Spadaro
Barbara Raffael
...
Sergio Consoli
Amalia Muñoz Piñeiro
Alex Patak
Maddalena Querci
Tobias Wiesenthal
RALM
3DV
100
0
1
07 May 2025
Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber
F. S. Bao
Chenyu Xu
Ge Luo
Suleman Kazi
Minseok Bae
Miaoran Li
Ofer Mendelevitch
Renyi Qu
Jimmy J. Lin
VLM
72
1
0
07 May 2025
UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output
Sicong Huang
Jincheng He
Shiyuan Huang
Karthik Raja Anandan
Arkajyoti Chakraborty
Ian Lane
HILM
LRM
128
1
0
05 May 2025
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering
Jihao Zhao
Chunlai Zhou
Biao Qin
115
0
0
05 May 2025
A Comprehensive Analysis for Visual Object Hallucination in Large Vision-Language Models
Liqiang Jing
Guiming Hardy Chen
Ehsan Aghazadeh
Xin Eric Wang
Xinya Du
137
0
0
04 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
148
0
0
02 May 2025
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Sahel Sharifymoghaddam
Shivani Upadhyay
Nandan Thakur
Ronak Pradeep
Jimmy Lin
RALM
186
1
0
28 Apr 2025
Towards Long Context Hallucination Detection
Siyi Liu
Kishaloy Halder
Zheng Qi
Wei Xiao
Nikolaos Pappas
Phu Mon Htut
Neha Anna John
Yassine Benajiba
Dan Roth
HILM
115
2
0
28 Apr 2025
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage
Rui Xin
Niloofar Mireshghallah
Shuyue Stella Li
Michael Duan
Hyunwoo Kim
Yejin Choi
Yulia Tsvetkov
Sewoong Oh
Pang Wei Koh
150
7
0
28 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
58
0
0
25 Apr 2025
HalluLens: LLM Hallucination Benchmark
Yejin Bang
Ziwei Ji
Alan Schelten
Anthony Hartshorn
Tara Fowler
Cheng Zhang
Nicola Cancedda
Pascale Fung
HILM
132
5
0
24 Apr 2025
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Yuante Li
Jama Hussein Mohamud
Chongren Sun
Di Wu
Benoit Boulet
LLMAG
ELM
125
1
0
23 Apr 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey
Shuang Tian
Tao Zhang
Qingbin Liu
Jiacheng Wang
Xuangou Wu
...
Ruichen Zhang
Wentao Zhang
Zhenhui Yuan
Shiwen Mao
Dong In Kim
153
1
0
22 Apr 2025
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
Ronak Pradeep
Nandan Thakur
Shivani Upadhyay
Daniel Fernando Campos
Nick Craswell
Jimmy Lin
78
2
0
21 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM
3DV
89
1
0
21 Apr 2025
Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation
Jiajun Shen
Tong Zhou
Yubo Chen
Delai Qiu
Shengping Liu
Kang Liu
Jun Zhao
HILM
RALM
126
0
0
21 Apr 2025
CoLoTa: A Dataset for Entity-based Commonsense Reasoning over Long-Tail Knowledge
Armin Toroghi
Willis Guo
Scott Sanner
RALM
LRM
70
0
0
20 Apr 2025
Efficient MAP Estimation of LLM Judgment Performance with Prior Transfer
Huaizhi Qu
Inyoung Choi
Zhen Tan
Song Wang
Sukwon Yun
Qi Long
Faizan Siddiqui
Kwonjoon Lee
Tianlong Chen
76
0
0
17 Apr 2025
How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension
Hao Li
Liuzhenghao Lv
He Cao
Zijing Liu
Zhiyuan Yan
Yu Wang
Yonghong Tian
Yuezun Li
Li Yuan
144
2
0
10 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAG
ELM
103
0
0
10 Apr 2025
Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented Generation
Alireza Salemi
Chris Samarinas
Hamed Zamani
78
0
0
10 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
130
0
0
09 Apr 2025
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
Jiaqi Deng
Kaize Shi
Zonghan Wu
Huan Huo
Dingxian Wang
Guandong Xu
44
0
0
05 Apr 2025
Bonsai: Interpretable Tree-Adaptive Grounded Reasoning
Kate Sanders
Benjamin Van Durme
LRM
139
1
0
04 Apr 2025
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
Qisheng Hu
Quanyu Long
Wenya Wang
LRM
95
1
0
03 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILaw
RALM
ELM
172
0
0
02 Apr 2025
WikiVideo: Article Generation from Multiple Videos
Alexander Martin
Reno Kriz
William Walden
Kate Sanders
Hannah Recknor
Eugene Yang
Francis Ferraro
Benjamin Van Durme
DiffM
VGen
154
2
0
01 Apr 2025
LLMs for Explainable AI: A Comprehensive Survey
Ahsan Bilal
David Ebert
Beiyu Lin
188
7
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
100
2
0
30 Mar 2025
FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research
Gabriel Recchia
Chatrik Singh Mangat
Issac Li
Gayatri Krishnakumar
ALM
177
0
0
29 Mar 2025
Previous
1
2
3
4
5
...
9
10
11
Next