Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
v1
v2 (latest)
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 513 papers shown
Title
SciClaims: An End-to-End Generative System for Biomedical Claim Analysis
Raúl Ortega
José Manuel Gómez-Pérez
141
1
0
24 Mar 2025
Safeguarding Mobile GUI Agent via Logic-based Action Verification
Jungjae Lee
Dongjae Lee
Chihun Choi
Youngmin Im
Jaeyoung Wi
Kihong Heo
Sangeun Oh
Sunjae Lee
Insik Shin
LLMAG
134
2
0
24 Mar 2025
Fact-checking AI-generated news reports: Can LLMs catch their own lies?
Jiayi Yao
Haibo Sun
Nianwen Xue
HILM
78
0
0
24 Mar 2025
ProDehaze: Prompting Diffusion Models Toward Faithful Image Dehazing
Tianwen Zhou
Jing Wang
Songtao Wu
Kuanhong Xu
DiffM
83
0
0
21 Mar 2025
FactSelfCheck: Fact-Level Black-Box Hallucination Detection for LLMs
Albert Sawczyn
Jakub Binkowski
Denis Janiak
Bogdan Gabrys
Tomasz Kajdanowicz
HILM
LRM
150
0
0
21 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
90
0
0
20 Mar 2025
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
Alexandra DeLucia
Mark Dredze
79
0
0
20 Mar 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Xiaoou Liu
Tiejin Chen
Longchao Da
Chacha Chen
Zhen Lin
Hua Wei
HILM
146
8
0
20 Mar 2025
FACTS&EVIDENCE: An Interactive Tool for Transparent Fine-Grained Factual Verification of Machine-Generated Text
Varich Boonsanong
Vidhisha Balachandran
Xiaochuang Han
Shangbin Feng
Lucy Lu Wang
Yulia Tsvetkov
97
1
0
19 Mar 2025
Optimizing Decomposition for Optimal Claim Verification
Yining Lu
Noah Ziems
Hy Dang
Meng Jiang
130
1
0
19 Mar 2025
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Tao Feng
Yihang Sun
Jiaxuan You
163
1
0
16 Mar 2025
AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation
Fengyu Li
Yilin Li
Junhao Zhu
Lu Chen
Yanfei Zhang
Jia Zhou
Hui Zu
Jingwen Zhao
Yunjun Gao
LLMAG
100
0
0
14 Mar 2025
Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation
Wen Luo
Feifan Song
Wei Li
Guangyue Peng
Shaohang Wei
Houfeng Wang
AI4CE
96
0
0
11 Mar 2025
Evaluating open-source Large Language Models for automated fact-checking
Nicoló Fontana
Francesco Corso
Enrico Zuccolotto
Francesco Pierri
HILM
110
2
0
07 Mar 2025
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones
Arjun Patrawala
Jacob Steinhardt
72
1
0
06 Mar 2025
Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting
Jiyue Jiang
Pengan Chen
Jinqiao Wang
Dongchen He
Ziqin Wei
...
Yimin Fan
Xiangyu Shi
Jimeng Sun
Chuan Wu
Yuan Li
LM&MA
121
3
0
06 Mar 2025
DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
Y. Guo
Yuchen Yang
Zhe Chen
Pingjie Wang
Yusheng Liao
Yize Zhang
Yanfeng Wang
Yu Wang
HILM
101
1
0
05 Mar 2025
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu
Wentao Zhang
Chengqi Lyu
Dahua Lin
Kai Chen
124
2
0
04 Mar 2025
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection
Dimitra Karkani
Maria Lymperaiou
Giorgos Filandrianos
Nikolaos Spanos
Athanasios Voulodimos
Giorgos Stamou
HILM
LRM
130
0
0
04 Mar 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed
Mingmeng Geng
Michalis Vazirgiannis
Guokan Shang
150
2
0
27 Feb 2025
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
Kuang Wang
Xianrui Li
Steve Yang
Li Zhou
Feng Jiang
Haoyang Li
99
0
0
26 Feb 2025
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents
A. Lewis
Michael White
Jing Liu
T. Koike-Akino
K. Parsons
Yanjie Wang
HILM
169
0
0
26 Feb 2025
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
174
3
0
26 Feb 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Yunjia Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
101
7
0
26 Feb 2025
Uncertainty Quantification in Retrieval Augmented Question Answering
Laura Perez-Beltrachini
Mirella Lapata
RALM
160
0
0
25 Feb 2025
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
Matthew Barker
Andrew Bell
Evan Thomas
James Carr
Thomas Andrews
Umang Bhatt
167
2
0
25 Feb 2025
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
Radu Marinescu
D. Bhattacharjya
Junkyu Lee
T. Tchrakian
Javier Carnerero-Cano
Yufang Hou
Elizabeth M. Daly
Alessandra Pascale
HILM
LRM
83
0
0
25 Feb 2025
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
Rohit Saxena
Pasquale Minervini
Frank Keller
VLM
106
2
0
24 Feb 2025
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking
Yi-Ling Chung
Aurora Cobo
Pablo Serna
SyDa
HILM
107
1
0
24 Feb 2025
Is Relevance Propagated from Retriever to Generator in RAG?
Fangzheng Tian
Debasis Ganguly
Craig Macdonald
RALM
94
2
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
201
2
0
24 Feb 2025
Grounded Persuasive Language Generation for Automated Marketing
Jibang Wu
Chenghao Yang
Simon Mahns
Chaoqi Wang
Hao Zhu
Fei Fang
Haifeng Xu
Haifeng Xu
93
3
0
24 Feb 2025
GraphCheck: Breaking Long-Term Text Barriers with Extracted Knowledge Graph-Powered Fact-Checking
Yingjian Chen
Haoran Liu
Yinhong Liu
Rui Yang
Han Yuan
...
Pengyuan Zhou
Peng Yuan Zhou
Qingyu Chen
James Caverlee
Irene Li
HILM
133
0
0
23 Feb 2025
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild
Saad Obaid ul Islam
Anne Lauscher
Goran Glavaš
HILM
LRM
184
3
0
21 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
142
4
0
21 Feb 2025
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Zekun Xi
Wenbiao Yin
Jizhan Fang
Jialong Wu
Runnan Fang
N. Zhang
Jiang Yong
Pengjun Xie
Fei Huang
Ningyu Zhang
SyDa
LRM
194
8
0
21 Feb 2025
Hallucination Detection in Large Language Models with Metamorphic Relations
Borui Yang
Md Afif Al Mamun
Jie M. Zhang
Gias Uddin
HILM
161
2
0
20 Feb 2025
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Elliot Schumacher
Dhruv Naik
Anitha Kannan
LM&MA
66
0
0
20 Feb 2025
Can Your Uncertainty Scores Detect Hallucinated Entity?
Min-Hsuan Yeh
Max Kamachee
Seongheon Park
Yixuan Li
HILM
148
3
0
17 Feb 2025
STRIVE: Structured Reasoning for Self-Improvement in Claim Verification
Haisong Gong
Jing Li
Junfei Wu
Qiang Liu
Shu Wu
Liang Wang
LRM
80
0
0
17 Feb 2025
Optimizing Knowledge Integration in Retrieval-Augmented Generation with Self-Selection
Yan Weng
Fengbin Zhu
Tong Ye
Haoyan Liu
Fuli Feng
Tat-Seng Chua
RALM
165
2
0
10 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs
A. Kumar
Jaechul Roh
A. Naseh
Marzena Karpinska
Mohit Iyyer
Amir Houmansadr
Eugene Bagdasarian
LRM
159
25
0
04 Feb 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou
Mirella Lapata
MoMe
535
1
0
03 Feb 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
290
1
0
28 Jan 2025
OnionEval: An Unified Evaluation of Fact-conflicting Hallucination for Small-Large Language Models
Chongren Sun
Yuante Li
Di Wu
Benoit Boulet
HILM
LRM
137
2
0
22 Jan 2025
Iterative Tree Analysis for Medical Critics
Zenan Huang
Mingwei Li
Zheng Zhou
Youxin Jiang
401
0
0
18 Jan 2025
Enhancing Retrieval-Augmented Generation: A Study of Best Practices
Siran Li
Linus Stenzel
Carsten Eickhoff
Seyed Ali Bahrainian
RALM
3DV
114
8
0
13 Jan 2025
Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use
Mohit Chandra
Siddharth Sriraman
Gaurav Verma
Harneet Singh Khanuja
Jose Suarez Campayo
Zihang Li
Michael L. Birnbaum
M. D. Choudhury
AI4MH
111
7
0
08 Jan 2025
Hallucination Detox: Sensitivity Dropout (SenD) for Large Language Model Training
Shahrad Mohammadzadeh
Juan D. Guerra
Marco Bonizzato
Reihaneh Rabbany
Golnoosh Farnadi
HILM
100
0
0
08 Jan 2025
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input
Alon Jacovi
Andrew Wang
Chris Alberti
Connie Tao
Jon Lipovetz
...
Rachana Fellinger
Rui Wang
Zizhao Zhang
Sasha Goldshtein
Dipanjan Das
HILM
ALM
195
17
0
06 Jan 2025
Previous
1
2
3
4
5
6
...
9
10
11
Next