Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
v1
v2 (latest)
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 513 papers shown
Title
HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation
Wen Luo
Tianshu Shen
Wei Li
Guangyue Peng
Richeng Xuan
Houfeng Wang
Xi Yang
HILM
111
12
0
11 Jun 2024
Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges
Abhilasha Sancheti
Koustava Goswami
Balaji Vasan Srinivasan
RALM
92
1
0
11 Jun 2024
A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation
Bairu Hou
Yang Zhang
Jacob Andreas
Shiyu Chang
161
7
0
11 Jun 2024
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Joongwon Kim
Bhargavi Paranjape
Tushar Khot
Hannaneh Hajishirzi
LM&Ro
ELM
LLMAG
LRM
85
9
0
10 Jun 2024
Verifiable Generation with Subsentence-Level Fine-Grained Citations
Shuyang Cao
Lu Wang
94
7
0
10 Jun 2024
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
Neeraj Varshney
Satyam Raj
Venkatesh Mishra
Agneet Chatterjee
Ritika Sarkar
Amir Saeidi
Chitta Baral
LRM
96
11
0
08 Jun 2024
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Bill Yuchen Lin
Yuntian Deng
Khyathi Chandu
Faeze Brahman
Abhilasha Ravichander
Valentina Pyatkin
Nouha Dziri
Ronan Le Bras
Yejin Choi
108
82
0
07 Jun 2024
MAIRA-2: Grounded Radiology Report Generation
Shruthi Bannur
Kenza Bouzid
Daniel Coelho De Castro
Anton Schwaighofer
Sam Bond-Taylor
...
Anja Thieme
M. Lungren
Maria T. A. Wetscherek
Javier Alvarez-Valle
Stephanie L. Hyland
82
44
0
06 Jun 2024
PaCE: Parsimonious Concept Engineering for Large Language Models
Jinqi Luo
Tianjiao Ding
Kwan Ho Ryan Chan
D. Thaker
Aditya Chattopadhyay
Chris Callison-Burch
René Vidal
CVBM
98
12
0
06 Jun 2024
AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways
Zehang Deng
Yongjian Guo
Changzhou Han
Wanlun Ma
Junwu Xiong
Sheng Wen
Yang Xiang
157
49
0
04 Jun 2024
Safeguarding Large Language Models: A Survey
Yi Dong
Ronghui Mu
Yanghao Zhang
Siqi Sun
Tianle Zhang
...
Yi Qi
Jinwei Hu
Jie Meng
Saddek Bensalem
Xiaowei Huang
OffRL
KELM
AILaw
99
26
0
03 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
173
85
0
03 Jun 2024
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control
Huanshuo Liu
Hao Zhang
Zhijiang Guo
Kuicai Dong
Xiangyang Li
Yi Quan Lee
Cong Zhang
Yong Liu
3DV
94
1
0
29 May 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li
Xilun Chen
Ari Holtzman
Beidi Chen
Jimmy Lin
Wen-tau Yih
Xi Lin
RALM
BDL
240
14
0
29 May 2024
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models
Jaewoo Ahn
Taehyun Lee
Junyoung Lim
Jin-Hwa Kim
Sangdoo Yun
Hwaran Lee
Gunhee Kim
LLMAG
HILM
88
14
0
28 May 2024
GRAG: Graph Retrieval-Augmented Generation
Yuntong Hu
Zhihan Lei
Zhengwu Zhang
Bo Pan
Chen Ling
Liang Zhao
121
31
0
26 May 2024
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
Yun Zhu
Jia-Chen Gu
Caitlin Sikora
Ho Ko
Yinxiao Liu
...
Lei Shu
Liangchen Luo
Lei Meng
Bang Liu
Jindong Chen
RALM
99
19
0
25 May 2024
Certifiably Robust RAG against Retrieval Corruption
Chong Xiang
Tong Wu
Zexuan Zhong
David Wagner
Danqi Chen
Prateek Mittal
SILM
99
58
0
24 May 2024
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings
R. Reddy
Omar Attia
Yunyao Li
Heng Ji
Saloni Potdar
61
1
0
23 May 2024
RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Xiangkun Hu
Dongyu Ru
Lin Qiu
Qipeng Guo
Tianhang Zhang
Yang Xu
Yun Luo
Pengfei Liu
Yue Zhang
Zheng Zhang
HILM
LRM
98
9
0
23 May 2024
Can LLMs Solve longer Math Word Problems Better?
Xin Xu
Tong Xiao
Zitong Chao
Zhenya Huang
Can Yang
Yang Wang
173
14
0
23 May 2024
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
Guangzhi Sun
Potsawee Manakul
Adian Liusie
Kunat Pipatanakul
Chao Zhang
P. Woodland
Mark Gales
HILM
MLLM
84
9
0
22 May 2024
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation
Gauthier Guinet
Behrooz Omidvar-Tehrani
Anoop Deoras
Laurent Callot
RALM
117
23
0
22 May 2024
Atomic Self-Consistency for Better Long Form Generations
Raghuveer Thirukovalluru
Yukun Huang
Bhuwan Dhingra
80
5
0
21 May 2024
Large Language Models Meet NLP: A Survey
Libo Qin
Qiguang Chen
Xiachong Feng
Yang Wu
Yongheng Zhang
Hai-Tao Zheng
Min Li
Wanxiang Che
Philip S. Yu
ALM
LM&MA
ELM
LRM
123
59
0
21 May 2024
OLAPH: Improving Factuality in Biomedical Long-form Question Answering
Minbyul Jeong
Hyeon Hwang
Chanwoong Yoon
Taewhoo Lee
Jaewoo Kang
MedIm
HILM
LM&MA
123
12
0
21 May 2024
Question-Based Retrieval using Atomic Units for Enterprise RAG
Vatsal Raina
Mark Gales
74
14
0
20 May 2024
SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation
Yuwei Wan
Yixuan Liu
Aswathy Ajith
Clara Grazian
B. Hoex
Wenjie Zhang
Chunyu Kit
Tong Xie
Ian Foster
95
10
0
16 May 2024
LLMs can learn self-restraint through iterative self-reflection
Alexandre Piché
Aristides Milios
Dzmitry Bahdanau
Chris Pal
74
6
0
15 May 2024
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
Zorik Gekhman
G. Yona
Roee Aharoni
Matan Eyal
Amir Feder
Roi Reichart
Jonathan Herzig
146
137
0
09 May 2024
OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs
Yuxia Wang
Minghan Wang
Hasan Iqbal
Georgi Georgiev
Jiahui Geng
Preslav Nakov
HILM
107
2
0
09 May 2024
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations
Yoonjoo Lee
Kihoon Son
Tae Soo Kim
Jisu Kim
John Joon Young Chung
Eytan Adar
Juho Kim
93
13
0
09 May 2024
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training
Zexuan Zhong
Mengzhou Xia
Danqi Chen
Mike Lewis
MoE
110
19
0
06 May 2024
Recall Them All: Retrieval-Augmented Language Models for Long Object List Extraction from Long Documents
Sneha Singhania
Simon Razniewski
Gerhard Weikum
RALM
129
1
0
04 May 2024
FLAME: Factuality-Aware Alignment for Large Language Models
Sheng-Chieh Lin
Luyu Gao
Barlas Oğuz
Wenhan Xiong
Jimmy Lin
Wen-tau Yih
Xilun Chen
HILM
95
20
0
02 May 2024
On the Evaluation of Machine-Generated Reports
James Mayfield
Eugene Yang
Dawn J Lawrie
Sean MacAvaney
Paul McNamee
...
Orion Weller
Efsun Kayi
Kate Sanders
Marc Mason
Noah Hibbler
ALM
184
17
0
02 May 2024
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model
Xinzhe Li
Ming Liu
Shang Gao
RALM
103
0
0
30 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
212
61
0
23 Apr 2024
ISQA: Informative Factuality Feedback for Scientific Summarization
Zekai Li
Yanxia Qin
Qian Liu
Min-Yen Kan
HILM
60
1
0
20 Apr 2024
AmbigDocs: Reasoning across Documents on Different Entities under the Same Name
Yoonsang Lee
Xi Ye
Eunsol Choi
79
14
0
18 Apr 2024
Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models
Sunhao Dai
Chen Xu
Shicheng Xu
Liang Pang
Zhenhua Dong
Jun Xu
117
83
0
17 Apr 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang
Seunghyun Yoon
Byeongjeong Kim
Hwanhee Lee
HILM
114
7
0
17 Apr 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang
Philippe Laban
Greg Durrett
HILM
SyDa
86
103
0
16 Apr 2024
NoticIA: A Clickbait Article Summarization Dataset in Spanish
Iker García-Ferrero
Begoña Altuna
93
2
0
11 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDa
EgoV
126
96
0
11 Apr 2024
Pitfalls of Conversational LLMs on News Debiasing
Ipek Baris Schlicht
Defne Altiok
Maryanne Taouk
Lucie Flek
95
4
0
09 Apr 2024
Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
Tianyu Cao
Natraj Raman
Danial Dervovic
Chenhao Tan
62
5
0
09 Apr 2024
Know When To Stop: A Study of Semantic Drift in Text Generation
Ava Spataru
Eric Hambro
Elena Voita
Nicola Cancedda
61
3
0
08 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
187
18
0
07 Apr 2024
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
Dingfan Chen
Qing Li
Zongxiong Chen
Lei Ma
Jens Grossklags
Mario Fritz
HILM
89
14
0
06 Apr 2024
Previous
1
2
3
...
10
11
6
7
8
9
Next