Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
v1
v2 (latest)
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 513 papers shown
Title
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
193
9
0
05 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
86
25
0
04 Apr 2024
PRobELM: Plausibility Ranking Evaluation for Language Models
Moy Yuan
Chenxi Whitehouse
Eric Chamoun
Rami Aly
Andreas Vlachos
189
5
0
04 Apr 2024
HyperCLOVA X Technical Report
Kang Min Yoo
Jaegeun Han
Sookyo In
Heewon Jeon
Jisu Jeong
...
Hyunkyung Noh
Se-Eun Choi
Sang-Woo Lee
Jung Hwa Lim
Nako Sung
VLM
88
9
0
02 Apr 2024
On the Role of Summary Content Units in Text Summarization Evaluation
Marcel Nawrath
Agnieszka Nowak
Tristan Ratz
Danilo C. Walenta
Juri Opitz
...
Sebastian Gehrmann
Saad Mahamood
Miruna Clinciu
Khyathi Chandu
Yufang Hou
ELM
82
5
0
02 Apr 2024
AILS-NTUA at SemEval-2024 Task 6: Efficient model tuning for hallucination detection and analysis
Natalia Griogoriadou
Maria Lymperaiou
Giorgos Filandrianos
Giorgos Stamou
VLM
69
0
0
01 Apr 2024
Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks
Hyunjae Kim
Hyeon Hwang
Jiwoo Lee
Sihyeon Park
Dain Kim
Taewhoo Lee
Chanwoong Yoon
Jiwoong Sohn
Donghee Choi
Jaewoo Kang
ELM
AI4MH
LRM
118
22
0
30 Mar 2024
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge Editing Benchmark
Baolong Bi
Shenghua Liu
Yiwei Wang
Lingrui Mei
Xueqi Cheng
KELM
51
13
0
30 Mar 2024
LUQ: Long-text Uncertainty Quantification for LLMs
Caiqi Zhang
Fangyu Liu
Marco Basaldella
Nigel Collier
HILM
86
40
0
29 Mar 2024
FACTOID: FACtual enTailment fOr hallucInation Detection
Vipula Rawte
S. M. Towhidul
Krishnav Rajbangshi
Shravani Nag
Aman Chadha
Amit P. Sheth
Amitava Das
HILM
86
4
0
28 Mar 2024
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu
Zichen Zhu
Situo Zhang
Da Ma
Shuai Fan
Lu Chen
Kai Yu
HILM
105
45
0
27 Mar 2024
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
Yukyung Lee
Joonghoon Kim
Jaehee Kim
Hyowon Cho
Pilsung Kang
Pilsung Kang
Najoung Kim
ELM
83
5
0
27 Mar 2024
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
118
27
0
25 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
246
12
0
25 Mar 2024
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Kyungjae Lee
Dasol Hwang
Sunghyun Park
Youngsoo Jang
Moontae Lee
66
8
0
21 Mar 2024
A Closer Look at Claim Decomposition
Miriam Wanner
Seth Ebner
Zhengping Jiang
Mark Dredze
Benjamin Van Durme
99
24
0
18 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
88
13
0
15 Mar 2024
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Moxin Li
Wenjie Wang
Fuli Feng
Fengbin Zhu
Qifan Wang
Tat-Seng Chua
HILM
LRM
115
23
0
15 Mar 2024
ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs
Preetam Prabhu Srikar Dammu
Himanshu Naidu
Mouly Dewan
YoungMin Kim
Tanya Roosta
Aman Chadha
Chirag Shah
72
8
0
12 Mar 2024
Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts
Tian Yu
Shaolei Zhang
Yang Feng
HILM
71
7
0
12 Mar 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Katie Kang
Eric Wallace
Claire Tomlin
Aviral Kumar
Sergey Levine
HILM
LRM
106
58
0
08 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
76
4
0
08 Mar 2024
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Ekaterina Fadeeva
Aleksandr Rubashevskii
Artem Shelmanov
Sergey Petrakov
Haonan Li
...
Gleb Kuzmin
Alexander Panchenko
Timothy Baldwin
Preslav Nakov
Maxim Panov
HILM
102
56
0
07 Mar 2024
FaaF: Facts as a Function for the evaluation of generated text
Vasileios Katranidis
Gabor Barany
HILM
RALM
74
4
0
06 Mar 2024
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
86
10
0
06 Mar 2024
Multimodal Large Language Models to Support Real-World Fact-Checking
Jiahui Geng
Yova Kementchedjhieva
Preslav Nakov
Iryna Gurevych
LRM
120
15
0
06 Mar 2024
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
Yuhong Sun
Zhangyue Yin
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Hui Zhao
65
19
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
116
63
0
05 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
56
11
0
04 Mar 2024
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
Haolin Deng
Chang Wang
Xin Li
Dezhang Yuan
Junlang Zhan
Tianhua Zhou
Jin Ma
Jun Gao
Ruifeng Xu
HILM
89
2
0
04 Mar 2024
SyllabusQA: A Course Logistics Question Answering Dataset
Nigel Fernandez
Alexander Scarlatos
Andrew Lan
44
6
0
03 Mar 2024
Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering
Armin Toroghi
Willis Guo
Mohammad Mahdi Torabi pour
Scott Sanner
LRM
101
10
0
03 Mar 2024
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Tharindu Kumarage
Garima Agrawal
Paras Sheth
Raha Moraffah
Amanat Chadha
Joshua Garland
Huan Liu
DeLMO
72
13
0
02 Mar 2024
Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers
Melanie Subbiah
Sean Zhang
Lydia B. Chilton
Kathleen McKeown
111
15
0
02 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELM
ALM
61
4
0
01 Mar 2024
Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition
Ariel Goldstein
Gabriel Stanovsky
61
1
0
01 Mar 2024
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
Hongbang Yuan
Pengfei Cao
Zhuoran Jin
Yubo Chen
Daojian Zeng
Kang Liu
Jun Zhao
HILM
88
4
0
29 Feb 2024
Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore
Sheikh Shafayat
Eunsu Kim
Juhyun Oh
Alice Oh
HILM
88
8
0
28 Feb 2024
Collaborative decoding of critical tokens for boosting factuality of large language models
Lifeng Jin
Baolin Peng
Linfeng Song
Haitao Mi
Ye Tian
Dong Yu
HILM
49
9
0
28 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
86
81
0
27 Feb 2024
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu
Xiaojuan Tang
Haotong Yang
Muhan Zhang
LRM
105
25
0
27 Feb 2024
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks
Huajian Zhang
Yumo Xu
Laura Perez-Beltrachini
HILM
81
13
0
27 Feb 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
156
2
0
27 Feb 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
Cem Uluoglakci
T. Taşkaya-Temizel
HILM
64
3
0
25 Feb 2024
Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions
Xuming Hu
Xiaochuan Li
Junzhe Chen
Hai-Tao Zheng
Yangning Li
...
Yasheng Wang
Qun Liu
Lijie Wen
Philip S. Yu
Zhijiang Guo
AAML
ELM
81
4
0
25 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
73
16
0
24 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILM
LRM
66
7
0
23 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
Soheil Feizi
MIALM
106
42
0
23 Feb 2024
Faithful Temporal Question Answering over Heterogeneous Sources
Zhen Jia
Philipp Christmann
Gerhard Weikum
74
10
0
23 Feb 2024
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models
Zhaoheng Huang
Zhicheng Dou
Yutao Zhu
Ji-Rong Wen
HILM
54
2
0
22 Feb 2024
Previous
1
2
3
...
10
11
7
8
9
Next