Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14251
Cited By
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
23 May 2023
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
50 / 456 papers shown
Title
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Hongshen Xu
Zichen Zhu
Situo Zhang
Da Ma
Shuai Fan
Lu Chen
Kai Yu
HILM
39
34
0
27 Mar 2024
CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
Yukyung Lee
Joonghoon Kim
Jaehee Kim
Hyowon Cho
Pilsung Kang
Pilsung Kang
Najoung Kim
ELM
47
4
0
27 Mar 2024
Attribute First, then Generate: Locally-attributable Grounded Text Generation
Aviv Slobodkin
Eran Hirsch
Arie Cattan
Tal Schuster
Ido Dagan
73
20
0
25 Mar 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
57
9
0
25 Mar 2024
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection
Kyungjae Lee
Dasol Hwang
Sunghyun Park
Youngsoo Jang
Moontae Lee
46
8
0
21 Mar 2024
A Closer Look at Claim Decomposition
Miriam Wanner
Seth Ebner
Zhengping Jiang
Mark Dredze
Benjamin Van Durme
49
18
0
18 Mar 2024
TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale
Pengcheng Jiang
Cao Xiao
Zifeng Wang
Parminder Bhatia
Jimeng Sun
Jiawei Han
LRM
26
10
0
15 Mar 2024
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection
Moxin Li
Wenjie Wang
Fuli Feng
Fengbin Zhu
Qifan Wang
Tat-Seng Chua
HILM
LRM
46
13
0
15 Mar 2024
ClaimVer: Explainable Claim-Level Verification and Evidence Attribution of Text Through Knowledge Graphs
Preetam Prabhu Srikar Dammu
Himanshu Naidu
Mouly Dewan
YoungMin Kim
Tanya Roosta
Aman Chadha
Chirag Shah
46
6
0
12 Mar 2024
Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts
Tian Yu
Shaolei Zhang
Yang Feng
HILM
42
7
0
12 Mar 2024
Unfamiliar Finetuning Examples Control How Language Models Hallucinate
Katie Kang
Eric Wallace
Claire Tomlin
Aviral Kumar
Sergey Levine
HILM
LRM
46
49
0
08 Mar 2024
ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models
Jio Oh
Soyeon Kim
Junseok Seo
Jindong Wang
Ruochen Xu
Xing Xie
Steven Euijong Whang
41
1
0
08 Mar 2024
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Ekaterina Fadeeva
Aleksandr Rubashevskii
Artem Shelmanov
Sergey Petrakov
Haonan Li
...
Gleb Kuzmin
Alexander Panchenko
Timothy Baldwin
Preslav Nakov
Maxim Panov
HILM
45
40
0
07 Mar 2024
FaaF: Facts as a Function for the evaluation of generated text
Vasileios Katranidis
Gabor Barany
HILM
RALM
47
4
0
06 Mar 2024
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
26
10
0
06 Mar 2024
Multimodal Large Language Models to Support Real-World Fact-Checking
Jiahui Geng
Yova Kementchedjhieva
Preslav Nakov
Iryna Gurevych
LRM
32
11
0
06 Mar 2024
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem
Yuhong Sun
Zhangyue Yin
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Hui Zhao
33
14
0
06 Mar 2024
Reliable, Adaptable, and Attributable Language Models with Retrieval
Akari Asai
Zexuan Zhong
Danqi Chen
Pang Wei Koh
Luke Zettlemoyer
Hanna Hajishirzi
Wen-tau Yih
KELM
RALM
49
54
0
05 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
29
7
0
04 Mar 2024
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
Haolin Deng
Chang Wang
Xin Li
Dezhang Yuan
Junlang Zhan
Tianhua Zhou
Jin Ma
Jun Gao
Ruifeng Xu
HILM
66
2
0
04 Mar 2024
SyllabusQA: A Course Logistics Question Answering Dataset
Nigel Fernandez
Alexander Scarlatos
Andrew S. Lan
16
4
0
03 Mar 2024
Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering
Armin Toroghi
Willis Guo
Mohammad Mahdi Torabi pour
Scott Sanner
LRM
31
8
0
03 Mar 2024
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Tharindu Kumarage
Garima Agrawal
Paras Sheth
Raha Moraffah
Amanat Chadha
Joshua Garland
Huan Liu
DeLMO
36
11
0
02 Mar 2024
Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers
Melanie Subbiah
Sean Zhang
Lydia B. Chilton
Kathleen McKeown
54
14
0
02 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELM
ALM
39
4
0
01 Mar 2024
Do Zombies Understand? A Choose-Your-Own-Adventure Exploration of Machine Cognition
Ariel Goldstein
Gabriel Stanovsky
37
1
0
01 Mar 2024
Whispers that Shake Foundations: Analyzing and Mitigating False Premise Hallucinations in Large Language Models
Hongbang Yuan
Pengfei Cao
Zhuoran Jin
Yubo Chen
Daojian Zeng
Kang Liu
Jun Zhao
HILM
37
3
0
29 Feb 2024
Multi-FAct: Assessing Multilingual LLMs' Multi-Regional Knowledge using FActScore
Sheikh Shafayat
Eunsu Kim
Juhyun Oh
Alice H. Oh
HILM
46
6
0
28 Feb 2024
Collaborative decoding of critical tokens for boosting factuality of large language models
Lifeng Jin
Baolin Peng
Linfeng Song
Haitao Mi
Ye Tian
Dong Yu
HILM
32
6
0
28 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
24
68
0
27 Feb 2024
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu
Xiaojuan Tang
Haotong Yang
Muhan Zhang
LRM
27
20
0
27 Feb 2024
Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks
Huajian Zhang
Yumo Xu
Laura Perez-Beltrachini
HILM
34
9
0
27 Feb 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
56
2
0
27 Feb 2024
HypoTermQA: Hypothetical Terms Dataset for Benchmarking Hallucination Tendency of LLMs
Cem Uluoglakci
T. Taşkaya-Temizel
HILM
35
2
0
25 Feb 2024
Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions
Xuming Hu
Xiaochuan Li
Junzhe Chen
Hai-Tao Zheng
Yangning Li
...
Yasheng Wang
Qun Liu
Lijie Wen
Philip S. Yu
Zhijiang Guo
AAML
ELM
32
5
0
25 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
34
13
0
24 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILM
LRM
23
6
0
23 Feb 2024
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan
Shoumik Saha
Gaurang Sriramanan
Priyatham Kattakinda
Atoosa Malemir Chegini
S. Feizi
MIALM
43
34
0
23 Feb 2024
Faithful Temporal Question Answering over Heterogeneous Sources
Zhen Jia
Philipp Christmann
Gerhard Weikum
33
10
0
23 Feb 2024
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models
Zhaoheng Huang
Zhicheng Dou
Yutao Zhu
Ji-Rong Wen
HILM
38
1
0
22 Feb 2024
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Yijia Shao
Yucheng Jiang
Theodore A. Kanell
Peter Xu
Omar Khattab
Monica S. Lam
LLMAG
KELM
44
35
0
22 Feb 2024
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models
Jianhao Yan
Yun Luo
Yue Zhang
ALM
LRM
38
7
0
21 Feb 2024
Factual consistency evaluation of summarization in the Era of large language models
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
HILM
35
1
0
21 Feb 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
43
2
0
20 Feb 2024
GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence
Kundan Krishna
S. Ramprasad
Prakhar Gupta
Byron C. Wallace
Zachary Chase Lipton
Jeffrey P. Bigham
HILM
KELM
SyDa
52
9
0
19 Feb 2024
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs
Jiejun Tan
Zhicheng Dou
Yutao Zhu
Peidong Guo
Kun Fang
Ji-Rong Wen
47
24
0
19 Feb 2024
Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search
Chanwoong Yoon
Gangwoo Kim
Byeongguk Jeon
Sungdong Kim
Yohan Jo
Jaewoo Kang
RALM
KELM
39
12
0
19 Feb 2024
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph
Lily Chen
Jan Trienes
Hannah Louisa Göke
Monika Coers
Wei Xu
Byron C. Wallace
Junyi Jessy Li
LM&MA
HILM
26
10
0
18 Feb 2024
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
Yougang Lyu
Lingyong Yan
Shuaiqiang Wang
Haibo Shi
Dawei Yin
Pengjie Ren
Zhumin Chen
Maarten de Rijke
Zhaochun Ren
24
5
0
17 Feb 2024
GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models
Pengcheng Jiang
Jiacheng Lin
Zifeng Wang
Jimeng Sun
Jiawei Han
28
3
0
16 Feb 2024
Previous
1
2
3
...
10
6
7
8
9
Next