Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.15993
Cited By
Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown
24 November 2024
Lifu Tu
Rui Meng
Shafiq Joty
Yingbo Zhou
Semih Yavuz
HILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Investigating Factuality in Long-Form Text Generation: The Roles of Self-Known and Self-Unknown"
21 / 21 papers shown
Title
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Andrey Kravchenko
RALM
ALM
LRM
ReLM
ELM
85
82
0
14 Jun 2024
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
97
603
0
28 Aug 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
101
156
0
20 Jul 2023
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong
Zhiyuan Hu
Xinyang Lu
Yifei Li
Jie Fu
Junxian He
Bryan Hooi
210
451
0
22 Jun 2023
Do Large Language Models Know What They Don't Know?
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Jiawen Wu
Xipeng Qiu
Xuanjing Huang
ELM
AI4MH
78
163
0
29 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
144
702
0
23 May 2023
ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
Uri Shaham
Maor Ivgi
Avia Efrat
Jonathan Berant
Omer Levy
VLM
92
140
0
23 May 2023
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Junyi Li
Xiaoxue Cheng
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
HILM
VLM
90
250
0
19 May 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark Gales
HILM
LRM
193
445
0
15 Mar 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
74
134
0
15 Dec 2022
Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation
Nuno M. Guerreiro
Elena Voita
André F. T. Martins
HILM
65
57
0
10 Aug 2022
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
211
1,777
0
09 Jun 2022
Generating Literal and Implied Subquestions to Fact-check Complex Claims
Jifan Chen
Aniruddh Sriram
Eunsol Choi
Greg Durrett
HILM
79
66
0
14 May 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Dustin Wright
David Wadden
Kyle Lo
Bailey Kuehl
Arman Cohan
Isabelle Augenstein
Lucy Lu Wang
MedIm
103
57
0
24 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
886
13,207
0
04 Mar 2022
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
149
1,942
0
08 Sep 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
276
148
0
18 Apr 2021
Generating Fact Checking Briefs
Angela Fan
Aleksandra Piktus
Fabio Petroni
Guillaume Wenzek
Marzieh Saeidi
Andreas Vlachos
Antoine Bordes
Sebastian Riedel
HILM
99
59
0
10 Nov 2020
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
292
2,854
0
11 Jun 2018
FEVER: a large-scale dataset for Fact Extraction and VERification
James Thorne
Andreas Vlachos
Christos Christodoulopoulos
Arpit Mittal
HILM
167
1,667
0
14 Mar 2018
NewsQA: A Machine Comprehension Dataset
Adam Trischler
Tong Wang
Xingdi Yuan
Justin Harris
Alessandro Sordoni
Philip Bachman
Kaheer Suleman
108
893
0
29 Nov 2016
1