Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.13298
Cited By
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
30 January 2023
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization"
21 / 21 papers shown
Title
Integrating Video and Text: A Balanced Approach to Multimodal Summary Generation and Evaluation
Galann Pennec
Zhengyuan Liu
Nicholas Asher
Philippe Muller
Nancy F. Chen
VGen
31
0
0
10 May 2025
Context-Aware Hierarchical Merging for Long Document Summarization
Litu Ou
Mirella Lapata
MoMe
195
1
0
03 Feb 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
151
0
0
28 Jan 2025
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Ameya Godbole
Robin Jia
HILM
53
1
0
24 Jan 2025
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization
Catarina G. Belem
Pouya Pezeskhpour
Hayate Iso
Seiji Maekawa
Nikita Bhutani
Estevam R. Hruschka
HILM
70
1
0
17 Oct 2024
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
38
0
03 Oct 2024
CoverBench: A Challenging Benchmark for Complex Claim Verification
Alon Jacovi
Moran Ambar
Eyal Ben-David
Uri Shaham
Amir Feder
Mor Geva
Dror Marcus
Avi Caciularu
LMTD
49
3
0
06 Aug 2024
STORYSUMM: Evaluating Faithfulness in Story Summarization
Melanie Subbiah
Faisal Ladhak
Akankshya Mishra
Griffin Adams
Lydia B. Chilton
Kathleen McKeown
48
4
0
09 Jul 2024
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Philippe Laban
Alexander R. Fabbri
Caiming Xiong
Chien-Sheng Wu
RALM
51
41
0
01 Jul 2024
A Modular Approach for Multimodal Summarization of TV Shows
Louis Mahon
Mirella Lapata
26
9
0
06 Mar 2024
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Yijia Shao
Yucheng Jiang
Theodore A. Kanell
Peter Xu
Omar Khattab
Monica S. Lam
LLMAG
KELM
44
35
0
22 Feb 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Xiuying Chen
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
80
2
0
22 Feb 2024
SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs
Yebowen Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Dong Yu
Fei Liu
23
8
0
15 Feb 2024
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
30
304
0
02 Jun 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
56
606
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
57
9
0
23 May 2023
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
Alex Jinpeng Wang
Richard Yuanzhe Pang
Angelica Chen
Jason Phang
Samuel R. Bowman
74
44
0
23 May 2022
Leveraging Information Bottleneck for Scientific Document Summarization
Jiaxin Ju
Ming Liu
Huan Yee Koh
Yuan Jin
Lan Du
Shirui Pan
59
13
0
04 Oct 2021
Finding a Balanced Degree of Automation for Summary Evaluation
Shiyue Zhang
Joey Tianyi Zhou
52
43
0
23 Sep 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
220
106
0
14 Sep 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
282
2,015
0
28 Jul 2020
1