Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.07981
Cited By
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
15 December 2022
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
Ruilin Han
Simeng Han
Shafiq R. Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation"
50 / 118 papers shown
Title
WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations
Haolin Deng
Chang Wang
Xin Li
Dezhang Yuan
Junlang Zhan
Tianhua Zhou
Jin Ma
Jun Gao
Ruifeng Xu
HILM
60
2
0
04 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELM
ALM
33
4
0
01 Mar 2024
How Much Annotation is Needed to Compare Summarization Models?
Chantal Shaib
Joe Barrow
Alexa F. Siu
Byron C. Wallace
A. Nenkova
53
2
0
28 Feb 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Xiuying Chen
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
80
2
0
22 Feb 2024
KnowTuning: Knowledge-aware Fine-tuning for Large Language Models
Yougang Lyu
Lingyong Yan
Shuaiqiang Wang
Haibo Shi
Dawei Yin
Pengjie Ren
Zhumin Chen
Maarten de Rijke
Zhaochun Ren
21
5
0
17 Feb 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Lifeng Jin
Linfeng Song
Haitao Mi
Helen Meng
HILM
42
43
0
14 Feb 2024
Calibrating Long-form Generations from Large Language Models
Yukun Huang
Yixin Liu
Raghuveer Thirukovalluru
Arman Cohan
Bhuwan Dhingra
19
7
0
09 Feb 2024
GUMsley: Evaluating Entity Salience in Summarization for 12 English Genres
Jessica Lin
Amir Zeldes
33
4
0
31 Jan 2024
InfoLossQA: Characterizing and Recovering Information Loss in Text Simplification
Jan Trienes
Sebastian Antony Joseph
Jorg Schlotterer
Christin Seifert
Kyle Lo
Wei Xu
Byron C. Wallace
Junyi Jessy Li
44
6
0
29 Jan 2024
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu
N. Moosavi
Chenghua Lin
ELM
27
48
0
16 Nov 2023
P^3SUM: Preserving Author's Perspective in News Summarization with Diffusion Language Models
Yuhan Liu
Shangbin Feng
Xiaochuang Han
Vidhisha Balachandran
Chan Young Park
Sachin Kumar
Yulia Tsvetkov
DiffM
38
2
0
16 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq R. Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
46
57
0
15 Nov 2023
How Well Do Large Language Models Truly Ground?
Hyunji Lee
Se June Joo
Chaeeun Kim
Joel Jang
Doyoung Kim
Kyoung-Woon On
Minjoon Seo
HILM
30
6
0
15 Nov 2023
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
Yuxia Wang
Revanth Gangi Reddy
Zain Muhammad Mujahid
Arnav Arora
Aleksandr Rubashevskii
...
Nadav Borenstein
Aditya Pillai
Isabelle Augenstein
Iryna Gurevych
Preslav Nakov
HILM
34
33
0
15 Nov 2023
Fair Abstractive Summarization of Diverse Perspectives
Yusen Zhang
Nan Zhang
Yixin Liu
Alexander R. Fabbri
Junru Liu
...
Caiming Xiong
Jieyu Zhao
Dragomir R. Radev
Kathleen McKeown
Rui Zhang
28
8
0
14 Nov 2023
Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation
O. Amusat
Harshad B. Hegde
Christopher J. Mungall
Anna Giannakou
Neil Byers
Dan Gunter
Kjiersten Fagnan
Lavanya Ramakrishnan
16
2
0
08 Nov 2023
Evaluating Generative Ad Hoc Information Retrieval
Lukas Gienapp
Harrisen Scells
Niklas Deckers
Janek Bevendorff
Shuai Wang
...
Maik Frobe
Guide Zucoon
Benno Stein
Matthias Hagen
Martin Potthast
RALM
37
11
0
08 Nov 2023
FaMeSumm: Investigating and Improving Faithfulness of Medical Summarization
Nan Zhang
Yusen Zhang
Wu Guo
P. Mitra
Rui Zhang
HILM
35
4
0
03 Nov 2023
Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks
Yifan Wang
Qingyan Guo
Xinzhe Ni
Chufan Shi
Lemao Liu
Haiyun Jiang
Yujiu Yang
ReLM
RALM
25
8
0
03 Nov 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
Juri Opitz
Daniel Deutsch
Yang Gao
Rotem Dror
Steffen Eger
ALM
LRM
ELM
37
31
0
30 Oct 2023
TarGEN: Targeted Data Generation with Large Language Models
Himanshu Gupta
Kevin Scaria
Ujjwala Anantheswaran
Shreyas Verma
Mihir Parmar
Saurabh Arjun Sawant
Chitta Baral
Swaroop Mishra
SyDa
30
4
0
27 Oct 2023
On Context Utilization in Summarization with Large Language Models
Mathieu Ravaut
Aixin Sun
Nancy F. Chen
Shafiq R. Joty
31
13
0
16 Oct 2023
Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization
Ondrej Skopek
Rahul Aralikatte
Sian Gooding
Victor Carbune
ELM
39
18
0
12 Oct 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Cunxiang Wang
Xiaoze Liu
Yuanhao Yue
Xiangru Tang
Tianhang Zhang
...
Linyi Yang
Jindong Wang
Xing Xie
Zheng-Wei Zhang
Yue Zhang
HILM
KELM
51
184
0
11 Oct 2023
Hierarchical Evaluation Framework: Best Practices for Human Evaluation
I. Bojić
Jessica Chen
Si Yuan Chang
Qi Chwen Ong
Shafiq R. Joty
Josip Car
35
5
0
03 Oct 2023
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval
Qi Yan
Raihan Seraj
Jiawei He
Li Meng
Tristan Sylvain
14
9
0
03 Oct 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang
Kyle Lo
Tanya Goyal
Mohit Iyyer
ALM
21
106
0
01 Oct 2023
Human Feedback is not Gold Standard
Tom Hosking
Phil Blunsom
Max Bartolo
ALM
24
49
0
28 Sep 2023
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
Kung-Hsiang Huang
Philippe Laban
Alexander R. Fabbri
Prafulla Kumar Choubey
Shafiq R. Joty
Caiming Xiong
Chien-Sheng Wu
16
25
0
17 Sep 2023
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Griffin Adams
Alexander R. Fabbri
Faisal Ladhak
Eric Lehman
Noémie Elhadad
29
52
0
08 Sep 2023
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization
Mousumi Akter
Shubhra (Santu) Karmaker
21
1
0
04 Aug 2023
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
Ethan Chern
Steffi Chern
Shiqi Chen
Weizhe Yuan
Kehua Feng
Chunting Zhou
Junxian He
Graham Neubig
Pengfei Liu
HILM
19
190
0
25 Jul 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
Chen An
Shansan Gong
Ming Zhong
Xingjian Zhao
Mukai Li
Jun Zhang
Lingpeng Kong
Xipeng Qiu
ELM
ALM
40
132
0
20 Jul 2023
Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation
Yulong Chen
Huajian Zhang
Yijie Zhou
Xuefeng Bai
Yueguan Wang
...
Jianhao Yan
Yafu Li
Judy Li
Xianchao Zhu
Yue Zhang
37
6
0
08 Jul 2023
GUMSum: Multi-Genre Data and Evaluation for English Abstractive Summarization
Yang Liu
Amir Zeldes
ELM
21
2
0
20 Jun 2023
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu
Yixiao Song
Mohit Iyyer
Eunsol Choi
ELM
37
96
0
29 May 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq R. Joty
J. Huang
LM&MA
ELM
ALM
41
179
0
29 May 2023
Generating EDU Extracts for Plan-Guided Summary Re-Ranking
Griffin Adams
Alexander R. Fabbri
Faisal Ladhak
Kathleen McKeown
Noémie Elhadad
18
10
0
28 May 2023
Evaluating Evaluation Metrics: A Framework for Analyzing NLG Evaluation Metrics using Measurement Theory
Ziang Xiao
Susu Zhang
Vivian Lai
Q. V. Liao
ELM
30
24
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq R. Joty
LRM
27
100
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
21
11
0
24 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
56
601
0
23 May 2023
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
28
70
0
23 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
107
20
0
23 May 2023
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method
Yiming Wang
Zhuosheng Zhang
Rui Wang
33
78
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
24
38
0
22 May 2023
Complex Claim Verification with Evidence Retrieved in the Wild
Jifan Chen
Grace Kim
Aniruddh Sriram
Greg Durrett
Eunsol Choi
HILM
22
68
0
19 May 2023
FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge
Shangbin Feng
Vidhisha Balachandran
Yuyang Bai
Yulia Tsvetkov
KELM
HILM
21
51
0
14 May 2023
The Current State of Summarization
Fabian Retkowski
23
6
0
08 May 2023
Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation
Yixin Liu
Alexander R. Fabbri
Yilun Zhao
Pengfei Liu
Shafiq R. Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
15
27
0
07 Mar 2023
Previous
1
2
3
Next