Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13693
Cited By
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
23 May 2023
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations"
44 / 44 papers shown
Title
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
110
37
0
01 Mar 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Preslav Nakov
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
152
4
0
22 Feb 2024
LitSumm: Large language models for literature summarisation of non-coding RNAs
Andrew Green
C. Ribas
Nancy Ontiveros-Palacios
Sam Griffiths-Jones
Anton I. Petrov
Alex Bateman
Blake Sweeney
67
4
0
06 Nov 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
84
9
0
23 May 2023
Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)
Chantal Shaib
Millicent Li
Sebastian Antony Joseph
Iain J. Marshall
Junyi Jessy Li
Byron C. Wallace
LM&MA
ELM
53
67
0
10 May 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Shafiq Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
59
133
0
15 Dec 2022
SciFact-Open: Towards open-domain scientific claim verification
David Wadden
Kyle Lo
Bailey Kuehl
Arman Cohan
Iz Beltagy
Lucy Lu Wang
Hannaneh Hajishirzi
LRM
64
61
0
25 Oct 2022
How "Multi" is Multi-Document Summarization?
Ruben Wolhandler
Arie Cattan
Ori Ernst
Ido Dagan
99
12
0
23 Oct 2022
Self-Repetition in Abstractive Neural Summarizers
Nikita Salkar
T. Trikalinos
Byron C. Wallace
A. Nenkova
43
12
0
14 Oct 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
88
401
0
26 Sep 2022
LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation
Yulia Otmakhova
Thinh Hung Truong
Timothy Baldwin
Trevor Cohn
Karin Verspoor
Jey Han Lau
100
6
0
19 Sep 2022
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
Zejiang Shen
Kyle Lo
L. Yu
N. Dahlberg
Margo Schlanger
Doug Downey
ELM
AILaw
59
47
0
22 Jun 2022
Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics
Daniel Deutsch
Rotem Dror
Dan Roth
35
45
0
21 Apr 2022
LinkBERT: Pretraining Language Models with Document Links
Michihiro Yasunaga
J. Leskovec
Percy Liang
KELM
82
359
0
29 Mar 2022
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
Wen Xiao
Iz Beltagy
Giuseppe Carenini
Arman Cohan
CVBM
107
117
0
16 Oct 2021
BARTScore: Evaluating Generated Text as Text Generation
Weizhe Yuan
Graham Neubig
Pengfei Liu
95
841
0
22 Jun 2021
MS2: Multi-Document Summarization of Medical Studies
Jay DeYoung
Iz Beltagy
Madeleine van Zuylen
Bailey Kuehl
Lucy Lu Wang
61
112
0
13 Apr 2021
Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles
Yao Lu
Yue Dong
Laurent Charlin
AILaw
60
120
0
27 Oct 2020
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
Daniel Deutsch
Tania Bedrax-Weiss
Dan Roth
56
112
0
01 Oct 2020
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
Byron C. Wallace
Sayantani Saha
Frank Soboczenski
Iain J. Marshall
HILM
48
78
0
25 Aug 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
502
2,074
0
28 Jul 2020
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
90
710
0
24 Jul 2020
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
Daniel Deutsch
Dan Roth
71
25
0
10 Jul 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
680
41,736
0
28 May 2020
A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal
D. Ghalandari
Chris Hokamp
N. Pham
John Glover
Georgiana Ifrim
40
109
0
20 May 2020
Evidence Inference 2.0: More Data, Better Models
Jay DeYoung
Eric P. Lehman
Benjamin E. Nye
Iain J. Marshall
Byron C. Wallace
89
68
0
08 May 2020
FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization
Esin Durmus
He He
Mona T. Diab
HILM
83
392
0
07 May 2020
SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization
Yang Gao
Wei Zhao
Steffen Eger
ELM
68
125
0
07 May 2020
Fact or Fiction: Verifying Scientific Claims
David Wadden
Shanchuan Lin
Kyle Lo
Lucy Lu Wang
Madeleine van Zuylen
Arman Cohan
Hannaneh Hajishirzi
HAI
113
450
0
30 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
131
4,048
0
10 Apr 2020
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
81
1,489
0
09 Apr 2020
Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
Alex Jinpeng Wang
Kyunghyun Cho
M. Lewis
HILM
77
480
0
08 Apr 2020
Sparse Text Generation
Pedro Henrique Martins
Zita Marinho
André F. T. Martins
MoE
39
39
0
06 Apr 2020
Fill in the BLANC: Human-free quality estimation of document summaries
Oleg V. Vasilyev
Vedant Dharnidharka
John Bohannon
3DH
75
117
0
23 Feb 2020
Optimizing the Factual Correctness of a Summary: A Study of Summarizing Radiology Reports
Yuhao Zhang
Derek Merck
E. Tsai
Christopher D. Manning
C. Langlotz
MedIm
HILM
54
185
0
06 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
219
10,792
0
29 Oct 2019
Evaluating the Factual Consistency of Abstractive Text Summarization
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
HILM
101
742
0
28 Oct 2019
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao
Maxime Peyrard
Fei Liu
Yang Gao
Christian M. Meyer
Steffen Eger
159
595
0
05 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.1K
12,129
0
27 Aug 2019
Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
Alexander R. Fabbri
Irene Li
Tianwei She
Suyi Li
Dragomir R. Radev
74
583
0
04 Jun 2019
Hierarchical Transformers for Multi-Document Summarization
Yang Liu
Mirella Lapata
108
297
0
30 May 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
283
5,764
0
21 Apr 2019
Inferring Which Medical Treatments Work from Reports of Clinical Trials
Eric P. Lehman
Jay DeYoung
Regina Barzilay
Byron C. Wallace
77
116
0
02 Apr 2019
A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature
Benjamin E. Nye
Junyi Jessy Li
Roma Patel
Yinfei Yang
Iain J. Marshall
A. Nenkova
Byron C. Wallace
49
220
0
11 Jun 2018
1