Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14341
Cited By
v1
v2
v3
v4 (latest)
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
23 May 2023
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"APPLS: Evaluating Evaluation Metrics for Plain Language Summarization"
42 / 42 papers shown
Title
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation
Yue Guo
Jae Ho Sohn
Gondy Leroy
Trevor Cohen
ELM
65
0
0
15 May 2025
Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences
Jun Hou
Lucy Lu Wang
91
0
0
27 Feb 2025
Generating Summaries with Controllable Readability Levels
Leonardo F. R. Ribeiro
Mohit Bansal
Markus Dreyer
119
19
0
16 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
299
11,894
0
18 Jul 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
149
22
0
23 May 2023
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
55
132
0
05 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
61
95
0
30 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
103
44
0
20 Dec 2022
LENS: A Learnable Evaluation Metric for Text Simplification
Mounica Maddela
Yao Dou
David Heineman
Wei Xu
49
65
0
19 Dec 2022
A Survey on Medical Document Summarization
Raghav Jain
Anubhav Jangra
S. Saha
Adam Jatowt
3DGS
MedIm
74
19
0
03 Dec 2022
Retrieval augmentation of large language models for lay language generation
Yue Guo
Wei Qiu
Gondy Leroy
Sheng Wang
T. Cohen
RALM
LRM
69
45
0
07 Nov 2022
A Dataset for Plain Language Adaptation of Biomedical Abstracts
Kush Attal
Brian D. Ondov
Dina Demner-Fushman
60
24
0
21 Oct 2022
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
110
406
0
26 Sep 2022
Principled Paraphrase Generation with Parallel Corpora
Aitor Ormazabal
Mikel Artetxe
Aitor Soroa Etxabe
Gorka Labaka
Eneko Agirre
64
9
0
24 May 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Dustin Wright
David Wadden
Kyle Lo
Bailey Kuehl
Arman Cohan
Isabelle Augenstein
Lucy Lu Wang
MedIm
100
57
0
24 Mar 2022
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation
Ying Zhao
Zhiliang Tian
Huaxiu Yao
Yinhe Zheng
Dongkyu Lee
Yiping Song
Jian Sun
N. Zhang
42
20
0
22 Mar 2022
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Shankar Kanthara
Rixie Tiffany Ko Leong
Xiang Lin
Ahmed Masry
Megh Thakkar
Enamul Hoque
Shafiq Joty
78
147
0
12 Mar 2022
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
138
58
0
13 Sep 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
109
410
0
30 Jun 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu
Maarten Sap
Ximing Lu
Swabha Swayamdipta
Chandra Bhagavatula
Noah A. Smith
Yejin Choi
MU
107
372
0
07 May 2021
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
273
310
0
27 Apr 2021
Paragraph-level Simplification of Medical Texts
Ashwin Devaraj
Iain J. Marshall
Byron C. Wallace
Junjie Li
MedIm
53
92
0
12 Apr 2021
Automated Lay Language Summarization of Biomedical Scientific Reviews
Yue Guo
Weijian Qiu
Yizhong Wang
T. Cohen
69
78
0
23 Dec 2020
FFCI: A Framework for Interpretable Automatic Evaluation of Summarization
Fajri Koto
Timothy Baldwin
Jey Han Lau
HILM
75
37
0
27 Nov 2020
GO FIGURE: A Meta Evaluation of Factuality in Summarization
Saadia Gabriel
Asli Celikyilmaz
Rahul Jha
Yejin Choi
Jianfeng Gao
HILM
269
96
0
24 Oct 2020
Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification
Neha Srikanth
Junyi Jessy Li
56
44
0
20 Oct 2020
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
Daniel Deutsch
Tania Bedrax-Weiss
Dan Roth
65
112
0
01 Oct 2020
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
87
236
0
27 Aug 2020
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
Byron C. Wallace
Sayantani Saha
Frank Soboczenski
Iain J. Marshall
HILM
48
78
0
25 Aug 2020
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
97
713
0
24 Jul 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
795
42,055
0
28 May 2020
Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen
Yixin Cao
Ruihao Shui
Liangming Pan
Min-Yen Kan
Zhiyuan Liu
Tat-Seng Chua
58
76
0
02 May 2020
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
VLM
AI4CE
CLL
152
2,428
0
23 Apr 2020
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
Saku Sugawara
Pontus Stenetorp
Kentaro Inui
Akiko Aizawa
45
86
0
21 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
439
20,181
0
23 Oct 2019
EASSE: Easier Automatic Sentence Simplification Evaluation
Fernando Alva-Manchego
Louis Martin
Carolina Scarton
Lucia Specia
58
130
0
13 Aug 2019
HighRES: Highlight-based Reference-less Evaluation of Summarization
Hardy Hardy
Shashi Narayan
Andreas Vlachos
52
62
0
04 Jun 2019
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
326
5,814
0
21 Apr 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
131
1,239
0
04 Feb 2019
Domain Agnostic Real-Valued Specificity Prediction
Wei-Jen Ko
Greg Durrett
Junyi Jessy Li
45
31
0
13 Nov 2018
BLEU is Not Suitable for the Evaluation of Text Simplification
Elior Sulem
Omri Abend
A. Rappoport
53
193
0
14 Oct 2018
1