ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14341
  4. Cited By
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization

APPLS: Evaluating Evaluation Metrics for Plain Language Summarization

23 May 2023
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
ArXivPDFHTML

Papers citing "APPLS: Evaluating Evaluation Metrics for Plain Language Summarization"

42 / 42 papers shown
Title
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation
Yue Guo
Jae Ho Sohn
Gondy Leroy
Trevor Cohen
ELM
65
0
0
15 May 2025
Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences
Explainable AI for Clinical Outcome Prediction: A Survey of Clinician Perceptions and Preferences
Jun Hou
Lucy Lu Wang
91
0
0
27 Feb 2025
Generating Summaries with Controllable Readability Levels
Generating Summaries with Controllable Readability Levels
Leonardo F. R. Ribeiro
Mohit Bansal
Markus Dreyer
119
19
0
16 Oct 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
299
11,894
0
18 Jul 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with
  Human Evaluations
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
149
22
0
23 May 2023
Human-like Summarization Evaluation with ChatGPT
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
55
132
0
05 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form
  Summarization
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
61
95
0
30 Jan 2023
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
100
44
0
20 Dec 2022
LENS: A Learnable Evaluation Metric for Text Simplification
LENS: A Learnable Evaluation Metric for Text Simplification
Mounica Maddela
Yao Dou
David Heineman
Wei Xu
49
65
0
19 Dec 2022
A Survey on Medical Document Summarization
A Survey on Medical Document Summarization
Raghav Jain
Anubhav Jangra
S. Saha
Adam Jatowt
3DGS
MedIm
74
19
0
03 Dec 2022
Retrieval augmentation of large language models for lay language
  generation
Retrieval augmentation of large language models for lay language generation
Yue Guo
Wei Qiu
Gondy Leroy
Sheng Wang
T. Cohen
RALM
LRM
69
45
0
07 Nov 2022
A Dataset for Plain Language Adaptation of Biomedical Abstracts
A Dataset for Plain Language Adaptation of Biomedical Abstracts
Kush Attal
Brian D. Ondov
Dina Demner-Fushman
60
24
0
21 Oct 2022
News Summarization and Evaluation in the Era of GPT-3
News Summarization and Evaluation in the Era of GPT-3
Tanya Goyal
Junyi Jessy Li
Greg Durrett
ELM
107
406
0
26 Sep 2022
Principled Paraphrase Generation with Parallel Corpora
Principled Paraphrase Generation with Parallel Corpora
Aitor Ormazabal
Mikel Artetxe
Aitor Soroa Etxabe
Gorka Labaka
Eneko Agirre
64
9
0
24 May 2022
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Generating Scientific Claims for Zero-Shot Scientific Fact Checking
Dustin Wright
David Wadden
Kyle Lo
Bailey Kuehl
Arman Cohan
Isabelle Augenstein
Lucy Lu Wang
MedIm
100
57
0
24 Mar 2022
Improving Meta-learning for Low-resource Text Classification and
  Generation via Memory Imitation
Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation
Ying Zhao
Zhiliang Tian
Huaxiu Yao
Yinhe Zheng
Dongkyu Lee
Yiping Song
Jian Sun
N. Zhang
42
20
0
22 Mar 2022
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Shankar Kanthara
Rixie Tiffany Ko Leong
Xiang Lin
Ahmed Masry
Megh Thakkar
Enamul Hoque
Shafiq Joty
78
147
0
12 Mar 2022
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
138
58
0
13 Sep 2021
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
DeLMO
106
410
0
30 Jun 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and
  Anti-Experts
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu
Maarten Sap
Ximing Lu
Swabha Swayamdipta
Chandra Bhagavatula
Noah A. Smith
Yejin Choi
MU
107
372
0
07 May 2021
Understanding Factuality in Abstractive Summarization with FRANK: A
  Benchmark for Factuality Metrics
Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics
Artidoro Pagnoni
Vidhisha Balachandran
Yulia Tsvetkov
HILM
273
310
0
27 Apr 2021
Paragraph-level Simplification of Medical Texts
Paragraph-level Simplification of Medical Texts
Ashwin Devaraj
Iain J. Marshall
Byron C. Wallace
Junjie Li
MedIm
53
92
0
12 Apr 2021
Automated Lay Language Summarization of Biomedical Scientific Reviews
Automated Lay Language Summarization of Biomedical Scientific Reviews
Yue Guo
Weijian Qiu
Yizhong Wang
T. Cohen
69
78
0
23 Dec 2020
FFCI: A Framework for Interpretable Automatic Evaluation of
  Summarization
FFCI: A Framework for Interpretable Automatic Evaluation of Summarization
Fajri Koto
Timothy Baldwin
Jey Han Lau
HILM
75
37
0
27 Nov 2020
GO FIGURE: A Meta Evaluation of Factuality in Summarization
GO FIGURE: A Meta Evaluation of Factuality in Summarization
Saadia Gabriel
Asli Celikyilmaz
Rahul Jha
Yejin Choi
Jianfeng Gao
HILM
264
96
0
24 Oct 2020
Elaborative Simplification: Content Addition and Explanation Generation
  in Text Simplification
Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification
Neha Srikanth
Junyi Jessy Li
56
44
0
20 Oct 2020
Towards Question-Answering as an Automatic Metric for Evaluating the
  Content Quality of a Summary
Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
Daniel Deutsch
Tania Bedrax-Weiss
Dan Roth
65
112
0
01 Oct 2020
A Survey of Evaluation Metrics Used for NLG Systems
A Survey of Evaluation Metrics Used for NLG Systems
Ananya B. Sai
Akash Kumar Mohankumar
Mitesh M. Khapra
ELM
87
236
0
27 Aug 2020
Generating (Factual?) Narrative Summaries of RCTs: Experiments with
  Neural Multi-Document Summarization
Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization
Byron C. Wallace
Sayantani Saha
Frank Soboczenski
Iain J. Marshall
HILM
48
78
0
25 Aug 2020
SummEval: Re-evaluating Summarization Evaluation
SummEval: Re-evaluating Summarization Evaluation
Alexander R. Fabbri
Wojciech Kry'sciñski
Bryan McCann
Caiming Xiong
R. Socher
Dragomir R. Radev
HILM
97
713
0
24 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
795
42,055
0
28 May 2020
Expertise Style Transfer: A New Task Towards Better Communication
  between Experts and Laymen
Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen
Yixin Cao
Ruihao Shui
Liangming Pan
Min-Yen Kan
Zhiyuan Liu
Tat-Seng Chua
58
76
0
02 May 2020
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
VLM
AI4CE
CLL
152
2,428
0
23 Apr 2020
Assessing the Benchmarking Capacity of Machine Reading Comprehension
  Datasets
Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets
Saku Sugawara
Pontus Stenetorp
Kentaro Inui
Akiko Aizawa
42
86
0
21 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
439
20,181
0
23 Oct 2019
EASSE: Easier Automatic Sentence Simplification Evaluation
EASSE: Easier Automatic Sentence Simplification Evaluation
Fernando Alva-Manchego
Louis Martin
Carolina Scarton
Lucia Specia
58
130
0
13 Aug 2019
HighRES: Highlight-based Reference-less Evaluation of Summarization
HighRES: Highlight-based Reference-less Evaluation of Summarization
Hardy Hardy
Shashi Narayan
Andreas Vlachos
52
62
0
04 Jun 2019
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
324
5,814
0
21 Apr 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
131
1,239
0
04 Feb 2019
Domain Agnostic Real-Valued Specificity Prediction
Domain Agnostic Real-Valued Specificity Prediction
Wei-Jen Ko
Greg Durrett
Junyi Jessy Li
45
31
0
13 Nov 2018
BLEU is Not Suitable for the Evaluation of Text Simplification
BLEU is Not Suitable for the Evaluation of Text Simplification
Elior Sulem
Omri Abend
A. Rappoport
53
193
0
14 Oct 2018
1