Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.11520
Cited By
BARTScore: Evaluating Generated Text as Text Generation
22 June 2021
Weizhe Yuan
Graham Neubig
Pengfei Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BARTScore: Evaluating Generated Text as Text Generation"
50 / 535 papers shown
Title
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
28
15
0
17 Feb 2023
GPTScore: Evaluate as You Desire
Jinlan Fu
See-Kiong Ng
Zhengbao Jiang
Pengfei Liu
LM&MA
ALM
ELM
15
266
0
08 Feb 2023
Leveraging Summary Guidance on Medical Report Summarization
Yunqi Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
24
9
0
08 Feb 2023
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models
Amirkeivan Mohtashami
M. Verzetti
Paul Kishan Rubenstein
27
4
0
07 Feb 2023
Benchmarking Large Language Models for News Summarization
Tianyi Zhang
Faisal Ladhak
Esin Durmus
Percy Liang
Kathleen McKeown
Tatsunori B. Hashimoto
ELM
43
485
0
31 Jan 2023
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna
Erin Bransom
Bailey Kuehl
Mohit Iyyer
Pradeep Dasigi
Arman Cohan
Kyle Lo
22
90
0
30 Jan 2023
SWING: Balancing Coverage and Faithfulness for Dialogue Summarization
Kung-Hsiang Huang
Siffi Singh
Xiaofei Ma
Wei Xiao
Wei Xiao
Nicholas Dingwall
William Yang Wang
Kathleen McKeown
HILM
35
13
0
25 Jan 2023
The Next Chapter: A Study of Large Language Models in Storytelling
Zhuohan Xie
Trevor Cohn
Jey Han Lau
36
43
0
24 Jan 2023
Commentary Generation from Data Records of Multiplayer Strategy Esports Game
Zihan Wang
Naoki Yoshinaga
18
0
0
21 Dec 2022
Contrastive Error Attribution for Finetuned Language Models
Faisal Ladhak
Esin Durmus
Tatsunori Hashimoto
HILM
32
9
0
21 Dec 2022
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval
John Giorgi
Luca Soldaini
Bo Wang
Gary D. Bader
Kyle Lo
Lucy Lu Wang
Arman Cohan
13
17
0
20 Dec 2022
Transformers Go for the LOLs: Generating (Humourous) Titles from Scientific Abstracts End-to-End
Yanran Chen
Steffen Eger
26
16
0
20 Dec 2022
BMX: Boosting Natural Language Generation Metrics with Explainability
Christoph Leiter
Hoang-Quan Nguyen
Steffen Eger
ELM
24
0
0
20 Dec 2022
Geographic and Geopolitical Biases of Language Models
Fahim Faisal
Antonios Anastasopoulos
22
20
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
36
14
0
20 Dec 2022
WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning
Wenhao Wu
Wei Li
Xinyan Xiao
Jiachen Liu
Sujian Li
Yajuan Lv
HILM
28
4
0
20 Dec 2022
On the Blind Spots of Model-Based Evaluation Metrics for Text Generation
Tianxing He
Jingyu Zhang
Tianle Wang
Sachin Kumar
Kyunghyun Cho
James R. Glass
Yulia Tsvetkov
40
44
0
20 Dec 2022
BUMP: A Benchmark of Unfaithful Minimal Pairs for Meta-Evaluation of Faithfulness Metrics
Liang Ma
Shuyang Cao
IV RobertL.Logan
Di Lu
Shihao Ran
Kecheng Zhang
Joel R. Tetreault
A. Jaimes
17
6
0
20 Dec 2022
Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences
Asish Ghoshal
Arash Einolghozati
A. Arun
Haoran Li
L. Yu
Vera Gor
Yashar Mehdad
Scott Yih
Asli Celikyilmaz
HILM
29
1
0
19 Dec 2022
Explanation Regeneration via Information Bottleneck
Qintong Li
Zhiyong Wu
Lingpeng Kong
Wei Bi
30
3
0
19 Dec 2022
SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes
Wenda Xu
Xian Qian
Mingxuan Wang
Lei Li
William Yang Wang
23
10
0
19 Dec 2022
PromptBoosting: Black-Box Text Classification with Ten Forward Passes
Bairu Hou
J. O'Connor
Jacob Andreas
Shiyu Chang
Yang Zhang
VLM
19
44
0
19 Dec 2022
Rainproof: An Umbrella To Shield Text Generators From Out-Of-Distribution Data
Maxime Darrin
Pablo Piantanida
Pierre Colombo
OODD
53
13
0
18 Dec 2022
RISE: Leveraging Retrieval Techniques for Summarization Evaluation
David C. Uthus
Jianmo Ni
RALM
19
0
0
17 Dec 2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation
Qian Yang
Qian Chen
Wen Wang
Baotian Hu
Min Zhang
37
24
0
16 Dec 2022
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation
Swarnadeep Saha
Xinyan Velocity Yu
Joey Tianyi Zhou
Ramakanth Pasunuru
Asli Celikyilmaz
ReLM
LRM
25
10
0
16 Dec 2022
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation
Yixin Liu
Alexander R. Fabbri
Pengfei Liu
Yilun Zhao
Linyong Nan
...
Simeng Han
Chenyu You
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
ALM
24
133
0
15 Dec 2022
ROSCOE: A Suite of Metrics for Scoring Step-by-Step Reasoning
O. Yu. Golovneva
Moya Chen
Spencer Poff
Martin Corredor
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ReLM
LRM
34
139
0
15 Dec 2022
T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics
Yiwei Qin
Weizhe Yuan
Graham Neubig
Pengfei Liu
17
23
0
12 Dec 2022
MOPRD: A multidisciplinary open peer review dataset
Jialiang Lin
Jiaxin Song
Zhangping Zhou
Yidong Chen
X. Shi
31
12
0
09 Dec 2022
SpeechLMScore: Evaluating speech generation using speech language model
Soumi Maiti
Yifan Peng
Takaaki Saeki
Shinji Watanabe
ALM
26
30
0
08 Dec 2022
CoP: Factual Inconsistency Detection by Controlling the Preference
Shuaijie She
Xiang Geng
Shujian Huang
Jiajun Chen
27
4
0
03 Dec 2022
Credit Assignment for Trained Neural Networks Based on Koopman Operator Theory
Zhen Liang
Changyuan Zhao
Wanwei Liu
Bai Xue
Wenjing Yang
Zhengbin Pang
34
1
0
02 Dec 2022
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?
Joel Niklaus
Daniele Giofré
33
11
0
30 Nov 2022
Arguments to Key Points Mapping with Prompt-based Learning
Ahnaf Mozib Samin
Behrooz Nikandish
Jingyan Chen
AAML
19
2
0
28 Nov 2022
AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Weiyan Shi
Emily Dinan
Adithya Renduchintala
Daniel Fried
Athul Paul Jacob
Zhou Yu
M. Lewis
AAML
28
2
0
22 Nov 2022
HaRiM
+
^+
+
: Evaluating Summary Quality with Hallucination Risk
Seonil Son
Junsoo Park
J. Hwang
Junghwa Lee
Hyungjong Noh
Yeonsoo Lee
HILM
19
8
0
22 Nov 2022
Consecutive Question Generation via Dynamic Multitask Learning
Yun Li
Sujian Li
Xing Shi
LRM
24
2
0
16 Nov 2022
ED-FAITH: Evaluating Dialogue Summarization on Faithfulness
Sicong Huang
Asli Celikyilmaz
Haoran Li
HILM
36
4
0
15 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
51
101
0
15 Nov 2022
Follow the Wisdom of the Crowd: Effective Text Generation via Minimum Bayes Risk Decoding
Mirac Suzgun
Luke Melas-Kyriazi
Dan Jurafsky
30
43
0
14 Nov 2022
Large Language Models Are Human-Level Prompt Engineers
Yongchao Zhou
Andrei Ioan Muresanu
Ziwen Han
Keiran Paster
Silviu Pitis
Harris Chan
Jimmy Ba
ALM
LLMAG
21
834
0
03 Nov 2022
Revisiting Grammatical Error Correction Evaluation and Beyond
Peiyuan Gong
Xuebo Liu
Heyan Huang
Min Zhang
34
16
0
03 Nov 2022
RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question
Alireza Mohammadshahi
Thomas Scialom
Majid Yazdani
Pouya Yanki
Angela Fan
James Henderson
Marzieh Saeidi
31
20
0
02 Nov 2022
Questioning the Validity of Summarization Datasets and Improving Their Factual Consistency
Yanzhu Guo
Chloé Clavel
Moussa Kamal Eddine
Michalis Vazirgiannis
HILM
32
11
0
31 Oct 2022
How Far are We from Robust Long Abstractive Summarization?
Huan Yee Koh
Jiaxin Ju
He Zhang
Ming Liu
Shirui Pan
HILM
28
39
0
30 Oct 2022
He Said, She Said: Style Transfer for Shifting the Perspective of Dialogues
Amanda Bertsch
Graham Neubig
Matthew R. Gormley
48
5
0
27 Oct 2022
TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection
Piyush Behre
S.S. Tan
A. Shah
Harini Kesavamoorthy
Shuangyu Chang
Fei Zuo
C. Basoglu
Sayan D. Pathak
21
0
0
27 Oct 2022
Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator
Qinyuan Cheng
Linyang Li
Guofeng Quan
Feng Gao
Xiaofeng Mou
Xipeng Qiu
16
12
0
26 Oct 2022
DEMETR: Diagnosing Evaluation Metrics for Translation
Marzena Karpinska
N. Raj
Katherine Thai
Yixiao Song
Ankita Gupta
Mohit Iyyer
29
38
0
25 Oct 2022
Previous
1
2
3
...
10
11
8
9
Next