Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
v1
v2
v3 (latest)
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 3,522 papers shown
Title
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
Benjamin Newman
Luca Soldaini
Raymond Fok
Arman Cohan
Kyle Lo
RALM
55
18
0
24 May 2023
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
94
2
0
24 May 2023
Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting
Akhila Yerukola
Xuhui Zhou
Elizabeth Clark
Maarten Sap
71
7
0
24 May 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Haoyi Qiu
Zi-Yi Dou
Tianlu Wang
Asli Celikyilmaz
Nanyun Peng
EGVM
119
16
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
99
13
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALM
ELM
93
1
0
24 May 2023
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation
Qi Zeng
Mankeerat Sidhu
Ansel Blume
Hou Pong Chan
Lu Wang
Heng Ji
91
11
0
24 May 2023
COMET-M: Reasoning about Multiple Events in Complex Sentences
Sahithya Ravi
R. Ng
Vered Shwartz
LRM
ReLM
72
3
0
24 May 2023
OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
Li Zhang
Hainiu Xu
Abhinav Kommula
Chris Callison-Burch
Niket Tandon
65
7
0
24 May 2023
Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
Tiziano Labruna
Sofia Brenna
Andrea Zaninello
Bernardo Magnini
50
15
0
23 May 2023
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
Jakub Macina
Nico Daheim
Sankalan Pal Chowdhury
Tanmay Sinha
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
LRM
129
68
0
23 May 2023
How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation
Huda Khayrallah
Zuhaib Akhtar
Edward Cohen
João Sedoc
61
2
0
23 May 2023
Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Sky CH-Wang
Arkadiy Saakyan
Aochong Li
Zhou Yu
Smaranda Muresan
110
17
0
23 May 2023
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang
Pengyuan Wang
Kaiyuan Li
Xiong-Hui Chen
Jiacheng Xu
Zongzhang Zhang
Yang Yu
LRM
KELM
64
52
0
23 May 2023
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control
Yunzhe Li
Qian Chen
Weixiang Yan
Wen Wang
Qinglin Zhang
Hari Sundaram
77
3
0
23 May 2023
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
David Heineman
Yao Dou
Mounica Maddela
Wei Xu
102
17
0
23 May 2023
Schema-Driven Information Extraction from Heterogeneous Tables
Fan Bai
Junmo Kang
Gabriel Stanovsky
Dayne Freitag
Alan Ritter
LMTD
89
14
0
23 May 2023
QTSumm: Query-Focused Summarization over Tabular Data
Yilun Zhao
Zhenting Qi
Linyong Nan
Boyu Mi
Yixin Liu
...
Ruizhe Chen
Xiangru Tang
Yumo Xu
Dragomir R. Radev
Arman Cohan
RALM
LMTD
90
1
0
23 May 2023
Evaluation of African American Language Bias in Natural Language Generation
Nicholas Deas
Jessica A. Grieser
Shana Kleiner
D. Patton
Elsbeth Turcan
Kathleen McKeown
65
31
0
23 May 2023
INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback
Wenda Xu
Danqing Wang
Liangming Pan
Zhenqiao Song
Markus Freitag
Wenjie Wang
Lei Li
ALM
ELM
93
19
0
23 May 2023
SciMON: Scientific Inspiration Machines Optimized for Novelty
Qingyun Wang
Doug Downey
Heng Ji
Tom Hope
LLMAG
164
81
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
259
705
0
23 May 2023
Modeling Empathic Similarity in Personal Narratives
Jocelyn Shen
Maarten Sap
Pedro Colon-Hernandez
Hae Won Park
C. Breazeal
91
15
0
23 May 2023
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
119
82
0
23 May 2023
Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database
Minjun Zhu
Yixuan Weng
Shizhu He
Kang Liu
Jun Zhao
RALM
LRM
99
1
0
23 May 2023
HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors of Language Models in Human-Machine Conversations
Anthony Sicilia
Jennifer C. Gates
Malihe Alikhani
57
8
0
23 May 2023
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
Vaishnavi Himakunthala
Andy Ouyang
Daniel Philip Rose
Ryan He
Alex Mei
Yujie Lu
Chinmay Sonar
Michael Stephen Saxon
William Y. Wang
MLLM
LRM
86
2
0
23 May 2023
NarrativeXL: A Large-scale Dataset For Long-Term Memory Models
A. Moskvichev
Ky-Vinh Mai
RALM
62
1
0
23 May 2023
Reducing Sensitivity on Speaker Names for Text Generation from Dialogues
Qi Jia
Haifeng Tang
Kenny Q. Zhu
60
2
0
23 May 2023
Asking Clarification Questions to Handle Ambiguity in Open-Domain QA
Dongryeol Lee
Segwang Kim
Minwoo Lee
Hwanhee Lee
Joonsuk Park
Sang-Woo Lee
Kyomin Jung
UQLM
93
14
0
23 May 2023
Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generation
Rishabh Gupta
Shaily Desai
Manvi Goel
Anil Bandhakavi
Tanmoy Chakraborty
Md. Shad Akhtar
67
23
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung Chen
85
94
0
23 May 2023
MemeCap: A Dataset for Captioning and Interpreting Memes
EunJeong Hwang
Vered Shwartz
VLM
84
38
0
23 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
169
22
0
23 May 2023
Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
Yang Deng
Lizi Liao
Liang Chen
Hongru Wang
Wenqiang Lei
Tat-Seng Chua
141
88
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
182
9
0
23 May 2023
CEO: Corpus-based Open-Domain Event Ontology Induction
Nan Xu
Hongming Zhang
Jianshu Chen
136
2
0
22 May 2023
Neural Machine Translation for Code Generation
K. Dharma
Clayton T. Morrison
122
4
0
22 May 2023
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method
Yiming Wang
Zhuosheng Zhang
Rui Wang
117
88
0
22 May 2023
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan
Dennis Aumiller
Michael Gertz
HILM
123
4
0
22 May 2023
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents
Jannis Vamvas
Rico Sennrich
59
2
0
22 May 2023
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
171
379
0
22 May 2023
SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations
Jesus Solano
Oana-Maria Camburu
Pasquale Minervini
66
1
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
95
41
0
22 May 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELM
ALM
107
72
0
22 May 2023
MaNtLE: Model-agnostic Natural Language Explainer
Rakesh R Menon
Kerem Zaman
Shashank Srivastava
FAtt
LRM
85
2
0
22 May 2023
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
75
0
0
22 May 2023
Enhancing Coherence of Extractive Summarization with Multitask Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
56
1
0
22 May 2023
D
2
^2
2
TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization
Yunlong Liang
Fandong Meng
Jiaan Wang
Jinan Xu
Jinan Xu
Jie Zhou
VLM
72
11
0
22 May 2023
Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models
Hao Wang
Hirofumi Shimizu
Daisuke Kawahara
77
1
0
22 May 2023
Previous
1
2
3
...
45
46
47
...
69
70
71
Next