ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.09675
  4. Cited By
BERTScore: Evaluating Text Generation with BERT
v1v2v3 (latest)

BERTScore: Evaluating Text Generation with BERT

21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
ArXiv (abs)PDFHTML

Papers citing "BERTScore: Evaluating Text Generation with BERT"

50 / 3,522 papers shown
Title
A Question Answering Framework for Decontextualizing User-facing
  Snippets from Scientific Documents
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
Benjamin Newman
Luca Soldaini
Raymond Fok
Arman Cohan
Kyle Lo
RALM
55
18
0
24 May 2023
Psychological Metrics for Dialog System Evaluation
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
94
2
0
24 May 2023
Don't Take This Out of Context! On the Need for Contextual Models and
  Evaluations for Stylistic Rewriting
Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting
Akhila Yerukola
Xuhui Zhou
Elizabeth Clark
Maarten Sap
71
7
0
24 May 2023
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Gender Biases in Automatic Evaluation Metrics for Image Captioning
Haoyi Qiu
Zi-Yi Dou
Tianlu Wang
Asli Celikyilmaz
Nanyun Peng
EGVM
119
16
0
24 May 2023
DecipherPref: Analyzing Influential Factors in Human Preference
  Judgments via GPT-4
DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4
Ye Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
H. Foroosh
Fei Liu
99
13
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated
  Response
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALMELM
93
1
0
24 May 2023
Scientific Opinion Summarization: Paper Meta-review Generation Dataset,
  Methods, and Evaluation
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation
Qi Zeng
Mankeerat Sidhu
Ansel Blume
Hou Pong Chan
Lu Wang
Heng Ji
91
11
0
24 May 2023
COMET-M: Reasoning about Multiple Events in Complex Sentences
COMET-M: Reasoning about Multiple Events in Complex Sentences
Sahithya Ravi
R. Ng
Vered Shwartz
LRMReLM
72
3
0
24 May 2023
OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
Li Zhang
Hainiu Xu
Abhinav Kommula
Chris Callison-Burch
Niket Tandon
65
7
0
24 May 2023
Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented
  Dialogues and Annotations
Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations
Tiziano Labruna
Sofia Brenna
Andrea Zaninello
Bernardo Magnini
50
15
0
23 May 2023
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties
  Grounded in Math Reasoning Problems
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
Jakub Macina
Nico Daheim
Sankalan Pal Chowdhury
Tanmay Sinha
Manu Kapur
Iryna Gurevych
Mrinmaya Sachan
LRM
129
68
0
23 May 2023
How to Choose How to Choose Your Chatbot: A Massively Multi-System
  MultiReference Data Set for Dialog Metric Evaluation
How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation
Huda Khayrallah
Zuhaib Akhtar
Edward Cohen
João Sedoc
61
2
0
23 May 2023
Sociocultural Norm Similarities and Differences via Situational
  Alignment and Explainable Textual Entailment
Sociocultural Norm Similarities and Differences via Situational Alignment and Explainable Textual Entailment
Sky CH-Wang
Arkadiy Saakyan
Aochong Li
Zhou Yu
Smaranda Muresan
110
17
0
23 May 2023
Language Model Self-improvement by Reinforcement Learning Contemplation
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang
Pengyuan Wang
Kaiyuan Li
Xiong-Hui Chen
Jiacheng Xu
Zongzhang Zhang
Yang Yu
LRMKELM
64
52
0
23 May 2023
Advancing Precise Outline-Conditioned Text Generation with Task Duality
  and Explicit Outline Control
Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control
Yunzhe Li
Qian Chen
Weixiang Yan
Wen Wang
Qinglin Zhang
Hari Sundaram
77
3
0
23 May 2023
Dancing Between Success and Failure: Edit-level Simplification
  Evaluation using SALSA
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
David Heineman
Yao Dou
Mounica Maddela
Wei Xu
102
17
0
23 May 2023
Schema-Driven Information Extraction from Heterogeneous Tables
Schema-Driven Information Extraction from Heterogeneous Tables
Fan Bai
Junmo Kang
Gabriel Stanovsky
Dayne Freitag
Alan Ritter
LMTD
89
14
0
23 May 2023
QTSumm: Query-Focused Summarization over Tabular Data
QTSumm: Query-Focused Summarization over Tabular Data
Yilun Zhao
Zhenting Qi
Linyong Nan
Boyu Mi
Yixin Liu
...
Ruizhe Chen
Xiangru Tang
Yumo Xu
Dragomir R. Radev
Arman Cohan
RALMLMTD
90
1
0
23 May 2023
Evaluation of African American Language Bias in Natural Language
  Generation
Evaluation of African American Language Bias in Natural Language Generation
Nicholas Deas
Jessica A. Grieser
Shana Kleiner
D. Patton
Elsbeth Turcan
Kathleen McKeown
65
31
0
23 May 2023
INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained
  Feedback
INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback
Wenda Xu
Danqing Wang
Liangming Pan
Zhenqiao Song
Markus Freitag
Wenjie Wang
Lei Li
ALMELM
93
19
0
23 May 2023
SciMON: Scientific Inspiration Machines Optimized for Novelty
SciMON: Scientific Inspiration Machines Optimized for Novelty
Qingyun Wang
Doug Downey
Heng Ji
Tom Hope
LLMAG
164
81
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long
  Form Text Generation
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILMALM
259
705
0
23 May 2023
Modeling Empathic Similarity in Personal Narratives
Modeling Empathic Similarity in Personal Narratives
Jocelyn Shen
Maarten Sap
Pedro Colon-Hernandez
Hae Won Park
C. Breazeal
91
15
0
23 May 2023
On Learning to Summarize with Large Language Models as References
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
119
82
0
23 May 2023
Towards Graph-hop Retrieval and Reasoning in Complex Question Answering
  over Textual Database
Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database
Minjun Zhu
Yixuan Weng
Shizhu He
Kang Liu
Jun Zhao
RALMLRM
99
1
0
23 May 2023
HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors
  of Language Models in Human-Machine Conversations
HumBEL: A Human-in-the-Loop Approach for Evaluating Demographic Factors of Language Models in Human-Machine Conversations
Anthony Sicilia
Jennifer C. Gates
Malihe Alikhani
57
8
0
23 May 2023
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction
  Dataset for Evaluating Video Chain-of-Thought
Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought
Vaishnavi Himakunthala
Andy Ouyang
Daniel Philip Rose
Ryan He
Alex Mei
Yujie Lu
Chinmay Sonar
Michael Stephen Saxon
William Y. Wang
MLLMLRM
86
2
0
23 May 2023
NarrativeXL: A Large-scale Dataset For Long-Term Memory Models
NarrativeXL: A Large-scale Dataset For Long-Term Memory Models
A. Moskvichev
Ky-Vinh Mai
RALM
62
1
0
23 May 2023
Reducing Sensitivity on Speaker Names for Text Generation from Dialogues
Reducing Sensitivity on Speaker Names for Text Generation from Dialogues
Qi Jia
Haifeng Tang
Kenny Q. Zhu
60
2
0
23 May 2023
Asking Clarification Questions to Handle Ambiguity in Open-Domain QA
Asking Clarification Questions to Handle Ambiguity in Open-Domain QA
Dongryeol Lee
Segwang Kim
Minwoo Lee
Hwanhee Lee
Joonsuk Park
Sang-Woo Lee
Kyomin Jung
UQLM
93
14
0
23 May 2023
Counterspeeches up my sleeve! Intent Distribution Learning and
  Persistent Fusion for Intent-Conditioned Counterspeech Generation
Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generation
Rishabh Gupta
Shaily Desai
Manvi Goel
Anil Bandhakavi
Tanmoy Chakraborty
Md. Shad Akhtar
67
23
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain
  Conversations with Large Language Models
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung Chen
85
94
0
23 May 2023
MemeCap: A Dataset for Captioning and Interpreting Memes
MemeCap: A Dataset for Captioning and Interpreting Memes
EunJeong Hwang
Vered Shwartz
VLM
84
38
0
23 May 2023
Automated Metrics for Medical Multi-Document Summarization Disagree with
  Human Evaluations
Automated Metrics for Medical Multi-Document Summarization Disagree with Human Evaluations
Lucy Lu Wang
Yulia Otmakhova
Jay DeYoung
Thinh Hung Truong
Bailey Kuehl
Erin Bransom
Byron C. Wallace
169
22
0
23 May 2023
Prompting and Evaluating Large Language Models for Proactive Dialogues:
  Clarification, Target-guided, and Non-collaboration
Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
Yang Deng
Lizi Liao
Liang Chen
Hongru Wang
Wenqiang Lei
Tat-Seng Chua
141
88
0
23 May 2023
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
APPLS: Evaluating Evaluation Metrics for Plain Language Summarization
Yue Guo
Tal August
Gondy Leroy
T. Cohen
Lucy Lu Wang
182
9
0
23 May 2023
CEO: Corpus-based Open-Domain Event Ontology Induction
CEO: Corpus-based Open-Domain Event Ontology Induction
Nan Xu
Hongming Zhang
Jianshu Chen
136
2
0
22 May 2023
Neural Machine Translation for Code Generation
Neural Machine Translation for Code Generation
K. Dharma
Clayton T. Morrison
122
4
0
22 May 2023
Element-aware Summarization with Large Language Models: Expert-aligned
  Evaluation and Chain-of-Thought Method
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method
Yiming Wang
Zhuosheng Zhang
Rui Wang
117
88
0
22 May 2023
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan
Dennis Aumiller
Michael Gertz
HILM
123
4
0
22 May 2023
Towards Unsupervised Recognition of Token-level Semantic Differences in
  Related Documents
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents
Jannis Vamvas
Rico Sennrich
59
2
0
22 May 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
171
379
0
22 May 2023
SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly
  Generating Predictions and Natural Language Explanations
SPARSEFIT: Few-shot Prompting with Sparse Fine-tuning for Jointly Generating Predictions and Natural Language Explanations
Jesus Solano
Oana-Maria Camburu
Pasquale Minervini
66
1
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
  Evaluation
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
95
41
0
22 May 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive
  Summarization
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELMALM
107
72
0
22 May 2023
MaNtLE: Model-agnostic Natural Language Explainer
MaNtLE: Model-agnostic Natural Language Explainer
Rakesh R Menon
Kerem Zaman
Shashank Srivastava
FAttLRM
85
2
0
22 May 2023
GEST: the Graph of Events in Space and Time as a Common Representation
  between Vision and Language
GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language
Mihai Masala
Nicolae Cudlenco
Traian Rebedea
Marius Leordeanu
75
0
0
22 May 2023
Enhancing Coherence of Extractive Summarization with Multitask Learning
Enhancing Coherence of Extractive Summarization with Multitask Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
56
1
0
22 May 2023
D$^2$TV: Dual Knowledge Distillation and Target-oriented Vision Modeling
  for Many-to-Many Multimodal Summarization
D2^22TV: Dual Knowledge Distillation and Target-oriented Vision Modeling for Many-to-Many Multimodal Summarization
Yunlong Liang
Fandong Meng
Jiaan Wang
Jinan Xu
Jinan Xu
Jie Zhou
VLM
72
11
0
22 May 2023
Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods
  by Language Models
Kanbun-LM: Reading and Translating Classical Chinese in Japanese Methods by Language Models
Hao Wang
Hirofumi Shimizu
Daisuke Kawahara
77
1
0
22 May 2023
Previous
123...454647...697071
Next