ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.09675
  4. Cited By
BERTScore: Evaluating Text Generation with BERT
v1v2v3 (latest)

BERTScore: Evaluating Text Generation with BERT

21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
ArXiv (abs)PDFHTML

Papers citing "BERTScore: Evaluating Text Generation with BERT"

50 / 3,520 papers shown
Title
How Much Annotation is Needed to Compare Summarization Models?
How Much Annotation is Needed to Compare Summarization Models?
Chantal Shaib
Joe Barrow
Alexa F. Siu
Byron C. Wallace
A. Nenkova
88
2
0
28 Feb 2024
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of
  Pre-trained Language Models with Proximal Policy Optimization
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
Shuo Yang
Gjergji Kasneci
ALM
57
3
0
28 Feb 2024
A Survey on Neural Question Generation: Methods, Applications, and
  Prospects
A Survey on Neural Question Generation: Methods, Applications, and Prospects
Shasha Guo
Lizi Liao
Cuiping Li
Tat-Seng Chua
82
2
0
28 Feb 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
89
30
0
28 Feb 2024
SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images
  via Vision-Language Model
SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model
Bin Cao
Jianhao Yuan
Yexin Liu
Jian Li
Shuyang Sun
Jing Liu
Bo Zhao
DiffM
108
9
0
28 Feb 2024
A Sentiment Consolidation Framework for Meta-Review Generation
A Sentiment Consolidation Framework for Meta-Review Generation
Miao Li
Jey Han Lau
Eduard Hovy
54
5
0
28 Feb 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual
  Context Extraction
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
59
2
0
28 Feb 2024
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
Hanjie Chen
Zhouxiang Fang
Yash Singla
Mark Dredze
ELMAI4MH
145
43
0
28 Feb 2024
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models
Xiujie Song
Mengyue Wu
Ke Zhu
Chunhao Zhang
Yanyi Chen
LRMELM
134
3
0
28 Feb 2024
Self-Refinement of Language Models from External Proxy Metrics Feedback
Self-Refinement of Language Models from External Proxy Metrics Feedback
Keshav Ramji
Young-Suk Lee
Ramón Fernandez Astudillo
M. Sultan
Tahira Naseem
Asim Munawar
Radu Florian
Salim Roukos
HILM
77
7
0
27 Feb 2024
Follow My Instruction and Spill the Beans: Scalable Data Extraction from
  Retrieval-Augmented Generation Systems
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
Zhenting Qi
Hanlin Zhang
Eric Xing
Sham Kakade
Hima Lakkaraju
SILM
123
25
0
27 Feb 2024
Evaluating Very Long-Term Conversational Memory of LLM Agents
Evaluating Very Long-Term Conversational Memory of LLM Agents
A. Maharana
Dong-Ho Lee
Sergey Tulyakov
Mohit Bansal
Francesco Barbieri
Yuwei Fang
LLMAG
86
81
0
27 Feb 2024
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist
  Autonomous Agents for Desktop and Web
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor
Y. Butala
M. Russak
Jing Yu Koh
Kiran Kamble
Waseem Alshikh
Ruslan Salakhutdinov
LLMAG
157
57
0
27 Feb 2024
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
Shuangrui Ding
Zihan Liu
Xiao-wen Dong
Pan Zhang
Rui Qian
Junhao Huang
Conghui He
Jiaqi Wang
Jiaqi Wang
128
23
0
27 Feb 2024
Towards Explainability and Fairness in Swiss Judgement Prediction:
  Benchmarking on a Multilingual Dataset
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset
Santosh T.Y.S.S
Nina Baumgartner
Matthias Sturmer
Matthias Grabmair
Joel Niklaus
ELMAILaw
108
8
0
26 Feb 2024
Long Dialog Summarization: An Analysis
Long Dialog Summarization: An Analysis
Ankan Mullick
Ayan Kumar Bhowmick
R. Raghav
Ravi Kokku
Prasenjit Dey
Pawan Goyal
Niloy Ganguly
44
1
0
26 Feb 2024
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Mikayel Samvelyan
Sharath Chandra Raparthy
Andrei Lupu
Eric Hambro
Aram H. Markosyan
...
Minqi Jiang
Jack Parker-Holder
Jakob Foerster
Tim Rocktaschel
Roberta Raileanu
SyDa
117
89
0
26 Feb 2024
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual
  Natural Language Generalization
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
Qiwei Peng
Yekun Chai
Xuhong Li
ELMLM&MA
99
45
0
26 Feb 2024
Unveiling Vulnerability of Self-Attention
Unveiling Vulnerability of Self-Attention
Khai Jiet Liong
Hongqiu Wu
Haizhen Zhao
69
0
0
26 Feb 2024
Prompt Perturbation Consistency Learning for Robust Language Models
Prompt Perturbation Consistency Learning for Robust Language Models
Yao Qiang
Subhrangshu Nandi
Ninareh Mehrabi
Greg Ver Steeg
Anoop Kumar
Anna Rumshisky
Aram Galstyan
135
10
0
24 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical
  Criteria Decomposition
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
73
16
0
24 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILMLRM
66
7
0
23 Feb 2024
A Data-Centric Approach To Generate Faithful and High Quality Patient
  Summaries with Large Language Models
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models
S. Hegselmann
Zejiang Shen
Florian Gierse
Monica Agrawal
David Sontag
Xiaoyi Jiang
HILMVLM
92
6
0
23 Feb 2024
Generative Models are Self-Watermarked: Declaring Model Authentication
  through Re-Generation
Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation
Aditya Desu
Xuanli He
Xingliang Yuan
Wei Lu
WIGM
61
1
0
23 Feb 2024
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large
  Language Models
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Wei Ye
Jindong Wang
Xing Xie
Yue Zhang
Shikun Zhang
90
28
0
23 Feb 2024
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large
  Language Models
UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models
Zhaoheng Huang
Zhicheng Dou
Yutao Zhu
Ji-Rong Wen
HILM
54
2
0
22 Feb 2024
The Impact of Word Splitting on the Semantic Content of Contextualized
  Word Representations
The Impact of Word Splitting on the Semantic Content of Contextualized Word Representations
Aina Garí Soler
Matthieu Labeau
Chloé Clavel
VLM
75
2
0
22 Feb 2024
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt
  Politeness on LLM Performance
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance
Ziqi Yin
Hao Wang
Kaito Horio
Daisuke Kawahara
Satoshi Sekine
116
29
0
22 Feb 2024
Does the Generator Mind its Contexts? An Analysis of Generative Model
  Faithfulness under Context Transfer
Does the Generator Mind its Contexts? An Analysis of Generative Model Faithfulness under Context Transfer
Xinshuo Hu
Baotian Hu
Dongfang Li
Xiaoguang Li
Lifeng Shang
HILM
76
1
0
22 Feb 2024
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Seiji Gobara
Hidetaka Kamigaito
Taro Watanabe
79
4
0
22 Feb 2024
Qsnail: A Questionnaire Dataset for Sequential Question Generation
Qsnail: A Questionnaire Dataset for Sequential Question Generation
Yan Lei
Liang Pang
Yuanzhuo Wang
Huawei Shen
Xueqi Cheng
60
0
0
22 Feb 2024
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark
Preslav Nakov
Tairan Wang
Qingqing Zhu
Taicheng Guo
Shen Gao
Zhiyong Lu
Xin Gao
Xiangliang Zhang
214
4
0
22 Feb 2024
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Ashok Urlana
Charaka Vinayak Kumar
Ajeet Kumar Singh
B. Garlapati
S. Chalamala
Rahul Mishra
122
8
0
22 Feb 2024
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media
  Platforms
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms
Yiqiao Jin
Minje Choi
Gaurav Verma
Jindong Wang
Srijan Kumar
103
23
0
21 Feb 2024
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on
  Zero-shot LLM Assessment
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Vyas Raina
Adian Liusie
Mark Gales
AAMLELM
94
63
0
21 Feb 2024
Towards Building Multilingual Language Model for Medicine
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu
Chaoyi Wu
Xiaoman Zhang
Weixiong Lin
Haicheng Wang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MAELM
121
90
0
21 Feb 2024
A Unified Framework and Dataset for Assessing Societal Bias in
  Vision-Language Models
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models
Ashutosh Sathe
Prachi Jain
Sunayana Sitaram
108
2
0
21 Feb 2024
A Multimodal In-Context Tuning Approach for E-Commerce Product
  Description Generation
A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation
Yunxin Li
Baotian Hu
Wenhan Luo
Lin Ma
Yuxin Ding
Min Zhang
125
1
0
21 Feb 2024
Are LLMs Effective Negotiators? Systematic Evaluation of the
  Multifaceted Capabilities of LLMs in Negotiation Dialogues
Are LLMs Effective Negotiators? Systematic Evaluation of the Multifaceted Capabilities of LLMs in Negotiation Dialogues
Deuksin Kwon
Emily Weiss
Tara Kulshrestha
Kushal Chawla
Gale M. Lucas
Jonathan Gratch
91
11
0
21 Feb 2024
Ranking Large Language Models without Ground Truth
Ranking Large Language Models without Ground Truth
Amit Dhurandhar
Rahul Nair
Moninder Singh
Elizabeth M. Daly
Karthikeyan N. Ramamurthy
HILMALMELM
110
7
0
21 Feb 2024
Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems
Reliable LLM-based User Simulator for Task-Oriented Dialogue Systems
Ivan Sekulić
Silvia Terragni
Victor Guimaraes
Nghia Khau
Bruna Guedes
Modestas Filipavicius
A. Manso
Roland Mathis
53
7
0
20 Feb 2024
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue
  Summarization
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Liyan Tang
Igor Shalyminov
Amy Wing-mei Wong
Jon Burnsky
Jake W. Vincent
...
Hang Su
Lijia Sun
Yi Zhang
Saab Mansour
Kathleen McKeown
HILM
78
54
0
20 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
113
29
0
20 Feb 2024
Acknowledgment of Emotional States: Generating Validating Responses for
  Empathetic Dialogue
Acknowledgment of Emotional States: Generating Validating Responses for Empathetic Dialogue
Zi Haur Pang
Yahui Fu
Divesh Lala
Keiko Ochi
K. Inoue
Tatsuya Kawahara
53
1
0
20 Feb 2024
Me LLaMA: Foundation Large Language Models for Medical Applications
Me LLaMA: Foundation Large Language Models for Medical Applications
Qianqian Xie
Qingyu Chen
Aokun Chen
C.A.I. Peng
Yan Hu
...
Huan He
Lucila Ohno-Machido
Yonghui Wu
Hua Xu
Jiang Bian
LM&MAAI4MH
131
4
0
20 Feb 2024
A Dual-Prompting for Interpretable Mental Health Language Models
A Dual-Prompting for Interpretable Mental Health Language Models
Hyolim Jeon
Dongje Yoo
Daeun Lee
Sejung Son
Seungbae Kim
Jinyoung Han
AI4MH
90
5
0
20 Feb 2024
FinBen: A Holistic Financial Benchmark for Large Language Models
FinBen: A Holistic Financial Benchmark for Large Language Models
Qianqian Xie
Weiguang Han
Zhengyu Chen
Ruoyu Xiang
Xiao Zhang
...
Yanzhao Lai
Hao Wang
Min Peng
Sophia Ananiadou
Jimin Huang
AIFin
130
48
0
20 Feb 2024
IMBUE: Improving Interpersonal Effectiveness through Simulation and
  Just-in-time Feedback with Human-Language Model Interaction
IMBUE: Improving Interpersonal Effectiveness through Simulation and Just-in-time Feedback with Human-Language Model Interaction
Inna Wanyin Lin
Ashish Sharma
Christopher Rytting
Adam S. Miner
Jina Suh
Tim Althoff
88
15
0
19 Feb 2024
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness
Danna Zheng
Danyang Liu
Mirella Lapata
Jeff Z. Pan
HILM
87
7
0
19 Feb 2024
LangXAI: Integrating Large Vision Models for Generating Textual
  Explanations to Enhance Explainability in Visual Perception Tasks
LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks
Truong Thanh Hung Nguyen
Tobias Clement
Phuc Truong Loc Nguyen
Nils Kemmerzell
Van Binh Truong
V. Nguyen
Mohamed Abdelaal
Hung Cao
VLM
89
10
0
19 Feb 2024
Previous
123...303132...697071
Next