ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.09675
  4. Cited By
BERTScore: Evaluating Text Generation with BERT

BERTScore: Evaluating Text Generation with BERT

21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
ArXivPDFHTML

Papers citing "BERTScore: Evaluating Text Generation with BERT"

50 / 1,209 papers shown
Title
Context Selection and Rewriting for Video-based Educational Question Generation
Context Selection and Rewriting for Video-based Educational Question Generation
Mengxia Yu
Bang Nguyen
Olivia Zino
Meng Jiang
39
0
0
28 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
Shri Kiran Srinivasan
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
103
0
0
28 Apr 2025
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Rishika Sen
Sujoy Roychowdhury
Sumit Soman
H. G. Ranjani
Srikhetra Mohanty
68
0
0
28 Apr 2025
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Hugo Georgenthum
Cristian Cosentino
Fabrizio Marozzo
Pietro Liò
MedIm
225
0
0
28 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
65
0
0
28 Apr 2025
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics
Deeksha Varshney
Keane Ong
Rui Mao
Min Zhang
G. Mengaldo
47
0
0
27 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
84
0
0
27 Apr 2025
Explanatory Summarization with Discourse-Driven Planning
Explanatory Summarization with Discourse-Driven Planning
Dongqi Liu
Xi Yu
Vera Demberg
Mirella Lapata
55
0
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
98
2
0
26 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
Shri Kiran Srinivasan
Hong-ye Yu
HILM
86
0
0
25 Apr 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
39
0
0
24 Apr 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Xiuying Chen
Tairan Wang
Juexiao Zhou
Zirui Song
Xin Gao
Jiahui Geng
MedIm
49
1
0
24 Apr 2025
How Effective are Generative Large Language Models in Performing Requirements Classification?
How Effective are Generative Large Language Models in Performing Requirements Classification?
Waad Alhoshan
Alessio Ferrari
Liping Zhao
30
0
0
23 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
136
0
0
23 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
56
0
0
22 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Fröbe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
26
0
0
22 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Mohit Gupta
Akiko Aizawa
R. Shah
LM&MA
ELM
35
0
0
21 Apr 2025
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
Jiakai Tang
Jingsen Zhang
Zihang Tian
Xueyang Feng
Lei Wang
Xu Chen
OffRL
216
0
0
19 Apr 2025
FocusedAD: Character-centric Movie Audio Description
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
61
0
0
16 Apr 2025
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Joseph Liu
Yoonsoo Nam
Xinyue Cui
Swabha Swayamdipta
56
0
0
13 Apr 2025
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang
Rishabh Dabral
D. Luvizon
Zhe Cao
Lingjie Liu
Thabo Beeler
Christian Theobalt
EgoV
57
0
0
11 Apr 2025
Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG
Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG
Hengran Zhang
Minghao Tang
Keping Bi
J. Guo
Shihao Liu
Daiting Shi
Dawei Yin
Xueqi Cheng
21
0
0
07 Apr 2025
NoveltyBench: Evaluating Language Models for Humanlike Diversity
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang
Harshita Diddee
Susan Holm
Hanchen Liu
Xinyue Liu
Vinay Samuel
Barry Wang
Daphne Ippolito
37
1
0
07 Apr 2025
Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer Performance
Regional Tiny Stories: Using Small Models to Compare Language Learning and Tokenizer Performance
Nirvan Patil
Malhar Abhay Inamdar
Agnivo Gosai
Guruprasad Pathak
Anish Joshi
Aryan Sagavekar
Anish Joshirao
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
46
0
0
07 Apr 2025
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
A Perplexity and Menger Curvature-Based Approach for Similarity Evaluation of Large Language Models
Yuantao Zhang
Zhankui Yang
AAML
38
0
0
05 Apr 2025
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
45
0
0
03 Apr 2025
A Scalable Framework for Evaluating Health Language Models
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
64
2
0
30 Mar 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
Rohitash Chandra
Aryan Chaudhary
Yeshwanth Rayavarapu
49
0
0
27 Mar 2025
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
68
1
0
24 Mar 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
62
1
0
21 Mar 2025
ECLAIR: Enhanced Clarification for Interactive Responses
ECLAIR: Enhanced Clarification for Interactive Responses
John Murzaku
Zifan Liu
Md Mehrab Tanjim
Vaishnavi Muppala
Xiang Chen
Yunyao Li
44
0
0
19 Mar 2025
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System
Weihang Su
Baoqing Yue
Qingyao Ai
Yiran Hu
Jiaqi Li
C. Wang
Kaiyuan Zhang
Yueyue Wu
Yixiao Liu
AILaw
ELM
108
0
0
18 Mar 2025
ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models
ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models
Alexey Karev
Dong Xu
58
0
0
18 Mar 2025
HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models
HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models
Xinyan Jiang
Hang Ye
Yongxin Zhu
Xiaoying Zheng
Zikang Chen
Jun Gong
51
0
0
17 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
33
0
0
16 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
59
0
0
14 Mar 2025
Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs
Mayank Singh
Abhijeet Kumar
Sasidhar Donaparthi
Gayatri Karambelkar
48
0
0
12 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
54
0
0
12 Mar 2025
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
D. Rao
Weiqiu You
Eric Wong
Chris Callison-Burch
66
0
0
11 Mar 2025
Statistical Deficiency for Task Inclusion Estimation
Loïc Fosse
Frédéric Béchet
Benoit Favre
Géraldine Damnati
Gwénolé Lecorvé
Maxime Darrin
Philippe Formont
Pablo Piantanida
219
0
0
07 Mar 2025
Development and Enhancement of Text-to-Image Diffusion Models
Rajdeep Roshan Sahu
VLM
74
0
0
07 Mar 2025
Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models
Anar Yeginbergen
Maite Oronoz
Rodrigo Agerri
57
0
0
07 Mar 2025
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models
Jie He
Bo Peng
Yi-Lun Liao
Qun Liu
Deyi Xiong
68
8
0
06 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
86
2
0
03 Mar 2025
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Yao Wang
Mingxuan Cui
Arthur Jiang
77
0
0
03 Mar 2025
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Perla Al Almaoui
Pierrette Bouillon
Simon Hengchen
62
0
0
28 Feb 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Yongxu Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
38
0
0
28 Feb 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao
Xiang Zhang
Raymond Li
Chuyuan Li
Chenyu You
Shafiq Joty
Giuseppe Carenini
59
1
0
27 Feb 2025
LLM as a Broken Telephone: Iterative Generation Distorts Information
LLM as a Broken Telephone: Iterative Generation Distorts Information
Amr Mohamed
Mingmeng Geng
Michalis Vazirgiannis
Guokan Shang
78
1
0
27 Feb 2025
Previous
12345...232425
Next