ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.09675
  4. Cited By
BERTScore: Evaluating Text Generation with BERT
v1v2v3 (latest)

BERTScore: Evaluating Text Generation with BERT

21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
ArXiv (abs)PDFHTML

Papers citing "BERTScore: Evaluating Text Generation with BERT"

50 / 3,519 papers shown
Title
The Feasibility of Topic-Based Watermarking on Academic Peer Reviews
The Feasibility of Topic-Based Watermarking on Academic Peer Reviews
Alexander Nemecek
Yuzhou Jiang
Erman Ayday
WaLM
33
0
0
27 May 2025
Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation
Jong Hak Moon
Geon Choi
Paloma Rabaey
Min Gwan Kim
Hyuk Gi Hong
...
J. Kim
Harshita Sharma
Daniel Coelho De Castro
Javier Alvarez-Valle
Edward Choi
LM&MA
40
0
0
27 May 2025
LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation
LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation
Weikang Yuan
Kaisong Song
Zhuoren Jiang
Junjie Cao
Y. Zhang
Jun Lin
Kun Kuang
Ji Zhang
Xiaozhong Liu
AILawELM
16
0
0
26 May 2025
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets
Dannong Wang
Jaisal Patel
Daochen Zha
Steve Yang
Xiao-Yang Liu
31
0
0
26 May 2025
Graceful Forgetting in Generative Language Models
Graceful Forgetting in Generative Language Models
Chunyang Jiang
Chi-Min Chan
Yiyang Cai
Yulong Liu
Wei Xue
Yike Guo
MoMeCLLKELM
29
0
0
26 May 2025
DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
Yuxing Lu
Gecheng Fu
Wei Wu
Xukai Zhao
Sin Yee Goi
Jinzhuo Wang
123
0
0
26 May 2025
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Xiaoyuan Wu
Weiran Lin
Omer Akgul
Lujo Bauer
HILM
24
0
0
26 May 2025
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
Alexandre Ducorroy
Rachid Riad
MoMe
23
0
0
26 May 2025
Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis
Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis
Ahan Prasannakumar Shetty
ELM
10
0
0
26 May 2025
Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation
Does Rationale Quality Matter? Enhancing Mental Disorder Detection via Selective Reasoning Distillation
Hoyun Song
Huije Lee
Jisu Shin
Sukmin Cho
Changgeon Ko
Jong C. Park
AI4MHLRM
58
1
0
26 May 2025
Exploring Generative Error Correction for Dysarthric Speech Recognition
Exploring Generative Error Correction for Dysarthric Speech Recognition
Moreno La Quatra
Alkis Koudounas
Valerio Mario Salerno
Sabato Marco Siniscalchi
36
0
0
26 May 2025
Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning
Monocle: Hybrid Local-Global In-Context Evaluation for Long-Text Generation with Uncertainty-Based Active Learning
Xiaorong Wang
Ting Yang
Zhu Zhang
Shuo Wang
Zihan Zhou
Liner Yang
Zhiyuan Liu
Maosong Sun
38
0
0
26 May 2025
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos
Fanheng Kong
Jingyuan Zhang
Hongzhi Zhang
Shi Feng
Daling Wang
Linhao Yu
Xingguang Ji
Yu Tian
Qi Wang
Fuzheng Zhang
38
1
0
26 May 2025
gec-metrics: A Unified Library for Grammatical Error Correction Evaluation
gec-metrics: A Unified Library for Grammatical Error Correction Evaluation
Takumi Goto
Yusuke Sakai
Taro Watanabe
43
3
0
26 May 2025
SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs
SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs
Firoj Alam
Md. Arid Hasan
Shammur A. Chowdhury
63
0
0
25 May 2025
Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering
Hypercube-RAG: Hypercube-Based Retrieval-Augmented Generation for In-domain Scientific Question-Answering
Jimeng Shi
Sizhe Zhou
Bowen Jin
Wei Hu
Shaowen Wang
Giri Narasimhan
Jiawei Han
48
0
0
25 May 2025
From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
From Reddit to Generative AI: Evaluating Large Language Models for Anxiety Support Fine-tuned on Social Media Data
Ugur Kursuncu
Trilok Padhi
Gaurav Sinha
Abdulkadir Erol
Jaya Krishna Mandivarapu
Christopher R. Larrison
AI4MH
43
1
0
24 May 2025
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
C. Wang
Xiaoran Pan
Zihao Pan
Haofan Wang
Yiren Song
LRM
134
0
0
24 May 2025
Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing
Sci-LoRA: Mixture of Scientific LoRAs for Cross-Domain Lay Paraphrasing
Ming Cheng
Jiaying Gong
Hoda Eldardiry
AI4CE
46
0
0
24 May 2025
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
Alexander Shabalin
Viacheslav Meshchaninov
Dmitry Vetrov
44
0
0
24 May 2025
Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework
Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework
William Jongwon Han
Chaojing Duan
Zhepeng Cen
Yihang Yao
Xiaoyu Song
Atharva Mhaskar
Dylan Leong
Michael A. Rosenberg
Emerson Liu
Ding Zhao
61
0
0
24 May 2025
Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection
Shrey Pandit
Ashwin Vinod
Liu Leqi
Ying Ding
HILM
72
0
0
23 May 2025
ReqBrain: Task-Specific Instruction Tuning of LLMs for AI-Assisted Requirements Generation
ReqBrain: Task-Specific Instruction Tuning of LLMs for AI-Assisted Requirements Generation
Mohammad Kasra Habib
Daniel Graziotin
Stefan Wagner
194
0
0
23 May 2025
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Anjie Le
Henan Liu
Yue Wang
Zhenyu Liu
Rongkun Zhu
...
Alison Noble
Jacques Souquet
Xiaoqing Guo
Manxi Lin
Hongcheng Guo
LM&MAELMVLM
58
0
0
23 May 2025
Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps
Retrieval Augmented Generation-based Large Language Models for Bridging Transportation Cybersecurity Legal Knowledge Gaps
Khandakar Ashrafi Akbar
Md Nahiyan Uddin
Latifur Khan
Trayce Hockstad
Mizanur Rahman
M. Chowdhury
B. Thuraisingham
AILawRALM
242
0
0
23 May 2025
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
Wafa Alghallabi
Ritesh Thawkar
Sara Ghaboura
Ketan More
Omkar Thawakar
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
152
0
0
23 May 2025
CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation
Yuyang Jiang
Chacha Chen
Shengyuan Wang
Feng Li
Zecong Tang
...
Lydia Chelala
Christopher M. Straus
Reve Chahine
Samuel G. Armato III
Chenhao Tan
34
0
0
22 May 2025
Exploring the Relationship Between Diversity and Quality in Ad Text Generation
Exploring the Relationship Between Diversity and Quality in Ad Text Generation
Yoichi Aoki
Soichiro Murakami
Ukyo Honda
Akihiko Kato
96
0
0
22 May 2025
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
Don't Judge Code by Its Cover: Exploring Biases in LLM Judges for Code Evaluation
Jiwon Moon
Yerin Hwang
Dongryeol Lee
Taegwan Kang
Yongil Kim
Kyomin Jung
ELM
51
0
0
22 May 2025
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation
HiMATE: A Hierarchical Multi-Agent Framework for Machine Translation Evaluation
Shijie Zhang
Renhao Li
Songsheng Wang
Philipp Koehn
Min Yang
Derek F. Wong
27
0
0
22 May 2025
ScholarBench: A Bilingual Benchmark for Abstraction, Comprehension, and Reasoning Evaluation in Academic Contexts
ScholarBench: A Bilingual Benchmark for Abstraction, Comprehension, and Reasoning Evaluation in Academic Contexts
Dongwon Noh
Donghyeok Koh
Junghun Yuk
Gyuwan Kim
Jaeyong Lee
Kyungtae Lim
Cheoneum Park
ELM
71
0
0
22 May 2025
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Amartya Chakraborty
Paresh Dashore
Nadia Bathaee
Anmol Jain
Anirban Das
Shi-Xiong Zhang
Sambit Sahu
Milind Naphade
Genta Indra Winata
LLMAG
105
0
0
22 May 2025
Resource for Error Analysis in Text Simplification: New Taxonomy and Test Collection
Resource for Error Analysis in Text Simplification: New Taxonomy and Test Collection
Benjamin Vendeville
Liana Ermakova
Pierre De Loor
38
0
0
22 May 2025
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
Sara Ghaboura
Ketan More
Wafa Alghallabi
Omkar Thawakar
Jorma T. Laaksonen
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
VLMLRM
59
0
0
22 May 2025
From Generic Empathy to Personalized Emotional Support: A Self-Evolution Framework for User Preference Alignment
From Generic Empathy to Personalized Emotional Support: A Self-Evolution Framework for User Preference Alignment
Jing Ye
Lu Xiang
Yaping Zhang
Chengqing Zong
73
0
0
22 May 2025
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
LLMs Are Not Scorers: Rethinking MT Evaluation with Generation-Based Methods
Hyang Cui
LRM
112
0
0
22 May 2025
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text
BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text
Ibrahim Al Azher
Miftahul Jannat Mokarrama
Zhishuai Guo
Sagnik Ray Choudhury
Hamed Alhoori
94
0
0
22 May 2025
Continually Self-Improving Language Models for Bariatric Surgery Question--Answering
Continually Self-Improving Language Models for Bariatric Surgery Question--Answering
Yash Kumar Atri
Thomas H Shin
Thomas Hartvigsen
82
1
0
22 May 2025
Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics
Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics
Ashim Dahal
Ankit Ghimire
Saydul Akbar Murad
Nick Rahimi
54
0
0
22 May 2025
MuseRAG: Idea Originality Scoring At Scale
MuseRAG: Idea Originality Scoring At Scale
Ali Sarosh Bangash
Krish Veera
Ishfat Abrar Islam
Raiyan Abdul Baten
LRM
56
0
0
22 May 2025
The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
Suhas BN
Yash Mahajan
Dominik Mattioli
Dominik Mattioli
Rosa I. Arriaga
Chris W. Wiese
Saeed Abdullah
AI4MH
80
1
0
21 May 2025
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Danna Zheng
Mirella Lapata
Jeff Z. Pan
HILM
70
0
0
21 May 2025
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics
Jonathan Katzy
Yongcheng Huang
Gopal-Raj Panchu
Maksym Ziemlewski
Paris Loizides
Sander Vermeulen
Arie van Deursen
Maliheh Izadi
ELM
57
1
0
21 May 2025
Multi-Hop Question Generation via Dual-Perspective Keyword Guidance
Multi-Hop Question Generation via Dual-Perspective Keyword Guidance
Maodong Li
Longyin Zhang
Fang Kong
47
0
0
21 May 2025
Can Large Language Models be Effective Online Opinion Miners?
Can Large Language Models be Effective Online Opinion Miners?
Ryang Heo
Yongsik Seo
Junseong Lee
Dongha Lee
50
0
0
21 May 2025
Can Large Language Models Understand Internet Buzzwords Through User-Generated Content
Can Large Language Models Understand Internet Buzzwords Through User-Generated Content
Chen Huang
Junkai Luo
Xinzuo Wang
Wenqiang Lei
Jiancheng Lv
68
0
0
21 May 2025
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation
Ruijie Xi
He Ba
Hao Yuan
Rishu Agrawal
Arul Prakash
Ruoyan Long
Arul T. Prakash
SyDa
60
0
0
21 May 2025
SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models
SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models
Roland Daynauth
Christopher Clarke
Krisztian Flautner
Lingjia Tang
Jason Mars
ALM
26
0
0
21 May 2025
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications
Haijun Li
Tianqi Shi
Zifu Shang
Yuxuan Han
Xueyu Zhao
...
Longyue Wang
Gongbo Tang
Weihua Luo
Zhao Xu
Kaifu Zhang
ELM
53
0
0
20 May 2025
QA-prompting: Improving Summarization with Large Language Models using Question-Answering
QA-prompting: Improving Summarization with Large Language Models using Question-Answering
Neelabh Sinha
RALMLRM
105
0
0
20 May 2025
Previous
123456...697071
Next