ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.09675
  4. Cited By
BERTScore: Evaluating Text Generation with BERT
v1v2v3 (latest)

BERTScore: Evaluating Text Generation with BERT

21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
ArXiv (abs)PDFHTML

Papers citing "BERTScore: Evaluating Text Generation with BERT"

50 / 3,519 papers shown
Title
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
54
0
0
04 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Haoyang Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
77
0
0
04 May 2025
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding
Siyang Jiang
Bufang Yang
Lilin Xu
Mu Yuan
Yeerzhati Abudunuer
...
Liekang Zeng
Hongkai Chen
Zhenyu Yan
Xiaofan Jiang
Guoliang Xing
VLM
372
0
0
03 May 2025
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
Mazal Bethany
Nishant Vishwamitra
Cho-Yu Chiang
Peyman Najafirad
AAML
56
0
0
03 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
239
0
0
03 May 2025
LookAlike: Consistent Distractor Generation in Math MCQs
LookAlike: Consistent Distractor Generation in Math MCQs
Nisarg Parikh
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew Lan
112
0
0
03 May 2025
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Markos Stamatakis
Joshua Berger
Christian Wartena
Ralph Ewerth
Anett Hoppe
AI4Ed
118
0
0
03 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
138
0
0
02 May 2025
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
Tsai-Ning Wang
Lin-Lin Chen
Neil Zeghidour
Aaqib Saeed
AuLLMLM&MA
413
0
0
02 May 2025
Combining LLMs with Logic-Based Framework to Explain MCTS
Combining LLMs with Logic-Based Framework to Explain MCTS
Ziyan An
Xia Wang
Hendrik Baier
Zirong Chen
A. Dubey
Taylor T. Johnson
Jonathan Sprinkle
Ayan Mukhopadhyay
Meiyi Ma
78
2
0
01 May 2025
How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues
How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues
Suhas BN
Dominik Mattioli
Saeed Abdullah
Rosa I. Arriaga
Saeed Abdullah
Andrew M. Sherrill
89
3
0
30 Apr 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Yiming Lei
Chenkai Zhang
Ziqiang Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
149
0
0
30 Apr 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIPVLM
239
0
0
30 Apr 2025
ConSens: Assessing context grounding in open-book question answering
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
182
0
0
30 Apr 2025
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Andrei-Alexandru Manea
Jindřich Libovický
VLM
121
0
0
30 Apr 2025
AKIBoards: A Structure-Following Multiagent System for Predicting Acute Kidney Injury
AKIBoards: A Structure-Following Multiagent System for Predicting Acute Kidney Injury
David L. Gordon
P. Petousis
S. Nicholas
Alex A. T. Bui
FAtt
99
0
0
29 Apr 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
85
0
0
29 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
148
1
0
29 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
210
0
0
28 Apr 2025
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Rishika Sen
Sujoy Roychowdhury
Sumit Soman
H. G. Ranjani
Srikhetra Mohanty
126
0
0
28 Apr 2025
Context Selection and Rewriting for Video-based Educational Question Generation
Context Selection and Rewriting for Video-based Educational Question Generation
Mengxia Yu
Bang Nguyen
Olivia Zino
Meng Jiang
174
0
0
28 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
Siyang Song
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MAELM
193
2
0
28 Apr 2025
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Kang Yang
Xinjun Mao
Shangwen Wang
Yanjie Wang
Tanghaoran Zhang
Bo Lin
Yihao Qin
Zhang Zhang
Yao Lu
Kamal Al-Sabahi
ALM
283
1
0
28 Apr 2025
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Hugo Georgenthum
Cristian Cosentino
Fabrizio Marozzo
Pietro Liò
MedIm
443
0
0
28 Apr 2025
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics
Deeksha Varshney
Keane Ong
Rui Mao
Min Zhang
G. Mengaldo
70
1
0
27 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
153
0
0
27 Apr 2025
Explanatory Summarization with Discourse-Driven Planning
Explanatory Summarization with Discourse-Driven Planning
Dongqi Liu
Xi Yu
Vera Demberg
Mirella Lapata
120
1
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALMELM
253
7
0
26 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
An Empirical Study of Evaluating Long-form Question Answering
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
53
0
0
25 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
Siyang Song
Hong-ye Yu
HILM
114
0
0
25 Apr 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
94
0
0
24 Apr 2025
CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
Junyan Zhang
Shuliang Liu
Aiwei Liu
Yubo Gao
Jiajun Li
Xiaojie Gu
Xuming Hu
WaLM
111
3
0
24 Apr 2025
An Empirical Study on Prompt Compression for Large Language Models
An Empirical Study on Prompt Compression for Large Language Models
Zhenru Zhang
Jinyi Li
Yihuai Lan
Xinze Wang
Hao Wang
MQ
84
0
0
24 Apr 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Xiuying Chen
Tairan Wang
Juexiao Zhou
Zirui Song
Xin Gao
Wei Wei
MedIm
79
3
0
24 Apr 2025
How Effective are Generative Large Language Models in Performing Requirements Classification?
How Effective are Generative Large Language Models in Performing Requirements Classification?
Waad Alhoshan
Alessio Ferrari
Liping Zhao
75
0
0
23 Apr 2025
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Yuante Li
Jama Hussein Mohamud
Chongren Sun
Di Wu
Benoit Boulet
LLMAGELM
123
1
0
23 Apr 2025
Planning with Diffusion Models for Target-Oriented Dialogue Systems
Planning with Diffusion Models for Target-Oriented Dialogue Systems
Hanwen Du
Bo Peng
Xia Ning
74
0
0
23 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
289
0
0
23 Apr 2025
CiteFix: Enhancing RAG Accuracy Through Post-Processing Citation Correction
CiteFix: Enhancing RAG Accuracy Through Post-Processing Citation Correction
Harsh Maheshwari
Srikanth Tenneti
Alwarappan Nakkiran
3DV
102
1
0
22 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Fröbe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
121
0
0
22 Apr 2025
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Ruizhe Li
Chiwei Zhu
Benfeng Xu
Xiaorui Wang
Zhendong Mao
79
2
0
22 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
235
0
0
22 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
337
0
0
21 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM3DV
87
1
0
21 Apr 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Mohit Gupta
Akiko Aizawa
R. Shah
LM&MAELM
56
1
0
21 Apr 2025
Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends
Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends
Jiaxin Guo
Xiaoyu Chen
Zhiqiang Rao
Jinlong Yang
Zongyao Li
Hengchao Shang
Daimeng Wei
Hao Yang
77
0
0
21 Apr 2025
Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs
Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs
Yow-Fu Liou
Yu-Chien Tang
An-Zi Yen
AI4Ed
109
0
0
21 Apr 2025
Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations
Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations
Yuri Balashov
Alex Balashov
Shiho Fukuda Koski
89
0
0
20 Apr 2025
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models
Hongming Tan
Shaoxiong Zhan
Fengwei Jia
Hai-Tao Zheng
Wai Kin Victor Chan
77
0
0
20 Apr 2025
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
Reya Vir
Shreya Shankar
Harrison Chase
Will Fu-Hinthorn
Aditya G. Parameswaran
AI4TS
85
0
0
20 Apr 2025
Previous
123456...697071
Next