Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
v1
v2
v3 (latest)
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 3,519 papers shown
Title
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
54
0
0
04 May 2025
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
Haoyang Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
77
0
0
04 May 2025
An LLM-Empowered Low-Resolution Vision System for On-Device Human Behavior Understanding
Siyang Jiang
Bufang Yang
Lilin Xu
Mu Yuan
Yeerzhati Abudunuer
...
Liekang Zeng
Hongkai Chen
Zhenyu Yan
Xiaofan Jiang
Guoliang Xing
VLM
372
0
0
03 May 2025
CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation
Mazal Bethany
Nishant Vishwamitra
Cho-Yu Chiang
Peyman Najafirad
AAML
56
0
0
03 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
239
0
0
03 May 2025
LookAlike: Consistent Distractor Generation in Math MCQs
Nisarg Parikh
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew Lan
112
0
0
03 May 2025
Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos
Markos Stamatakis
Joshua Berger
Christian Wartena
Ralph Ewerth
Anett Hoppe
AI4Ed
118
0
0
03 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
138
0
0
02 May 2025
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
Tsai-Ning Wang
Lin-Lin Chen
Neil Zeghidour
Aaqib Saeed
AuLLM
LM&MA
413
0
0
02 May 2025
Combining LLMs with Logic-Based Framework to Explain MCTS
Ziyan An
Xia Wang
Hendrik Baier
Zirong Chen
A. Dubey
Taylor T. Johnson
Jonathan Sprinkle
Ayan Mukhopadhyay
Meiyi Ma
78
2
0
01 May 2025
How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues
Suhas BN
Dominik Mattioli
Saeed Abdullah
Rosa I. Arriaga
Saeed Abdullah
Andrew M. Sherrill
89
3
0
30 Apr 2025
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding
Yiming Lei
Chenkai Zhang
Ziqiang Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
AI4TS
149
0
0
30 Apr 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
239
0
0
30 Apr 2025
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
182
0
0
30 Apr 2025
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Andrei-Alexandru Manea
Jindřich Libovický
VLM
121
0
0
30 Apr 2025
AKIBoards: A Structure-Following Multiagent System for Predicting Acute Kidney Injury
David L. Gordon
P. Petousis
S. Nicholas
Alex A. T. Bui
FAtt
99
0
0
29 Apr 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
85
0
0
29 Apr 2025
UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
Sung Ju Hwang
148
1
0
29 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
210
0
0
28 Apr 2025
Knowledge Distillation of Domain-adapted LLMs for Question-Answering in Telecom
Rishika Sen
Sujoy Roychowdhury
Sumit Soman
H. G. Ranjani
Srikhetra Mohanty
126
0
0
28 Apr 2025
Context Selection and Rewriting for Video-based Educational Question Generation
Mengxia Yu
Bang Nguyen
Olivia Zino
Meng Jiang
174
0
0
28 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
Siyang Song
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
193
2
0
28 Apr 2025
Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks
Kang Yang
Xinjun Mao
Shangwen Wang
Yanjie Wang
Tanghaoran Zhang
Bo Lin
Yihao Qin
Zhang Zhang
Yao Lu
Kamal Al-Sabahi
ALM
283
1
0
28 Apr 2025
Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI
Hugo Georgenthum
Cristian Cosentino
Fabrizio Marozzo
Pietro Liò
MedIm
443
0
0
28 Apr 2025
ClimaEmpact: Domain-Aligned Small Language Models and Datasets for Extreme Weather Analytics
Deeksha Varshney
Keane Ong
Rui Mao
Min Zhang
G. Mengaldo
70
1
0
27 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
153
0
0
27 Apr 2025
Explanatory Summarization with Discourse-Driven Planning
Dongqi Liu
Xi Yu
Vera Demberg
Mirella Lapata
120
1
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
253
7
0
26 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
53
0
0
25 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
Siyang Song
Hong-ye Yu
HILM
114
0
0
25 Apr 2025
The Ultimate Cookbook for Invisible Poison: Crafting Subtle Clean-Label Text Backdoors with Style Attributes
Wencong You
Daniel Lowd
94
0
0
24 Apr 2025
CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
Junyan Zhang
Shuliang Liu
Aiwei Liu
Yubo Gao
Jiajun Li
Xiaojie Gu
Xuming Hu
WaLM
111
3
0
24 Apr 2025
An Empirical Study on Prompt Compression for Large Language Models
Zhenru Zhang
Jinyi Li
Yihuai Lan
Xinze Wang
Hao Wang
MQ
84
0
0
24 Apr 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Xiuying Chen
Tairan Wang
Juexiao Zhou
Zirui Song
Xin Gao
Wei Wei
MedIm
79
3
0
24 Apr 2025
How Effective are Generative Large Language Models in Performing Requirements Classification?
Waad Alhoshan
Alessio Ferrari
Liping Zhao
75
0
0
23 Apr 2025
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Yuante Li
Jama Hussein Mohamud
Chongren Sun
Di Wu
Benoit Boulet
LLMAG
ELM
123
1
0
23 Apr 2025
Planning with Diffusion Models for Target-Oriented Dialogue Systems
Hanwen Du
Bo Peng
Xia Ning
74
0
0
23 Apr 2025
ConTextual: Improving Clinical Text Summarization in LLMs with Context-preserving Token Filtering and Knowledge Graphs
Fahmida Liza Piya
Rahmatollah Beheshti
289
0
0
23 Apr 2025
CiteFix: Enhancing RAG Accuracy Through Post-Processing Citation Correction
Harsh Maheshwari
Srikanth Tenneti
Alwarappan Nakkiran
3DV
102
1
0
22 Apr 2025
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Fröbe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
121
0
0
22 Apr 2025
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Ruizhe Li
Chiwei Zhu
Benfeng Xu
Xiaorui Wang
Zhendong Mao
79
2
0
22 Apr 2025
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
235
0
0
22 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
337
0
0
21 Apr 2025
Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey
Aoran Gan
Hao Yu
Kai Zhang
Qi Liu
Wenyu Yan
Zhenya Huang
Shiwei Tong
Guoping Hu
RALM
3DV
87
1
0
21 Apr 2025
Med-CoDE: Medical Critique based Disagreement Evaluation Framework
Mohit Gupta
Akiko Aizawa
R. Shah
LM&MA
ELM
56
1
0
21 Apr 2025
Automatic Evaluation Metrics for Document-level Translation: Overview, Challenges and Trends
Jiaxin Guo
Xiaoyu Chen
Zhiqiang Rao
Jinlong Yang
Zongyao Li
Hengchao Shang
Daimeng Wei
Hao Yang
77
0
0
21 Apr 2025
Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs
Yow-Fu Liou
Yu-Chien Tang
An-Zi Yen
AI4Ed
109
0
0
21 Apr 2025
Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations
Yuri Balashov
Alex Balashov
Shiho Fukuda Koski
89
0
0
20 Apr 2025
A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models
Hongming Tan
Shaoxiong Zhan
Fengwei Jia
Hai-Tao Zheng
Wai Kin Victor Chan
77
0
0
20 Apr 2025
PROMPTEVALS: A Dataset of Assertions and Guardrails for Custom Production Large Language Model Pipelines
Reya Vir
Shreya Shankar
Harrison Chase
Will Fu-Hinthorn
Aditya G. Parameswaran
AI4TS
85
0
0
20 Apr 2025
Previous
1
2
3
4
5
6
...
69
70
71
Next