Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
v1
v2
v3 (latest)
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 3,519 papers shown
Title
HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents
Shiyi Liu
Haiying Shen
Shuai Che
Mahdi Ghandi
Mingqin Li
LLMAG
176
0
0
01 Apr 2025
CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation
Chengshuai Zhao
Riccardo De Maria
Tharindu Kumarage
Kumar Satvik Chaudhary
Garima Agrawal
Yiwen Li
Jongchan Park
Yuli Deng
Yiran Chen
Huan Liu
82
0
0
01 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffM
VGen
111
0
0
01 Apr 2025
Does "Reasoning" with Large Language Models Improve Recognizing, Generating, and Reframing Unhelpful Thoughts?
Yilin Qi
Dong Won Lee
C. Breazeal
Hae Won Park
LRM
88
0
0
31 Mar 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
84
0
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
89
2
0
30 Mar 2025
The Impact of Code-switched Synthetic Data Quality is Task Dependent: Insights from MT and ASR
Injy Hamed
Ngoc Thang Vu
Nizar Habash
74
0
0
30 Mar 2025
Improving the Context Length and Efficiency of Code Retrieval for Tracing Security Vulnerability Fixes
Xueqing Liu
Jiangrui Zheng
Guanqun Yang
Siyan Wen
Qiushi Liu
Xiaoyin Wang
92
0
0
29 Mar 2025
Penrose Tiled Low-Rank Compression and Section-Wise Q&A Fine-Tuning: A General Framework for Domain-Specific Large Language Model Adaptation
Chuan-Wei Kuo
Siyu Chen
Chenqi Yan
Yu Liu
103
0
0
28 Mar 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
Rohitash Chandra
Aryan Chaudhary
Yeshwanth Rayavarapu
124
0
0
27 Mar 2025
JEEM: Vision-Language Understanding in Four Arabic Dialects
Karima Kadaoui
Hanin Atwany
Hamdan Al-Ali
Abdelrahman Mohamed
Ali Mekky
Sergei Tilga
Natalia Fedorova
Ekaterina Artemova
Hanan Aldarmaki
Yova Kementchedjhieva
VLM
93
4
0
27 Mar 2025
Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach
Javier Coronado-Blázquez
HILM
ELM
114
0
0
27 Mar 2025
Low-resource Information Extraction with the European Clinical Case Corpus
Soumitra Ghosh
Begona Altuna
Saeed Farzi
Pietro Ferrazzi
A. Lavelli
Giulia Mezzanotte
Manuela Speranza
Bernardo Magnini
83
1
0
26 Mar 2025
StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Zhicheng Guo
Sijie Cheng
Yuchen Niu
Hao Wang
Sicheng Zhou
Wenbing Huang
Yang Liu
CLL
OffRL
206
0
0
26 Mar 2025
TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes
Raj Sanjay Shah
Lei Xu
Qianchu Liu
Jon Burnsky
Drew Bertagnolli
Chaitanya P. Shivade
LM&MA
141
0
0
26 Mar 2025
Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees
Gollam Rabby
Diyana Muhammed
Prasenjit Mitra
Sören Auer
79
2
0
25 Mar 2025
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Athiya Deviyani
Fernando Diaz
85
0
0
25 Mar 2025
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLM
CoGe
114
0
0
25 Mar 2025
Understanding and Improving Information Preservation in Prompt Compression for LLMs
Weronika Łajewska
Momchil Hardalov
Laura Aina
Neha Anna John
Hang Su
Lluís Marquez
160
1
0
24 Mar 2025
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
130
2
0
24 Mar 2025
A Survey of Large Language Model Agents for Question Answering
Murong Yue
LLMAG
LM&MA
ELM
112
5
0
24 Mar 2025
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts
Beining Xu
Arkaitz Zubiaga
DeLMO
119
0
0
23 Mar 2025
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
Diwei Wang
Cédric Bobenrieth
Hyewon Seo
LRM
65
0
0
23 Mar 2025
MedPlan:A Two-Stage RAG-Based System for Personalized Medical Plan Generation
Hsin-Ling Hsu
Cong-Tinh Dao
Luning Wang
Zitao Shuai
T. M. Phan
...
Dongsheng Luo
Wen-Chih Peng
Feng Liu
Fang-Ming Hung
Chenwei Wu
120
1
0
23 Mar 2025
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu
Bing Li
Cheng Zheng
Jinjie Mai
Jun-Cheng Chen
...
Abdullah Hamdi
Sara Rojas Martinez
Chia-Wen Lin
Mohamed Elhoseiny
Bernard Ghanem
VLM
102
1
0
22 Mar 2025
Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?
Jeremy Barnes
Naiara Perez
Alba Bonet-Jover
Begoña Altuna
101
2
0
21 Mar 2025
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
Brihi Joshi
Sriram Venkatapathy
Mohit Bansal
Nanyun Peng
Haw-Shiuan Chang
LRM
128
0
0
21 Mar 2025
Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis
Bo Hu
Han Yuan
Vlad Pandelea
Wuqiong Luo
Yingzhu Zhao
Zheng Ma
90
0
0
20 Mar 2025
Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering
DongGeon Lee
Ahjeong Park
Hyeri Lee
Hyeonseo Nam
Yunho Maeng
62
2
0
20 Mar 2025
Automatically Generating Chinese Homophone Words to Probe Machine Translation Estimation Systems
Shenbin Qian
Constantin Orasan
Diptesh Kanojia
Félix do Carmo
89
0
0
20 Mar 2025
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
Alexandra DeLucia
Mark Dredze
79
0
0
20 Mar 2025
Towards Lighter and Robust Evaluation for Retrieval Augmented Generation
Alex-Razvan Ispas
Charles-Elie Simon
Fabien Caspani
Vincent Guigue
RALM
102
0
0
20 Mar 2025
FutureGen: LLM-RAG Approach to Generate the Future Work of Scientific Article
Ibrahim Al Azher
Miftahul Jannat Mokarrama
Zhishuai Guo
Sagnik Ray Choudhury
Hamed Alhoori
LLMAG
99
2
0
20 Mar 2025
Uncertainty Quantification and Confidence Calibration in Large Language Models: A Survey
Xiaoou Liu
Tiejin Chen
Longchao Da
Chacha Chen
Zhen Lin
Hua Wei
HILM
136
8
0
20 Mar 2025
EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?
Xinyan Chen
Jiaxin Ge
Hongming Dai
Qiang Zhou
Qiuxuan Feng
Jingtong Hu
Yun Wang
Jiaming Liu
Shanghang Zhang
LM&Ro
97
0
0
19 Mar 2025
ECLAIR: Enhanced Clarification for Interactive Responses
John Murzaku
Zifan Liu
Md Mehrab Tanjim
Vaishnavi Muppala
Xiang Chen
Yunyao Li
71
0
0
19 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Shafiq Joty
ELM
156
3
0
19 Mar 2025
Am I eligible? Natural Language Inference for Clinical Trial Patient Recruitment: the Patient's Point of View
Mathilde Aguiar
Pierre Zweigenbaum
Nona Naderi
LM&MA
134
0
0
19 Mar 2025
Inspecting the Representation Manifold of Differentially-Private Text
Stefan Arnold
68
0
0
19 Mar 2025
SceneEval: Evaluating Semantic Coherence in Text-Conditioned 3D Indoor Scene Synthesis
Hou In Ivan Tam
Hou In Derek Pun
Austin T. Wang
Angel X. Chang
Manolis Savva
105
1
0
18 Mar 2025
Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives
Sara Sarto
Marcella Cornia
Rita Cucchiara
84
1
0
18 Mar 2025
JuDGE: Benchmarking Judgment Document Generation for Chinese Legal System
Weihang Su
Baoqing Yue
Qingyao Ai
Yiran Hu
Jiaqi Li
C. Wang
Kaiyuan Zhang
Yueyue Wu
Yixiao Liu
AILaw
ELM
168
3
0
18 Mar 2025
ConSCompF: Consistency-focused Similarity Comparison Framework for Generative Large Language Models
Alexey Karev
Dong Xu
144
0
0
18 Mar 2025
ExDDV: A New Dataset for Explainable Deepfake Detection in Video
Vlad Hondru
Eduard Hogea
Darian M. Onchis
Radu Tudor Ionescu
147
2
0
18 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
85
0
0
17 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
128
3
0
17 Mar 2025
Overview of the NTCIR-18 Automatic Evaluation of LLMs (AEOLLM) Task
Junjie Chen
Haitao Li
Zhumin Chu
Yixiao Liu
Qingyao Ai
50
0
0
17 Mar 2025
HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models
Xinyan Jiang
Hang Ye
Yongxin Zhu
Xiaoying Zheng
Zikang Chen
Jun Gong
110
0
0
17 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
115
1
0
16 Mar 2025
MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling
Zhaopeng Feng
Jiahan Ren
Jiayuan Su
Jiamei Zheng
Zhihang Tang
Hongwei Wang
Zuozhu Liu
LRM
152
2
0
15 Mar 2025
Previous
1
2
3
...
6
7
8
...
69
70
71
Next