Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
v1
v2
v3 (latest)
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 3,519 papers shown
Title
Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models
Junjie Chen
Xuyang Liu
Subin Huang
Linfeng Zhang
Hang Yu
103
0
0
15 Mar 2025
Cross-Modal Learning for Music-to-Music-Video Description Generation
Zhuoyuan Mao
Mengjie Zhao
Qiyu Wu
Zhi-Wei Zhong
Wei-Hsiang Liao
Hiromi Wakaki
Yuki Mitsufuji
DiffM
VGen
113
0
0
14 Mar 2025
RESPONSE: Benchmarking the Ability of Language Models to Undertake Commonsense Reasoning in Crisis Situation
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
ReLM
LRM
151
1
0
14 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
85
1
0
14 Mar 2025
RONA: Pragmatically Diverse Image Captioning with Coherence Relations
Aashish Anantha Ramakrishnan
Aadarsh Anantha Ramakrishnan
Dongwon Lee
111
1
0
14 Mar 2025
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Hao Cui
Zahra Shamsi
Gowoon Cheon
Xuejian Ma
Shutong Li
...
Eun-Ah Kim
M. Brenner
Viren Jain
Sameera Ponda
Subhashini Venugopalan
ELM
LRM
140
4
0
14 Mar 2025
SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
Jiaxin Zhang
Zechao Li
Wendi Cui
Kamalika Das
Bradley Malin
Sricharan Kumar
111
0
0
13 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
102
0
0
13 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
86
0
0
12 Mar 2025
Prompt Inference Attack on Distributed Large Language Model Inference Frameworks
Xinjian Luo
Ting Yu
X. Xiao
AAML
SILM
146
1
0
12 Mar 2025
Leveraging Retrieval Augmented Generative LLMs For Automated Metadata Description Generation to Enhance Data Catalogs
Mayank Singh
Abhijeet Kumar
Sasidhar Donaparthi
Gayatri Karambelkar
119
0
0
12 Mar 2025
Exploring the Word Sense Disambiguation Capabilities of Large Language Models
Pierpaolo Basile
Lucia Siciliani
Elio Musacchio
Giovanni Semeraro
68
0
0
11 Mar 2025
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims
D. Rao
Weiqiu You
Eric Wong
Chris Callison-Burch
102
0
0
11 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
134
1
0
11 Mar 2025
Odysseus Navigates the Sirens' Song: Dynamic Focus Decoding for Factual and Diverse Open-Ended Text Generation
Wen Luo
Feifan Song
Wei Li
Guangyue Peng
Shaohang Wei
Houfeng Wang
AI4CE
94
0
0
11 Mar 2025
Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data
Swati Rallapalli
Shannon Gallagher
Andrew O. Mellinger
Jasmine Ratchford
Anusha Sinha
Tyler Brooks
William R. Nichols
Nick Winski
Bryan Brown
78
0
0
10 Mar 2025
A Novel Trustworthy Video Summarization Algorithm Through a Mixture of LoRA Experts
Wenzhuo Du
G. Wang
Guancheng Chen
Hang Zhao
Xiaochen Li
Jian Gao
483
0
0
08 Mar 2025
Unlocking Pretrained LLMs for Motion-Related Multimodal Generation: A Fine-Tuning Approach to Unify Diffusion and Next-Token Prediction
Shinichi Tanaka
Zhao Wang
Yoichi Kato
Jun Ohya
DiffM
82
0
0
08 Mar 2025
CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset
Oriel Perets
Ofir Ben Shoham
Nir Grinberg
Nadav Rappoport
ELM
64
0
0
08 Mar 2025
Mitigating Memorization in LLMs using Activation Steering
Manan Suri
Nishit Anand
Amisha Bhaskar
LLMSV
117
2
0
08 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
206
7
0
08 Mar 2025
Statistical Deficiency for Task Inclusion Estimation
Loïc Fosse
Frédéric Béchet
Benoit Favre
Géraldine Damnati
Gwénolé Lecorvé
Maxime Darrin
Philippe Formont
Pablo Piantanida
528
0
0
07 Mar 2025
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
Bang Nguyen
Tingting Du
Mengxia Yu
Lawrence Angrave
Meng Jiang
AI4Ed
111
0
0
07 Mar 2025
Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models
Anar Yeginbergen
Maite Oronoz
Rodrigo Agerri
138
0
0
07 Mar 2025
Development and Enhancement of Text-to-Image Diffusion Models
Rajdeep Roshan Sahu
VLM
158
0
0
07 Mar 2025
Learning and generalization of robotic dual-arm manipulation of boxes from demonstrations via Gaussian Mixture Models (GMMs)
Qian Ying Lee
Suhas Raghavendra Kulkarni
Kenzhi Iskandar Wong
Lin Yang
Bernardo Noronha
Yongjun Wee
Tzu-Yi Hung
Domenico Campolo
83
0
0
07 Mar 2025
GEMA-Score: Granular Explainable Multi-Agent Score for Radiology Report Evaluation
Zhenxuan Zhang
Kinhei Lee
Weihang Deng
Huichi Zhou
Zihao Jin
Jiahao Huang
Zhifan Gao
D. C. Marshall
Yingying Fang
G. Yang
MedIm
81
1
0
07 Mar 2025
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Tianjun Wei
Wei Wen
Ruizhi Qiao
Xing Sun
Jianghong Ma
ALM
ELM
75
2
0
07 Mar 2025
DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization
Yasir Khan
Xinlei Wu
Sangpil Youm
Justin Ho
Aryaan Shaikh
Jairo Garciga
Rohan Sharma
Bonnie J. Dorr
LMTD
123
0
0
07 Mar 2025
Uncertainty-Aware Decoding with Minimum Bayes Risk
Nico Daheim
Clara Meister
Thomas Möllenhoff
Iryna Gurevych
104
4
0
07 Mar 2025
Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting
Jiyue Jiang
Pengan Chen
Jinqiao Wang
Dongchen He
Ziqin Wei
...
Yimin Fan
Xiangyu Shi
Jimeng Sun
Chuan Wu
Yuan Li
LM&MA
121
3
0
06 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
106
0
0
06 Mar 2025
Tgea: An error-annotated dataset and benchmark tasks for text generation from pretrained language models
Jie He
Bo Peng
Yi-Lun Liao
Qun Liu
Deyi Xiong
109
8
0
06 Mar 2025
TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records
Hejie Cui
Alyssa Unell
Bowen Chen
Jason Alan Fries
Emily Alsentzer
Sanmi Koyejo
N. Shah
135
3
0
06 Mar 2025
Developing and Utilizing a Large-Scale Cantonese Dataset for Multi-Tasking in Large Language Models
Jiyue Jiang
Alfred Kar Yin Truong
Yuxiao Chen
Qinghang Bao
Sheng Wang
Pengan Chen
Jinqiao Wang
Dianbo Sui
Yu Li
Chuan Wu
ALM
85
0
0
05 Mar 2025
Rewarding Doubt: A Reinforcement Learning Approach to Calibrated Confidence Expression of Large Language Models
Paul Stangel
David Bani-Harouni
Chantal Pellegrini
Ege Özsoy
Kamilia Zaripova
Matthias Keicher
Nassir Navab
72
0
0
04 Mar 2025
MedHEval: Benchmarking Hallucinations and Mitigation Strategies in Medical Large Vision-Language Models
Aofei Chang
Le Huang
Parminder Bhatia
Taha A. Kass-Hout
Fenglong Ma
Cao Xiao
VLM
121
0
0
04 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Wenjie Wang
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
186
8
0
04 Mar 2025
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
Z. Zhong
Yuli Wang
Lulu Bi
Zhuoqi Ma
S. H. Ahn
...
Webster Stayman
Todd M. Kolb
I. Kamel
Harrison X. Bai
Zhicheng Jiao
LM&MA
93
0
0
03 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
203
3
0
03 Mar 2025
Enabling AI Scientists to Recognize Innovation: A Domain-Agnostic Algorithm for Assessing Novelty
Yao Wang
Mingxuan Cui
Arthur Jiang
133
0
0
03 Mar 2025
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
VGen
122
0
0
03 Mar 2025
SwiLTra-Bench: The Swiss Legal Translation Benchmark
Joel Niklaus
Jakob Merane
Luka Nenadic
Sina Ahmadi
Yingqiang Gao
...
Matthew Guillod
Robin Mamié
Daniel Brunner
Julio Pereyra
Niko Grupen
AILaw
ELM
115
4
0
03 Mar 2025
Argument Summarization and its Evaluation in the Era of Large Language Models
Moritz Altemeyer
Steffen Eger
Johannes Daxenberger
Yanran Chen
Tim Altendorf
Philipp Cimiano
Benjamin Schiller
LM&MA
ELM
LRM
120
1
0
02 Mar 2025
Instructor-Worker Large Language Model System for Policy Recommendation: a Case Study on Air Quality Analysis of the January 2025 Los Angeles Wildfires
K. Gao
Dening Lu
Liangzhi Li
Nan Chen
Hongjie He
Linlin Xu
Jonathan Li
91
1
0
01 Mar 2025
Embracing Diversity: A Multi-Perspective Approach with Soft Labels
Benedetta Muscato
Praveen Bushipaka
Gizem Gezici
Lucia Passaro
F. Giannotti
Tommaso Cucinotta
105
0
0
01 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
Terry Tong
Fei Wang
Zhe Zhao
Mengzhao Chen
AAML
ELM
94
3
0
01 Mar 2025
A Survey of Uncertainty Estimation Methods on Large Language Models
Zhiqiu Xia
Jinxuan Xu
Yuqian Zhang
Hang Liu
101
3
0
28 Feb 2025
LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation
Haitao Li
Yexin Chen
Yiran Hu
Qingyao Ai
Junjie Chen
Xiaoyu Yang
J. Yang
Yueyue Wu
Zeyang Liu
Yang Liu
AILaw
RALM
ELM
110
0
0
28 Feb 2025
Contextualizing biological perturbation experiments through language
Menghua Wu
Russell Littman
Jacob Levine
Lin Qiu
Tommaso Biancalani
David Richmond
Jan-Christian Huetter
69
0
0
28 Feb 2025
Previous
1
2
3
...
7
8
9
...
69
70
71
Next