Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 1,209 papers shown
Title
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
52
6
0
28 Jan 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
83
15
0
28 Jan 2025
SedarEval: Automated Evaluation using Self-Adaptive Rubrics
Zhiyuan Fan
Weinong Wang
Xing Wu
Debing Zhang
41
1
0
28 Jan 2025
RELexED: Retrieval-Enhanced Legal Summarization with Exemplar Diversity
T. Y. S. S. Santosh
Chen Jia
Patrick Goroncy
Matthias Grabmair
AILaw
54
1
0
23 Jan 2025
Hypothesis Generation for Materials Discovery and Design Using Goal-Driven and Constraint-Guided LLM Agents
Shrinidhi Kumbhar
Venkatesh Mishra
Kevin Coutinho
Divij Handa
Ashif Iquebal
Chitta Baral
89
5
0
23 Jan 2025
Open or Closed LLM for Lesser-Resourced Languages? Lessons from Greek
John Pavlopoulos
Juli Bakagianni
K. Pouli
M. Gavriilidou
61
0
0
22 Jan 2025
Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2
Md. Rakibul Islam
Md. Zahid Hossain
Mustofa Ahmed
Most. Sharmin Sultana Samu
LM&MA
MedIm
40
0
0
21 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
141
70
0
20 Jan 2025
Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models
Qiming Bao
Juho Leinonen
A. Peng
Wanjun Zhong
Gaël Gendron
Tim Pistotti
Alice Huang
Paul Denny
Michael Witbrock
Jing Liu
AI4Ed
LRM
181
1
0
20 Jan 2025
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu
Haoyu Zhao
Xinran Gu
Dingli Yu
Anirudh Goyal
Sanjeev Arora
ALM
82
46
0
20 Jan 2025
Decoupled Sequence and Structure Generation for Realistic Antibody Design
Nayoung Kim
Minsu Kim
Sungsoo Ahn
Jinkyoo Park
54
0
0
20 Jan 2025
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation
Suvodip Dey
M. Desarkar
OffRL
46
0
0
20 Jan 2025
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs
Jiaxing Wu
Lin Ning
Luyang Liu
Harrison Lee
Neo Wu
Chao Wang
Sushant Prakash
S. O’Banion
Bradley Green
Jun Xie
71
1
0
20 Jan 2025
Zero-shot and Few-shot Learning with Instruction-following LLMs for Claim Matching in Automated Fact-checking
Dina Pisarevskaya
Arkaitz Zubiaga
55
0
0
18 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jingyang Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
90
19
0
17 Jan 2025
ASTRID -- An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems
Mohita Chowdhury
Yajie Vera He
Aisling Higham
Ernest Lim
60
1
0
14 Jan 2025
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment
Difei Gu
Yunhe Gao
Yang Zhou
Mu Zhou
Dimitris N. Metaxas
LM&MA
55
2
0
13 Jan 2025
Scaling Down Semantic Leakage: Investigating Associative Bias in Smaller Language Models
Veronika Smilga
45
0
0
11 Jan 2025
A Novel Approach to Scalable and Automatic Topic-Controlled Question Generation in Education
Ziqing Li
Mutlu Cukurova
Sahan Bulathwela
36
3
0
10 Jan 2025
IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization
Jie Cao
Dian Jiao
Qiang Yan
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
50
1
0
08 Jan 2025
Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text
Ali Al-Lawati
Jason Lucas
Prasenjit Mitra
LMTD
48
0
0
06 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
73
96
0
03 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
59
20
0
03 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice Oh
Seohyon Jung
94
1
0
03 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
91
0
0
02 Jan 2025
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation
Eunsu Kim
Juyoung Suk
Seungone Kim
Niklas Muennighoff
Dongkwan Kim
Alice Oh
ELM
96
1
0
31 Dec 2024
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
96
12
0
31 Dec 2024
Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization
Liqiang Jing
Jingxuan Zuo
Yue Zhang
50
8
0
31 Dec 2024
MapExplorer: New Content Generation from Low-Dimensional Visualizations
Xingjian Zhang
Ziyang Xiong
Shixuan Liu
Yutong Xie
Tolga Ergen
Dongsub Shim
Hua Xu
Honglak Lee
Qiaozhu Me
44
0
0
24 Dec 2024
Investigating Length Issues in Document-level Machine Translation
Ziqian Peng
Rachel Bawden
François Yvon
71
1
0
23 Dec 2024
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang
Liqiang Jing
Vibhav Gogate
116
2
0
19 Dec 2024
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja
Vivek Iyer
Claire He
Graham Neubig
ViT
98
1
0
18 Dec 2024
EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents
Mengna Zhu
Kaisheng Zeng
Mao Wang
Kaiming Xiao
Lei Hou
Hongbin Huang
Juanzi Li
271
1
0
16 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Ruotong Wang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
108
4
0
12 Dec 2024
CoMA: Compositional Human Motion Generation with Multi-modal Agents
Shanlin Sun
Gabriel De Araujo
Jiaqi Xu
S. Kevin Zhou
Hanwen Zhang
Ziheng Huang
Chenyu You
Xiaohui Xie
102
4
0
10 Dec 2024
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLM
LRM
86
4
0
08 Dec 2024
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
Z. Chen
Tingzhu Chen
Wenjun Zhang
Guangtao Zhai
101
3
0
02 Dec 2024
Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Xi Zhang
Zaiqiao Meng
Jake Lever
Edmond S. L. Ho
MedIm
101
1
0
28 Nov 2024
AMPS: ASR with Multimodal Paraphrase Supervision
Amruta Parulekar
Abhishek Gupta
Sameep Chattopadhyay
Preethi Jyothi
75
0
0
27 Nov 2024
Can LLMs be Good Graph Judge for Knowledge Graph Construction?
Haoyu Huang
Chong Chen
Zeang Sheng
Yang Li
Wentao Zhang
84
1
0
26 Nov 2024
TechCoach: Towards Technical-Point-Aware Descriptive Action Coaching
Yuan-Ming Li
An-Lan Wang
Kun-Yu Lin
Yu-Ming Tang
Ling-an Zeng
Jian-Fang Hu
Wei-Shi Zheng
98
6
0
26 Nov 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
131
73
0
25 Nov 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
Yufei Guo
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
258
0
0
25 Nov 2024
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
Bo Liu
K. Zou
Liming Zhan
Zexin Lu
Xiaoyu Dong
Yidi Chen
Chengqiang Xie
Jiannong Cao
Xiao-Ming Wu
Huazhu Fu
134
0
0
25 Nov 2024
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset
Tobi Olatunji
Charles Nimo
A. Owodunni
Tassallah Abdullahi
Emmanuel Ayodele
...
Michael Best
Irfan Essa
Stephen E. Moore
Chris Fourie
M. Asiedu
LM&MA
86
3
0
23 Nov 2024
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia
Yuesong Nan
Huixi Zhao
Gengdai Liu
EGVM
94
0
0
22 Nov 2024
Human-In-the-Loop Software Development Agents
Wannita Takerngsaksiri
Jirat Pasuksmit
Patanamon Thongtanunam
Chakkrit Tantithamthavorn
Ruixiong Zhang
Fan Jiang
Jing Li
Evan Cook
K. Chen
Ming Wu
LLMAG
108
2
0
19 Nov 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
252
1
0
19 Nov 2024
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions
Moran Yanuka
Assaf Ben-Kish
Yonatan Bitton
Idan Szpektor
Raja Giryes
VLM
47
2
0
13 Nov 2024
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu
Yuxuan Lu
Grant Schoenebeck
Yuqing Kong
36
1
0
11 Nov 2024
Previous
1
2
3
4
5
...
23
24
25
Next