Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
v1
v2
v3 (latest)
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 3,519 papers shown
Title
PerSEval: Assessing Personalization in Text Summarizers
Sourish Dasgupta
Ankush Chander
Parth Borad
Isha Motiyani
Tanmoy Chakraborty
85
1
0
29 Jun 2024
SHADE: Semantic Hypernym Annotator for Domain-specific Entities -- DnD Domain Use Case
Akila Peiris
Nisansa de Silva
47
1
0
29 Jun 2024
Detection and Measurement of Syntactic Templates in Generated Text
Chantal Shaib
Yanai Elazar
Junyi Jessy Li
Byron C. Wallace
90
20
0
28 Jun 2024
IDT: Dual-Task Adversarial Attacks for Privacy Protection
Pedro Faustini
Shakila Mahjabin Tonni
Annabelle McIver
Xingliang Yuan
Mark Dras
SILM
AAML
88
0
0
28 Jun 2024
CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Zhi-Long Ji
Jin-Feng Bai
Zhen-Ru Pan
Fan-Hu Zeng
Jian Xu
Jia-Xin Zhang
Cheng-Lin Liu
ELM
98
14
0
28 Jun 2024
Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT
Ibrahim Said Ahmad
Shiran Dudy
R. Ramachandranpillai
Kenneth Church
86
6
0
27 Jun 2024
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks
Ibrahim Abdelaziz
Kinjal Basu
Mayank Agarwal
Yara Rizk
Matthew Stallone
...
Merve Unuvar
David D. Cox
Salim Roukos
Luis A. Lastras
Pavan Kapanipathi
LLMAG
92
24
0
27 Jun 2024
DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew Lan
AAML
95
4
0
27 Jun 2024
Two-Pronged Human Evaluation of ChatGPT Self-Correction in Radiology Report Simplification
Ziyu Yang
Santhosh Cherian
Slobodan Vucetic
MedIm
96
0
0
27 Jun 2024
Can Large Language Models Generate High-quality Patent Claims?
Lekang Jiang
Caiqi Zhang
Pascal A Scherz
Stephan Goetz
ELM
125
7
0
27 Jun 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
91
8
0
26 Jun 2024
Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems
Italo Luis da Silva
Hanqi Yan
Lin Gui
Yulan He
CML
109
0
0
26 Jun 2024
Selective Prompting Tuning for Personalized Conversations with LLMs
Qiushi Huang
Xubo Liu
Tom Ko
Bo Wu
Wenwu Wang
Yu Zhang
Lilian H. Y. Tang
87
8
0
26 Jun 2024
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs
Ahmed Heakl
Youssef Zaghloul
Mennatullah Ali
Rania Hossam
Walid Gomaa
38
4
0
26 Jun 2024
Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need
Yang Wang
Alberto Garcia Hernandez
Roman Kyslyi
Nicholas S. Kersting
101
3
0
26 Jun 2024
PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Huixuan Zhang
Yun Lin
Xiaojun Wan
144
0
0
26 Jun 2024
CTBench: A Comprehensive Benchmark for Evaluating Language Model Capabilities in Clinical Trial Design
Nafis Neehal
Bowen Wang
Shayom Debopadhaya
Soham Dan
K. Murugesan
Vibha Anand
Kristin P. Bennett
LM&MA
ELM
101
2
0
25 Jun 2024
Disce aut Deficere: Evaluating LLMs Proficiency on the INVALSI Italian Benchmark
Fabio Mercorio
Mario Mezzanzanica
Daniele Potertì
Antonio Serino
Andrea Seveso
113
5
0
25 Jun 2024
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance
Manon Reusens
Philipp Borchert
Jochen De Weerdt
Bart Baesens
161
2
0
25 Jun 2024
CausalScore: An Automatic Reference-Free Metric for Assessing Response Relevance in Open-Domain Dialogue Systems
Tao Feng
Zhuang Li
Xiaoxi Kang
Gholamreza Haffari
67
1
0
25 Jun 2024
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Shane Arora
Marzena Karpinska
Hung-Ting Chen
Ipsita Bhattacharjee
Mohit Iyyer
Eunsol Choi
HILM
148
14
0
25 Jun 2024
Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts
Aditya Sharma
Michael Saxon
William Yang Wang
VLM
70
2
0
24 Jun 2024
RaTEScore: A Metric for Radiology Report Generation
W. Zhao
Chaoyi Wu
Xiechi Zhang
Ya Zhang
Yanfeng Wang
Weidi Xie
105
12
0
24 Jun 2024
Exploring the Capability of Mamba in Speech Applications
Koichi Miyazaki
Yoshiki Masuyama
Masato Murata
Mamba
102
15
0
24 Jun 2024
The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories
Xi Yu Huang
Krishnapriya Vishnubhotla
Frank Rudzicz
74
3
0
24 Jun 2024
Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting
Jiyue Jiang
Liheng Chen
Sheng Wang
Lingpeng Kong
Yu Li
Chuan Wu
85
0
0
24 Jun 2024
Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback
Jimin Sohn
Jeihee Cho
Junyong Lee
Songmu Heo
Ji-Eun Han
David R. Mortensen
LRM
87
0
0
24 Jun 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
78
2
0
24 Jun 2024
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu
Yu-Xiang Lin
Tsui-Wei Weng
108
2
0
24 Jun 2024
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Jiangshu Du
Yibo Wang
Wenting Zhao
Zhongfen Deng
Shuaiqi Liu
...
Eduardo Blanco
Yixin Cao
Rui Zhang
Philip S. Yu
Wenpeng Yin
88
34
0
24 Jun 2024
PlagBench: Exploring the Duality of Large Language Models in Plagiarism Generation and Detection
Jooyoung Lee
Toshini Agrawal
Adaku Uchendu
Thai V. Le
Jinghui Chen
Dongwon Lee
183
1
0
24 Jun 2024
FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models
Junyi Zhu
Shuochen Liu
Yu Yu
Bo Tang
Yibo Yan
Zhiyu Li
Feiyu Xiong
Tong Xu
Matthew B. Blaschko
86
5
0
23 Jun 2024
Uncovering Hidden Intentions: Exploring Prompt Recovery for Deeper Insights into Generated Texts
Louis Give
Timo Zaoral
Maria Antonietta Bruno
52
1
0
22 Jun 2024
Evaluating Diversity in Automatic Poetry Generation
Yanran Chen
Hannes Groner
Sina Zarrieß
Steffen Eger
98
11
0
21 Jun 2024
Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics
Weijia Zhang
Mohammad Aliannejadi
Yifei Yuan
Jiahuan Pei
Jia-Hong Huang
Evangelos Kanoulas
HILM
88
13
0
21 Jun 2024
A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation
I. Zubiaga
A. Soroa
Rodrigo Agerri
74
6
0
21 Jun 2024
Harnessing Knowledge Retrieval with Large Language Models for Clinical Report Error Correction
Jinge Wu
Zhaolong Wu
Abul Hasan
Yunsoo Kim
Jason PY Cheung
Teng Zhang
Honghan Wu
3DV
60
0
0
21 Jun 2024
AgriLLM: Harnessing Transformers for Farmer Queries
Krish Didwania
Pratinav Seth
Aditya Kasliwal
Amit Agarwal
66
0
0
21 Jun 2024
LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models
Mengdan Zhu
Raasikh Kanjiani
Jiahui Lu
Andrew Choi
Qirui Ye
Liang Zhao
DiffM
90
1
0
21 Jun 2024
Is this a bad table? A Closer Look at the Evaluation of Table Generation from Text
Pritika Ramu
Aparna Garimella
Sambaran Bandyopadhyay
LMTD
72
3
0
21 Jun 2024
Word Matters: What Influences Domain Adaptation in Summarization?
Yinghao Li
Siyu Miao
Heyan Huang
Yang Gao
87
4
0
21 Jun 2024
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Daniil Larionov
Mikhail Seleznyov
Vasiliy Viskov
Alexander Panchenko
Steffen Eger
46
3
0
20 Jun 2024
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary
Xingmeng Zhao
Tongnian Wang
Anthony Rios
LM&MA
117
2
0
20 Jun 2024
Step-Back Profiling: Distilling User History for Personalized Scientific Writing
Xiangru Tang
Xingyao Zhang
Yanjun Shao
Jie Wu
Yilun Zhao
Arman Cohan
Ming Gong
Dongmei Zhang
Mark B. Gerstein
128
3
0
20 Jun 2024
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics
Seungbeen Lee
Seungwon Lim
Seungju Han
Giyeong Oh
Hyungjoo Chae
...
Beong-woo Kwak
Yeonsoo Lee
Dongha Lee
Jinyoung Yeo
Youngjae Yu
101
16
0
20 Jun 2024
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World
Weixiang Yan
Haitian Liu
Tengxiao Wu
Qian Chen
Wen Wang
...
Jiayi Wang
Weishan Zhao
Yixin Zhang
Renjun Zhang
Li Zhu
LM&MA
88
13
0
19 Jun 2024
LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application
Zhe Huang
John Pohovey
Ananya Yammanuru
Katherine Driggs-Campbell
LM&Ro
81
2
0
19 Jun 2024
MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language
Shun Wang
Ge Zhang
Han Wu
Tyler Loakman
Wenhao Huang
Chenghua Lin
59
2
0
19 Jun 2024
Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models
Akchay Srivastava
Atif Memon
ELM
85
1
0
19 Jun 2024
AgentReview: Exploring Peer Review Dynamics with LLM Agents
Yiqiao Jin
Qinlin Zhao
Yiyang Wang
Hao Chen
Kaijie Zhu
Yijia Xiao
Jindong Wang
LLMAG
129
27
0
18 Jun 2024
Previous
1
2
3
...
21
22
23
...
69
70
71
Next