Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.09675
Cited By
v1
v2
v3 (latest)
BERTScore: Evaluating Text Generation with BERT
21 April 2019
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"BERTScore: Evaluating Text Generation with BERT"
50 / 3,520 papers shown
Title
Minimizing Factual Inconsistency and Hallucination in Large Language Models
Muneeswaran Irulandi
Shreya Saxena
Siva Prasad
M. V. Sai Prakash
Advaith Shankar
V. Varun
Vishal Vaddina
Saisubramaniam Gopalakrishnan
HILM
75
6
0
23 Nov 2023
A Cross Attention Approach to Diagnostic Explainability using Clinical Practice Guidelines for Depression
Sumit Dalal
Deepa Tilwani
Kaushik Roy
Manas Gaur
Sarika Jain
V. Shalin
Amit P. Sheth
84
7
0
23 Nov 2023
Comparative Experimentation of Accuracy Metrics in Automated Medical Reporting: The Case of Otitis Consultations
Wouter Faber
Renske Eline Bootsma
Tom Huibers
S. Dulmen
S. Brinkkemper
23
1
0
22 Nov 2023
Attribution and Alignment: Effects of Local Context Repetition on Utterance Production and Comprehension in Dialogue
Aron Molnar
Jaap Jumelet
Mario Giulianelli
Arabella J. Sinclair
68
2
0
21 Nov 2023
Evaluation Metrics of Language Generation Models for Synthetic Traffic Generation Tasks
Simone Filice
J. Choi
Giuseppe Castellucci
Eugene Agichtein
Oleg Rokhlenko
38
0
0
21 Nov 2023
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation
Jiaxin Ge
Sanjay Subramanian
Trevor Darrell
Boyi Li
LRM
104
4
0
21 Nov 2023
Unifying Corroborative and Contributive Attributions in Large Language Models
Theodora Worledge
Judy Hanwen Shen
Nicole Meister
Caleb Winston
Carlos Guestrin
TDI
92
13
0
20 Nov 2023
Human Learning by Model Feedback: The Dynamics of Iterative Prompting with Midjourney
Shachar Don-Yehiya
Leshem Choshen
Omri Abend
71
7
0
20 Nov 2023
Automatic Analysis of Substantiation in Scientific Peer Reviews
Yanzhu Guo
Guokan Shang
Virgile Rennard
Michalis Vazirgiannis
Chloé Clavel
80
8
0
20 Nov 2023
Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning
Quanyu Long
Wenya Wang
Sinno Jialin Pan
109
14
0
20 Nov 2023
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation
Nurbanu Aksoy
Serge Sharoff
Selçuk Başer
Nishant Ravikumar
Alejandro F Frangi
MedIm
59
5
0
18 Nov 2023
Countering Misinformation via Emotional Response Generation
Daniel Russo
Shane P. Kaszefski-Yaschuk
Jacopo Staiano
Marco Guerini
OffRL
82
10
0
17 Nov 2023
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
Ilaria Manco
Benno Weck
Seungheon Doh
Minz Won
Yixiao Zhang
...
Philip Tovstogan
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
Juhan Nam
83
30
0
16 Nov 2023
Language Generation from Brain Recordings
Ziyi Ye
Qingyao Ai
Yiqun Liu
Maarten de Rijke
Min Zhang
Christina Lioma
Tuukka Ruotsalo
36
0
0
16 Nov 2023
PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization
Joseph Peper
Wenzhao Qiu
Lu Wang
56
0
0
16 Nov 2023
Dial
BeInfo
for
Faithfulness
\textit{Dial BeInfo for Faithfulness}
Dial BeInfo for Faithfulness
: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning
E. Razumovskaia
Ivan Vulić
Pavle Marković
Tomasz Cichy
Qian Zheng
Tsung-Hsien Wen
Paweł Budzianowski
HILM
95
10
0
16 Nov 2023
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
97
11
0
16 Nov 2023
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu
N. Moosavi
Chenghua Lin
ELM
114
55
0
16 Nov 2023
Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring
Yuhang Li
Yihan Wang
Zhouxing Shi
Cho-Jui Hsieh
WaLM
56
7
0
16 Nov 2023
Event Causality Is Key to Computational Story Understanding
Yidan Sun
Qin Chao
Boyang Albert Li
73
9
0
16 Nov 2023
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
Shivanshu Gupta
Clemens Rosenbaum
Ethan R. Elenberg
LRM
77
8
0
16 Nov 2023
Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization
Gaurav Sahu
Olga Vechtomova
I. Laradji
77
1
0
16 Nov 2023
Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation
Yifu Qiu
Varun R. Embar
Shay B. Cohen
Benjamin Han
61
4
0
16 Nov 2023
MacGyver: Are Large Language Models Creative Problem Solvers?
Yufei Tian
Abhilasha Ravichander
Lianhui Qin
Ronan Le Bras
Raja Marjieh
Nanyun Peng
Yejin Choi
Thomas Griffiths
Faeze Brahman
AI4CE
LLMAG
117
14
0
16 Nov 2023
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Wenjie Mo
Lyne Tchapmi
Qin Liu
Jiong Wang
Jun Yan
Chaowei Xiao
Muhao Chen
Muhao Chen
AAML
146
20
0
16 Nov 2023
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
G. Chrysostomou
Zhixue Zhao
Miles Williams
Nikolaos Aletras
HILM
74
11
0
15 Nov 2023
Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey
Ashok Urlana
Pruthwik Mishra
Tathagato Roy
Rahul Mishra
78
11
0
15 Nov 2023
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Lei Shu
Nevan Wichers
Liangchen Luo
Yun Zhu
Yinxiao Liu
Jindong Chen
Lei Meng
ELM
79
4
0
15 Nov 2023
Towards Verifiable Text Generation with Symbolic References
Lucas Torroba Hennigen
Zejiang Shen
Aniruddha Nrusimha
Bernhard Gapp
David Sontag
Yoon Kim
103
14
0
15 Nov 2023
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers
Sheshera Mysore
Zhuoran Lu
Mengting Wan
Longqi Yang
Steve Menezes
Tina Baghaee
Emmanuel Barajas Gonzalez
Jennifer Neville
Tara Safavi
RALM
140
43
0
15 Nov 2023
Exploring the Potential of Large Language Models in Computational Argumentation
Guizhen Chen
Liying Cheng
Anh Tuan Luu
Lidong Bing
LLMAG
LRM
61
30
0
15 Nov 2023
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Minqian Liu
Ying Shen
Zhiyang Xu
Yixin Cao
Eunah Cho
Vaibhav Kumar
Reza Ghanadan
Lifu Huang
ELM
LM&MA
ALM
144
30
0
15 Nov 2023
Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding
H. Amrani
D. Micucci
Paolo Napoletano
83
6
0
15 Nov 2023
Evaluating Robustness of Dialogue Summarization Models in the Presence of Naturally Occurring Variations
Ankita Gupta
Chulaka Gunasekara
H. Wan
Jatin Ganhotra
Sachindra Joshi
Marina Danilevsky
71
0
0
15 Nov 2023
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan
Haitian Liu
Yunkun Wang
Yunzhe Li
Qian Chen
...
Tingyu Lin
Weishan Zhao
Li Zhu
Hari Sundaram
Shuiguang Deng
ELM
LRM
144
37
0
14 Nov 2023
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations
Wenting Zhao
Justin T Chiu
Jena D. Hwang
Faeze Brahman
Jack Hessel
Sanjiban Choudhury
Yejin Choi
Xiang Lorraine Li
Alane Suhr
LRM
ReLM
114
12
0
14 Nov 2023
A Ship of Theseus: Curious Cases of Paraphrasing in LLM-Generated Texts
Nafis Irtiza Tripto
Saranya Venkatraman
Dominik Macko
Robert Moro
Ivan Srba
Adaku Uchendu
Thai V. Le
Dongwon Lee
DeLMO
102
21
0
14 Nov 2023
Extrinsically-Focused Evaluation of Omissions in Medical Summarization
Elliot Schumacher
Daniel Rosenthal
Varun Nair
Luladay Price
Geoffrey Tso
Anitha Kannan
44
2
0
14 Nov 2023
Workflow-Guided Response Generation for Task-Oriented Dialogue
Do June Min
Paloma Sodhi
Ramya Ramakrishnan
72
0
0
14 Nov 2023
VERVE: Template-based ReflectiVE Rewriting for MotiVational IntErviewing
Do June Min
Verónica Pérez-Rosas
Kenneth Resnicow
Rada Mihalcea
OffRL
43
9
0
14 Nov 2023
A Survey of Confidence Estimation and Calibration in Large Language Models
Jiahui Geng
Fengyu Cai
Yuxia Wang
Heinz Koeppl
Preslav Nakov
Iryna Gurevych
UQCV
150
82
0
14 Nov 2023
Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction
Kunting Li
Yong Hu
Shaolei Wang
Hanhan Ma
Liang He
Fandong Meng
Jie Zhou
107
1
0
14 Nov 2023
Fair Abstractive Summarization of Diverse Perspectives
Yusen Zhang
Nan Zhang
Yixin Liu
Alexander R. Fabbri
Junru Liu
...
Caiming Xiong
Jieyu Zhao
Dragomir R. Radev
Kathleen McKeown
Rui Zhang
79
11
0
14 Nov 2023
Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA
Dhruv Agarwal
Rajarshi Das
Sopan Khosla
Rashmi Gangadharaiah
OffRL
73
8
0
14 Nov 2023
GreekT5: A Series of Greek Sequence-to-Sequence Models for News Summarization
Nikolaos Giarelis
Charalampos Mastrokostas
N. Karacapilidis
73
3
0
13 Nov 2023
Using Natural Language Explanations to Improve Robustness of In-context Learning
Xuanli He
Yuxiang Wu
Oana-Maria Camburu
Pasquale Minervini
Pontus Stenetorp
AAML
75
1
0
13 Nov 2023
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers
Haowen Pan
Yixin Cao
Xiaozhi Wang
Xun Yang
Meng Wang
KELM
108
27
0
13 Nov 2023
InCA: Rethinking In-Car Conversational System Assessment Leveraging Large Language Models
Ken E. Friedl
Abbas Goher Khan
S. Sahoo
Md. Rony
Jana Germies
Christian Süß
72
3
0
13 Nov 2023
ChartCheck: Explainable Fact-Checking over Real-World Chart Images
Mubashara Akhtar
Nikesh Subedi
Vivek Gupta
Sahar Tahmasebi
O. Cocarascu
Elena Simperl
HAI
106
7
0
13 Nov 2023
LM-Polygraph: Uncertainty Estimation for Language Models
Ekaterina Fadeeva
Roman Vashurin
Akim Tsvigun
Artem Vazhentsev
Sergey Petrakov
...
Elizaveta Goncharova
Alexander Panchenko
Maxim Panov
Timothy Baldwin
Artem Shelmanov
62
69
0
13 Nov 2023
Previous
1
2
3
...
35
36
37
...
69
70
71
Next