Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.11520
Cited By
BARTScore: Evaluating Generated Text as Text Generation
22 June 2021
Weizhe Yuan
Graham Neubig
Pengfei Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BARTScore: Evaluating Generated Text as Text Generation"
50 / 535 papers shown
Title
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
Choong Seon Hong
AI4CE
31
0
0
11 May 2025
Summarisation of German Judgments in conjunction with a Class-based Evaluation
Bianca Steffes
Nils Torben Wiedemann
Alexander Gratz
Pamela Hochreither
Jana Elina Meyer
Katharina Luise Schilke
AILaw
ELM
58
0
0
09 May 2025
Adaptive Stress Testing Black-Box LLM Planners
Neeloy Chakraborty
John Pohovey
Melkior Ornik
Katherine Driggs-Campbell
28
0
0
08 May 2025
ConSens: Assessing context grounding in open-book question answering
Ivan Vankov
Matyo Ivanov
Adriana Correia
Victor Botev
ELM
69
0
0
30 Apr 2025
Robust Misinformation Detection by Visiting Potential Commonsense Conflict
Bing Wang
Ximing Li
C. Li
Bingrui Zhao
Bo Fu
Renchu Guan
Shengsheng Wang
53
0
0
30 Apr 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
31
0
0
29 Apr 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
84
0
0
27 Apr 2025
Automated Creativity Evaluation for Large Language Models: A Reference-Based Approach
Ruizhe Li
Chiwei Zhu
Benfeng Xu
Xiaorui Wang
Zhendong Mao
27
0
0
22 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
MusFlow: Multimodal Music Generation via Conditional Flow Matching
Jiahao Song
Yuzhao Wang
37
0
0
18 Apr 2025
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Théo Gigant
Camille Guinaudeau
Frédéric Dufaux
29
0
0
14 Apr 2025
From Punchlines to Predictions: A Metric to Assess LLM Performance in Identifying Humor in Stand-Up Comedy
Adrianna Romanowski
Pedro Valois
Kazuhiro Fukui
36
0
0
12 Apr 2025
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
37
0
0
11 Apr 2025
LLM for Comparative Narrative Analysis
Leo Kampen
Carlos Rabat Villarreal
Louis Yu
Santu Karmaker
Dongji Feng
25
0
0
11 Apr 2025
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
Qi Zhi Lim
C. Lee
K. Lim
Kalaiarasi Sonai Muthu Anbananthen
31
0
0
11 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
46
0
0
10 Apr 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Yashar Deldjoo
Nikhil Mehta
M. Sathiamoorthy
Shuai Zhang
Pablo Castells
Julian McAuley
EGVM
ELM
72
1
0
09 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALM
ELM
49
0
0
09 Apr 2025
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization
Brihi Joshi
Sriram Venkatapathy
Mohit Bansal
Nanyun Peng
Haw-Shiuan Chang
LRM
49
0
0
21 Mar 2025
Can one size fit all?: Measuring Failure in Multi-Document Summarization Domain Transfer
Alexandra DeLucia
Mark Dredze
47
0
0
20 Mar 2025
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings
Austin Xu
Srijan Bansal
Yifei Ming
Semih Yavuz
Chenyu You
ELM
95
3
0
19 Mar 2025
Inspecting the Representation Manifold of Differentially-Private Text
Stefan Arnold
42
0
0
19 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
68
0
0
17 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
51
1
0
14 Mar 2025
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
44
0
0
13 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
54
2
0
08 Mar 2025
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist
Tianjun Wei
Wei Wen
Ruizhi Qiao
Xing Sun
Jianghong Ma
ALM
ELM
50
1
0
07 Mar 2025
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs
Samir Abdaljalil
Hasan Kurban
Parichit Sharma
Erchin Serpedin
Rachad Atat
HILM
58
0
0
07 Mar 2025
Argument Summarization and its Evaluation in the Era of Large Language Models
Moritz Altemeyer
Steffen Eger
Johannes Daxenberger
Tim Altendorf
Philipp Cimiano
Benjamin Schiller
LM&MA
ELM
LRM
67
0
0
02 Mar 2025
Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks
Umar Ali Khan
Ekram Khan
Fiza Khan
A. A. Moinuddin
48
0
0
02 Mar 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao
Xiang Zhang
Raymond Li
Chuyuan Li
Chenyu You
Shafiq Joty
Giuseppe Carenini
59
1
0
27 Feb 2025
MultiOCR-QA: Dataset for Evaluating Robustness of LLMs in Question Answering on Multilingual OCR Texts
Bhawna Piryani
Jamshid Mozafari
Abdelrahman Abdallah
Antoine Doucet
Adam Jatowt
47
1
0
24 Feb 2025
OrderSum: Semantic Sentence Ordering for Extractive Summarization
Taewan Kwon
Sangyong Lee
46
0
0
22 Feb 2025
IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector
Zheng Chen
Yushi Feng
Changyang He
Yue Deng
Hongxi Pu
Bo-wen Li
DeLMO
49
1
0
21 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
60
3
0
21 Feb 2025
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study
Wenwen Xie
Gray Gwizdz
Dongji Feng
85
0
0
20 Feb 2025
G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation
Yuhan Li
Xinni Zhang
Linhao Luo
Heng Chang
Yuxiang Ren
Irwin King
Jiajian Li
60
3
0
18 Feb 2025
Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation
Hieu Nguyen
Zihao He
Shoumik Atul Gandre
Ujjwal Pasupulety
Sharanya Kumari Shivakumar
Kristina Lerman
HILM
59
1
0
16 Feb 2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Zhaoyi Zhou
Yuda Song
Andrea Zanette
ALM
73
0
0
14 Feb 2025
Learning to Substitute Words with Model-based Score Ranking
Hongye Liu
Ricardo Henao
43
0
0
09 Feb 2025
Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance
Borui Xu
Yao Chen
Zeyi Wen
Weiguo Liu
Bingsheng He
79
1
0
02 Feb 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
156
0
0
28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
132
69
0
20 Jan 2025
Clinical Insights: A Comprehensive Review of Language Models in Medicine
Nikita Neveditsin
Pawan Lingras
V. Mago
LM&MA
58
4
0
08 Jan 2025
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions
Mourad Heddaya
Kyle MacMillan
Anup Malani
Hongyuan Mei
Chenhao Tan
AILaw
ELM
34
0
0
03 Jan 2025
Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM
Dong Yuan
Eti Rastogi
Fen Zhao
Sagar Goyal
Gautam Naik
Sree Prasanna Rajagopal
44
0
0
31 Dec 2024
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja
Vivek Iyer
Claire He
Graham Neubig
ViT
90
1
0
18 Dec 2024
Coverage-based Fairness in Multi-document Summarization
Haoyuan Li
Yusen Zhang
Rui Zhang
Snigdha Chaturvedi
80
0
0
11 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
86
0
0
10 Dec 2024
1
2
3
4
...
9
10
11
Next