ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11520
  4. Cited By
BARTScore: Evaluating Generated Text as Text Generation

BARTScore: Evaluating Generated Text as Text Generation

22 June 2021
Weizhe Yuan
Graham Neubig
Pengfei Liu
ArXivPDFHTML

Papers citing "BARTScore: Evaluating Generated Text as Text Generation"

50 / 535 papers shown
Title
Is my Meeting Summary Good? Estimating Quality with a Multi-LLM
  Evaluator
Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator
Frederic Kirstein
Terry Ruas
Bela Gipp
87
2
0
27 Nov 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
123
70
0
25 Nov 2024
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with
  Query-Aware Model Scaling
DiffServe: Efficiently Serving Text-to-Image Diffusion Models with Query-Aware Model Scaling
Sohaib Ahmad
Qizheng Yang
Haoliang Wang
Ramesh K. Sitaraman
Hui Guan
78
1
0
22 Nov 2024
Benchmarking LLMs' Judgments with No Gold Standard
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu
Yuxuan Lu
Grant Schoenebeck
Yuqing Kong
34
1
0
11 Nov 2024
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
31
1
0
08 Nov 2024
IdeaBench: Benchmarking Large Language Models for Research Idea
  Generation
IdeaBench: Benchmarking Large Language Models for Research Idea Generation
Sikun Guo
Amir Hassan Shariatmadari
Guangzhi Xiong
Albert Huang
Eric Xie
Stefan Bekiranov
Aidong Zhang
LM&MA
40
8
0
31 Oct 2024
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Jaechang Kim
Jinmin Goh
Inseok Hwang
Jaewoong Cho
Jungseul Ok
ELM
33
1
0
28 Oct 2024
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection
  Bias in LLMs-as-Judges
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges
Haitao Li
Junjie Chen
Qingyao Ai
Zhumin Chu
Yujia Zhou
Qian Dong
Yiqun Liu
46
8
0
20 Oct 2024
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping
  Language-Image Pre-training
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training
Muhe Ding
Yang Ma
Pengda Qin
Jianlong Wu
Yuhong Li
Liqiang Nie
23
1
0
18 Oct 2024
Enabling Scalable Evaluation of Bias Patterns in Medical LLMs
Enabling Scalable Evaluation of Bias Patterns in Medical LLMs
Hamed Fayyaz
Raphael Poulain
Rahmatollah Beheshti
40
1
0
18 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
39
0
0
14 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
46
3
0
14 Oct 2024
EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of
  LLMs
EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMs
Yijie Li
Yuan Sun
ELM
31
0
0
13 Oct 2024
Audio Description Generation in the Era of LLMs and VLMs: A Review of
  Transferable Generative AI Technologies
Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies
Yingqiang Gao
Lukas Fischer
Alexa Lintner
Sarah Ebling
36
0
0
11 Oct 2024
SocialGaze: Improving the Integration of Human Social Norms in Large
  Language Models
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
Anvesh Rao Vijjini
Rakesh R Menon
Jiayi Fu
Shashank Srivastava
Snigdha Chaturvedi
ALM
34
0
0
11 Oct 2024
JurEE not Judges: safeguarding llm interactions with small, specialised
  Encoder Ensembles
JurEE not Judges: safeguarding llm interactions with small, specialised Encoder Ensembles
Dom Nasrabadi
31
1
0
11 Oct 2024
Multi-Facet Counterfactual Learning for Content Quality Evaluation
Multi-Facet Counterfactual Learning for Content Quality Evaluation
Jiasheng Zheng
Hongyu Lin
Boxi Cao
M. Liao
Yunfan LU
Xianpei Han
Le Sun
32
0
0
10 Oct 2024
AuditWen:An Open-Source Large Language Model for Audit
AuditWen:An Open-Source Large Language Model for Audit
Jiajia Huang
Haoran Zhu
Chao Xu
Tianming Zhan
Qianqian Xie
J. Huang
20
0
0
09 Oct 2024
Mitigating the Impact of Reference Quality on Evaluation of
  Summarization Systems with Reference-Free Metrics
Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics
Théo Gigant
Camille Guinaudeau
Marc Decombas
Frédéric Dufaux
50
1
0
08 Oct 2024
CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models
  Using Discrete Concept
CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept
YuXuan Wu
Bonaventure F. P. Dossou
Dianbo Liu
MU
26
0
0
08 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
31
4
0
07 Oct 2024
CodeJudge: Evaluating Code Generation with Large Language Models
CodeJudge: Evaluating Code Generation with Large Language Models
Weixi Tong
Tianyi Zhang
ELM
ALM
39
8
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
42
7
0
03 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
48
9
0
03 Oct 2024
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
Yuchen Fan
Xin Zhong
Heng Zhou
Yuchen Zhang
Mingyu Liang
Chengxing Xie
Ermo Hua
Ning Ding
Bowen Zhou
ALM
ELM
31
0
0
02 Oct 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
31
1
0
28 Sep 2024
Model-based Preference Optimization in Abstractive Summarization without
  Human Feedback
Model-based Preference Optimization in Abstractive Summarization without Human Feedback
Jaepill Choi
Kyubyung Chae
Jiwoo Song
Yohan Jo
Taesup Kim
24
0
0
27 Sep 2024
Evaluation of Large Language Models for Summarization Tasks in the
  Medical Domain: A Narrative Review
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILM
ELM
LM&MA
37
1
0
26 Sep 2024
Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness
Zero-Shot Detection of LLM-Generated Text using Token Cohesiveness
Shixuan Ma
Quan Wang
40
2
0
25 Sep 2024
Pre-trained Language Models Return Distinguishable Probability
  Distributions to Unfaithfully Hallucinated Texts
Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts
Taehun Cha
Donghun Lee
HILM
29
1
0
25 Sep 2024
Local Explanations and Self-Explanations for Assessing Faithfulness in
  black-box LLMs
Local Explanations and Self-Explanations for Assessing Faithfulness in black-box LLMs
Christos Fragkathoulas
Odysseas S. Chlapanis
LRM
25
0
0
18 Sep 2024
Extract-and-Abstract: Unifying Extractive and Abstractive Summarization
  within Single Encoder-Decoder Framework
Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework
Yuping Wu
Hao Li
Hongbo Zhu
Goran Nenadic
Xiao-Jun Zeng
31
0
0
18 Sep 2024
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation
  for Meeting Summarization
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization
Ziwei Gong
Lin Ai
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Zehui Wu
Ahmad Emami
Julia Hirschberg
44
2
0
17 Sep 2024
ReflectDiffu:Reflect between Emotion-intent Contagion and Mimicry for
  Empathetic Response Generation via a RL-Diffusion Framework
ReflectDiffu:Reflect between Emotion-intent Contagion and Mimicry for Empathetic Response Generation via a RL-Diffusion Framework
Jiahao Yuan
Zixiang Di
Zhiqing Cui
Guisong Yang
Usman Naseem
68
0
0
16 Sep 2024
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for
  Fine-grained Text Evaluations
Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations
Abe Bohan Hou
William Jurayj
Nils Holzenberger
Andrew Blair-Stanek
Benjamin Van Durme
ELM
28
0
0
16 Sep 2024
NovAScore: A New Automated Metric for Evaluating Document Level Novelty
NovAScore: A New Automated Metric for Evaluating Document Level Novelty
Lin Ai
Ziwei Gong
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Ahmad Emami
Julia Hirschberg
18
1
0
14 Sep 2024
Cross-Refine: Improving Natural Language Explanation Generation by
  Learning in Tandem
Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem
Qianli Wang
Tatiana Anikina
Nils Feldhus
Simon Ostermann
Sebastian Möller
Vera Schmitt
LRM
38
0
0
11 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
CLUE: Concept-Level Uncertainty Estimation for Large Language Models
CLUE: Concept-Level Uncertainty Estimation for Large Language Models
Yu-Hsiang Wang
Andrew Bai
Che-Ping Tsai
Cho-Jui Hsieh
LRM
34
0
0
04 Sep 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
46
5
0
26 Aug 2024
A Comparative Analysis of Faithfulness Metrics and Humans in Citation
  Evaluation
A Comparative Analysis of Faithfulness Metrics and Humans in Citation Evaluation
Weijia Zhang
Mohammad Aliannejadi
Jiahuan Pei
Yifei Yuan
Jia-Hong Huang
Evangelos Kanoulas
HILM
45
4
0
22 Aug 2024
CT-AGRG: Automated Abnormality-Guided Report Generation from 3D Chest CT Volumes
CT-AGRG: Automated Abnormality-Guided Report Generation from 3D Chest CT Volumes
Theo Di Piazza
32
0
0
21 Aug 2024
Summarizing long regulatory documents with a multi-step pipeline
Summarizing long regulatory documents with a multi-step pipeline
Mika Sie
Ruby Beek
Michiel Bots
S. Brinkkemper
Albert Gatt
AILaw
ELM
29
1
0
19 Aug 2024
Automatic Metrics in Natural Language Generation: A Survey of Current
  Evaluation Practices
Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices
Patrícia Schmidtová
Saad Mahamood
Simone Balloccu
Ondřej Dušek
Albert Gatt
Dimitra Gkatzia
David M. Howcroft
Ondřej Plátek
Adarsa Sivaprasad
45
3
0
17 Aug 2024
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
44
5
0
14 Aug 2024
Speech vs. Transcript: Does It Matter for Human Annotators in Speech
  Summarization?
Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?
Roshan S. Sharma
Suwon Shon
Mark Lindsey
Hira Dhamyal
Rita Singh
Bhiksha Raj
56
1
0
12 Aug 2024
Zero-shot Factual Consistency Evaluation Across Domains
Zero-shot Factual Consistency Evaluation Across Domains
Raunak Agarwal
HILM
47
0
0
07 Aug 2024
'Finance Wizard' at the FinLLM Challenge Task: Financial Text
  Summarization
'Finance Wizard' at the FinLLM Challenge Task: Financial Text Summarization
Meisin Lee
Soon Lay-Ki
34
2
0
07 Aug 2024
DebateQA: Evaluating Question Answering on Debatable Knowledge
DebateQA: Evaluating Question Answering on Debatable Knowledge
Rongwu Xu
Xuan Qi
Zehan Qi
Wei Xu
Zhijiang Guo
ELM
53
5
0
02 Aug 2024
VolDoGer: LLM-assisted Datasets for Domain Generalization in
  Vision-Language Tasks
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks
Juhwan Choi
Junehyoung Kwon
Jungmin Yun
Seunguk Yu
Youngbin Kim
46
1
0
29 Jul 2024
Previous
12345...91011
Next