ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11520
  4. Cited By
BARTScore: Evaluating Generated Text as Text Generation

BARTScore: Evaluating Generated Text as Text Generation

22 June 2021
Weizhe Yuan
Graham Neubig
Pengfei Liu
ArXivPDFHTML

Papers citing "BARTScore: Evaluating Generated Text as Text Generation"

50 / 537 papers shown
Title
Evaluating Generative Ad Hoc Information Retrieval
Evaluating Generative Ad Hoc Information Retrieval
Lukas Gienapp
Harrisen Scells
Niklas Deckers
Janek Bevendorff
Shuai Wang
...
Maik Frobe
Guide Zucoon
Benno Stein
Matthias Hagen
Martin Potthast
RALM
42
11
0
08 Nov 2023
Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning
  for Medical Image Captioning
Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning
Zhenyu Zhang
Benlu Wang
Weijie Liang
Yizhi Li
Xuechen Guo
Guanhong Wang
Shiyan Li
Gaoang Wang
MedIm
LM&MA
24
7
0
02 Nov 2023
Defining a New NLP Playground
Defining a New NLP Playground
Sha Li
Chi Han
Pengfei Yu
Carl Edwards
Manling Li
...
Yi R. Fung
Charles Yu
Joel R. Tetreault
Eduard H. Hovy
Heng Ji
41
5
0
31 Oct 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as
  Explainable Metrics
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
Juri Opitz
Daniel Deutsch
Yang Gao
Rotem Dror
Steffen Eger
ALM
LRM
ELM
40
31
0
30 Oct 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative
  Understanding
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
52
4
0
28 Oct 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
Yuchen Shen
Xiaojun Wan
38
9
0
27 Oct 2023
Automatic Logical Forms improve fidelity in Table-to-Text generation
Automatic Logical Forms improve fidelity in Table-to-Text generation
Iñigo Alonso
Eneko Agirre
LMTD
22
3
0
26 Oct 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELM
ALM
62
110
0
26 Oct 2023
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with
  Tool-Augmented Large Language Models
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
Zefan Wang
Zichuan Liu
Yingying Zhang
Aoxiao Zhong
Lunting Fan
Lingfei Wu
Qingsong Wen
43
24
0
25 Oct 2023
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue
  Assessment
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment
Yukun Zhao
Lingyong Yan
Weiwei Sun
Chong Meng
Shuaiqiang Wang
Zhicong Cheng
Zhaochun Ren
Dawei Yin
ELM
20
0
0
25 Oct 2023
Enhancing Biomedical Lay Summarisation with External Knowledge Graphs
Enhancing Biomedical Lay Summarisation with External Knowledge Graphs
Tomas Goldsack
Zhihao Zhang
Chen Tang
Carolina Scarton
Chenghua Lin
18
9
0
24 Oct 2023
Improving Biomedical Abstractive Summarisation with Knowledge
  Aggregation from Citation Papers
Improving Biomedical Abstractive Summarisation with Knowledge Aggregation from Citation Papers
Chen Tang
Shunyu Wang
Tomas Goldsack
Chenghua Lin
32
17
0
24 Oct 2023
A Communication Theory Perspective on Prompting Engineering Methods for
  Large Language Models
A Communication Theory Perspective on Prompting Engineering Methods for Large Language Models
Yuanfeng Song
Yuanqin He
Xuefang Zhao
Hanlin Gu
Di Jiang
Haijun Yang
Lixin Fan
Qiang Yang
40
3
0
24 Oct 2023
Reference Free Domain Adaptation for Translation of Noisy Questions with
  Question Specific Rewards
Reference Free Domain Adaptation for Translation of Noisy Questions with Question Specific Rewards
Baban Gain
Ramakrishna Appicharla
Soumya Chennabasavaraj
Nikesh Garera
Asif Ekbal
M. Chelliah
30
0
0
23 Oct 2023
Language Models Hallucinate, but May Excel at Fact Verification
Language Models Hallucinate, but May Excel at Fact Verification
Jian Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
34
28
0
23 Oct 2023
Which Prompts Make The Difference? Data Prioritization For Efficient
  Human LLM Evaluation
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Marzieh Fadaee
Sara Hooker
ALM
33
18
0
22 Oct 2023
AITA Generating Moral Judgements of the Crowd with Reasoning
AITA Generating Moral Judgements of the Crowd with Reasoning
Osama Bsher
Ameer Sabri
14
0
0
21 Oct 2023
Tuna: Instruction Tuning using Feedback from Large Language Models
Tuna: Instruction Tuning using Feedback from Large Language Models
Haoran Li
Yiran Liu
Xingxing Zhang
Wei Lu
Furu Wei
ALM
38
3
0
20 Oct 2023
Fast and Accurate Factual Inconsistency Detection Over Long Documents
Fast and Accurate Factual Inconsistency Detection Over Long Documents
B. Lattimer
Patrick Chen
Xinyuan Zhang
Yi Yang
HILM
6
18
0
19 Oct 2023
Zero-shot Faithfulness Evaluation for Text Summarization with Foundation
  Language Model
Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model
Qi Jia
Siyu Ren
Yizhu Liu
Kenny Q. Zhu
ALM
HILM
33
16
0
18 Oct 2023
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for
  Text Generation
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara
Daisuke Kawahara
33
2
0
17 Oct 2023
Towards reducing hallucination in extracting information from financial
  reports using Large Language Models
Towards reducing hallucination in extracting information from financial reports using Large Language Models
Bhaskarjit Sarmah
Tianjie Zhu
Dhagash Mehta
Stefano Pasquali
RALM
13
11
0
16 Oct 2023
Progressive Evidence Refinement for Open-domain Multimodal Retrieval
  Question Answering
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
27
2
0
15 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
37
213
0
12 Oct 2023
Towards Better Evaluation of Instruction-Following: A Case-Study in
  Summarization
Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization
Ondrej Skopek
Rahul Aralikatte
Sian Gooding
Victor Carbune
ELM
47
18
0
12 Oct 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and
  Domain-Specificity
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Cunxiang Wang
Xiaoze Liu
Yuanhao Yue
Xiangru Tang
Tianhang Zhang
...
Linyi Yang
Jindong Wang
Xing Xie
Zheng-Wei Zhang
Yue Zhang
HILM
KELM
51
184
0
11 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
79
0
09 Oct 2023
Learning Personalized Alignment for Evaluating Open-ended Text
  Generation
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang
Kevin Kaichuang Yang
Hanlin Zhu
Xiaomeng Yang
Andrew Cohen
Lei Li
Yuandong Tian
ALM
LM&MA
20
8
0
05 Oct 2023
A Survey of GPT-3 Family Large Language Models Including ChatGPT and
  GPT-4
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Katikapalli Subramanyam Kalyan
LM&MA
AI4CE
LRM
AILaw
ELM
43
224
0
04 Oct 2023
Instances Need More Care: Rewriting Prompts for Instances with LLMs in
  the Loop Yields Better Zero-Shot Performance
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance
Saurabh Srivastava
Chengyue Huang
Weiguo Fan
Ziyu Yao
LLMAG
28
5
0
03 Oct 2023
Fusing Models with Complementary Expertise
Fusing Models with Complementary Expertise
Hongyi Wang
Felipe Maia Polo
Yuekai Sun
Souvik Kundu
Eric Xing
Mikhail Yurochkin
FedML
MoMe
28
26
0
02 Oct 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation
  Tasks
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang
Yishan Li
Ge Zhang
Wenhao Huang
Bill Yuchen Lin
Wenhu Chen
ALM
32
58
0
01 Oct 2023
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of
  Biomedical Research Articles
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles
Tomas Goldsack
Jiancheng Yang
Qianqian Xie
Carolina Scarton
Matthew Shardlow
Sophia Ananiadou
Chenghua Lin
38
16
0
29 Sep 2023
Large Language Model Routing with Benchmark Datasets
Large Language Model Routing with Benchmark Datasets
Tal Shnitzer
Anthony Ou
Mírian Silva
Kate Soule
Yuekai Sun
Justin Solomon
Neil Thompson
Mikhail Yurochkin
RALM
16
58
0
27 Sep 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
91
177
0
26 Sep 2023
ALLURE: Auditing and Improving LLM-based Evaluation of Text using
  Iterative In-Context-Learning
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig
Hiteshi Sharma
Leo Betthauser
Felipe Vieira Frujeri
Ida Momennejad
38
15
0
24 Sep 2023
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with
  Large Language Models
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
Kailai Yang
Tianlin Zhang
Zi-Zhou Kuang
Qianqian Xie
Jimin Huang
Sophia Ananiadou
AI4MH
38
47
0
24 Sep 2023
Calibrating LLM-Based Evaluator
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
49
31
0
23 Sep 2023
LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive
  Summarisation
LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation
Jennifer A Bishop
Qianqian Xie
Sophia Ananiadou
HILM
22
9
0
21 Sep 2023
Reranking for Natural Language Generation from Logical Forms: A Study
  based on Large Language Models
Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models
Levon Haroutunian
Zhuang Li
Lucian Galescu
Philip R. Cohen
Raj Tumuluri
Gholamreza Haffari
LRM
31
1
0
21 Sep 2023
Automatic Personalized Impression Generation for PET Reports Using Large
  Language Models
Automatic Personalized Impression Generation for PET Reports Using Large Language Models
Xin Tie
Muheon Shin
Ali Pirasteh
Nevein Ibrahim
Zachary Huemann
...
K. M. Kelly
John W. Garrett
Junjie Hu
Steve Y. Cho
Tyler Bradshaw
LM&MA
27
10
0
18 Sep 2023
Less is More for Long Document Summary Evaluation by LLMs
Less is More for Long Document Summary Evaluation by LLMs
Yunshu Wu
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
24
34
0
14 Sep 2023
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Cognitive Mirage: A Review of Hallucinations in Large Language Models
Hongbin Ye
Tong Liu
Aijia Zhang
Wei Hua
Weiqiang Jia
HILM
48
77
0
13 Sep 2023
Automating Behavioral Testing in Machine Translation
Automating Behavioral Testing in Machine Translation
Javier Ferrando
Matthias Sperber
Hendra Setiawan
Dominic Telaar
Savsa Hasan
30
2
0
05 Sep 2023
Socratis: Are large multimodal models emotionally aware?
Socratis: Are large multimodal models emotionally aware?
Katherine Deng
Arijit Ray
Reuben Tan
Saadia Gabriel
Bryan A. Plummer
Kate Saenko
32
4
0
31 Aug 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
39
4
0
30 Aug 2023
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge
  Selection
Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection
Hongjin Qian
Zhicheng Dou
Jiejun Tan
Haonan Chen
Haoqi Gu
Ruofei Lai
Xinyu Zhang
Bo Zhao
Ji-Rong Wen
29
2
0
30 Aug 2023
WorldSmith: Iterative and Expressive Prompting for World Building with a
  Generative AI
WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI
Hai Dang
Frederik Brudy
G. Fitzmaurice
Fraser Anderson
24
24
0
25 Aug 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for
  Transformers
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
40
7
0
25 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Min Zhang
ELM
LM&MA
38
101
0
24 Aug 2023
Previous
123...567...91011
Next