ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.04048
  4. Cited By
Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
    LM&MA
    ELM
    ALM
    AI4MH
ArXivPDFHTML

Papers citing "Is ChatGPT a Good NLG Evaluator? A Preliminary Study"

50 / 289 papers shown
Title
Is ChatGPT a Good Multi-Party Conversation Solver?
Is ChatGPT a Good Multi-Party Conversation Solver?
Chao-Hong Tan
Jia-Chen Gu
Zhen-Hua Ling
19
9
0
25 Oct 2023
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large
  Language Models for Data Annotation
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation
Minzhi Li
Taiwei Shi
Caleb Ziems
Min-Yen Kan
Nancy F. Chen
Zhengyuan Liu
Diyi Yang
29
68
0
24 Oct 2023
Language Models Hallucinate, but May Excel at Fact Verification
Language Models Hallucinate, but May Excel at Fact Verification
Jian-Yu Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
31
28
0
23 Oct 2023
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
Yating Wu
Ritika Mangla
Greg Durrett
Junyi Jessy Li
39
12
0
23 Oct 2023
Zero-shot Faithfulness Evaluation for Text Summarization with Foundation
  Language Model
Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model
Qi Jia
Siyu Ren
Yizhu Liu
Kenny Q. Zhu
ALM
HILM
33
16
0
18 Oct 2023
On Context Utilization in Summarization with Large Language Models
On Context Utilization in Summarization with Large Language Models
Mathieu Ravaut
Aixin Sun
Nancy F. Chen
Shafiq R. Joty
36
13
0
16 Oct 2023
How Good is ChatGPT in Giving Advice on Your Visualization Design?
How Good is ChatGPT in Giving Advice on Your Visualization Design?
Nam Wook Kim
Grace Myers
Benjamin Bach
28
20
0
14 Oct 2023
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational
  Interaction
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Joshua Gorniak
Yoon Kim
Donglai Wei
Nam Wook Kim
32
8
0
14 Oct 2023
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for
  Knowledge-Grounded Dialogue
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue
Lang Qin
Yao Zhang
Hongru Liang
Jun Wang
Zhenglu Yang
29
3
0
11 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
44
166
0
11 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELM
ALM
LM&MA
35
13
0
09 Oct 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code
  Translation
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan
Yuchen Tian
Yunzhe Li
Qian Chen
Wen Wang
34
35
0
08 Oct 2023
EcoAssistant: Using LLM Assistant More Affordably and Accurately
EcoAssistant: Using LLM Assistant More Affordably and Accurately
Jieyu Zhang
Ranjay Krishna
Ahmed Hassan Awadallah
Chi Wang
30
34
0
03 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable
  Autonomous Driving
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
15
180
0
03 Oct 2023
BooookScore: A systematic exploration of book-length summarization in
  the era of LLMs
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang
Kyle Lo
Tanya Goyal
Mohit Iyyer
ALM
21
106
0
01 Oct 2023
SocREval: Large Language Models with the Socratic Method for
  Reference-Free Reasoning Evaluation
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
28
13
0
29 Sep 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
43
75
0
29 Sep 2023
Curriculum-Driven Edubot: A Framework for Developing Language Learning
  Chatbots Through Synthesizing Conversational Data
Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots Through Synthesizing Conversational Data
Yu Li
Shang Qu
Jili Shen
Shangchao Min
Zhou Yu
47
16
0
28 Sep 2023
Question-Answering Approach to Evaluating Legal Summaries
Question-Answering Approach to Evaluating Legal Summaries
Huihui Xu
Kevin D. Ashley
AILaw
ELM
27
3
0
26 Sep 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
91
177
0
26 Sep 2023
ALLURE: Auditing and Improving LLM-based Evaluation of Text using
  Iterative In-Context-Learning
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig
Hiteshi Sharma
Leo Betthauser
Felipe Vieira Frujeri
Ida Momennejad
38
15
0
24 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling
  Capacities of Large Language Models
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
30
33
0
23 Sep 2023
Calibrating LLM-Based Evaluator
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
49
31
0
23 Sep 2023
Automatic Answerability Evaluation for Question Generation
Automatic Answerability Evaluation for Question Generation
Zifan Wang
Kotaro Funakoshi
Manabu Okumura
34
2
0
22 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
Summarization is (Almost) Dead
Summarization is (Almost) Dead
Xiao Pu
Mingqi Gao
Xiaojun Wan
HILM
81
39
0
18 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
30
11
0
16 Sep 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up
  Multilingual Evaluation?
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
35
61
0
14 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation
  Suite for Large Language Models
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
26
9
0
12 Sep 2023
FaNS: a Facet-based Narrative Similarity Metric
FaNS: a Facet-based Narrative Similarity Metric
Mousumi Akter
Shubhra (Santu) Karmaker
25
1
0
09 Sep 2023
EPA: Easy Prompt Augmentation on Large Language Models via Multiple
  Sources and Multiple Targets
EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets
Hongyuan Lu
Wai Lam
21
1
0
09 Sep 2023
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing
  Idiomatic Translation with Language Models
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models
Shuang Li
Jiangjie Chen
Siyu Yuan
Xinyi Wu
Hao Yang
Shimin Tao
Yanghua Xiao
48
15
0
26 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Erik Cambria
ELM
LM&MA
33
101
0
24 Aug 2023
Instruction Position Matters in Sequence Generation with Large Language
  Models
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
54
8
0
23 Aug 2023
Discrete Prompt Compression with Reinforcement Learning
Discrete Prompt Compression with Reinforcement Learning
Hoyoun Jung
Kyung-Joong Kim
29
24
0
17 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
29
446
0
14 Aug 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference
  Text
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Tahsina Hashem
Weiqing Wang
Derry Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
29
3
0
12 Aug 2023
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
Jingdan Zhang
Jiaan Wang
Xiaodan Wang
Zhixu Li
Yanghua Xiao
31
10
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELM
LLMAG
32
58
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
43
3
0
08 Aug 2023
Towards Multiple References Era -- Addressing Data Leakage and Limited
  Reference Diversity in NLG Evaluation
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation
Xianfeng Zeng
Yanjun Liu
Fandong Meng
Jie Zhou
24
0
0
06 Aug 2023
Towards Consistency Filtering-Free Unsupervised Learning for Dense
  Retrieval
Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval
Haoxiang Shi
Sumio Fujita
Tetsuya Sakai
21
0
0
05 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
50
83
0
03 Aug 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise
  Comparisons using Large Language Models
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
Adian Liusie
Potsawee Manakul
Mark J. F. Gales
ELM
24
35
0
15 Jul 2023
Is ChatGPT a Good Personality Recognizer? A Preliminary Study
Is ChatGPT a Good Personality Recognizer? A Preliminary Study
Yuzhe Ji
Wen Wu
Hong Zheng
Yiqiang Hu
Xi Chen
Liang He
AI4MH
34
24
0
08 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
72
1,513
0
06 Jul 2023
FunQA: Towards Surprising Video Comprehension
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Bo-wen Li
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
39
20
0
26 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
30
13
0
22 Jun 2023
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled
  from Foundation Model
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model
Jiaan Wang
Jianfeng Qu
Yunlong Liang
Zhixu Li
An Liu
Guanfeng Liu
Xin Zheng
30
2
0
17 Jun 2023
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use
  Large Language Models for Text Production Tasks
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
V. Veselovsky
Manoel Horta Ribeiro
Robert West
28
130
0
13 Jun 2023
Previous
123456
Next