ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.04048
  4. Cited By
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
v1v2v3 (latest)

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
    LM&MAELMALMAI4MH
ArXiv (abs)PDFHTML

Papers citing "Is ChatGPT a Good NLG Evaluator? A Preliminary Study"

50 / 307 papers shown
Title
EPA: Easy Prompt Augmentation on Large Language Models via Multiple
  Sources and Multiple Targets
EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets
Hongyuan Lu
Wai Lam
67
1
0
09 Sep 2023
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing
  Idiomatic Translation with Language Models
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models
Shuang Li
Jiangjie Chen
Siyu Yuan
Xinyi Wu
Hao Yang
Shimin Tao
Yanghua Xiao
86
20
0
26 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Min Zhang
ELMLM&MA
85
112
0
24 Aug 2023
Instruction Position Matters in Sequence Generation with Large Language
  Models
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
107
9
0
23 Aug 2023
Discrete Prompt Compression with Reinforcement Learning
Discrete Prompt Compression with Reinforcement Learning
Hoyoun Jung
Kyung-Joong Kim
101
29
0
17 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELMLLMAGALM
99
504
0
14 Aug 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference
  Text
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Tahsina Hashem
Weiqing Wang
Derry Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
57
3
0
12 Aug 2023
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
Jingdan Zhang
Jiaan Wang
Xiaodan Wang
Zhixu Li
Yanghua Xiao
91
10
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELMLLMAG
104
68
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
154
4
0
08 Aug 2023
Towards Multiple References Era -- Addressing Data Leakage and Limited
  Reference Diversity in NLG Evaluation
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation
Xianfeng Zeng
Yanjun Liu
Fandong Meng
Jie Zhou
55
0
0
06 Aug 2023
Towards Consistency Filtering-Free Unsupervised Learning for Dense
  Retrieval
Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval
Haoxiang Shi
Sumio Fujita
Tetsuya Sakai
57
0
0
05 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise
  Comparisons using Large Language Models
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
Adian Liusie
Potsawee Manakul
Mark Gales
ELM
88
40
0
15 Jul 2023
Is ChatGPT a Good Personality Recognizer? A Preliminary Study
Is ChatGPT a Good Personality Recognizer? A Preliminary Study
Yuzhe Ji
Wen Wu
Hong Zheng
Yiqiang Hu
Xi Chen
Liang He
AI4MH
67
26
0
08 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
223
1,764
0
06 Jul 2023
FunQA: Towards Surprising Video Comprehension
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Yue Liu
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
143
24
0
26 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
ELM
104
15
0
22 Jun 2023
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled
  from Foundation Model
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model
Jiaan Wang
Jianfeng Qu
Yunlong Liang
Zhixu Li
An Liu
Guanfeng Liu
Xin Zheng
82
2
0
17 Jun 2023
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use
  Large Language Models for Text Production Tasks
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
V. Veselovsky
Manoel Horta Ribeiro
Robert West
75
135
0
13 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
  Evaluation
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALMELM
148
60
0
09 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALMELM
107
149
0
07 Jun 2023
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A
  Practical Study
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study
Guang Lu
Sylvia B. Larcher
Tu-Anh Tran
51
9
0
01 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
  Datasets
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq Joty
J. Huang
LM&MAELMALM
125
193
0
29 May 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying
  References
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
58
7
0
24 May 2023
Generating Faithful Synthetic Data with Large Language Models: A Case
  Study in Computational Social Science
Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science
V. Veselovsky
Manoel Horta Ribeiro
Akhil Arora
Martin Josifoski
Ashton Anderson
Robert West
SyDaHILM
82
35
0
24 May 2023
Is GPT-4 a Good Data Analyst?
Is GPT-4 a Good Data Analyst?
Liying Cheng
Xingxuan Li
Lidong Bing
LM&MAELM
122
101
0
24 May 2023
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic
  Agricultural Text Classification
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic Agricultural Text Classification
Biao Zhao
Weiqiang Jin
Javier Del Ser
Guangyao Yang
86
67
0
24 May 2023
Unlocking Temporal Question Answering for Large Language Models Using
  Code Execution
Unlocking Temporal Question Answering for Large Language Models Using Code Execution
Xingxuan Li
Liying Cheng
Qingyu Tan
Hwee Tou Ng
Shafiq Joty
Lidong Bing
LRMAI4CE
81
0
0
24 May 2023
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
Muhammad Khalifa
Lajanugen Logeswaran
Moontae Lee
Ho Hin Lee
Lu Wang
LRM
72
42
0
24 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
80
8
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated
  Response
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALMELM
89
1
0
24 May 2023
On Learning to Summarize with Large Language Models as References
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
117
82
0
23 May 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive
  Summarization
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELMALM
107
72
0
22 May 2023
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent
  Classification: Higher Diversity and Comparable Model Robustness
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness
Ján Cegin
Jakub Simko
Peter Brusilovsky
103
48
0
22 May 2023
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
  Questions with LLMs
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
Hongru Wang
Rui Wang
Fei Mi
Yang Deng
Zezhong Wang
Bin Liang
Ruifeng Xu
Kam-Fai Wong
LRM
84
68
0
19 May 2023
TrueTeacher: Learning Factual Consistency Evaluation with Large Language
  Models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILMELM
140
79
0
18 May 2023
Chain-of-Dictionary Prompting Elicits Translation in Large Language
  Models
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
Hongyuan Lu
Haoran Yang
Haoyang Huang
Dongdong Zhang
Wai Lam
Furu Wei
LRMAI4CE
106
18
0
11 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALMLM&MA
300
633
0
03 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Helen Zhou
LM&MA
214
682
0
26 Apr 2023
Multidimensional Evaluation for Text Style Transfer Using ChatGPT
Multidimensional Evaluation for Text Style Transfer Using ChatGPT
Huiyuan Lai
Antonio Toral
Malvina Nissim
97
17
0
26 Apr 2023
Safety Assessment of Chinese Large Language Models
Safety Assessment of Chinese Large Language Models
Hao Sun
Zhexin Zhang
Jiawen Deng
Jiale Cheng
Minlie Huang
ALMELM
85
77
0
20 Apr 2023
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social
  Computing Tasks
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks
Yiming Zhu
Peixian Zhang
Ehsan-ul Haq
Pan Hui
Gareth Tyson
DeLMOALMAI4MH
87
127
0
20 Apr 2023
Learning to Compress Prompts with Gist Tokens
Learning to Compress Prompts with Gist Tokens
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
131
227
0
17 Apr 2023
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
  Language Models in Multilingual Learning
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
Viet Dac Lai
Nghia Trung Ngo
Amir Pouran Ben Veyseh
Hieu Man
Franck Dernoncourt
Trung Bui
Thien Huu Nguyen
ELMLM&MA
69
290
0
12 Apr 2023
Are Large Language Models Ready for Healthcare? A Comparative Study on
  Clinical Language Understanding
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Yuqing Wang
Yun Zhao
Linda R. Petzold
AI4MHLM&MAELM
95
53
0
09 Apr 2023
Human-like Summarization Evaluation with ChatGPT
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALMAI4MH
69
135
0
05 Apr 2023
Summary of ChatGPT-Related Research and Perspective Towards the Future
  of Large Language Models
Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
Yi-Hsien Liu
Tianle Han
Siyuan Ma
Jia-Yu Zhang
Yuanyu Yang
...
Xiang Li
Ning Qiang
Dingang Shen
Tianming Liu
Bao Ge
ALMELMAI4CELM&MALLMAG
93
511
0
04 Apr 2023
Exploring the Use of Large Language Models for Reference-Free Text
  Quality Evaluation: An Empirical Study
Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study
Yi Chen
Rui Wang
Haiyun Jiang
Shuming Shi
Ruifeng Xu
LM&MA
122
87
0
03 Apr 2023
CQSumDP: A ChatGPT-Annotated Resource for Query-Focused Abstractive
  Summarization Based on Debatepedia
CQSumDP: A ChatGPT-Annotated Resource for Query-Focused Abstractive Summarization Based on Debatepedia
Md Tahmid Rahman Laskar
Mizanur Rahman
Israt Jahan
Enamul Hoque
J. Huang
84
9
0
31 Mar 2023
Previous
1234567
Next