ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.04048
  4. Cited By
Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
    LM&MA
    ELM
    ALM
    AI4MH
ArXivPDFHTML

Papers citing "Is ChatGPT a Good NLG Evaluator? A Preliminary Study"

39 / 289 papers shown
Title
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge
  Evaluation
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALM
ELM
30
58
0
09 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
33
136
0
07 Jun 2023
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A
  Practical Study
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study
Guang Lu
Sylvia B. Larcher
Tu-Anh Tran
23
9
0
01 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark
  Datasets
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Chenyu You
J. Huang
LM&MA
ELM
ALM
49
179
0
29 May 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying
  References
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
20
5
0
24 May 2023
Generating Faithful Synthetic Data with Large Language Models: A Case
  Study in Computational Social Science
Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science
V. Veselovsky
Manoel Horta Ribeiro
Akhil Arora
Martin Josifoski
Ashton Anderson
Robert West
SyDa
HILM
35
31
0
24 May 2023
Is GPT-4 a Good Data Analyst?
Is GPT-4 a Good Data Analyst?
Liying Cheng
Xingxuan Li
Lidong Bing
LM&MA
ELM
27
95
0
24 May 2023
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic
  Agricultural Text Classification
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic Agricultural Text Classification
Biao Zhao
Weiqiang Jin
Javier Del Ser
Guangyao Yang
38
60
0
24 May 2023
Unlocking Temporal Question Answering for Large Language Models Using
  Code Execution
Unlocking Temporal Question Answering for Large Language Models Using Code Execution
Xingxuan Li
Liying Cheng
Qingyu Tan
Hwee Tou Ng
Chenyu You
Lidong Bing
LRM
AI4CE
30
0
0
24 May 2023
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
Muhammad Khalifa
Lajanugen Logeswaran
Moontae Lee
Ho Hin Lee
Lu Wang
LRM
32
37
0
24 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
43
8
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated
  Response
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALM
ELM
39
1
0
24 May 2023
On Learning to Summarize with Large Language Models as References
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
31
71
0
23 May 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive
  Summarization
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELM
ALM
47
64
0
22 May 2023
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent
  Classification: Higher Diversity and Comparable Model Robustness
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness
Ján Cegin
Jakub Simko
Peter Brusilovsky
39
42
0
22 May 2023
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
  Questions with LLMs
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
Hongru Wang
Rui Wang
Fei Mi
Yang Deng
Zezhong Wang
Bin Liang
Ruifeng Xu
Kam-Fai Wong
LRM
41
55
0
19 May 2023
TrueTeacher: Learning Factual Consistency Evaluation with Large Language
  Models
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILM
ELM
29
71
0
18 May 2023
Chain-of-Dictionary Prompting Elicits Translation in Large Language
  Models
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
Hongyuan Lu
Haoran Yang
Haoyang Huang
Dongdong Zhang
Wai Lam
Furu Wei
LRM
AI4CE
38
15
0
11 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
574
0
03 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
139
626
0
26 Apr 2023
Multidimensional Evaluation for Text Style Transfer Using ChatGPT
Multidimensional Evaluation for Text Style Transfer Using ChatGPT
Huiyuan Lai
Antonio Toral
Malvina Nissim
34
17
0
26 Apr 2023
Safety Assessment of Chinese Large Language Models
Safety Assessment of Chinese Large Language Models
Hao Sun
Zhexin Zhang
Jiawen Deng
Jiale Cheng
Minlie Huang
ALM
ELM
32
75
0
20 Apr 2023
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social
  Computing Tasks
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks
Yiming Zhu
Peixian Zhang
Ehsan-ul Haq
Pan Hui
Gareth Tyson
DeLMO
ALM
AI4MH
41
123
0
20 Apr 2023
Learning to Compress Prompts with Gist Tokens
Learning to Compress Prompts with Gist Tokens
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
47
206
0
17 Apr 2023
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
  Language Models in Multilingual Learning
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
Viet Dac Lai
Nghia Trung Ngo
Amir Pouran Ben Veyseh
Hieu Man
Franck Dernoncourt
Trung Bui
Thien Huu Nguyen
ELM
LM&MA
30
268
0
12 Apr 2023
Are Large Language Models Ready for Healthcare? A Comparative Study on
  Clinical Language Understanding
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Yuqing Wang
Yun Zhao
Linda R. Petzold
AI4MH
LM&MA
ELM
32
50
0
09 Apr 2023
Human-like Summarization Evaluation with ChatGPT
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
23
125
0
05 Apr 2023
Summary of ChatGPT-Related Research and Perspective Towards the Future
  of Large Language Models
Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
Yi-Hsien Liu
Tianle Han
Siyuan Ma
Jia-Yu Zhang
Yuanyu Yang
...
Xiang Li
Ning Qiang
Dingang Shen
Tianming Liu
Bao Ge
ALM
ELM
AI4CE
LM&MA
LLMAG
38
464
0
04 Apr 2023
Exploring the Use of Large Language Models for Reference-Free Text
  Quality Evaluation: An Empirical Study
Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study
Yi Chen
Rui Wang
Haiyun Jiang
Shuming Shi
Ruifeng Xu
LM&MA
35
75
0
03 Apr 2023
CQSumDP: A ChatGPT-Annotated Resource for Query-Focused Abstractive
  Summarization Based on Debatepedia
CQSumDP: A ChatGPT-Annotated Resource for Query-Focused Abstractive Summarization Based on Debatepedia
Md Tahmid Rahman Laskar
Mizanur Rahman
Israt Jahan
Enamul Hoque
J. Huang
43
8
0
31 Mar 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
53
1,078
0
29 Mar 2023
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization
Zheheng Luo
Qianqian Xie
Sophia Ananiadou
ELM
HILM
ALM
41
73
0
27 Mar 2023
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on
  Consistency with Human Preferences
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
Yunjie Ji
Yan Gong
Yiping Peng
Chao Ni
Peiyan Sun
Dongyu Pan
Baochang Ma
Xiangang Li
ELM
ALM
AI4MH
30
37
0
14 Mar 2023
Zero-Shot Cross-Lingual Summarization via Large Language Models
Zero-Shot Cross-Lingual Summarization via Large Language Models
Jiaan Wang
Yunlong Liang
Fandong Meng
Beiqi Zou
Zhixu Li
Jianfeng Qu
Jie Zhou
ELM
29
28
0
28 Feb 2023
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on
  Simplified Radiology Reports
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports
Katharina Jeblick
B. Schachtner
Jakob Dexl
Andreas Mittermeier
Anna Theresa Stüber
...
Tobias Weber
Philipp Wesp
B. Sabel
J. Ricke
Michael Ingrisch
LM&MA
MedIm
126
373
0
30 Dec 2022
DocAsRef: An Empirical Study on Repurposing Reference-Based Summary
  Quality Metrics Reference-Freely
DocAsRef: An Empirical Study on Repurposing Reference-Based Summary Quality Metrics Reference-Freely
F. S. Bao
Ruixuan Tu
Ge Luo
Yinfei Yang
Hebi Li
Minghui Qiu
Youbiao He
Cen Chen
18
2
0
20 Dec 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Exploring Dense Retrieval for Dialogue Response Selection
Exploring Dense Retrieval for Dialogue Response Selection
Tian Lan
Deng Cai
Yan Wang
Yixuan Su
Heyan Huang
Xian-Ling Mao
120
16
0
13 Oct 2021
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
190
3,513
0
10 Jun 2015
Previous
123456