Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04048
Cited By
v1
v2
v3 (latest)
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Is ChatGPT a Good NLG Evaluator? A Preliminary Study"
50 / 307 papers shown
Title
EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets
Hongyuan Lu
Wai Lam
67
1
0
09 Sep 2023
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models
Shuang Li
Jiangjie Chen
Siyu Yuan
Xinyi Wu
Hao Yang
Shimin Tao
Yanghua Xiao
86
20
0
26 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Min Zhang
ELM
LM&MA
85
112
0
24 Aug 2023
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
107
9
0
23 Aug 2023
Discrete Prompt Compression with Reinforcement Learning
Hoyoun Jung
Kyung-Joong Kim
101
29
0
17 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
99
504
0
14 Aug 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Tahsina Hashem
Weiqing Wang
Derry Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
57
3
0
12 Aug 2023
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
Jingdan Zhang
Jiaan Wang
Xiaodan Wang
Zhixu Li
Yanghua Xiao
91
10
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELM
LLMAG
104
68
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
154
4
0
08 Aug 2023
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation
Xianfeng Zeng
Yanjun Liu
Fandong Meng
Jie Zhou
55
0
0
06 Aug 2023
Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval
Haoxiang Shi
Sumio Fujita
Tetsuya Sakai
57
0
0
05 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
Adian Liusie
Potsawee Manakul
Mark Gales
ELM
88
40
0
15 Jul 2023
Is ChatGPT a Good Personality Recognizer? A Preliminary Study
Yuzhe Ji
Wen Wu
Hong Zheng
Yiqiang Hu
Xi Chen
Liang He
AI4MH
67
26
0
08 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
223
1,764
0
06 Jul 2023
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Yue Liu
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
143
24
0
26 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei Zhao
Yang Gao
Steffen Eger
ELM
104
15
0
22 Jun 2023
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model
Jiaan Wang
Jianfeng Qu
Yunlong Liang
Zhixu Li
An Liu
Guanfeng Liu
Xin Zheng
82
2
0
17 Jun 2023
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
V. Veselovsky
Manoel Horta Ribeiro
Robert West
75
135
0
13 Jun 2023
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Zhouhong Gu
Xiaoxuan Zhu
Haoning Ye
Lin Zhang
Jianchen Wang
...
Zili Wang
Shusen Wang
Weiguo Zheng
Hongwei Feng
Yanghua Xiao
ALM
ELM
148
60
0
09 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
107
149
0
07 Jun 2023
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study
Guang Lu
Sylvia B. Larcher
Tu-Anh Tran
51
9
0
01 Jun 2023
A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets
Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq Joty
J. Huang
LM&MA
ELM
ALM
125
193
0
29 May 2023
Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang
Hongyuan Lu
Yuchen Eleanor Jiang
Haoyang Huang
Dongdong Zhang
Wayne Xin Zhao
Tom Kocmi
Furu Wei
58
7
0
24 May 2023
Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science
V. Veselovsky
Manoel Horta Ribeiro
Akhil Arora
Martin Josifoski
Ashton Anderson
Robert West
SyDa
HILM
82
35
0
24 May 2023
Is GPT-4 a Good Data Analyst?
Liying Cheng
Xingxuan Li
Lidong Bing
LM&MA
ELM
122
101
0
24 May 2023
ChatAgri: Exploring Potentials of ChatGPT on Cross-linguistic Agricultural Text Classification
Biao Zhao
Weiqiang Jin
Javier Del Ser
Guangyao Yang
86
67
0
24 May 2023
Unlocking Temporal Question Answering for Large Language Models Using Code Execution
Xingxuan Li
Liying Cheng
Qingyu Tan
Hwee Tou Ng
Shafiq Joty
Lidong Bing
LRM
AI4CE
81
0
0
24 May 2023
GRACE: Discriminator-Guided Chain-of-Thought Reasoning
Muhammad Khalifa
Lajanugen Logeswaran
Moontae Lee
Ho Hin Lee
Lu Wang
LRM
72
42
0
24 May 2023
Using Natural Language Explanations to Rescale Human Judgments
Manya Wadhwa
Jifan Chen
Junyi Jessy Li
Greg Durrett
80
8
0
24 May 2023
Evaluate What You Can't Evaluate: Unassessable Quality for Generated Response
Yongkang Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
ALM
ELM
89
1
0
24 May 2023
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
117
82
0
23 May 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELM
ALM
107
72
0
22 May 2023
ChatGPT to Replace Crowdsourcing of Paraphrases for Intent Classification: Higher Diversity and Comparable Model Robustness
Ján Cegin
Jakub Simko
Peter Brusilovsky
103
48
0
22 May 2023
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
Hongru Wang
Rui Wang
Fei Mi
Yang Deng
Zezhong Wang
Bin Liang
Ruifeng Xu
Kam-Fai Wong
LRM
84
68
0
19 May 2023
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILM
ELM
140
79
0
18 May 2023
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models
Hongyuan Lu
Haoran Yang
Haoyang Huang
Dongdong Zhang
Wai Lam
Furu Wei
LRM
AI4CE
106
18
0
11 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
300
633
0
03 May 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Helen Zhou
LM&MA
214
682
0
26 Apr 2023
Multidimensional Evaluation for Text Style Transfer Using ChatGPT
Huiyuan Lai
Antonio Toral
Malvina Nissim
97
17
0
26 Apr 2023
Safety Assessment of Chinese Large Language Models
Hao Sun
Zhexin Zhang
Jiawen Deng
Jiale Cheng
Minlie Huang
ALM
ELM
85
77
0
20 Apr 2023
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks
Yiming Zhu
Peixian Zhang
Ehsan-ul Haq
Pan Hui
Gareth Tyson
DeLMO
ALM
AI4MH
87
127
0
20 Apr 2023
Learning to Compress Prompts with Gist Tokens
Jesse Mu
Xiang Lisa Li
Noah D. Goodman
VLM
131
227
0
17 Apr 2023
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning
Viet Dac Lai
Nghia Trung Ngo
Amir Pouran Ben Veyseh
Hieu Man
Franck Dernoncourt
Trung Bui
Thien Huu Nguyen
ELM
LM&MA
69
290
0
12 Apr 2023
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Yuqing Wang
Yun Zhao
Linda R. Petzold
AI4MH
LM&MA
ELM
95
53
0
09 Apr 2023
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
69
135
0
05 Apr 2023
Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
Yi-Hsien Liu
Tianle Han
Siyuan Ma
Jia-Yu Zhang
Yuanyu Yang
...
Xiang Li
Ning Qiang
Dingang Shen
Tianming Liu
Bao Ge
ALM
ELM
AI4CE
LM&MA
LLMAG
93
511
0
04 Apr 2023
Exploring the Use of Large Language Models for Reference-Free Text Quality Evaluation: An Empirical Study
Yi Chen
Rui Wang
Haiyun Jiang
Shuming Shi
Ruifeng Xu
LM&MA
122
87
0
03 Apr 2023
CQSumDP: A ChatGPT-Annotated Resource for Query-Focused Abstractive Summarization Based on Debatepedia
Md Tahmid Rahman Laskar
Mizanur Rahman
Israt Jahan
Enamul Hoque
J. Huang
84
9
0
31 Mar 2023
Previous
1
2
3
4
5
6
7
Next