Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04048
Cited By
v1
v2
v3 (latest)
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Is ChatGPT a Good NLG Evaluator? A Preliminary Study"
50 / 307 papers shown
Title
AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Minlie Huang
Yuxiao Dong
Jie Tang
ELM
LM&MA
ALM
187
50
0
30 Nov 2023
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
Pei Ke
Bosi Wen
Andrew Feng
Xiao-Yang Liu
Xuanyu Lei
...
Aohan Zeng
Yuxiao Dong
Hongning Wang
Jie Tang
Minlie Huang
ELM
ALM
129
35
0
30 Nov 2023
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen
Pei Ke
Hao Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
75
31
0
29 Nov 2023
Universal Self-Consistency for Large Language Model Generation
Xinyun Chen
Renat Aksitov
Uri Alon
Jie Jessie Ren
Kefan Xiao
Pengcheng Yin
Sushant Prakash
Charles Sutton
Xuezhi Wang
Denny Zhou
LRM
95
75
0
29 Nov 2023
Exploring Prompting Large Language Models as Explainable Metrics
Ghazaleh Mahmoudi
LRM
60
4
0
20 Nov 2023
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models
Yimin Jing
Renren Jin
Jiahao Hu
Huishi Qiu
Xiaohua Wang
Peng Wang
Deyi Xiong
LRM
ELM
67
3
0
16 Nov 2023
Event Causality Is Key to Computational Story Understanding
Yidan Sun
Qin Chao
Boyang Albert Li
73
9
0
16 Nov 2023
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
72
6
0
16 Nov 2023
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
Jon Saad-Falcon
Omar Khattab
Christopher Potts
Matei A. Zaharia
RALM
108
120
0
16 Nov 2023
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Lei Shu
Nevan Wichers
Liangchen Luo
Yun Zhu
Yinxiao Liu
Jindong Chen
Lei Meng
ELM
75
4
0
15 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
107
64
0
15 Nov 2023
Exploring the Potential of Large Language Models in Computational Argumentation
Guizhen Chen
Liying Cheng
Anh Tuan Luu
Lidong Bing
LLMAG
LRM
59
30
0
15 Nov 2023
Eval-GCSC: A New Metric for Evaluating ChatGPT's Performance in Chinese Spelling Correction
Kunting Li
Yong Hu
Shaolei Wang
Hanhan Ma
Liang He
Fandong Meng
Jie Zhou
107
1
0
14 Nov 2023
Just Ask One More Time! Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios
Lei Lin
Jiayi Fu
Pengli Liu
Qingyang Li
Yan Gong
Junchen Wan
Fuzheng Zhang
Zhongyuan Wang
Di Zhang
Kun Gai
LRM
53
7
0
14 Nov 2023
Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations
Zilu Tang
Mayank Agarwal
Alex Shypula
Bailin Wang
Derry Wijaya
Jie Chen
Yoon Kim
LRM
115
16
0
13 Nov 2023
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
Lei Huang
Weijiang Yu
Weitao Ma
Weihong Zhong
Zhangyin Feng
...
Qianglong Chen
Weihua Peng
Xiaocheng Feng
Bing Qin
Ting Liu
LRM
HILM
142
935
0
09 Nov 2023
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
Yerin Hwang
Yongi-Mi Kim
Hyunkyung Bae
Jeesoo Bang
Hwanhee Lee
Kyomin Jung
68
6
0
09 Nov 2023
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs
Yann Hicke
Anmol Agarwal
Qianou Ma
Paul Denny
AI4Ed
88
24
0
05 Nov 2023
InstructCoder: Instruction Tuning Large Language Models for Code Editing
Kaixin Li
Qisheng Hu
Xu Zhao
Hui Chen
Yuxi Xie
Tiedong Liu
Qizhe Xie
Junxian He
ALM
SyDa
92
15
0
31 Oct 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
Yuchen Shen
Xiaojun Wan
84
9
0
27 Oct 2023
Is ChatGPT a Good Multi-Party Conversation Solver?
Chao-Hong Tan
Jia-Chen Gu
Zhen-Hua Ling
101
11
0
25 Oct 2023
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation
Minzhi Li
Taiwei Shi
Caleb Ziems
Min-Yen Kan
Nancy F. Chen
Zhengyuan Liu
Diyi Yang
123
77
0
24 Oct 2023
Language Models Hallucinate, but May Excel at Fact Verification
Jian Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
111
33
0
23 Oct 2023
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
Yating Wu
Ritika Mangla
Greg Durrett
Junyi Jessy Li
89
14
0
23 Oct 2023
Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model
Qi Jia
Siyu Ren
Yizhu Liu
Kenny Q. Zhu
ALM
HILM
87
17
0
18 Oct 2023
On Context Utilization in Summarization with Large Language Models
Mathieu Ravaut
Aixin Sun
Nancy F. Chen
Shafiq Joty
91
14
0
16 Oct 2023
How Good is ChatGPT in Giving Advice on Your Visualization Design?
Nam Wook Kim
Grace Myers
Benjamin Bach
83
21
0
14 Oct 2023
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Joshua Gorniak
Yoon Kim
Donglai Wei
Nam Wook Kim
82
10
0
14 Oct 2023
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue
Lang Qin
Yao Zhang
Hongru Liang
Jun Wang
Zhenglu Yang
52
3
0
11 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
144
192
0
11 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELM
ALM
LM&MA
90
13
0
09 Oct 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan
Yuchen Tian
Yunzhe Li
Qian Chen
Wen Wang
119
42
0
08 Oct 2023
EcoAssistant: Using LLM Assistant More Affordably and Accurately
Jieyu Zhang
Ranjay Krishna
Ahmed Hassan Awadallah
Chi Wang
86
40
0
03 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
127
210
0
03 Oct 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang
Kyle Lo
Tanya Goyal
Mohit Iyyer
ALM
148
117
0
01 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
119
15
0
29 Sep 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
114
87
0
29 Sep 2023
Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots Through Synthesizing Conversational Data
Yu Li
Shang Qu
Jili Shen
Shangchao Min
Zhou Yu
84
18
0
28 Sep 2023
Question-Answering Approach to Evaluating Legal Summaries
Huihui Xu
Kevin D. Ashley
AILaw
ELM
39
4
0
26 Sep 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
145
201
0
26 Sep 2023
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig
Hiteshi Sharma
Leo Betthauser
Felipe Vieira Frujeri
Ida Momennejad
110
16
0
24 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
152
39
0
23 Sep 2023
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
114
33
0
23 Sep 2023
Automatic Answerability Evaluation for Question Generation
Zifan Wang
Kotaro Funakoshi
Manabu Okumura
78
3
0
22 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
230
353
0
19 Sep 2023
Summarization is (Almost) Dead
Xiao Pu
Mingqi Gao
Xiaojun Wan
HILM
113
45
0
18 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
83
12
0
16 Sep 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
93
69
0
14 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
71
9
0
12 Sep 2023
FaNS: a Facet-based Narrative Similarity Metric
Mousumi Akter
Shubhra (Santu) Karmaker
75
1
0
09 Sep 2023
Previous
1
2
3
4
5
6
7
Next