Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04048
Cited By
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is ChatGPT a Good NLG Evaluator? A Preliminary Study"
50 / 289 papers shown
Title
Is ChatGPT a Good Multi-Party Conversation Solver?
Chao-Hong Tan
Jia-Chen Gu
Zhen-Hua Ling
19
9
0
25 Oct 2023
CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data Annotation
Minzhi Li
Taiwei Shi
Caleb Ziems
Min-Yen Kan
Nancy F. Chen
Zhengyuan Liu
Diyi Yang
29
68
0
24 Oct 2023
Language Models Hallucinate, but May Excel at Fact Verification
Jian-Yu Guan
Jesse Dodge
David Wadden
Minlie Huang
Hao Peng
LRM
HILM
31
28
0
23 Oct 2023
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
Yating Wu
Ritika Mangla
Greg Durrett
Junyi Jessy Li
39
12
0
23 Oct 2023
Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model
Qi Jia
Siyu Ren
Yizhu Liu
Kenny Q. Zhu
ALM
HILM
33
16
0
18 Oct 2023
On Context Utilization in Summarization with Large Language Models
Mathieu Ravaut
Aixin Sun
Nancy F. Chen
Shafiq R. Joty
36
13
0
16 Oct 2023
How Good is ChatGPT in Giving Advice on Your Visualization Design?
Nam Wook Kim
Grace Myers
Benjamin Bach
28
20
0
14 Oct 2023
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Joshua Gorniak
Yoon Kim
Donglai Wei
Nam Wook Kim
32
8
0
14 Oct 2023
Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue
Lang Qin
Yao Zhang
Hongru Liang
Jun Wang
Zhenglu Yang
29
3
0
11 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
44
166
0
11 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELM
ALM
LM&MA
35
13
0
09 Oct 2023
CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
Weixiang Yan
Yuchen Tian
Yunzhe Li
Qian Chen
Wen Wang
34
35
0
08 Oct 2023
EcoAssistant: Using LLM Assistant More Affordably and Accurately
Jieyu Zhang
Ranjay Krishna
Ahmed Hassan Awadallah
Chi Wang
30
34
0
03 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
15
180
0
03 Oct 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang
Kyle Lo
Tanya Goyal
Mohit Iyyer
ALM
21
106
0
01 Oct 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
28
13
0
29 Sep 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
43
75
0
29 Sep 2023
Curriculum-Driven Edubot: A Framework for Developing Language Learning Chatbots Through Synthesizing Conversational Data
Yu Li
Shang Qu
Jili Shen
Shangchao Min
Zhou Yu
47
16
0
28 Sep 2023
Question-Answering Approach to Evaluating Legal Summaries
Huihui Xu
Kevin D. Ashley
AILaw
ELM
27
3
0
26 Sep 2023
Ragas: Automated Evaluation of Retrieval Augmented Generation
ES Shahul
Jithin James
Luis Espinosa-Anke
Steven Schockaert
91
177
0
26 Sep 2023
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig
Hiteshi Sharma
Leo Betthauser
Felipe Vieira Frujeri
Ida Momennejad
38
15
0
24 Sep 2023
BAMBOO: A Comprehensive Benchmark for Evaluating Long Text Modeling Capacities of Large Language Models
Zican Dong
Tianyi Tang
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
RALM
ALM
30
33
0
23 Sep 2023
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
49
31
0
23 Sep 2023
Automatic Answerability Evaluation for Question Generation
Zifan Wang
Kotaro Funakoshi
Manabu Okumura
34
2
0
22 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
301
0
19 Sep 2023
Summarization is (Almost) Dead
Xiao Pu
Mingqi Gao
Xiaojun Wan
HILM
81
39
0
18 Sep 2023
Cross-Lingual Knowledge Editing in Large Language Models
Jiaan Wang
Yunlong Liang
Zengkui Sun
Yu Cao
Jiarong Xu
Fandong Meng
KELM
30
11
0
16 Sep 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
35
61
0
14 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
26
9
0
12 Sep 2023
FaNS: a Facet-based Narrative Similarity Metric
Mousumi Akter
Shubhra (Santu) Karmaker
25
1
0
09 Sep 2023
EPA: Easy Prompt Augmentation on Large Language Models via Multiple Sources and Multiple Targets
Hongyuan Lu
Wai Lam
21
1
0
09 Sep 2023
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models
Shuang Li
Jiangjie Chen
Siyu Yuan
Xinyi Wu
Hao Yang
Shimin Tao
Yanghua Xiao
48
15
0
26 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Erik Cambria
ELM
LM&MA
33
101
0
24 Aug 2023
Instruction Position Matters in Sequence Generation with Large Language Models
Yanjun Liu
Xianfeng Zeng
Fandong Meng
Jie Zhou
LRM
54
8
0
23 Aug 2023
Discrete Prompt Compression with Reinforcement Learning
Hoyoun Jung
Kyung-Joong Kim
29
24
0
17 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
29
446
0
14 Aug 2023
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text
Tahsina Hashem
Weiqing Wang
Derry Wijaya
Mohammed Eunus Ali
Yuan-Fang Li
29
3
0
12 Aug 2023
AspectMMKG: A Multi-modal Knowledge Graph with Aspect-aware Entities
Jingdan Zhang
Jiaan Wang
Xiaodan Wang
Zhixu Li
Yanghua Xiao
31
10
0
09 Aug 2023
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
Jiaju Lin
Haoran Zhao
Aochi Zhang
Yiting Wu
Huqiuyue Ping
Qin Chen
ELM
LLMAG
32
58
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
43
3
0
08 Aug 2023
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation
Xianfeng Zeng
Yanjun Liu
Fandong Meng
Jie Zhou
24
0
0
06 Aug 2023
Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval
Haoxiang Shi
Sumio Fujita
Tetsuya Sakai
21
0
0
05 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
50
83
0
03 Aug 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
Adian Liusie
Potsawee Manakul
Mark J. F. Gales
ELM
24
35
0
15 Jul 2023
Is ChatGPT a Good Personality Recognizer? A Preliminary Study
Yuzhe Ji
Wen Wu
Hong Zheng
Yiqiang Hu
Xi Chen
Liang He
AI4MH
34
24
0
08 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
72
1,513
0
06 Jul 2023
FunQA: Towards Surprising Video Comprehension
Binzhu Xie
Sicheng Zhang
Zitang Zhou
Bo-wen Li
Yuanhan Zhang
Jack Hessel
Jingkang Yang
Ziwei Liu
39
20
0
26 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
30
13
0
22 Jun 2023
Snowman: A Million-scale Chinese Commonsense Knowledge Graph Distilled from Foundation Model
Jiaan Wang
Jianfeng Qu
Yunlong Liang
Zhixu Li
An Liu
Guanfeng Liu
Xin Zheng
30
2
0
17 Jun 2023
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
V. Veselovsky
Manoel Horta Ribeiro
Robert West
28
130
0
13 Jun 2023
Previous
1
2
3
4
5
6
Next