Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04048
Cited By
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Is ChatGPT a Good NLG Evaluator? A Preliminary Study"
50 / 289 papers shown
Title
Unveiling the Safety of GPT-4o: An Empirical Study using Jailbreak Attacks
Zonghao Ying
Aishan Liu
Xianglong Liu
Dacheng Tao
62
16
0
10 Jun 2024
On Subjective Uncertainty Quantification and Calibration in Natural Language Generation
Ziyu Wang
Chris Holmes
UQLM
50
4
0
07 Jun 2024
Large Language Models as Evaluators for Recommendation Explanations
Xiaoyu Zhang
Yishan Li
Jiayin Wang
Bowen Sun
Weizhi Ma
Peijie Sun
Min Zhang
LRM
ELM
48
12
0
05 Jun 2024
Item-Language Model for Conversational Recommendation
Li Yang
Anushya Subbiah
Hardik Patel
Judith Yue Li
Yanwei Song
Reza Mirghaderi
Vikram Aggarwal
Qifan Wang
KELM
47
4
0
05 Jun 2024
XRec: Large Language Models for Explainable Recommendation
Qiyao Ma
Xubin Ren
Chao Huang
LRM
34
18
0
04 Jun 2024
Guiding ChatGPT to Generate Salient Domain Summaries
Jun Gao
Ziqiang Cao
Shaoyao Huang
Luozheng Qin
Chunhui Ai
41
0
0
03 Jun 2024
Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning
Sangwon Ryu
Heejin Do
Yunsu Kim
Gary Geunbae Lee
Jungseul Ok
28
3
0
01 Jun 2024
Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals
Phillip Howard
Kathleen C. Fraser
Anahita Bhiwandiwalla
S. Kiritchenko
52
9
0
30 May 2024
A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Peng Wang
Songshuo Lu
Yaohua Tang
Sijie Yan
Yuanjun Xiong
Wei Xia
AuLLM
31
10
0
29 May 2024
Unifying Demonstration Selection and Compression for In-Context Learning
Jun Gao
Ziqiang Cao
Wenjie Li
43
3
0
27 May 2024
SelfCP: Compressing Over-Limit Prompt via the Frozen Large Language Model Itself
Jun Gao
Ziqiang Cao
Wenjie Li
27
4
0
27 May 2024
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling
Chenhao Zhang
Renhao Li
Minghuan Tan
Min Yang
Jingwei Zhu
Di Yang
Jiahao Zhao
Guancheng Ye
Chengming Li
Xiping Hu
38
19
0
26 May 2024
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases
Zhizheng Wang
Qiao Jin
Chih-Hsuan Wei
Shubo Tian
Po-Ting Lai
Qingqing Zhu
Chi-Ping Day
Christina Ross
Zhiyong Lu
LLMAG
32
8
0
25 May 2024
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Kun Zhao
Bohao Yang
Chen Tang
Chenghua Lin
Liang Zhan
43
5
0
24 May 2024
Organic Data-Driven Approach for Turkish Grammatical Error Correction and LLMs
Asim Ersoy
O. T. Yildiz
41
0
0
24 May 2024
CHARP: Conversation History AwaReness Probing for Knowledge-grounded Dialogue Systems
Abbas Ghaddar
David Alfonso-Hermelo
Philippe Langlais
Mehdi Rezagholizadeh
Boxing Chen
Prasanna Parthasarathi
39
0
0
24 May 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
46
4
0
21 May 2024
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
San Kim
Gary Geunbae Lee
AAML
43
3
0
21 May 2024
CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models
Haoxiang Shi
Jiaan Wang
Jiarong Xu
Cen Wang
Tetsuya Sakai
LMTD
28
0
0
20 May 2024
Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model
Chen Huang
Yang Deng
Wenqiang Lei
Jiancheng Lv
Ido Dagan
48
4
0
20 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
37
4
0
17 May 2024
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation
Alex G. Kim
Keonwoo Kim
Sangwon Yoon
ELM
32
5
0
16 May 2024
LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
Li-Chun Lu
Shou-Jen Chen
Tsung-Min Pai
Chan-Hung Yu
Hung-yi Lee
Shao-Hua Sun
LLMAG
56
39
0
10 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark J. F. Gales
43
9
0
09 May 2024
Evaluating Students' Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large
Jussi S. Jauhiainen
Agustín Garagorry Guerra
37
5
0
08 May 2024
MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning
Inderjeet Nair
Lu Wang
LRM
21
1
0
08 May 2024
Assessing and Verifying Task Utility in LLM-Powered Applications
Negar Arabzadeh
Siging Huo
Nikhil Mehta
Qinqyun Wu
Chi Wang
Ahmed Hassan Awadallah
Charles L. A. Clarke
Julia Kiseleva
38
10
0
03 May 2024
Large Language Models are Inconsistent and Biased Evaluators
Rickard Stureborg
Dimitris Alikaniotis
Yoshi Suhara
ALM
45
51
0
02 May 2024
CEval: A Benchmark for Evaluating Counterfactual Text Generation
Van Bach Nguyen
Jorg Schlotterer
Christin Seifert
33
5
0
26 Apr 2024
FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom
Yuanqin He
Yan Kang
Lixin Fan
Qiang Yang
35
3
0
18 Apr 2024
FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document
Joonho Yang
Seunghyun Yoon
Byeongjeong Kim
Hwanhee Lee
HILM
34
3
0
17 Apr 2024
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
Yijin Liu
Fandong Meng
Jie Zhou
AI4CE
27
7
0
10 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
39
6
0
04 Apr 2024
METAL: Towards Multilingual Meta-Evaluation
Rishav Hada
Varun Gumma
Mohamed Ahmed
Kalika Bali
Sunayana Sitaram
ELM
43
2
0
02 Apr 2024
CoUDA: Coherence Evaluation via Unified Data Augmentation
Dawei Zhu
Wenhao Wu
Yifan Song
Fangwei Zhu
Ziqiang Cao
Sujian Li
28
0
0
31 Mar 2024
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
Yu Li
Shenyu Zhang
Rui Wu
Xiutian Huang
Yongrui Chen
Wenhao Xu
Guilin Qi
Dehai Min
LLMAG
16
9
0
28 Mar 2024
STRUM-LLM: Attributed and Structured Contrastive Summarization
Beliz Gunel
James Bradley Wendt
Jing Xie
Yichao Zhou
Nguyen Vo
Zachary Fisher
Sandeep Tata
25
4
0
25 Mar 2024
Community Needs and Assets: A Computational Analysis of Community Conversations
Towhid Chowdhury
Naveen Sharma
Ashiqur R. KhudaBukhsh
33
0
0
20 Mar 2024
LMStyle Benchmark: Evaluating Text Style Transfer for Chatbots
Jianlin Chen
43
4
0
13 Mar 2024
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM
Jingcong Liang
Rong Ye
Meng Han
Ruofei Lai
Xinyu Zhang
Xuanjing Huang
Zhongyu Wei
45
5
0
12 Mar 2024
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
Juan Manuel Zambrano Chaves
Shih-Cheng Huang
Yanbo Xu
Hanwen Xu
Naoto Usuyama
...
Akshay S. Chaudhari
Serena Yeung-Levy
Curtis P. Langlotz
Sheng Wang
Hoifung Poon
VLM
LM&MA
67
10
0
12 Mar 2024
MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs
Yerin Hwang
Yongi-Mi Kim
Yunah Jang
Jeesoo Bang
Hyunkyung Bae
Kyomin Jung
43
2
0
09 Mar 2024
LLM4Decompile: Decompiling Binary Code with Large Language Models
Hanzhuo Tan
Qi Luo
Jing Li
Yuqun Zhang
SyDa
ELM
50
17
0
08 Mar 2024
FaaF: Facts as a Function for the evaluation of generated text
Vasileios Katranidis
Gabor Barany
HILM
RALM
42
4
0
06 Mar 2024
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization
Shitong Duan
Xiaoyuan Yi
Peng Zhang
T. Lu
Xing Xie
Ning Gu
40
4
0
06 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
26
6
0
04 Mar 2024
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models
Kedi Chen
Qin Chen
Jie Zhou
Yishen He
Liang He
HILM
38
1
0
01 Mar 2024
A Regularization-based Transfer Learning Method for Information Extraction via Instructed Graph Decoder
Kedi Chen
Jie Zhou
Qin Chen
Shunyu Liu
Liang He
38
2
0
01 Mar 2024
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery
Feihong Lu
Weiqi Wang
Yangyifei Luo
Ziqin Zhu
Qingyun Sun
...
Haochen Shi
Shiqi Gao
Qian Li
Yangqiu Song
Jianxin Li
VLM
34
2
0
28 Feb 2024
Consistency Matters: Explore LLMs Consistency From a Black-Box Perspective
Fufangchen Zhao
Guoqiang Jin
Jiaheng Huang
Rui Zhao
Fei Tan
32
1
0
27 Feb 2024
Previous
1
2
3
4
5
6
Next