Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.12767
Cited By
Can we trust the evaluation on ChatGPT?
22 March 2023
Rachith Aiyappa
Jisun An
Haewoon Kwak
Yong-Yeol Ahn
ELM
ALM
LLMAG
AI4MH
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can we trust the evaluation on ChatGPT?"
10 / 10 papers shown
Title
Generative Evaluation of Complex Reasoning in Large Language Models
Haowei Lin
X. Wang
Ruilin Yan
Baizhou Huang
Haotian Ye
Jianhua Zhu
Zihao Wang
James Y. Zou
Jianzhu Ma
Yitao Liang
ReLM
ELM
LRM
154
0
0
03 Apr 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset
Laura Mascarell
Ribin Chalumattu
Annette Rios
HILM
33
0
0
06 Mar 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs
Simone Balloccu
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
SILM
ELM
PILM
21
156
0
06 Feb 2024
A Survey of Graph Meets Large Language Model: Progress and Future Directions
Yuhan Li
Zhixun Li
Peisong Wang
Jia Li
Xiangguo Sun
Hongtao Cheng
Jeffrey Xu Yu
38
55
0
21 Nov 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
32
82
0
19 May 2023
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays
Steffen Herbold
Annette Hautli-Janisz
Ute Heuer
Zlata Kikteva
Alexander Trautsch
DeLMO
74
23
0
24 Apr 2023
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks
Yiming Zhu
Peixian Zhang
Ehsan-ul Haq
Pan Hui
Gareth Tyson
DeLMO
ALM
AI4MH
24
123
0
20 Apr 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,814
0
14 Dec 2020
1