Can we trust the evaluation on ChatGPT?

Can we trust the evaluation on ChatGPT?

22 March 2023

Rachith Aiyappa

Papers citing "Can we trust the evaluation on ChatGPT?"

10 / 10 papers shown

Title
Generative Evaluation of Complex Reasoning in Large Language Models Haowei Lin X. Wang Ruilin Yan Baizhou Huang Haotian Ye Jianhua Zhu Zihao Wang James Y. Zou Jianzhu Ma Yitao Liang ReLM ELM LRM 154 0 0 03 Apr 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization Kun-Peng Ning Shuo Yang Yu-Yang Liu Jia-Yu Yao Zhen-Hui Liu Yu Wang Ming Pang Li Yuan ALM 69 8 0 24 Feb 2025
German also Hallucinates! Inconsistency Detection in News Summaries with the Absinth Dataset Laura Mascarell Ribin Chalumattu Annette Rios HILM 33 0 0 06 Mar 2024
Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs Simone Balloccu Patrícia Schmidtová Mateusz Lango Ondrej Dusek SILM ELM PILM 21 156 0 06 Feb 2024
A Survey of Graph Meets Large Language Model: Progress and Future Directions Yuhan Li Zhixun Li Peisong Wang Jia Li Xiangguo Sun Hongtao Cheng Jeffrey Xu Yu 38 55 0 21 Nov 2023
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation Xiaowei Huang Wenjie Ruan Wei Huang Gao Jin Yizhen Dong ... Sihao Wu Peipei Xu Dengyu Wu André Freitas Mustafa A. Mustafa ALM 32 82 0 19 May 2023
AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays Steffen Herbold Annette Hautli-Janisz Ute Heuer Zlata Kikteva Alexander Trautsch DeLMO 74 23 0 24 Apr 2023
Can ChatGPT Reproduce Human-Generated Labels? A Study of Social Computing Tasks Yiming Zhu Peixian Zhang Ehsan-ul Haq Pan Hui Gareth Tyson DeLMO ALM AI4MH 24 123 0 20 Apr 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist Marzena Karpinska Mohit Iyyer 31 81 0 06 Apr 2023
Extracting Training Data from Large Language Models Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss ... Tom B. Brown D. Song Ulfar Erlingsson Alina Oprea Colin Raffel MLAU SILM 290 1,814 0 14 Dec 2020