v1v2 (latest)

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

30 June 2021

Papers citing "All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"

50 / 224 papers shown

Title
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability Yusuke Sakai Hidetaka Kamigaito Taro Watanabe LRM 29 0 0 18 Jun 2025
Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models Rylan Schaeffer Joshua Kazdan Yegor Denisov-Blanch 31 0 0 16 Jun 2025
Labelling Data with Unknown References Adrian de Wynter 69 0 0 03 Jun 2025
Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling Yimin Du 32 0 0 30 May 2025
Domain Gating Ensemble Networks for AI-Generated Text Detection Arihant Tripathi Liam Dugan Charis Gao Maggie Huan Emma Jin Peter Zhang David Zhang Julia Zhao Chris Callison-Burch VLM 61 0 0 20 May 2025
Humans can learn to detect AI-generated texts, or at least learn when they can't Jiří Milička Anna Marklová Ondřej Drobil Eva Pospíšilová DeLMO 78 0 0 03 May 2025
The Viability of Crowdsourcing for RAG Evaluation Lukas Gienapp Tim Hagen Maik Fröbe Matthias Hagen Benno Stein Martin Potthast Harrisen Scells 121 0 0 22 Apr 2025
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks Jaime Raldua Veuthey Zainab Ali Majid Suhas Hariharan Jacob Haimes ELM 72 0 0 18 Apr 2025
Labeling Messages as AI-Generated Does Not Reduce Their Persuasive Effects Isabel O. Gallegos Chen Shani Weiyan Shi Federico Bianchi Izzy Gainsburg Dan Jurafsky Robb Willer 81 2 0 14 Apr 2025
Explorer: Robust Collection of Interactable GUI Elements Iason Chaimalas Arnas Vyšniauskas Gabriel Brostow 56 0 0 12 Apr 2025
Can postgraduate translation students identify machine-generated text? Michael Farrell DeLMO 76 0 0 12 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models Sher Badshah Ali Emami Hassan Sajjad LLMAG ELM 101 0 0 10 Apr 2025
Summarizing Speech: A Comprehensive Survey Fabian Retkowski Maike Züfle Andreas Sudmann Dinah Pfau Jan Niehues Alexander Waibel Alexander H. Waibel 112 0 0 10 Apr 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models Yashar Deldjoo Nikhil Mehta M. Sathiamoorthy Shuai Zhang Pablo Castells Julian McAuley EGVM ELM 131 2 0 09 Apr 2025
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset Diana Galván-Sosa Gabrielle Gaudeau Pride Kavumba Yunmeng Li Hongyi gu Zheng Yuan Keisuke Sakaguchi P. Buttery LRM 133 0 0 31 Mar 2025
Did ChatGPT or Copilot use alter the style of internet news headlines? A time series regression analysis Chris Brogly Connor McElroy KELM 49 1 0 31 Mar 2025
SCORE: Story Coherence and Retrieval Enhancement for AI Narratives Qiang Yi Yangfan He Jing Wang Xinyuan Song Shiyao Qian ... Kuan Lu Menghao Huo Jiaqi Chen Tianyu Shi Tianyu Shi RALM 162 17 0 30 Mar 2025
Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models Tom Kempton Stuart Burrell 70 0 0 27 Mar 2025
Feature Extraction and Analysis for GPT-Generated Text A. Selvioğlu V. Adanova M. Atagoziev DeLMO 122 0 0 17 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama Naome A. Etori Kevin Lu Randu Karisa Arturs Kanepajs LRM ELM 479 0 0 14 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering Sher Badshah Hassan Sajjad 134 1 0 11 Mar 2025
Detection Avoidance Techniques for Large Language Models Sinclair Schneider Florian Steuber João A. G. Schneider Gabi Dreo Rodosek DeLMO 113 0 0 10 Mar 2025
Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems Jooyoung Lee Xiaochen Zhu Georgi Karadzhov Tom Stafford Andreas Vlachos Dongwon Lee 72 0 0 06 Mar 2025
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning Yijiang River Dong Tiancheng Hu Yinhong Liu Ahmet Üstün Nigel Collier 124 1 0 26 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks Rylan Schaeffer Punit Singh Koura Binh Tang R. Subramanian Aaditya K. Singh ... Vedanuj Goswami Sergey Edunov Dieuwke Hupkes Sanmi Koyejo Sharan Narang ALM 146 1 0 24 Feb 2025
Can AI mimic the human ability to define neologisms? Georgios P. Georgiou 64 1 0 18 Feb 2025
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis Zhuoyan Li Hangxiao Zhu Zhuoran Lu Ziang Xiao Ming Yin 110 1 0 17 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages Shreyan Biswas Alexander Erlei U. Gadiraju 166 4 0 13 Feb 2025
Reference-free Evaluation Metrics for Text Generation: A Survey Takumi Ito Kees van Deemter Jun Suzuki ELM 123 2 0 21 Jan 2025
Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction Andrea Cristina McGlinchey Peter J Barclay DeLMO 124 0 0 15 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization Shiyue Zhang David Wan Arie Cattan Ayal Klein Ido Dagan Joey Tianyi Zhou 126 0 0 10 Dec 2024
Challenges in Trustworthy Human Evaluation of Chatbots Wenting Zhao Alexander M. Rush Tanya Goyal ALM 117 3 0 05 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance? Sourav Banerjee Ayushi Agarwal Eishkaran Singh ELM 105 3 0 02 Dec 2024
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Yew Ken Chia Liying Cheng Hou Pong Chan Chaoqun Liu Maojia Song Sharifah Mahani Aljunied Soujanya Poria Lidong Bing RALM VLM 112 6 0 09 Nov 2024
How Performance Pressure Influences AI-Assisted Decision Making Nikita Haduong Noah A. Smith 57 0 0 21 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting Gal Fiebelman Tamir Cohen Ayellet Morgenstern Peter Hedman Hadar Averbuch-Elor 3DGS 148 1 0 14 Oct 2024
Reverse Modeling in Large Language Models S. Yu Yuanchen Xu Cunxiao Du Yanying Zhou Minghui Qiu Q. Sun Hao Zhang Jiawei Wu 157 2 0 13 Oct 2024
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making Basile Garcia Crystal Qian Stefano Palminteri ELM 110 6 0 09 Oct 2024
Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback Taufiq Daryanto Xiaohan Ding Lance T Wilhelm Sophia Stil Kirk McInnis Knutsen Eugenia H Rho 69 3 0 08 Oct 2024
How Does the Disclosure of AI Assistance Affect the Perceptions of Writing? Zhuoyan Li Chen Liang Jing Peng Ming Yin 43 1 0 06 Oct 2024
Trying to be human: Linguistic traces of stochastic empathy in language models Bennett Kleinberg Jari Zegers Jonas Festor Stefana Vida Julian Präsent Riccardo Loconte Sanne Peereboom 79 1 0 02 Oct 2024
Generative AI and Perceptual Harms: Who's Suspected of using LLMs? Kowe Kadoma D. Metaxa Mor Naaman 83 4 0 01 Oct 2024
From Deception to Detection: The Dual Roles of Large Language Models in Fake News Dorsaf Sallami Yuan-Chen Chang Esma Aïmeur 63 6 0 25 Sep 2024
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation Jasper Dekoninck Maximilian Baader Martin Vechev ALM 189 0 0 01 Sep 2024
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists Raoyuan Zhao Abdullatif Köksal Yihong Liu Leonie Weissweiler Anna Korhonen Hinrich Schütze SyDa 73 1 0 30 Aug 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation Dingyi Yang Qin Jin 130 7 0 26 Aug 2024
CPS-TaskForge: Generating Collaborative Problem Solving Environments for Diverse Communication Tasks Nikita Haduong Irene Wang Bo-Ru Lu Prithviraj Ammanabrolu Noah A. Smith 83 1 0 16 Aug 2024
Risks and NLP Design: A Case Study on Procedural Document QA Nikita Haduong Alice Gao Noah A. Smith 87 4 0 16 Aug 2024
The Oscars of AI Theater: A Survey on Role-Playing with Language Models Nuo Chen Yan Wang Yang Deng Jia Li 120 21 0 16 Jul 2024
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures? Yingming Pu Liping Huang Tao Lin Hongyu Chen ELM 41 0 0 12 Jul 2024