ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00061
  4. Cited By
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
v1v2 (latest)

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

30 June 2021
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
    DeLMO
ArXiv (abs)PDFHTML

Papers citing "All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"

50 / 224 papers shown
Title
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
29
0
0
18 Jun 2025
Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models
Min-p, Max Exaggeration: A Critical Analysis of Min-p Sampling in Language Models
Rylan Schaeffer
Joshua Kazdan
Yegor Denisov-Blanch
31
0
0
16 Jun 2025
Labelling Data with Unknown References
Labelling Data with Unknown References
Adrian de Wynter
69
0
0
03 Jun 2025
Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling
Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling
Yimin Du
32
0
0
30 May 2025
Domain Gating Ensemble Networks for AI-Generated Text Detection
Domain Gating Ensemble Networks for AI-Generated Text Detection
Arihant Tripathi
Liam Dugan
Charis Gao
Maggie Huan
Emma Jin
Peter Zhang
David Zhang
Julia Zhao
Chris Callison-Burch
VLM
61
0
0
20 May 2025
Humans can learn to detect AI-generated texts, or at least learn when they can't
Humans can learn to detect AI-generated texts, or at least learn when they can't
Jiří Milička
Anna Marklová
Ondřej Drobil
Eva Pospíšilová
DeLMO
78
0
0
03 May 2025
The Viability of Crowdsourcing for RAG Evaluation
The Viability of Crowdsourcing for RAG Evaluation
Lukas Gienapp
Tim Hagen
Maik Fröbe
Matthias Hagen
Benno Stein
Martin Potthast
Harrisen Scells
121
0
0
22 Apr 2025
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
Jaime Raldua Veuthey
Zainab Ali Majid
Suhas Hariharan
Jacob Haimes
ELM
72
0
0
18 Apr 2025
Labeling Messages as AI-Generated Does Not Reduce Their Persuasive Effects
Labeling Messages as AI-Generated Does Not Reduce Their Persuasive Effects
Isabel O. Gallegos
Chen Shani
Weiyan Shi
Federico Bianchi
Izzy Gainsburg
Dan Jurafsky
Robb Willer
81
2
0
14 Apr 2025
Explorer: Robust Collection of Interactable GUI Elements
Explorer: Robust Collection of Interactable GUI Elements
Iason Chaimalas
Arnas Vyšniauskas
Gabriel Brostow
56
0
0
12 Apr 2025
Can postgraduate translation students identify machine-generated text?
Can postgraduate translation students identify machine-generated text?
Michael Farrell
DeLMO
76
0
0
12 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAGELM
101
0
0
10 Apr 2025
Summarizing Speech: A Comprehensive Survey
Summarizing Speech: A Comprehensive Survey
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
Alexander H. Waibel
112
0
0
10 Apr 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Yashar Deldjoo
Nikhil Mehta
M. Sathiamoorthy
Shuai Zhang
Pablo Castells
Julian McAuley
EGVMELM
131
2
0
09 Apr 2025
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Diana Galván-Sosa
Gabrielle Gaudeau
Pride Kavumba
Yunmeng Li
Hongyi gu
Zheng Yuan
Keisuke Sakaguchi
P. Buttery
LRM
133
0
0
31 Mar 2025
Did ChatGPT or Copilot use alter the style of internet news headlines? A time series regression analysis
Did ChatGPT or Copilot use alter the style of internet news headlines? A time series regression analysis
Chris Brogly
Connor McElroy
KELM
49
1
0
31 Mar 2025
SCORE: Story Coherence and Retrieval Enhancement for AI Narratives
SCORE: Story Coherence and Retrieval Enhancement for AI Narratives
Qiang Yi
Yangfan He
Jing Wang
Xinyuan Song
Shiyao Qian
...
Kuan Lu
Menghao Huo
Jiaqi Chen
Tianyu Shi
Tianyu Shi
RALM
162
17
0
30 Mar 2025
Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models
Local Normalization Distortion and the Thermodynamic Formalism of Decoding Strategies for Large Language Models
Tom Kempton
Stuart Burrell
70
0
0
27 Mar 2025
Feature Extraction and Analysis for GPT-Generated Text
Feature Extraction and Analysis for GPT-Generated Text
A. Selvioğlu
V. Adanova
M. Atagoziev
DeLMO
122
0
0
17 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRMELM
479
0
0
14 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
134
1
0
11 Mar 2025
Detection Avoidance Techniques for Large Language Models
Sinclair Schneider
Florian Steuber
João A. G. Schneider
Gabi Dreo Rodosek
DeLMO
113
0
0
10 Mar 2025
Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems
Jooyoung Lee
Xiaochen Zhu
Georgi Karadzhov
Tom Stafford
Andreas Vlachos
Dongwon Lee
72
0
0
06 Mar 2025
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong
Tiancheng Hu
Yinhong Liu
Ahmet Üstün
Nigel Collier
124
1
0
26 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
146
1
0
24 Feb 2025
Can AI mimic the human ability to define neologisms?
Can AI mimic the human ability to define neologisms?
Georgios P. Georgiou
64
1
0
18 Feb 2025
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis
From Text to Trust: Empowering AI-assisted Decision Making with Adaptive LLM-powered Analysis
Zhuoyan Li
Hangxiao Zhu
Zhuoran Lu
Ziang Xiao
Ming Yin
110
1
0
17 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages
Shreyan Biswas
Alexander Erlei
U. Gadiraju
166
4
0
13 Feb 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
123
2
0
21 Jan 2025
Using Machine Learning to Distinguish Human-written from
  Machine-generated Creative Fiction
Using Machine Learning to Distinguish Human-written from Machine-generated Creative Fiction
Andrea Cristina McGlinchey
Peter J Barclay
DeLMO
124
0
0
15 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text
  Summarization
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
126
0
0
10 Dec 2024
Challenges in Trustworthy Human Evaluation of Chatbots
Challenges in Trustworthy Human Evaluation of Chatbots
Wenting Zhao
Alexander M. Rush
Tanya Goyal
ALM
117
3
0
05 Dec 2024
The Vulnerability of Language Model Benchmarks: Do They Accurately
  Reflect True LLM Performance?
The Vulnerability of Language Model Benchmarks: Do They Accurately Reflect True LLM Performance?
Sourav Banerjee
Ayushi Agarwal
Eishkaran Singh
ELM
105
3
0
02 Dec 2024
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding
  And A Retrieval-Aware Tuning Framework
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
Yew Ken Chia
Liying Cheng
Hou Pong Chan
Chaoqun Liu
Maojia Song
Sharifah Mahani Aljunied
Soujanya Poria
Lidong Bing
RALMVLM
112
6
0
09 Nov 2024
How Performance Pressure Influences AI-Assisted Decision Making
How Performance Pressure Influences AI-Assisted Decision Making
Nikita Haduong
Noah A. Smith
57
0
0
21 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
148
1
0
14 Oct 2024
Reverse Modeling in Large Language Models
Reverse Modeling in Large Language Models
S. Yu
Yuanchen Xu
Cunxiao Du
Yanying Zhou
Minghui Qiu
Q. Sun
Hao Zhang
Jiawei Wu
157
2
0
13 Oct 2024
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral
  Decision-Making
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
Basile Garcia
Crystal Qian
Stefano Palminteri
ELM
110
6
0
09 Oct 2024
Conversate: Supporting Reflective Learning in Interview Practice Through
  Interactive Simulation and Dialogic Feedback
Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback
Taufiq Daryanto
Xiaohan Ding
Lance T Wilhelm
Sophia Stil
Kirk McInnis Knutsen
Eugenia H Rho
69
3
0
08 Oct 2024
How Does the Disclosure of AI Assistance Affect the Perceptions of
  Writing?
How Does the Disclosure of AI Assistance Affect the Perceptions of Writing?
Zhuoyan Li
Chen Liang
Jing Peng
Ming Yin
43
1
0
06 Oct 2024
Trying to be human: Linguistic traces of stochastic empathy in language
  models
Trying to be human: Linguistic traces of stochastic empathy in language models
Bennett Kleinberg
Jari Zegers
Jonas Festor
Stefana Vida
Julian Präsent
Riccardo Loconte
Sanne Peereboom
79
1
0
02 Oct 2024
Generative AI and Perceptual Harms: Who's Suspected of using LLMs?
Generative AI and Perceptual Harms: Who's Suspected of using LLMs?
Kowe Kadoma
D. Metaxa
Mor Naaman
83
4
0
01 Oct 2024
From Deception to Detection: The Dual Roles of Large Language Models in
  Fake News
From Deception to Detection: The Dual Roles of Large Language Models in Fake News
Dorsaf Sallami
Yuan-Chen Chang
Esma Aïmeur
63
6
0
25 Sep 2024
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Polyrating: A Cost-Effective and Bias-Aware Rating System for LLM Evaluation
Jasper Dekoninck
Maximilian Baader
Martin Vechev
ALM
189
0
0
01 Sep 2024
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic
  CheckLists
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists
Raoyuan Zhao
Abdullatif Köksal
Yihong Liu
Leonie Weissweiler
Anna Korhonen
Hinrich Schütze
SyDa
73
1
0
30 Aug 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
130
7
0
26 Aug 2024
CPS-TaskForge: Generating Collaborative Problem Solving Environments for
  Diverse Communication Tasks
CPS-TaskForge: Generating Collaborative Problem Solving Environments for Diverse Communication Tasks
Nikita Haduong
Irene Wang
Bo-Ru Lu
Prithviraj Ammanabrolu
Noah A. Smith
83
1
0
16 Aug 2024
Risks and NLP Design: A Case Study on Procedural Document QA
Risks and NLP Design: A Case Study on Procedural Document QA
Nikita Haduong
Alice Gao
Noah A. Smith
87
4
0
16 Aug 2024
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
Nuo Chen
Yan Wang
Yang Deng
Jia Li
120
21
0
16 Jul 2024
Leveraging large language models for nano synthesis mechanism
  explanation: solid foundations or mere conjectures?
Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?
Yingming Pu
Liping Huang
Tao Lin
Hongyu Chen
ELM
41
0
0
12 Jul 2024
12345
Next