ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00061
  4. Cited By
All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated
  Text
v1v2 (latest)

All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text

30 June 2021
Elizabeth Clark
Tal August
Sofia Serrano
Nikita Haduong
Suchin Gururangan
Noah A. Smith
    DeLMO
ArXiv (abs)PDFHTML

Papers citing "All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text"

24 / 224 papers shown
Title
On the probability-quality paradox in language generation
On the probability-quality paradox in language generation
Clara Meister
Gian Wiher
Tiago Pimentel
Ryan Cotterell
99
14
0
31 Mar 2022
On Decoding Strategies for Neural Text Generators
On Decoding Strategies for Neural Text Generators
Gian Wiher
Clara Meister
Ryan Cotterell
89
71
0
29 Mar 2022
Report from the NSF Future Directions Workshop on Automatic Evaluation
  of Dialog: Research Directions and Challenges
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
Shikib Mehri
Jinho Choi
L. F. D’Haro
Jan Deriu
M. Eskénazi
...
David Traum
Yi-Ting Yeh
Zhou Yu
Yizhe Zhang
Chen Zhang
99
22
0
18 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and
  Implicit Hate Speech Detection
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
92
392
0
17 Mar 2022
Do Language Models Plagiarize?
Do Language Models Plagiarize?
Jooyoung Lee
Thai Le
Jinghui Chen
Dongwon Lee
103
76
0
15 Mar 2022
Probing BERT's priors with serial reproduction chains
Probing BERT's priors with serial reproduction chains
Takateru Yamakoshi
Thomas Griffiths
Robert D. Hawkins
84
13
0
24 Feb 2022
Synthetic Disinformation Attacks on Automated Fact Verification Systems
Synthetic Disinformation Attacks on Automated Fact Verification Systems
Y. Du
Antoine Bosselut
Christopher D. Manning
AAMLOffRL
103
38
0
18 Feb 2022
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation
  Practices for Generated Text
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
Sebastian Gehrmann
Elizabeth Clark
Thibault Sellam
ELMAI4CE
157
193
0
14 Feb 2022
A Benchmark Corpus for the Detection of Automatically Generated Text in
  Academic Publications
A Benchmark Corpus for the Detection of Automatically Generated Text in Academic Publications
Vijini Liyanage
Davide Buscaldi
A. Nazarenko
DeLMO
60
26
0
04 Feb 2022
Towards Coherent and Consistent Use of Entities in Narrative Generation
Towards Coherent and Consistent Use of Entities in Narrative Generation
Pinelopi Papalampidi
Kris Cao
Tomás Kociský
HILM
61
13
0
03 Feb 2022
WANLI: Worker and AI Collaboration for Natural Language Inference
  Dataset Creation
WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation
Alisa Liu
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
202
221
0
16 Jan 2022
Imagined versus Remembered Stories: Quantifying Differences in Narrative
  Flow
Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow
Maarten Sap
A. Jafarpour
Yejin Choi
Noah A. Smith
J. Pennebaker
Eric Horvitz
41
1
0
07 Jan 2022
Dynamic Human Evaluation for Relative Model Comparisons
Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir
Cédric Renggli
Nora Hollenstein
Ce Zhang
75
2
0
15 Dec 2021
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
97
40
0
08 Dec 2021
Modelling Direct Messaging Networks with Multiple Recipients for Cyber
  Deception
Modelling Direct Messaging Networks with Multiple Recipients for Cyber Deception
Kristen Moore
Cody James Christopher
David Liebowitz
Surya Nepal
R. Selvey
89
4
0
21 Nov 2021
Transparent Human Evaluation for Image Captioning
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
82
49
0
17 Nov 2021
Unsupervised and Distributional Detection of Machine-Generated Text
Unsupervised and Distributional Detection of Machine-Generated Text
Matthias Gallé
Jos Rozen
Germán Kruszewski
Hady ElSahar
DeLMO
72
28
0
04 Nov 2021
A Systematic Investigation of Commonsense Knowledge in Large Language
  Models
A Systematic Investigation of Commonsense Knowledge in Large Language Models
Xiang Lorraine Li
A. Kuncoro
Jordan Hoffmann
Cyprien de Masson dÁutume
Phil Blunsom
Aida Nematzadeh
LRM
101
59
0
31 Oct 2021
Attacking Open-domain Question Answering by Injecting Misinformation
Attacking Open-domain Question Answering by Injecting Misinformation
Liangming Pan
Wenhu Chen
Min-Yen Kan
Wenjie Wang
HILMAAML
289
28
0
15 Oct 2021
Leveraging Generative Models for Covert Messaging: Challenges and
  Tradeoffs for "Dead-Drop" Deployments
Leveraging Generative Models for Covert Messaging: Challenges and Tradeoffs for "Dead-Drop" Deployments
L. A. Bauer
IV JamesK.Howes
Sam A. Markelon
Vincent Bindschaedler
Thomas Shrimpton
35
2
0
13 Oct 2021
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text
  Generation
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
292
108
0
14 Sep 2021
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework
  for Scrutinizing Machine Text
Is GPT-3 Text Indistinguishable from Human Text? Scarecrow: A Framework for Scrutinizing Machine Text
Yao Dou
Maxwell Forbes
Rik Koncel-Kedziorski
Noah A. Smith
Yejin Choi
DeLMO
130
129
0
02 Jul 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using
  Divergence Frontiers
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
156
364
0
02 Feb 2021
GENIE: Toward Reproducible and Standardized Human Evaluation for Text
  Generation
GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation
Daniel Khashabi
Gabriel Stanovsky
Jonathan Bragg
Nicholas Lourie
Jungo Kasai
Yejin Choi
Noah A. Smith
Daniel S. Weld
131
21
0
17 Jan 2021
Previous
12345