Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.06835
Cited By
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
14 September 2021
Marzena Karpinska
Nader Akoury
Mohit Iyyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation"
26 / 26 papers shown
Title
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
203
0
0
21 Feb 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
40
0
0
19 Feb 2025
Economics of Sourcing Human Data
Sebastin Santy
Prasanta Bhattacharya
Manoel Horta Ribeiro
Kelsey Allen
Sewoong Oh
71
0
0
11 Feb 2025
A Collection of Question Answering Datasets for Norwegian
Vladislav Mikhailov
Petter Mæhlum
Victoria Ovedie Chruickshank Langø
Erik Velldal
Lilja Øvrelid
RALM
41
4
0
19 Jan 2025
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
Yan Ma
Yu Qiao
Pengfei Liu
32
5
0
09 Jun 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
52
7
0
09 May 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu
Hayoung Jung
Anjali Singh
Monojit Choudhury
Tanushree Mitra
32
8
0
08 May 2024
Evaluating Optimal Reference Translations
Vilém Zouhar
Vvera Kloudová
Martin Popel
Ondrej Bojar
29
2
0
28 Nov 2023
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
Carlos Gómez-Rodríguez
Paul Williams
29
65
0
12 Oct 2023
Learning Personalized Alignment for Evaluating Open-ended Text Generation
Danqing Wang
Kevin Kaichuang Yang
Hanlin Zhu
Xiaomeng Yang
Andrew Cohen
Lei Li
Yuandong Tian
ALM
LM&MA
15
8
0
05 Oct 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman
Yao Dou
Wei-ping Xu
26
7
0
14 Aug 2023
GIO: Gradient Information Optimization for Training Dataset Selection
Dante Everaert
Christopher Potts
21
3
0
20 Jun 2023
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond
Haw-Shiuan Chang
Zonghai Yao
Alolika Gon
Hong-ye Yu
Andrew McCallum
43
10
0
20 May 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
Mayu Otani
Riku Togashi
Yu Sawai
Ryosuke Ishigami
Yuta Nakashima
Esa Rahtu
J. Heikkilä
Shiníchi Satoh
33
62
0
04 Apr 2023
In BLOOM: Creativity and Affinity in Artificial Lyrics and Art
Evan Crothers
H. Viktor
Nathalie Japkowicz
30
3
0
13 Jan 2023
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
35
21
0
30 Dec 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
53
50
0
24 Aug 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
T. Shamardina
Vladislav Mikhailov
Daniil Chernianskii
Alena Fenogenova
Marat Saidov
A. Valeeva
Tatiana Shavrina
I. Smurov
E. Tutubalina
Ekaterina Artemova
DeLMO
16
30
0
03 Jun 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
18
68
0
19 May 2022
SNaC: Coherence Error Detection for Narrative Summarization
Tanya Goyal
Junyi Jessy Li
Greg Durrett
27
27
0
19 May 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
94
26
0
13 May 2022
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models
Tanya Goyal
Nazneen Rajani
Wenhao Liu
Wojciech Kry'sciñski
AI4CE
15
12
0
08 Oct 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
37
341
0
02 Feb 2021
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
227
115
0
13 Oct 2020
1