Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.06835
Cited By
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
14 September 2021
Marzena Karpinska
Nader Akoury
Mohit Iyyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation"
23 / 23 papers shown
Title
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
203
0
0
21 Feb 2025
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu
Laura Ruis
Tim Rocktaschel
Robert Kirk
38
0
0
19 Feb 2025
Economics of Sourcing Human Data
Sebastin Santy
Prasanta Bhattacharya
Manoel Horta Ribeiro
Kelsey Allen
Sewoong Oh
69
0
0
11 Feb 2025
A Collection of Question Answering Datasets for Norwegian
Vladislav Mikhailov
Petter Mæhlum
Victoria Ovedie Chruickshank Langø
Erik Velldal
Lilja Øvrelid
RALM
41
4
0
19 Jan 2025
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
52
7
0
09 May 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu
Hayoung Jung
Anjali Singh
Monojit Choudhury
Tanushree Mitra
32
8
0
08 May 2024
Evaluating Optimal Reference Translations
Vilém Zouhar
Vvera Kloudová
Martin Popel
Ondrej Bojar
29
2
0
28 Nov 2023
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
Carlos Gómez-Rodríguez
Paul Williams
29
65
0
12 Oct 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman
Yao Dou
Wei-ping Xu
22
7
0
14 Aug 2023
GIO: Gradient Information Optimization for Training Dataset Selection
Dante Everaert
Christopher Potts
21
3
0
20 Jun 2023
Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond
Haw-Shiuan Chang
Zonghai Yao
Alolika Gon
Hong-ye Yu
Andrew McCallum
43
10
0
20 May 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
Mayu Otani
Riku Togashi
Yu Sawai
Ryosuke Ishigami
Yuta Nakashima
Esa Rahtu
J. Heikkilä
Shiníchi Satoh
33
62
0
04 Apr 2023
In BLOOM: Creativity and Affinity in Artificial Lyrics and Art
Evan Crothers
H. Viktor
Nathalie Japkowicz
30
3
0
13 Jan 2023
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
31
21
0
30 Dec 2022
Of Human Criteria and Automatic Metrics: A Benchmark of the Evaluation of Story Generation
Cyril Chhun
Pierre Colombo
Chloé Clavel
Fabian M. Suchanek
51
50
0
24 Aug 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
16
68
0
19 May 2022
SNaC: Coherence Error Detection for Narrative Summarization
Tanya Goyal
Junyi Jessy Li
Greg Durrett
24
27
0
19 May 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
94
26
0
13 May 2022
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models
Tanya Goyal
Nazneen Rajani
Wenhao Liu
Wojciech Kry'sciñski
AI4CE
15
12
0
08 Oct 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
37
341
0
02 Feb 2021
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
225
115
0
13 Oct 2020
1