Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.05345
Cited By
Data and its (dis)contents: A survey of dataset development and use in machine learning research
9 December 2020
Amandalynne Paullada
Inioluwa Deborah Raji
Emily M. Bender
Emily L. Denton
A. Hanna
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Data and its (dis)contents: A survey of dataset development and use in machine learning research"
28 / 78 papers shown
Title
Data Smells: Categories, Causes and Consequences, and Detection of Suspicious Data in AI-based Systems
Harald Foidl
Michael Felderer
Rudolf Ramler
13
31
0
19 Mar 2022
Sex Trouble: Common pitfalls in incorporating sex/gender in medical machine learning and how to avoid them
Kendra Albert
Maggie K. Delano
FaML
21
11
0
15 Mar 2022
FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
Ilias Chalkidis
Tommaso Pasini
Shenmin Zhang
Letizia Tomada
Sebastian Felix Schwemer
Anders Søgaard
AILaw
40
54
0
14 Mar 2022
A streamable large-scale clinical EEG dataset for Deep Learning
Dung Truong
Manisha Sinha
K. Venkataraju
M. Milham
Arnaud Delorme
24
4
0
04 Mar 2022
3D Common Corruptions and Data Augmentation
Oğuzhan Fatih Kar
Teresa Yeo
Andrei Atanov
Amir Zamir
3DPC
45
107
0
02 Mar 2022
Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR
Nina Markl
S. McNulty
24
9
0
25 Feb 2022
The craft and coordination of data curation: complicating "workflow" views of data science
A. Thomer
Dharma Akmon
J. York
Allison R. B. Tyler
Faye O. Polasek
Sara Lafia
Libby Hemphill
E. Yakel
16
20
0
09 Feb 2022
Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts
Sebastian Bordt
Michèle Finck
Eric Raidl
U. V. Luxburg
AILaw
39
77
0
25 Jan 2022
There is an elephant in the room: Towards a critique on the use of fairness in biometrics
Ana Valdivia
Júlia Corbera Serrajòrdia
Aneta Swianiewicz
21
14
0
16 Dec 2021
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
Bernard Koch
Emily L. Denton
A. Hanna
J. Foster
41
140
0
03 Dec 2021
RedCaps: web-curated image-text data created by the people, for the people
Karan Desai
Gaurav Kaul
Zubin Aysola
Justin Johnson
22
162
0
22 Nov 2021
Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing
Abhilash Mishra
Yash Gorana
19
3
0
16 Nov 2021
Building Legal Datasets
Jerrold Soh
ELM
AILaw
22
3
0
03 Nov 2021
A survey on datasets for fairness-aware machine learning
Tai Le Quy
Arjun Roy
Vasileios Iosifidis
Wenbin Zhang
Eirini Ntoutsi
FaML
11
240
0
01 Oct 2021
PASS: An ImageNet replacement for self-supervised pretraining without humans
Yuki M. Asano
Christian Rupprecht
Andrew Zisserman
Andrea Vedaldi
VLM
SSL
21
57
0
27 Sep 2021
Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?
Milagros Miceli
Julian Posada
Tianling Yang
22
60
0
16 Sep 2021
Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP
Anna Rogers
Timothy Baldwin
Kobi Leins
104
64
0
14 Sep 2021
Retiring Adult: New Datasets for Fair Machine Learning
Frances Ding
Moritz Hardt
John Miller
Ludwig Schmidt
57
428
0
10 Aug 2021
Mitigating Dataset Harms Requires Stewardship: Lessons from 1000 Papers
Kenny Peng
Arunesh Mathur
Arvind Narayanan
99
93
0
06 Aug 2021
How to avoid machine learning pitfalls: a guide for academic researchers
M. Lones
VLM
FaML
OnRL
62
77
0
05 Aug 2021
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Irene Solaiman
Christy Dennison
30
222
0
18 Jun 2021
Understanding and Evaluating Racial Biases in Image Captioning
Dora Zhao
Angelina Wang
Olga Russakovsky
24
134
0
16 Jun 2021
A Study of Face Obfuscation in ImageNet
Kaiyu Yang
Jacqueline Yau
Li Fei-Fei
Jia Deng
Olga Russakovsky
PICV
CVBM
30
144
0
10 Mar 2021
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
290
1,815
0
14 Dec 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
Improving fairness in machine learning systems: What do industry practitioners need?
Kenneth Holstein
Jennifer Wortman Vaughan
Hal Daumé
Miroslav Dudík
Hanna M. Wallach
FaML
HAI
192
742
0
13 Dec 2018
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
Previous
1
2