Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

3 December 2021

Papers citing "Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research"

20 / 20 papers shown

Title
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation Maria Eriksson Erasmo Purificato Arman Noroozian Joao Vinagre Guillaume Chaslot Emilia Gomez David Fernandez-Llorca ELM 217 6 0 10 Feb 2025
AI and the Everything in the Whole Wide World Benchmark Inioluwa Deborah Raji Emily M. Bender Amandalynne Paullada Emily L. Denton A. Hanna 80 308 0 26 Nov 2021
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development M. Scheuerman Emily L. Denton A. Hanna 72 210 0 09 Aug 2021
Data and its (dis)contents: A survey of dataset development and use in machine learning research Amandalynne Paullada Inioluwa Deborah Raji Emily M. Bender Emily L. Denton A. Hanna 119 524 0 09 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine Learning Alexander DÁmour Katherine A. Heller D. Moldovan Ben Adlam B. Alipanahi ... Kellie Webster Steve Yadlowsky T. Yun Xiaohua Zhai D. Sculley OffRL 117 686 0 06 Nov 2020
Targeting the Benchmark: On Methodology in Current Natural Language Processing Research David Schlangen 55 58 0 07 Jul 2020
Large image datasets: A pyrrhic win for computer vision? Vinay Uday Prabhu Abeba Birhane 65 366 0 24 Jun 2020
Are we done with ImageNet? Lucas Beyer Olivier J. Hénaff Alexander Kolesnikov Xiaohua Zhai Aaron van den Oord VLM 119 401 0 12 Jun 2020
From ImageNet to Image Classification: Contextualizing Progress on Benchmarks Dimitris Tsipras Shibani Santurkar Logan Engstrom Andrew Ilyas Aleksander Madry 75 134 0 22 May 2020
Shortcut Learning in Deep Neural Networks Robert Geirhos J. Jacobsen Claudio Michaelis R. Zemel Wieland Brendel Matthias Bethge Felix Wichmann 206 2,052 0 16 Apr 2020
Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From? R. Geiger Kevin Yu Yanlai Yang Mindy Dai Jie Qiu Rebekah Tang Jenny Huang 64 153 0 17 Dec 2019
Value-laden Disciplinary Shifts in Machine Learning Ravit Dotan S. Milli AILaw 63 48 0 03 Dec 2019
Show Your Work: Improved Reporting of Experimental Results Jesse Dodge Suchin Gururangan Dallas Card Roy Schwartz Noah A. Smith 72 255 0 06 Sep 2019
A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras S. Laine Timo Aila 583 10,561 0 12 Dec 2018
Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods Jieyu Zhao Tianlu Wang Mark Yatskar Vicente Ordonez Kai-Wei Chang 119 936 0 18 Apr 2018
Datasheets for Datasets Timnit Gebru Jamie Morgenstern Briana Vecchione Jennifer Wortman Vaughan Hanna M. Wallach Hal Daumé Kate Crawford 264 2,184 0 23 Mar 2018
No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World S. Shankar Yoni Halpern Eric Breck James Atwood Jimbo Wilson D. Sculley 71 295 0 22 Nov 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms Han Xiao Kashif Rasul Roland Vollgraf 283 8,883 0 25 Aug 2017
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition Yandong Guo Lei Zhang Yuxiao Hu Xiaodong He Jianfeng Gao CVBM 166 1,961 0 27 Jul 2016
The MegaFace Benchmark: 1 Million Faces for Recognition at Scale Ira Kemelmacher-Shlizerman S. M. Seitz Daniel Miller Evan Brossard CVBM 85 862 0 02 Dec 2015