Do We Train on Test Data? Purging CIFAR of Near-Duplicates

1 February 2019

Papers citing "Do We Train on Test Data? Purging CIFAR of Near-Duplicates"

26 / 26 papers shown

Title
Impact of Data Duplication on Deep Neural Network-Based Image Classifiers: Robust vs. Standard Models Alireza Aghabagherloo Aydin Abadi Sumanta Sarkar Vishnu Asutosh Dasu Bart Preneel AAML 65 0 0 01 Apr 2025
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions Luca Arnaboldi Yatin Dandi Florent Krzakala Luca Pesce Ludovic Stephan 70 12 0 24 May 2024
Automated Program Repair: Emerging trends pose and expose problems for benchmarks J. Renzullo Pemma Reiter Westley Weimer Stephanie Forrest 42 1 0 08 May 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets Lorenzo Brigato Stavroula Mougiakakou 45 0 0 08 Mar 2024
Benchmarking Pretrained Vision Embeddings for Near- and Duplicate Detection in Medical Images Tuan Truong Farnaz Khun Jush Matthias Lenga 34 2 0 12 Dec 2023
Data-Efficient Energy-Aware Participant Selection for UAV-Enabled Federated Learning Youssra Cheriguene Wael Jaafar Kerrache Chaker Abdelaziz H. Yanikomeroglu Fatima Zohra Bousbaa N. Lagraa FedML 44 1 0 14 Aug 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 39 3 0 21 Jun 2023
How Effective Are Neural Networks for Fixing Security Vulnerabilities Yi Wu Nan Jiang H. Pham Thibaud Lutellier Jordan Davis Lin Tan Petr Babkin Sameena Shah AAML 21 79 0 29 May 2023
Infinite Class Mixup Thomas Mensink Pascal Mettes 31 2 0 17 May 2023
Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition Rayson Laroca Valter Estevam A. Britto Rodrigo Minetto David Menotti 30 10 0 10 Apr 2023
Image Classification with Small Datasets: Overview and Benchmark Lorenzo Brigato Björn Barz Luca Iocchi Joachim Denzler VLM 30 17 0 23 Dec 2022
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences Natavsa Tagasovska Nathan C. Frey Andreas Loukas I. Hotzel J. Lafrance-Vanasse ... A. Rajpal Richard Bonneau Kyunghyun Cho Stephen Ra Vladimir Gligorijević 18 11 0 19 Oct 2022
Unsupervised visualization of image datasets using contrastive learning Jan Boehm Philipp Berens D. Kobak SSL 26 15 0 18 Oct 2022
Bugs in the Data: How ImageNet Misrepresents Biodiversity A. Luccioni David Rolnick 21 43 0 24 Aug 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet Vijay Vasudevan Benjamin Caine Raphael Gontijo-Lopes Sara Fridovich-Keil Rebecca Roelofs VLM UQCV 48 57 0 09 May 2022
A Siren Song of Open Source Reproducibility Edward Raff Andrew L. Farris 16 9 0 09 Apr 2022
Datamodels: Predicting Predictions from Training Data Andrew Ilyas Sung Min Park Logan Engstrom Guillaume Leclerc A. Madry TDI 47 131 0 01 Feb 2022
A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels R. Joyce Edward Raff Charles K. Nicholas 48 16 0 23 Sep 2021
Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification Lorenzo Brigato Björn Barz Luca Iocchi Joachim Denzler 35 16 0 30 Aug 2021
On Memorization in Probabilistic Deep Generative Models G. V. D. Burg Christopher K. I. Williams TDI 25 59 0 06 Jun 2021
FSD50K: An Open Dataset of Human-Labeled Sound Events Eduardo Fonseca Xavier Favory Jordi Pons F. Font Xavier Serra 26 436 0 01 Oct 2020
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation Vitaly Feldman Chiyuan Zhang TDI 46 441 0 09 Aug 2020
Are we done with ImageNet? Lucas Beyer Olivier J. Hénaff Alexander Kolesnikov Xiaohua Zhai Aaron van den Oord VLM 19 397 0 12 Jun 2020
The Curious Case of Convex Neural Networks S. Sivaprasad Ankur Singh Naresh Manwani Vineet Gandhi 46 27 0 09 Jun 2020
Self-Distillation as Instance-Specific Label Smoothing Zhilu Zhang M. Sabuncu 20 116 0 09 Jun 2020
Big Transfer (BiT): General Visual Representation Learning Alexander Kolesnikov Lucas Beyer Xiaohua Zhai J. Puigcerver Jessica Yung Sylvain Gelly N. Houlsby MQ 106 1,183 0 24 Dec 2019