Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

26 March 2021

Papers citing "Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks"

50 / 92 papers shown

Title
The Pitfalls of Benchmarking in Algorithm Selection: What We Are Getting Wrong G. Petelin Gjorgjina Cenikj 34 0 0 12 May 2025
Adversarial Robustness of Deep Learning Models for Inland Water Body Segmentation from SAR Images Siddharth Kothari Srinivasan Murali Sankalp Kothari Ujjwal Verma Jaya Sreevalsan-Nair 57 0 0 03 May 2025
When Dynamic Data Selection Meets Data Augmentation Steve Yang Peng Ye Furao Shen Dongzhan Zhou 42 0 0 02 May 2025
Hide and Seek in Noise Labels: Noise-Robust Collaborative Active Learning with LLM-Powered Assistance Bo Yuan Yulin Chen Yin Zhang Wei Jiang NoLa 40 6 0 03 Apr 2025
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models Jonathan Bourne 77 0 0 24 Feb 2025
DEUCE: Dual-diversity Enhancement and Uncertainty-awareness for Cold-start Active Learning Jiaxin Guo Cheng Chen Shuzhen Li Tianze Zhang 63 0 0 01 Feb 2025
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features Po-han Li Sandeep Chinchali Ufuk Topcu 36 1 0 10 Oct 2024
Label Convergence: Defining an Upper Performance Bound in Object Recognition through Contradictory Annotations David Tschirschwitz Volker Rodehorst 31 1 0 14 Sep 2024
Assistive Image Annotation Systems with Deep Learning and Natural Language Capabilities: A Review Moseli Motsóehli VLM 3DV 37 0 0 28 Jun 2024
Data Valuation by Leveraging Global and Local Statistical Information Xiaoling Zhou Ou Wu Michael K. Ng Hao Jiang TDI 30 0 0 23 May 2024
Are large language models superhuman chemists? Adrian Mirza Nawaf Alampara Sreekanth Kunchapu Benedict Emoekabu Aswanth Krishnan ... Leanne M. Stafast Dinga Wonanke Michael Pieler P. Schwaller Kevin Maik Jablonka ELM AI4MH LRM LM&MA 31 5 0 01 Apr 2024
Better than classical? The subtle art of benchmarking quantum machine learning models Joseph Bowles Shahnawaz Ahmed Maria Schuld 42 65 0 11 Mar 2024
Corrective Machine Unlearning Shashwat Goel Ameya Prabhu Philip Torr Ponnurangam Kumaraguru Amartya Sanyal OnRL 42 14 0 21 Feb 2024
Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement Esla Timothy Anzaku Hyesoo Hong Jin-Woo Park Wonjun Yang Kangmin Kim Jongbum Won Deshika Vinoshani Kumari Herath Arnout Van Messem W. D. Neve 20 0 0 31 Jan 2024
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets Kumar Abhishek Aditi Jain Ghassan Hamarneh 49 3 0 25 Jan 2024
Towards Reliable Dermatology Evaluation Benchmarks Fabian Gröger Simone Lionetti Philippe Gottfrois Alvaro Gonzalez-Jimenez Matthew Groh Roxana Daneshjou Labelling Consortium Alexander A. Navarini Marc Pouly 33 5 0 13 Sep 2023
Adaptive conformal classification with noisy labels Matteo Sesia Y. X. R. Wang Xin Tong 24 13 0 10 Sep 2023
FPR Estimation for Fraud Detection in the Presence of Class-Conditional Label Noise Justin Tittelfitz 31 0 0 04 Aug 2023
From Attachments to SEO: Click Here to Learn More about Clickbait PDFs! Giada Stivala Sahar Abdelnabi Andrea Mengascini Mariano Graziano Mario Fritz Giancarlo Pellegrino 24 1 0 02 Aug 2023
LUCID-GAN: Conditional Generative Models to Locate Unfairness Andres Algaba Carmen Mazijn Carina E. A. Prunkl J. Danckaert Vincent Ginis SyDa 42 1 0 28 Jul 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 39 3 0 21 Jun 2023
Quantifying lottery tickets under label noise: accuracy, calibration, and complexity V. Arora Daniele Irto Sebastian Goldt G. Sanguinetti 38 2 0 21 Jun 2023
Rapid Image Labeling via Neuro-Symbolic Learning Yifeng Wang Zhi Tu Yiwen Xiang Shiyuan Zhou Xiyuan Chen Bingxuan Li Tianyi Zhang VLM 37 6 0 18 Jun 2023
AI-Supported Assessment of Load Safety Julius Schöning Niklas Kruse 22 0 0 06 Jun 2023
MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup Hua Shen Vicky Zayats Johann C. Rocholl D. D. Walker Dirk Padfield 47 3 0 19 May 2023
NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing Tingting Wu Xiao Ding Minji Tang Haotian Zhang Bing Qin Ting Liu NoLa 34 10 0 18 May 2023
Fairness and Bias in Truth Discovery Algorithms: An Experimental Analysis Simone Lazier Saravanan Thirumuruganathan Hadis Anahideh 29 3 0 25 Apr 2023
Improved Naive Bayes with Mislabeled Data Qianhan Zeng Yingqiu Zhu Xuening Zhu Feifei Wang Weichen Zhao Shuning Sun Meng Su Hansheng Wang NoLa 13 2 0 13 Apr 2023
Evaluation of Confidence-based Ensembling in Deep Learning Image Classification Rafael Rosales Peter Popov Michael Paulitsch UQCV 8 2 0 03 Mar 2023
Towards Unbounded Machine Unlearning M. Kurmanji Peter Triantafillou Jamie Hayes Eleni Triantafillou MU 28 123 0 20 Feb 2023
ActiveLab: Active Learning with Re-Labeling by Multiple Annotators Hui Wen Goh Jonas W. Mueller 29 3 0 27 Jan 2023
Look Beyond Bias with Entropic Adversarial Data Augmentation Thomas Duboudin Emmanuel Dellandrea Corentin Abgrall Gilles Hénaff Liming Chen CML 35 4 0 10 Jan 2023
Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features Qingrui Jia Xuhong Li Lei Yu Jiang Bian Penghao Zhao Shupeng Li Haoyi Xiong Dejing Dou NoLa 35 5 0 19 Dec 2022
Convergence Analysis for Training Stochastic Neural Networks via Stochastic Gradient Descent Richard Archibald F. Bao Yanzhao Cao Hui‐Jie Sun 52 2 0 17 Dec 2022
Azimuth: Systematic Error Analysis for Text Classification Gabrielle Gauthier Melançon Orlando Marquez Ayala Lindsay D. Brin Chris Tyler Frederic Branchaud-Charron Joseph Marinier Karine Grande Dieu-Thu Le 16 3 0 16 Dec 2022
Measuring Annotator Agreement Generally across Complex Structured, Multi-object, and Free-text Annotation Tasks Alexander Braylan Omar Alonso Matthew Lease 8 17 0 15 Dec 2022
The Grind for Good Data: Understanding ML Practitioners' Struggles and Aspirations in Making Good Data Inha Cha Juhyun Oh Cheul Young Park Jiyoon Han Hwalsuk Lee 29 2 0 28 Nov 2022
Combating noisy labels in object detection datasets K. Chachula Jakub Lyskawa Bartlomiej Olber Piotr Fratczak A. Popowicz Krystian Radlak NoLa 31 4 0 25 Nov 2022
Identifying Incorrect Annotations in Multi-Label Classification Data Aditya Thyagarajan Elías Snorrason Curtis G. Northcutt Jonas W. Mueller 37 10 0 25 Nov 2022
Quantifying the Impact of Label Noise on Federated Learning Shuqi Ke Chao Huang Xin Liu FedML 28 7 0 15 Nov 2022
DC-Check: A Data-Centric AI checklist to guide the development of reliable machine learning systems Nabeel Seedat F. Imrie M. Schaar 27 12 0 09 Nov 2022
Seeing the Unseen: Errors and Bias in Visual Datasets Hongrui Jin 29 0 0 03 Nov 2022
Unsupervised visualization of image datasets using contrastive learning Jan Boehm Philipp Berens D. Kobak SSL 26 15 0 18 Oct 2022
CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators Hui Wen Goh Ulyana Tkachenko Jonas W. Mueller 19 10 0 13 Oct 2022
Detecting Label Errors in Token Classification Data Wei-Chen Wang Jonas W. Mueller 27 13 0 08 Oct 2022
Annealing Optimization for Progressive Learning with Stochastic Approximation Christos N. Mavridis John S. Baras 28 10 0 06 Sep 2022
Efficient Methods for Natural Language Processing: A Survey Marcos Vinícius Treviso Ji-Ung Lee Tianchu Ji Betty van Aken Qingqing Cao ... Emma Strubell Niranjan Balasubramanian Leon Derczynski Iryna Gurevych Roy Schwartz 33 109 0 31 Aug 2022
Bugs in the Data: How ImageNet Misrepresents Biodiversity A. Luccioni David Rolnick 21 43 0 24 Aug 2022
The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning Wai Tong Chung Kihoon Jung Jacqueline H. Chen M. Ihme AI4CE 24 3 0 25 Jul 2022
POP: Mining POtential Performance of new fashion products via webly cross-modal query expansion Christian Joppi Geri Skenderi Marco Cristani 18 3 0 22 Jul 2022