The Ladder: A Reliable Leaderboard for Machine Learning Competitions

16 February 2015

Papers citing "The Ladder: A Reliable Leaderboard for Machine Learning Competitions"

28 / 28 papers shown

Title
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Xiaosen Zheng Tianyu Pang Chao Du Qian Liu Jing Jiang Min-Bin Lin 47 8 0 09 Oct 2024
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 34 6 0 13 Apr 2023
Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text Yasmen Wahba N. Madhavji John Steinbacher 33 0 0 31 Mar 2023
Making Progress Based on False Discoveries Roi Livni 38 0 0 19 Apr 2022
Sequential algorithmic modification with test data reuse Jean Feng Gene Pennello N. Petrick B. Sahiner Romain Pirracchio Alexej Gossmann 19 4 0 21 Mar 2022
An Uncommon Task: Participatory Design in Legal AI Fernando A. Delgado Solon Barocas K. Levy 15 34 0 08 Mar 2022
The Benchmark Lottery Mostafa Dehghani Yi Tay A. Gritsenko Zhe Zhao N. Houlsby Fernando Diaz Donald Metzler Oriol Vinyals 42 89 0 14 Jul 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation Swaroop Mishra Anjana Arunkumar 31 24 0 10 Jun 2021
RATT: Leveraging Unlabeled Data to Guarantee Generalization Saurabh Garg Sivaraman Balakrishnan J. Zico Kolter Zachary Chase Lipton 30 30 0 01 May 2021
A Data Quality-Driven View of MLOps Cédric Renggli Luka Rimanic Nezihe Merve Gürel Bojan Karlavs Wentao Wu Ce Zhang AI4TS 22 65 0 15 Feb 2021
Utility is in the Eye of the User: A Critique of NLP Leaderboards Kawin Ethayarajh Dan Jurafsky ELM 24 51 0 29 Sep 2020
Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep Jean Feng S. Emerson N. Simon 13 20 0 28 Dec 2019
A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical Analysis L. Stefani E. Upfal 19 8 0 04 Oct 2019
Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions Matthew Faw Rajat Sen Karthikeyan Shanmugam C. Caramanis Sanjay Shakkottai 33 3 0 23 Jul 2019
Model Similarity Mitigates Test Set Overuse Horia Mania John Miller Ludwig Schmidt Moritz Hardt Benjamin Recht 20 50 0 29 May 2019
The advantages of multiple classes for reducing overfitting from test set reuse Vitaly Feldman Roy Frostig Moritz Hardt 27 29 0 24 May 2019
Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment Cédric Renggli Bojan Karlas Bolin Ding Feng Liu Kevin Schawinski Wentao Wu Ce Zhang VLM 9 47 0 01 Mar 2019
Do ImageNet Classifiers Generalize to ImageNet? Benjamin Recht Rebecca Roelofs Ludwig Schmidt Vaishaal Shankar OOD SSeg VLM 40 1,650 0 13 Feb 2019
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition C. Anderson‐Cook Kary L. Myers Lu Lu M. Fugate K. Quinlan N. Pawley 14 11 0 16 Jan 2019
Asynchronous Online Testing of Multiple Hypotheses Tijana Zrnic Aaditya Ramdas Michael I. Jordan 14 30 0 12 Dec 2018
Do CIFAR-10 Classifiers Generalize to CIFAR-10? Benjamin Recht Rebecca Roelofs Ludwig Schmidt Vaishaal Shankar OOD FedML ELM 19 405 0 01 Jun 2018
The Everlasting Database: Statistical Validity at a Fair Price Blake E. Woodworth Vitaly Feldman Saharon Rosset Nathan Srebro 32 2 0 12 Mar 2018
Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge A. Setio A. Traverso Thomas de Bel Moira S. N. Berens C. V. D. Bogaard ... Jef Vandemeulebroucke N. Walasek G. Zuidhof Bram van Ginneken Colin Jacobs 40 1,057 0 23 Dec 2016
Neural Network Matrix Factorization Gintare Karolina Dziugaite Daniel M. Roy 14 175 0 19 Nov 2015
How much does your data exploration overfit? Controlling bias via information usage D. Russo James Zou 14 185 0 16 Nov 2015
Generalization in Adaptive Data Analysis and Holdout Reuse Cynthia Dwork Vitaly Feldman Moritz Hardt T. Pitassi Omer Reingold Aaron Roth 16 228 0 08 Jun 2015
Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle Yu-Xiang Wang Jing Lei S. Fienberg 33 103 0 23 Feb 2015
Preserving Statistical Validity in Adaptive Data Analysis Cynthia Dwork Vitaly Feldman Moritz Hardt T. Pitassi Omer Reingold Aaron Roth 34 375 0 10 Nov 2014