ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.04585
  4. Cited By
The Ladder: A Reliable Leaderboard for Machine Learning Competitions

The Ladder: A Reliable Leaderboard for Machine Learning Competitions

16 February 2015
Avrim Blum
Moritz Hardt
ArXivPDFHTML

Papers citing "The Ladder: A Reliable Leaderboard for Machine Learning Competitions"

28 / 28 papers shown
Title
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min-Bin Lin
47
8
0
09 Oct 2024
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
34
6
0
13 Apr 2023
Attention is Not Always What You Need: Towards Efficient Classification
  of Domain-Specific Text
Attention is Not Always What You Need: Towards Efficient Classification of Domain-Specific Text
Yasmen Wahba
N. Madhavji
John Steinbacher
33
0
0
31 Mar 2023
Making Progress Based on False Discoveries
Making Progress Based on False Discoveries
Roi Livni
38
0
0
19 Apr 2022
Sequential algorithmic modification with test data reuse
Sequential algorithmic modification with test data reuse
Jean Feng
Gene Pennello
N. Petrick
B. Sahiner
Romain Pirracchio
Alexej Gossmann
19
4
0
21 Mar 2022
An Uncommon Task: Participatory Design in Legal AI
An Uncommon Task: Participatory Design in Legal AI
Fernando A. Delgado
Solon Barocas
K. Levy
15
34
0
08 Mar 2022
The Benchmark Lottery
The Benchmark Lottery
Mostafa Dehghani
Yi Tay
A. Gritsenko
Zhe Zhao
N. Houlsby
Fernando Diaz
Donald Metzler
Oriol Vinyals
42
89
0
14 Jul 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for
  Equitable Evaluation
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Swaroop Mishra
Anjana Arunkumar
31
24
0
10 Jun 2021
RATT: Leveraging Unlabeled Data to Guarantee Generalization
RATT: Leveraging Unlabeled Data to Guarantee Generalization
Saurabh Garg
Sivaraman Balakrishnan
J. Zico Kolter
Zachary Chase Lipton
30
30
0
01 May 2021
A Data Quality-Driven View of MLOps
A Data Quality-Driven View of MLOps
Cédric Renggli
Luka Rimanic
Nezihe Merve Gürel
Bojan Karlavs
Wentao Wu
Ce Zhang
AI4TS
22
65
0
15 Feb 2021
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Kawin Ethayarajh
Dan Jurafsky
ELM
24
51
0
29 Sep 2020
Approval policies for modifications to Machine Learning-Based Software
  as a Medical Device: A study of bio-creep
Approval policies for modifications to Machine Learning-Based Software as a Medical Device: A study of bio-creep
Jean Feng
S. Emerson
N. Simon
13
20
0
28 Dec 2019
A Rademacher Complexity Based Method fo rControlling Power and
  Confidence Level in Adaptive Statistical Analysis
A Rademacher Complexity Based Method fo rControlling Power and Confidence Level in Adaptive Statistical Analysis
L. Stefani
E. Upfal
19
8
0
04 Oct 2019
Mix and Match: An Optimistic Tree-Search Approach for Learning Models
  from Mixture Distributions
Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions
Matthew Faw
Rajat Sen
Karthikeyan Shanmugam
C. Caramanis
Sanjay Shakkottai
33
3
0
23 Jul 2019
Model Similarity Mitigates Test Set Overuse
Model Similarity Mitigates Test Set Overuse
Horia Mania
John Miller
Ludwig Schmidt
Moritz Hardt
Benjamin Recht
20
50
0
29 May 2019
The advantages of multiple classes for reducing overfitting from test
  set reuse
The advantages of multiple classes for reducing overfitting from test set reuse
Vitaly Feldman
Roy Frostig
Moritz Hardt
27
29
0
24 May 2019
Continuous Integration of Machine Learning Models with ease.ml/ci:
  Towards a Rigorous Yet Practical Treatment
Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment
Cédric Renggli
Bojan Karlas
Bolin Ding
Feng Liu
Kevin Schawinski
Wentao Wu
Ce Zhang
VLM
9
47
0
01 Mar 2019
Do ImageNet Classifiers Generalize to ImageNet?
Do ImageNet Classifiers Generalize to ImageNet?
Benjamin Recht
Rebecca Roelofs
Ludwig Schmidt
Vaishaal Shankar
OOD
SSeg
VLM
40
1,650
0
13 Feb 2019
How to Host a Data Competition: Statistical Advice for Design and
  Analysis of a Data Competition
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
C. Anderson‐Cook
Kary L. Myers
Lu Lu
M. Fugate
K. Quinlan
N. Pawley
14
11
0
16 Jan 2019
Asynchronous Online Testing of Multiple Hypotheses
Asynchronous Online Testing of Multiple Hypotheses
Tijana Zrnic
Aaditya Ramdas
Michael I. Jordan
14
30
0
12 Dec 2018
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Benjamin Recht
Rebecca Roelofs
Ludwig Schmidt
Vaishaal Shankar
OOD
FedML
ELM
19
405
0
01 Jun 2018
The Everlasting Database: Statistical Validity at a Fair Price
The Everlasting Database: Statistical Validity at a Fair Price
Blake E. Woodworth
Vitaly Feldman
Saharon Rosset
Nathan Srebro
32
2
0
12 Mar 2018
Validation, comparison, and combination of algorithms for automatic
  detection of pulmonary nodules in computed tomography images: the LUNA16
  challenge
Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the LUNA16 challenge
A. Setio
A. Traverso
Thomas de Bel
Moira S. N. Berens
C. V. D. Bogaard
...
Jef Vandemeulebroucke
N. Walasek
G. Zuidhof
Bram van Ginneken
Colin Jacobs
40
1,057
0
23 Dec 2016
Neural Network Matrix Factorization
Neural Network Matrix Factorization
Gintare Karolina Dziugaite
Daniel M. Roy
14
175
0
19 Nov 2015
How much does your data exploration overfit? Controlling bias via
  information usage
How much does your data exploration overfit? Controlling bias via information usage
D. Russo
James Zou
14
185
0
16 Nov 2015
Generalization in Adaptive Data Analysis and Holdout Reuse
Generalization in Adaptive Data Analysis and Holdout Reuse
Cynthia Dwork
Vitaly Feldman
Moritz Hardt
T. Pitassi
Omer Reingold
Aaron Roth
16
228
0
08 Jun 2015
Learning with Differential Privacy: Stability, Learnability and the
  Sufficiency and Necessity of ERM Principle
Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle
Yu-Xiang Wang
Jing Lei
S. Fienberg
33
103
0
23 Feb 2015
Preserving Statistical Validity in Adaptive Data Analysis
Preserving Statistical Validity in Adaptive Data Analysis
Cynthia Dwork
Vitaly Feldman
Moritz Hardt
T. Pitassi
Omer Reingold
Aaron Roth
34
375
0
10 Nov 2014
1