ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.03098
  4. Cited By
Accounting for Variance in Machine Learning Benchmarks

Accounting for Variance in Machine Learning Benchmarks

1 March 2021
Xavier Bouthillier
Pierre Delaunay
Mirko Bronzi
Assya Trofimov
Brennan Nichyporuk
Justin Szeto
Naz Sepah
Edward Raff
Kanika Madan
Vikram S. Voleti
Samira Ebrahimi Kahou
Vincent Michalski
Dmitriy Serdyuk
Tal Arbel
C. Pal
Gaël Varoquaux
Pascal Vincent
ArXivPDFHTML

Papers citing "Accounting for Variance in Machine Learning Benchmarks"

50 / 89 papers shown
Title
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
On the Mistaken Assumption of Interchangeable Deep Reinforcement Learning Implementations
Rajdeep Singh Hundal
Yan Xiao
Xiaochun Cao
Jin Song Dong
Manuel Rigger
48
0
0
28 Mar 2025
Crash Severity Analysis of Child Bicyclists using Arm-Net and MambaNet
Shriyank Somvanshi
Rohit Chakraborty
Subasish Das
Anandi K Dutta
40
1
0
14 Mar 2025
The Curious Case of Arbitrariness in Machine Learning
Prakhar Ganesh
Afaf Taik
G. Farnadi
59
2
0
28 Jan 2025
Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer's disease MRI dataset using explainable deep learning
Pfungst and Clever Hans: Identifying the unintended cues in a widely used Alzheimer's disease MRI dataset using explainable deep learning
C. Tinauer
Maximilian Sackl
Rudolf Stollberger
Stefan Ropele
C. Langkammer
AAML
40
0
0
27 Jan 2025
Benchmark Data Repositories for Better Benchmarking
Benchmark Data Repositories for Better Benchmarking
Rachel Longjohn
Markelle Kelly
Sameer Singh
Padhraic Smyth
46
0
0
31 Oct 2024
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel
  Governance Mechanisms
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms
Jordan Meyer
Nick Padgett
Cullen Miller
Laura Exline
31
4
0
30 Oct 2024
Measuring and Controlling Solution Degeneracy across Task-Trained
  Recurrent Neural Networks
Measuring and Controlling Solution Degeneracy across Task-Trained Recurrent Neural Networks
Ann Huang
Satpreet H. Singh
Kanaka Rajan
24
0
0
04 Oct 2024
Confidence intervals uncovered: Are we ready for real-world medical
  imaging AI?
Confidence intervals uncovered: Are we ready for real-world medical imaging AI?
Evangelia Christodoulou
Annika Reinke
Rola Houhou
P. Kalinowski
Selen Erkan
...
Paul F. Jäger
Annette Kopp-Schneider
Gaël Varoquaux
O. Colliot
Lena Maier-Hein
OOD
31
3
0
26 Sep 2024
Revisiting Static Feature-Based Android Malware Detection
Revisiting Static Feature-Based Android Malware Detection
Md Tanvirul Alam
Dipkamal Bhusal
Nidhi Rastogi
AAML
35
0
0
11 Sep 2024
IIFE: Interaction Information Based Automated Feature Engineering
IIFE: Interaction Information Based Automated Feature Engineering
Tom Overman
Diego Klabjan
J. Utke
24
1
0
07 Sep 2024
The Influence of Faulty Labels in Data Sets on Human Pose Estimation
The Influence of Faulty Labels in Data Sets on Human Pose Estimation
Arnold Schwarz
Levente Hernadi
Felix Bießmann
Kristian Hildebrand
42
0
0
05 Sep 2024
Deep Learning for Network Anomaly Detection under Data Contamination:
  Evaluating Robustness and Mitigating Performance Degradation
Deep Learning for Network Anomaly Detection under Data Contamination: Evaluating Robustness and Mitigating Performance Degradation
D'Jeff K. Nkashama
Jordan Masakuna Félicien
Arian Soltani
Jean-Charles Verdier
Pierre Martin Tardif
Marc Frappier
F. Kabanza
AAML
32
1
0
11 Jul 2024
Generalizability of experimental studies
Generalizability of experimental studies
Federico Matteucci
Vadim Arzamasov
Jose Cribeiro-Ramallo
Marco Heyden
Konstantin Ntounas
Klemens Bohm
50
0
0
25 Jun 2024
CoDreamer: Communication-Based Decentralised World Models
CoDreamer: Communication-Based Decentralised World Models
Edan Toledo
Amanda Prorok
43
0
0
19 Jun 2024
Learning from Uncertain Data: From Possible Worlds to Possible Models
Learning from Uncertain Data: From Possible Worlds to Possible Models
Jiongli Zhu
Su Feng
Boris Glavic
Babak Salimi
37
0
0
28 May 2024
Reshuffling Resampling Splits Can Improve Generalization of
  Hyperparameter Optimization
Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization
Thomas Nagler
Lennart Schneider
B. Bischl
Matthias Feurer
45
2
0
24 May 2024
Reinforcing Language Agents via Policy Optimization with Action
  Decomposition
Reinforcing Language Agents via Policy Optimization with Action Decomposition
Muning Wen
Bo Liu
Weinan Zhang
Jun Wang
Ying Wen
46
8
0
23 May 2024
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Katherine Xu
Lingzhi Zhang
Jianbo Shi
43
12
0
23 May 2024
Position: Why We Must Rethink Empirical Research in Machine Learning
Position: Why We Must Rethink Empirical Research in Machine Learning
Moritz Herrmann
F. J. D. Lange
Katharina Eggensperger
Giuseppe Casalicchio
Marcel Wever
Matthias Feurer
David Rügamer
Eyke Hüllermeier
A. Boulesteix
Bernd Bischl
55
6
0
03 May 2024
Explainable concept mappings of MRI: Revealing the mechanisms underlying
  deep learning-based brain disease classification
Explainable concept mappings of MRI: Revealing the mechanisms underlying deep learning-based brain disease classification
C. Tinauer
A. Damulina
Maximilian Sackl
M. Soellradl
Reduan Achtibat
...
Sebastian Lapuschkin
Reinhold Schmidt
Stefan Ropele
Wojciech Samek
C. Langkammer
FAtt
35
1
0
16 Apr 2024
On Sensitivity of Learning with Limited Labelled Data to the Effects of
  Randomness: Impact of Interactions and Systematic Choices
On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices
Branislav Pecher
Ivan Srba
Maria Bielikova
69
3
0
20 Feb 2024
SPO: Sequential Monte Carlo Policy Optimisation
SPO: Sequential Monte Carlo Policy Optimisation
Matthew Macfarlane
Edan Toledo
Donal Byrne
Paul Duckworth
Alexandre Laterre
30
1
0
12 Feb 2024
Entropy-Regularized Token-Level Policy Optimization for Language Agent
  Reinforcement
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement
Muning Wen
Junwei Liao
Cheng Deng
Jun Wang
Weinan Zhang
Ying Wen
28
1
0
09 Feb 2024
Calibration-then-Calculation: A Variance Reduced Metric Framework in
  Deep Click-Through Rate Prediction Models
Calibration-then-Calculation: A Variance Reduced Metric Framework in Deep Click-Through Rate Prediction Models
Yewen Fan
Nian Si
Xiangchen Song
Kun Zhang
18
0
0
30 Jan 2024
Evaluation of pseudo-healthy image reconstruction for anomaly detection
  with deep generative models: Application to brain FDG PET
Evaluation of pseudo-healthy image reconstruction for anomaly detection with deep generative models: Application to brain FDG PET
Ravi Hassanaly
Camille Brianceau
Maelys Solal
O. Colliot
Ninon Burgos
MedIm
38
5
0
29 Jan 2024
Faster ISNet for Background Bias Mitigation on Deep Neural Networks
Faster ISNet for Background Bias Mitigation on Deep Neural Networks
P. R. Bassi
S. Decherchi
Andrea Cavalli
25
0
0
16 Jan 2024
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!
Tirth Patel
Fred Lu
Edward Raff
Charles K. Nicholas
Cynthia Matuszek
James Holt
35
3
0
25 Dec 2023
On The Fairness Impacts of Hardware Selection in Machine Learning
On The Fairness Impacts of Hardware Selection in Machine Learning
Sree Harsha Nelaturu
Nishaanth Kanna Ravichandran
Cuong Tran
Sara Hooker
Ferdinando Fioretto
53
2
0
06 Dec 2023
Elo Uncovered: Robustness and Best Practices in Language Model
  Evaluation
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
M. Boubdir
Edward Kim
Beyza Ermis
Sara Hooker
Marzieh Fadaee
ELM
27
35
0
29 Nov 2023
Reproducibility in Multiple Instance Learning: A Case For Algorithmic
  Unit Tests
Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
Edward Raff
James Holt
27
3
0
27 Oct 2023
Fantastic Gains and Where to Find Them: On the Existence and Prospect of
  General Knowledge Transfer between Any Pretrained Model
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model
Karsten Roth
Lukas Thede
Almut Sophia Koepke
Oriol Vinyals
Olivier J. Hénaff
Zeynep Akata
AAML
30
12
0
26 Oct 2023
Variance of ML-based software fault predictors: are we really improving
  fault prediction?
Variance of ML-based software fault predictors: are we really improving fault prediction?
Xhulja Shahini
Domenic Bubel
Andreas Metzger
AAML
16
1
0
26 Oct 2023
Target Variable Engineering
Target Variable Engineering
Jessica Clark
35
0
0
13 Oct 2023
Robust Nonparametric Hypothesis Testing to Understand Variability in
  Training Neural Networks
Robust Nonparametric Hypothesis Testing to Understand Variability in Training Neural Networks
Sinjini Banerjee
Reilly Cannon
Tim Marrinan
Tony Chiang
Anand D. Sarwate
OOD
26
0
0
01 Oct 2023
REFORMS: Reporting Standards for Machine Learning Based Science
REFORMS: Reporting Standards for Machine Learning Based Science
Sayash Kapoor
Emily F. Cantrell
Kenny Peng
Thanh Hien Pham
C. Bail
...
Matthew J. Salganik
Marta Serra-Garcia
Brandon M Stewart
Gilles Vandewiele
Arvind Narayanan
10
19
0
15 Aug 2023
Model Reporting for Certifiable AI: A Proposal from Merging EU
  Regulation into AI Development
Model Reporting for Certifiable AI: A Proposal from Merging EU Regulation into AI Development
Danilo Brajovic
Niclas Renner
Vincent Philipp Goebels
Philipp Wagner
Benjamin Frész
M. Biller
Mara Klaeb
Janika Kutz
Jens Neuhuettler
Marco F. Huber
19
9
0
21 Jul 2023
Confidence Intervals for Performance Estimates in Brain MRI Segmentation
Confidence Intervals for Performance Estimates in Brain MRI Segmentation
Rosana El Jurdi
Gaël Varoquaux
O. Colliot
14
1
0
20 Jul 2023
A benchmark of categorical encoders for binary classification
A benchmark of categorical encoders for binary classification
Federico Matteucci
Vadim Arzamasov
Klemens Boehm
ELM
26
4
0
17 Jul 2023
GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection
GADBench: Revisiting and Benchmarking Supervised Graph Anomaly Detection
Jianheng Tang
Fengrui Hua
Zi-Chao Gao
P. Zhao
Jia Li
27
55
0
21 Jun 2023
Improving Convergence and Generalization Using Parameter Symmetries
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
33
13
0
22 May 2023
A benchmark for computational analysis of animal behavior, using
  animal-borne tags
A benchmark for computational analysis of animal behavior, using animal-borne tags
Benjamin Hoffman
M. Cusimano
V. Baglione
D. Canestrari
D. Chevallier
...
O. Vainio
A. Vehkaoja
Ken Yoda
Katie Zacarian
A. Friedlaender
25
7
0
18 May 2023
Large Language Models for Automated Data Science: Introducing CAAFE for
  Context-Aware Automated Feature Engineering
Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
Noah Hollmann
Samuel G. Müller
Frank Hutter
45
53
0
05 May 2023
Time Series Classification for Detecting Parkinson's Disease from Wrist
  Motions
Time Series Classification for Detecting Parkinson's Disease from Wrist Motions
Cedric Donié
Neha Das
Satoshi Endo
Sandra Hirche
18
1
0
21 Apr 2023
The Dataset Multiplicity Problem: How Unreliable Data Impacts
  Predictions
The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions
Anna P. Meyer
Aws Albarghouthi
Loris Dántoni
30
13
0
20 Apr 2023
On the Variance of Neural Network Training with respect to Test Sets and
  Distributions
On the Variance of Neural Network Training with respect to Test Sets and Distributions
Keller Jordan
OOD
24
11
0
04 Apr 2023
Can Fairness be Automated? Guidelines and Opportunities for
  Fairness-aware AutoML
Can Fairness be Automated? Guidelines and Opportunities for Fairness-aware AutoML
Hilde J. P. Weerts
Florian Pfisterer
Matthias Feurer
Katharina Eggensperger
Eddie Bergman
Noor H. Awad
Joaquin Vanschoren
Mykola Pechenizkiy
B. Bischl
Frank Hutter
FaML
41
18
0
15 Mar 2023
Accounting for multiplicity in machine learning benchmark performance
Accounting for multiplicity in machine learning benchmark performance
Kajsa Møllersen
Einar J. Holsbø
9
2
0
10 Mar 2023
Towards Inferential Reproducibility of Machine Learning Research
Towards Inferential Reproducibility of Machine Learning Research
Michael Hagmann
Philipp Meier
Stefan Riezler
29
2
0
08 Feb 2023
How to select predictive models for causal inference?
How to select predictive models for causal inference?
M. Doutreligne
Gaël Varoquaux
ELM
CML
29
2
0
01 Feb 2023
A Coreset Learning Reality Check
A Coreset Learning Reality Check
Fred Lu
Edward Raff
James Holt
24
5
0
15 Jan 2023
12
Next