Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.10062
Cited By
DataPerf: Benchmarks for Data-Centric AI Development
20 July 2022
Mark Mazumder
Colby R. Banbury
Xiaozhe Yao
Bojan Karlavs
W. G. Rojas
Sudnya Diamos
G. Diamos
Lynn He
Alicia Parrish
Hannah Rose Kirk
Jessica Quaye
Charvi Rastogi
Douwe Kiela
David Jurado
David Kanter
Rafael Mosquera
Juan Ciro
Lora Aroyo
Bilge Acun
Lingjiao Chen
Mehul Smriti Raje
Max Bartolo
Sabri Eyuboglu
Amirata Ghorbani
E. Goodman
Oana Inel
Tariq Kane
Christine R. Kirkpatrick
Tzu-Sheng Kuo
Jonas W. Mueller
Tristan Thrush
Joaquin Vanschoren
Margaret J. Warren
Adina Williams
Serena Yeung
Newsha Ardalani
Praveen K. Paritosh
Lilith Bat-Leah
Ce Zhang
James Y. Zou
Carole-Jean Wu
Cody Coleman
Andrew Y. Ng
Peter Mattson
Vijay Janapa Reddi
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DataPerf: Benchmarks for Data-Centric AI Development"
22 / 22 papers shown
Title
NeurIPS 2024 Ariel Data Challenge: Characterisation of Exoplanetary Atmospheres Using a Data-Centric Approach
Jeremie Blanchard
Lisa Casino
Jordan Gierschendorf
14
0
0
13 May 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
108
0
10 Apr 2025
Fast Data Aware Neural Architecture Search via Supernet Accelerated Evaluation
Emil Njor
Colby R. Banbury
Xenofon Fafoutis
72
0
0
18 Feb 2025
Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay
Hossein Rezaei
Mohammad Sabokrou
CLL
26
0
0
09 Oct 2024
Advancing Post-OCR Correction: A Comparative Study of Synthetic Data
Shuhao Guan
Derek Greene
34
6
0
05 Aug 2024
CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning
Huaiguang Cai
FedML
TDI
56
1
0
17 Jun 2024
Representation Debiasing of Generated Data Involving Domain Experts
Aditya Bhattacharya
Simone Stumpf
K. Verbert
36
2
0
17 May 2024
Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition
O. Kundacina
V. Vincan
D. Mišković
BDL
101
0
0
03 May 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
Jessica Quaye
Alicia Parrish
Oana Inel
Charvi Rastogi
Hannah Rose Kirk
...
Nathan Clement
Rafael Mosquera
Juan Ciro
Vijay Janapa Reddi
Lora Aroyo
31
7
0
14 Feb 2024
ShadowNet for Data-Centric Quantum System Learning
Yuxuan Du
Yibo Yang
Tongliang Liu
Zhouchen Lin
Bernard Ghanem
Dacheng Tao
34
6
0
22 Aug 2023
A Bag-of-Prototypes Representation for Dataset-Level Applications
Wei-Chih Tu
Weijian Deng
Tom Gedeon
Liang Zheng
38
9
0
23 Mar 2023
Data-centric AI: Perspectives and Challenges
Daochen Zha
Zaid Pervaiz Bhat
Kwei-Herng Lai
Fan Yang
Xia Hu
19
67
0
12 Jan 2023
DMOps: Data Management Operation and Recipes
E. Choi
Chanjun Park
29
7
0
02 Jan 2023
Learning from Training Dynamics: Identifying Mislabeled Data Beyond Manually Designed Features
Qingrui Jia
Xuhong Li
Lei Yu
Jiang Bian
Penghao Zhao
Shupeng Li
Haoyi Xiong
Dejing Dou
NoLa
32
5
0
19 Dec 2022
The Grind for Good Data: Understanding ML Practitioners' Struggles and Aspirations in Making Good Data
Inha Cha
Juhyun Oh
Cheul Young Park
Jiyoon Han
Hwalsuk Lee
29
2
0
28 Nov 2022
Data Quality Over Quantity: Pitfalls and Guidelines for Process Analytics
Lim C. Siang
Shams Elnawawi
L. Rippon
Daniel L. O'Connor
R. Bhushan Gopaluni
8
2
0
11 Nov 2022
Edge Impulse: An MLOps Platform for Tiny Machine Learning
Shawn Hymel
Colby R. Banbury
Daniel Situnayake
A. Elium
Carl Ward
...
Louis Moreau
Dmitry Maslov
A. Beavis
Jan Jongboom
Vijay Janapa Reddi
VLM
LRM
40
95
0
02 Nov 2022
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando
Daniel Paleka
David Lindner
Lennard Heim
Florian Tramèr
DiffM
124
183
0
03 Oct 2022
tf.data: A Machine Learning Data Processing Framework
D. Murray
Jiří Šimša
Ana Klimovic
Ihor Indyk
PINN
AI4CE
LMTD
39
87
0
28 Jan 2021
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
323
4,212
0
23 Aug 2019
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets
Mor Geva
Yoav Goldberg
Jonathan Berant
242
320
0
21 Aug 2019
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
190
576
0
02 May 2018
1