Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.11829
Cited By
Selection via Proxy: Efficient Data Selection for Deep Learning
26 June 2019
Cody Coleman
Christopher Yeh
Stephen Mussmann
Baharan Mirzasoleiman
Peter Bailis
Percy Liang
J. Leskovec
Matei A. Zaharia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Selection via Proxy: Efficient Data Selection for Deep Learning"
46 / 46 papers shown
Title
Diversity-Oriented Data Augmentation with Large Language Models
Zaitian Wang
Jinghan Zhang
Xinhao Zhang
Kunpeng Liu
Pengfei Wang
Yuanchun Zhou
80
1
0
17 Feb 2025
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
S. Joshi
Jiayi Ni
Baharan Mirzasoleiman
DD
72
2
0
03 Oct 2024
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
30
0
0
03 Oct 2024
CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data
Hossein Entezari Zarch
Abdulla Alshabanah
Chaoyi Jiang
Murali Annavaram
25
1
0
11 Jul 2024
Diversified Batch Selection for Training Acceleration
Feng Hong
Yueming Lyu
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Yanfeng Wang
42
4
0
07 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
61
0
0
03 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
43
24
0
30 May 2024
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Pietro Lesci
Andreas Vlachos
35
2
0
08 Apr 2024
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Gantavya Bhatt
Yifang Chen
Arnav M. Das
Jifan Zhang
Sang T. Truong
...
Jeff Bilmes
S. Du
Kevin G. Jamieson
Jordan T. Ash
Robert D. Nowak
42
14
0
12 Jan 2024
Generative Deduplication For Socia Media Data Selection
Xianming Li
Jing Li
29
2
0
11 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Hansong Zhang
Shikun Li
Pengju Wang
Dan Zeng
Shiming Ge
DD
19
21
0
26 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
18
16
0
08 Dec 2023
REDUCR: Robust Data Downsampling Using Class Priority Reweighting
William Bankes
George Hughes
Ilija Bogunovic
Zi Wang
34
3
0
01 Dec 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
37
7
0
22 Nov 2023
Soft Random Sampling: A Theoretical and Empirical Analysis
Xiaodong Cui
Ashish R. Mittal
Songtao Lu
Wei Zhang
G. Saon
Brian Kingsbury
48
1
0
21 Nov 2023
Fair Wasserstein Coresets
Zikai Xiong
Niccolò Dalmasso
Shubham Sharma
Freddy Lecue
Daniele Magazzeni
Vamsi K. Potluru
T. Balch
Manuela Veloso
34
2
0
09 Nov 2023
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
23
3
0
15 Oct 2023
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
A. Maharana
Prateek Yadav
Mohit Bansal
27
28
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
31
1
0
10 Oct 2023
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang
Yifang Chen
Gregory H. Canal
Stephen Mussmann
Arnav M. Das
...
Yinglun Zhu
Jeffrey Bilmes
S. Du
Kevin G. Jamieson
Robert D. Nowak
VLM
33
10
0
16 Jun 2023
Training-Free Neural Active Learning with Initialization-Robustness Guarantees
Apivich Hemachandra
Zhongxiang Dai
Jasraj Singh
See-Kiong Ng
K. H. Low
AAML
36
6
0
07 Jun 2023
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Jean-Michel Attendu
Jean-Philippe Corbeil
33
15
0
05 Jun 2023
Selective Pre-training for Private Fine-tuning
Da Yu
Sivakanth Gopi
Janardhan Kulkarni
Zinan Lin
Saurabh Naik
Tomasz Religa
Jian Yin
Huishuai Zhang
38
19
0
23 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
56
177
0
17 May 2023
Data Efficient Contrastive Learning in Histopathology using Active Sampling
Tahsin Reasat
David S. Smith
MedIm
23
0
0
28 Mar 2023
Provable Data Subset Selection For Efficient Neural Network Training
M. Tukan
Samson Zhou
Alaa Maalouf
Daniela Rus
Vladimir Braverman
Dan Feldman
MLT
25
9
0
09 Mar 2023
Less is More: Data Pruning for Faster Adversarial Training
Yize Li
Pu Zhao
X. Lin
B. Kailkhura
Ryan Goldh
AAML
15
9
0
23 Feb 2023
Gaussian Switch Sampling: A Second Order Approach to Active Learning
Ryan Benkert
Mohit Prabhushankar
Ghassan Al-Regib
Armin Pacharmi
E. Corona
AAML
23
9
0
16 Feb 2023
A Comprehensive Survey of Dataset Distillation
Shiye Lei
Dacheng Tao
DD
31
87
0
13 Jan 2023
Data Distillation: A Survey
Noveen Sachdeva
Julian McAuley
DD
45
73
0
11 Jan 2023
Coresets for Relational Data and The Applications
Jiaxiang Chen
Qingyuan Yang
Ru Huang
Hu Ding
21
4
0
09 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
DC-BENCH: Dataset Condensation Benchmark
Justin Cui
Ruochen Wang
Si Si
Cho-Jui Hsieh
DD
40
72
0
20 Jul 2022
Queried Unlabeled Data Improves and Robustifies Class-Incremental Learning
Tianlong Chen
Sijia Liu
Shiyu Chang
Lisa Amini
Zhangyang Wang
CLL
26
4
0
15 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
60
148
0
14 Jun 2022
ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data
Xiaochuang Han
Yulia Tsvetkov
24
27
0
25 May 2022
Datamodels: Predicting Predictions from Training Data
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
A. Madry
TDI
47
131
0
01 Feb 2022
Optimizing Active Learning for Low Annotation Budgets
Umang Aggarwal
Adrian Daniel Popescu
C´eline Hudelot
35
1
0
18 Jan 2022
A Novel Sequential Coreset Method for Gradient Descent Algorithms
Jiawei Huang
Ru Huang
Wenjie Liu
N. Freris
Huihua Ding
29
16
0
05 Dec 2021
Data Summarization via Bilevel Optimization
Zalan Borsos
Mojmír Mutný
Marco Tagliasacchi
Andreas Krause
30
8
0
26 Sep 2021
Is Simple Uniform Sampling Effective for Center-Based Clustering with Outliers: When and Why?
Jiawei Huang
Wenjie Liu
Hu Ding
25
1
0
28 Feb 2021
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty
D. Sivasubramanian
Ganesh Ramakrishnan
A. De
Rishabh K. Iyer
OOD
91
188
0
27 Feb 2021
Coresets via Bilevel Optimization for Continual Learning and Streaming
Zalan Borsos
Mojmír Mutný
Andreas Krause
CLL
38
226
0
06 Jun 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
285
9,138
0
06 Jun 2015
1