ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.11829
  4. Cited By
Selection via Proxy: Efficient Data Selection for Deep Learning

Selection via Proxy: Efficient Data Selection for Deep Learning

26 June 2019
Cody Coleman
Christopher Yeh
Stephen Mussmann
Baharan Mirzasoleiman
Peter Bailis
Percy Liang
J. Leskovec
Matei A. Zaharia
ArXivPDFHTML

Papers citing "Selection via Proxy: Efficient Data Selection for Deep Learning"

50 / 51 papers shown
Title
Diversity-Oriented Data Augmentation with Large Language Models
Diversity-Oriented Data Augmentation with Large Language Models
Zaitian Wang
Jinghan Zhang
Xinhao Zhang
Kunpeng Liu
Pengfei Wang
Yuanchun Zhou
80
1
0
17 Feb 2025
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
S. Joshi
Besmira Nushi
Vidhisha Balachandran
Varun Chandrasekaran
Vibhav Vineet
Neel Joshi
Baharan Mirzasoleiman
MLLM
VLM
49
0
0
07 Jan 2025
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
30
0
0
03 Oct 2024
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
S. Joshi
Jiayi Ni
Baharan Mirzasoleiman
DD
72
2
0
03 Oct 2024
CADC: Encoding User-Item Interactions for Compressing Recommendation
  Model Training Data
CADC: Encoding User-Item Interactions for Compressing Recommendation Model Training Data
Hossein Entezari Zarch
Abdulla Alshabanah
Chaoyi Jiang
Murali Annavaram
25
1
0
11 Jul 2024
Diversified Batch Selection for Training Acceleration
Diversified Batch Selection for Training Acceleration
Feng Hong
Yueming Lyu
Jiangchao Yao
Ya Zhang
Ivor W. Tsang
Yanfeng Wang
42
4
0
07 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
61
0
0
03 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
  Reference Models
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
43
24
0
30 May 2024
AnchorAL: Computationally Efficient Active Learning for Large and
  Imbalanced Datasets
AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets
Pietro Lesci
Andreas Vlachos
35
2
0
08 Apr 2024
An Experimental Design Framework for Label-Efficient Supervised
  Finetuning of Large Language Models
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Gantavya Bhatt
Yifang Chen
Arnav M. Das
Jifan Zhang
Sang T. Truong
...
Jeff Bilmes
S. Du
Kevin G. Jamieson
Jordan T. Ash
Robert D. Nowak
42
14
0
12 Jan 2024
Generative Deduplication For Socia Media Data Selection
Generative Deduplication For Socia Media Data Selection
Xianming Li
Jing Li
29
2
0
11 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept
  clusters
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Hansong Zhang
Shikun Li
Pengju Wang
Dan Zeng
Shiming Ge
DD
19
22
0
26 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates
  Large-Scale Visual Understanding
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
18
16
0
08 Dec 2023
REDUCR: Robust Data Downsampling Using Class Priority Reweighting
REDUCR: Robust Data Downsampling Using Class Priority Reweighting
William Bankes
George Hughes
Ilija Bogunovic
Zi Wang
34
3
0
01 Dec 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for
  Enhanced Dataset Pruning
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
37
7
0
22 Nov 2023
Soft Random Sampling: A Theoretical and Empirical Analysis
Soft Random Sampling: A Theoretical and Empirical Analysis
Xiaodong Cui
Ashish R. Mittal
Songtao Lu
Wei Zhang
G. Saon
Brian Kingsbury
48
1
0
21 Nov 2023
Fair Wasserstein Coresets
Fair Wasserstein Coresets
Zikai Xiong
Niccolò Dalmasso
Shubham Sharma
Freddy Lecue
Daniele Magazzeni
Vamsi K. Potluru
T. Balch
Manuela Veloso
34
2
0
09 Nov 2023
Farzi Data: Autoregressive Data Distillation
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
23
3
0
15 Oct 2023
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in
  Data Pruning
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
A. Maharana
Prateek Yadav
Mohit Bansal
27
28
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
31
1
0
10 Oct 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
22
41
0
12 Jul 2023
LabelBench: A Comprehensive Framework for Benchmarking Adaptive
  Label-Efficient Learning
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang
Yifang Chen
Gregory H. Canal
Stephen Mussmann
Arnav M. Das
...
Yinglun Zhu
Jeffrey Bilmes
S. Du
Kevin G. Jamieson
Robert D. Nowak
VLM
33
10
0
16 Jun 2023
Training-Free Neural Active Learning with Initialization-Robustness
  Guarantees
Training-Free Neural Active Learning with Initialization-Robustness Guarantees
Apivich Hemachandra
Zhongxiang Dai
Jasraj Singh
See-Kiong Ng
K. H. Low
AAML
36
6
0
07 Jun 2023
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification
  Tasks
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Jean-Michel Attendu
Jean-Philippe Corbeil
35
15
0
05 Jun 2023
Selective Pre-training for Private Fine-tuning
Selective Pre-training for Private Fine-tuning
Da Yu
Sivakanth Gopi
Janardhan Kulkarni
Zinan Lin
Saurabh Naik
Tomasz Religa
Jian Yin
Huishuai Zhang
38
19
0
23 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
56
178
0
17 May 2023
Data Efficient Contrastive Learning in Histopathology using Active
  Sampling
Data Efficient Contrastive Learning in Histopathology using Active Sampling
Tahsin Reasat
David S. Smith
MedIm
23
0
0
28 Mar 2023
Provable Data Subset Selection For Efficient Neural Network Training
Provable Data Subset Selection For Efficient Neural Network Training
M. Tukan
Samson Zhou
Alaa Maalouf
Daniela Rus
Vladimir Braverman
Dan Feldman
MLT
25
9
0
09 Mar 2023
Finding Support Examples for In-Context Learning
Finding Support Examples for In-Context Learning
Xiaonan Li
Xipeng Qiu
24
88
0
27 Feb 2023
Less is More: Data Pruning for Faster Adversarial Training
Less is More: Data Pruning for Faster Adversarial Training
Yize Li
Pu Zhao
X. Lin
B. Kailkhura
Ryan Goldh
AAML
15
9
0
23 Feb 2023
Gaussian Switch Sampling: A Second Order Approach to Active Learning
Gaussian Switch Sampling: A Second Order Approach to Active Learning
Ryan Benkert
Mohit Prabhushankar
Ghassan Al-Regib
Armin Pacharmi
E. Corona
AAML
26
9
0
16 Feb 2023
A Comprehensive Survey of Dataset Distillation
A Comprehensive Survey of Dataset Distillation
Shiye Lei
Dacheng Tao
DD
31
88
0
13 Jan 2023
Data Distillation: A Survey
Data Distillation: A Survey
Noveen Sachdeva
Julian McAuley
DD
45
73
0
11 Jan 2023
Partitioned Gradient Matching-based Data Subset Selection for
  Compute-Efficient Robust ASR Training
Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training
Ashish R. Mittal
D. Sivasubramanian
Rishabh K. Iyer
P. Jyothi
Ganesh Ramakrishnan
19
3
0
30 Oct 2022
Coresets for Relational Data and The Applications
Coresets for Relational Data and The Applications
Jiaxiang Chen
Qingyuan Yang
Ru Huang
Hu Ding
21
4
0
09 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with
  Latest Weight Averaging
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
DC-BENCH: Dataset Condensation Benchmark
DC-BENCH: Dataset Condensation Benchmark
Justin Cui
Ruochen Wang
Si Si
Cho-Jui Hsieh
DD
40
73
0
20 Jul 2022
Queried Unlabeled Data Improves and Robustifies Class-Incremental
  Learning
Queried Unlabeled Data Improves and Robustifies Class-Incremental Learning
Tianlong Chen
Sijia Liu
Shiyu Chang
Lisa Amini
Zhangyang Wang
CLL
26
4
0
15 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and
  Not Yet Learnt
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
62
149
0
14 Jun 2022
ORCA: Interpreting Prompted Language Models via Locating Supporting Data
  Evidence in the Ocean of Pretraining Data
ORCA: Interpreting Prompted Language Models via Locating Supporting Data Evidence in the Ocean of Pretraining Data
Xiaochuang Han
Yulia Tsvetkov
24
27
0
25 May 2022
Datamodels: Predicting Predictions from Training Data
Datamodels: Predicting Predictions from Training Data
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
A. Madry
TDI
47
131
0
01 Feb 2022
Minority Class Oriented Active Learning for Imbalanced Datasets
Minority Class Oriented Active Learning for Imbalanced Datasets
Umang Aggarwal
Adrian Daniel Popescu
C´eline Hudelot
34
11
0
01 Feb 2022
Optimizing Active Learning for Low Annotation Budgets
Optimizing Active Learning for Low Annotation Budgets
Umang Aggarwal
Adrian Daniel Popescu
C´eline Hudelot
38
1
0
18 Jan 2022
On Sampling Collaborative Filtering Datasets
On Sampling Collaborative Filtering Datasets
Noveen Sachdeva
Carole-Jean Wu
Julian McAuley
21
16
0
13 Jan 2022
A Novel Sequential Coreset Method for Gradient Descent Algorithms
A Novel Sequential Coreset Method for Gradient Descent Algorithms
Jiawei Huang
Ru Huang
Wenjie Liu
N. Freris
Huihua Ding
29
16
0
05 Dec 2021
Data Summarization via Bilevel Optimization
Data Summarization via Bilevel Optimization
Zalan Borsos
Mojmír Mutný
Marco Tagliasacchi
Andreas Krause
30
8
0
26 Sep 2021
Is Simple Uniform Sampling Effective for Center-Based Clustering with
  Outliers: When and Why?
Is Simple Uniform Sampling Effective for Center-Based Clustering with Outliers: When and Why?
Jiawei Huang
Wenjie Liu
Hu Ding
25
1
0
28 Feb 2021
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient
  Deep Model Training
GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training
Krishnateja Killamsetty
D. Sivasubramanian
Ganesh Ramakrishnan
A. De
Rishabh K. Iyer
OOD
94
189
0
27 Feb 2021
Coresets via Bilevel Optimization for Continual Learning and Streaming
Coresets via Bilevel Optimization for Continual Learning and Streaming
Zalan Borsos
Mojmír Mutný
Andreas Krause
CLL
38
226
0
06 Jun 2020
12
Next