Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.07931
Cited By
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
11 October 2023
A. Maharana
Prateek Yadav
Mohit Bansal
Re-assign community
ArXiv
PDF
HTML
Papers citing
"D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning"
26 / 26 papers shown
Title
When Dynamic Data Selection Meets Data Augmentation
Steve Yang
Peng Ye
Furao Shen
Dongzhan Zhou
26
0
0
02 May 2025
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models
Jaewoo Lee
Keyang Xuan
Chanakya Ekbote
Sandeep Polisetty
Yi Ren Fung
Paul Pu Liang
VLM
37
0
0
14 Apr 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
50
0
0
18 Mar 2025
MUSS: Multilevel Subset Selection for Relevance and Diversity
Vu Nguyen
Andrey Kan
53
0
0
14 Mar 2025
Finding the Muses: Identifying Coresets through Loss Trajectories
M. Nagaraj
Deepak Ravikumar
Efstathia Soufleri
Kaushik Roy
41
0
0
12 Mar 2025
Model-agnostic Coreset Selection via LLM-based Concept Bottlenecks
Akshay Mehra
Trisha Mittal
Subhadra Gopalakrishnan
Joshua Kimball
45
0
0
23 Feb 2025
Annotation Efficiency: Identifying Hard Samples via Blocked Sparse Linear Bandits
Adit Jain
Soumyabrata Pal
Sunav Choudhary
Ramasuri Narayanam
Vikram Krishnamurthy
21
1
0
26 Oct 2024
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Steve Yang
Peng Ye
Wanli Ouyang
Dongzhan Zhou
Furao Shen
29
1
0
15 Oct 2024
Adapt-
∞
\infty
∞
: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
34
1
0
14 Oct 2024
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content
Qiuheng Wang
Yukai Shi
Jiarong Ou
R. J. Chen
Ke Lin
...
Mingwu Zheng
Xin Tao
Fei Yang
Pengfei Wan
Di Zhang
VGen
88
18
0
10 Oct 2024
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
28
0
0
03 Oct 2024
Deep Model Interpretation with Limited Data : A Coreset-based Approach
Hamed Behzadi-Khormouji
José Oramas
SLR
25
0
0
01 Oct 2024
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
Jiawei Du
Xin Zhang
Juncheng Hu
Wenxin Huang
Joey Tianyi Zhou
DD
33
6
0
26 Sep 2024
Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation
Shaobo Wang
Yantai Yang
Qilong Wang
Kaixin Li
Linfeng Zhang
Junchi Yan
DD
51
4
0
22 Aug 2024
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for Optimizing LLM Training
Yingxuan Yang
Huayi Wang
Muning Wen
Weinan Zhang
44
0
0
10 Aug 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
57
5
0
11 Jul 2024
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
43
8
0
16 Jun 2024
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning
Yiping Wang
Yifang Chen
Wendan Yan
Alex Fang
Wenjing Zhou
Kevin G. Jamieson
S. Du
36
7
0
29 May 2024
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman
Stefan Lee
Scott D. Cohen
Kushal Kafle
VLM
41
5
0
24 Apr 2024
LongWanjuan: Towards Systematic Measurement for Long Text Quality
Kai Lv
Xiaoran Liu
Qipeng Guo
Hang Yan
Conghui He
Xipeng Qiu
Dahua Lin
33
4
0
21 Feb 2024
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
Ruibo Chen
Yihan Wu
Lichang Chen
Guodong Liu
Qi He
Tianyi Xiong
Chenxi Liu
Junfeng Guo
Heng-Chiao Huang
VLM
23
17
0
19 Feb 2024
Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning
Yiping Wang
Yifang Chen
Wendan Yan
Kevin G. Jamieson
S. Du
28
5
0
03 Feb 2024
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
77
35
0
21 Nov 2023
Dataset Pruning: Reducing Training Data by Examining Generalization Influence
Shuo Yang
Zeke Xie
Hanyu Peng
Minjing Xu
Mingming Sun
P. Li
DD
152
106
0
19 May 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,989
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
243
4,469
0
23 Jan 2020
1