ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.14486
  4. Cited By
Beyond neural scaling laws: beating power law scaling via data pruning

Beyond neural scaling laws: beating power law scaling via data pruning

29 June 2022
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
ArXivPDFHTML

Papers citing "Beyond neural scaling laws: beating power law scaling via data pruning"

50 / 77 papers shown
Title
RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection
RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection
Yixin Yang
Qingxiu Dong
Linli Yao
Fangwei Zhu
Zhifang Sui
48
0
0
08 May 2025
When Dynamic Data Selection Meets Data Augmentation
When Dynamic Data Selection Meets Data Augmentation
Steve Yang
Peng Ye
Furao Shen
Dongzhan Zhou
29
0
0
02 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
A Model Zoo on Phase Transitions in Neural Networks
A Model Zoo on Phase Transitions in Neural Networks
Konstantin Schurholt
Léo Meynent
Yefan Zhou
Haiquan Lu
Yaoqing Yang
Damian Borth
68
0
0
25 Apr 2025
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine
AdaParse: An Adaptive Parallel PDF Parsing and Resource Scaling Engine
Carlo Siebenschuh
Kyle Hippe
Ozan Gokdemir
Alexander Brace
A. Khan
...
V. Vishwanath
R. Stevens
Arvind Ramanathan
Ian Foster
Robert Underwood
MoE
49
0
0
23 Apr 2025
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation
Thomas Kerdreux
A. Tuel
Quentin Febvre
A. Mouche
Bertrand Chapron
75
0
0
09 Apr 2025
Geometric Median Matching for Robust k-Subset Selection from Noisy Data
Geometric Median Matching for Robust k-Subset Selection from Noisy Data
Anish Acharya
Sujay Sanghavi
Alexandros G. Dimakis
Inderjit S Dhillon
AAML
62
0
0
01 Apr 2025
Severing Spurious Correlations with Data Pruning
Severing Spurious Correlations with Data Pruning
Varun Mulchandani
Jung-Eun Kim
150
0
0
24 Mar 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
75
3
0
24 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
171
0
0
21 Feb 2025
Physics of Skill Learning
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
46
1
0
21 Jan 2025
Geometric Median (GM) Matching for Robust Data Pruning
Geometric Median (GM) Matching for Robust Data Pruning
Anish Acharya
Inderjit S Dhillon
Sujay Sanghavi
AAML
59
0
0
20 Jan 2025
FED: Fast and Efficient Dataset Deduplication Framework with GPU Acceleration
FED: Fast and Efficient Dataset Deduplication Framework with GPU Acceleration
Youngjun Son
Chaewon Kim
Jaejin Lee
50
0
0
02 Jan 2025
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
93
12
0
31 Dec 2024
Demystifying CLIP Data
Demystifying CLIP Data
Hu Xu
Saining Xie
Xiaoqing Ellen Tan
Po-Yao (Bernie) Huang
Russell Howes
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
CLIP
47
108
0
31 Dec 2024
Data Pruning Can Do More: A Comprehensive Data Pruning Approach for
  Object Re-identification
Data Pruning Can Do More: A Comprehensive Data Pruning Approach for Object Re-identification
Zi Yang
Haojin Yang
Soumajit Majumder
Jorge M. Cardoso
Guillermo Gallego
MoMe
VLM
106
1
0
13 Dec 2024
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
155
0
0
01 Dec 2024
DELIFT: Data Efficient Language model Instruction Fine Tuning
DELIFT: Data Efficient Language model Instruction Fine Tuning
Ishika Agarwal
Krishnateja Killamsetty
Lucian Popa
Marina Danilevksy
ALM
VLM
58
3
0
07 Nov 2024
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. E. Ildiz
Halil Alperen Gozeten
Ege Onur Taga
Marco Mondelli
Samet Oymak
56
2
0
24 Oct 2024
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment
Elyas Obbad
Iddah Mlauzi
Brando Miranda
Rylan Schaeffer
Kamal Obbad
Suhana Bedi
Sanmi Koyejo
CVBM
53
0
0
23 Oct 2024
Data Selection via Optimal Control for Language Models
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
55
5
0
09 Oct 2024
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
30
0
0
03 Oct 2024
Targeted synthetic data generation for tabular data via hardness characterization
Targeted synthetic data generation for tabular data via hardness characterization
Tommaso Ferracci
Leonie Goldmann
Anton Hinel
Francesco Sanna Passino
135
0
0
01 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
57
12
0
26 Sep 2024
Unsupervised Domain Adaptation Via Data Pruning
Unsupervised Domain Adaptation Via Data Pruning
Andrea Napoli
Paul White
36
1
0
18 Sep 2024
Breaking Neural Network Scaling Laws with Modularity
Breaking Neural Network Scaling Laws with Modularity
Akhilan Boopathy
Sunshine Jiang
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
OOD
41
2
0
09 Sep 2024
Wicked Oddities: Selectively Poisoning for Effective Clean-Label
  Backdoor Attacks
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Quang H. Nguyen
Nguyen Ngoc-Hieu
The-Anh Ta
Thanh Nguyen-Tang
Kok-Seng Wong
Hoang Thanh-Tung
Khoa D. Doan
AAML
33
2
0
15 Jul 2024
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth
Alvin Deng
Kyle O'Brien
Jyothir S V
Mohammad Aflah Khan
...
Jacob Ray Fuehne
Stella Biderman
Tracy Ke
Katherine Lee
Naomi Saphra
63
12
0
25 Jun 2024
Concept-skill Transferability-based Data Selection for Large
  Vision-Language Models
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
43
8
0
16 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
61
0
0
03 Jun 2024
Critical Learning Periods: Leveraging Early Training Dynamics for
  Efficient Data Pruning
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
42
4
0
29 May 2024
SelMatch: Effectively Scaling Up Dataset Distillation via
  Selection-Based Initialization and Partial Updates by Trajectory Matching
SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching
Yongmin Lee
Hye Won Chung
31
6
0
28 May 2024
Smaller Language Models are capable of selecting Instruction-Tuning
  Training Data for Larger Language Models
Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models
Dheeraj Mekala
Alex Nguyen
Jingbo Shang
ALM
30
18
0
16 Feb 2024
Effective pruning of web-scale datasets based on complexity of concept
  clusters
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
240
69
0
31 Dec 2023
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Hansong Zhang
Shikun Li
Pengju Wang
Dan Zeng
Shiming Ge
DD
19
21
0
26 Dec 2023
Revisiting Knowledge Distillation under Distribution Shift
Revisiting Knowledge Distillation under Distribution Shift
Songming Zhang
Ziyu Lyu
Xiaofeng Chen
32
1
0
25 Dec 2023
Mixed Distillation Helps Smaller Language Model Better Reasoning
Mixed Distillation Helps Smaller Language Model Better Reasoning
Chenglin Li
Qianglong Chen
Liangyue Li
Wang Caiyu
Yicheng Li
Zhang Yin
Yin Zhang
LRM
41
11
0
17 Dec 2023
TinyGSM: achieving >80% on GSM8k with small language models
TinyGSM: achieving >80% on GSM8k with small language models
Bingbin Liu
Sébastien Bubeck
Ronen Eldan
Janardhan Kulkarni
Yuanzhi Li
Anh Nguyen
Rachel A. Ward
Yi Zhang
ALM
27
47
0
14 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates
  Large-Scale Visual Understanding
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
18
16
0
08 Dec 2023
REDUCR: Robust Data Downsampling Using Class Priority Reweighting
REDUCR: Robust Data Downsampling Using Class Priority Reweighting
William Bankes
George Hughes
Ilija Bogunovic
Zi Wang
34
3
0
01 Dec 2023
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for
  Enhanced Dataset Pruning
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Xin Zhang
Jiawei Du
Yunsong Li
Weiying Xie
Qiufeng Wang
37
7
0
22 Nov 2023
The Universal Statistical Structure and Scaling Laws of Chaos and
  Turbulence
The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence
Noam Levi
Yaron Oz
AI4CE
32
1
0
02 Nov 2023
DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via
  Unsupervised Core-Set Selection
DEFT: Data Efficient Fine-Tuning for Pre-Trained Language Models via Unsupervised Core-Set Selection
Devleena Das
Vivek Khetan
26
0
0
25 Oct 2023
Farzi Data: Autoregressive Data Distillation
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
23
3
0
15 Oct 2023
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in
  Data Pruning
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
A. Maharana
Prateek Yadav
Mohit Bansal
27
28
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
31
1
0
10 Oct 2023
Towards a statistical theory of data selection under weak supervision
Towards a statistical theory of data selection under weak supervision
Germain Kolossov
Andrea Montanari
Pulkit Tandon
21
14
0
25 Sep 2023
D3: Data Diversity Design for Systematic Generalization in Visual
  Question Answering
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
36
1
0
15 Sep 2023
Uncovering Neural Scaling Laws in Molecular Representation Learning
Uncovering Neural Scaling Laws in Molecular Representation Learning
Dingshuo Chen
Yanqiao Zhu
Jieyu Zhang
Yuanqi Du
Zhixun Li
Qiang Liu
Shu Wu
Liang Wang
32
16
0
15 Sep 2023
12
Next