ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.03800
  4. Cited By
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck

7 September 2023
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
ArXivPDFHTML

Papers citing "Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck"

22 / 22 papers shown
Title
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense
  Subnetworks
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks
William Merrill
Nikolaos Tsilivis
Aman Shukla
50
52
0
21 Mar 2023
Learning Single-Index Models with Shallow Neural Networks
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
184
71
0
27 Oct 2022
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
85
82
0
03 Oct 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
93
132
0
18 Jul 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
85
127
0
03 May 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
197
1,946
0
29 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
85
162
0
07 Mar 2022
Random Feature Amplification: Feature Learning and Generalization in
  Neural Networks
Random Feature Amplification: Feature Learning and Generalization in Neural Networks
Spencer Frei
Niladri S. Chatterji
Peter L. Bartlett
MLT
60
29
0
15 Feb 2022
Deep Neural Networks and Tabular Data: A Survey
Deep Neural Networks and Tabular Data: A Survey
V. Borisov
Tobias Leemann
Kathrin Seßler
Johannes Haug
Martin Pawelczyk
Gjergji Kasneci
LMTD
107
685
0
05 Oct 2021
Revisiting Deep Learning Models for Tabular Data
Revisiting Deep Learning Models for Tabular Data
Yu. V. Gorishniy
Ivan Rubachev
Valentin Khrulkov
Artem Babenko
LMTD
112
749
0
22 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
415
2,673
0
04 May 2021
Quantifying the Benefit of Using Differentiable Learning over Tangent
  Kernels
Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels
Eran Malach
Pritish Kamath
Emmanuel Abbe
Nathan Srebro
54
39
0
01 Mar 2021
Explaining Neural Scaling Laws
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
62
261
0
12 Feb 2021
Learning Curve Theory
Learning Curve Theory
Marcus Hutter
200
62
0
08 Feb 2021
Learning Parities with Neural Networks
Learning Parities with Neural Networks
Amit Daniely
Eran Malach
58
78
0
18 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
602
4,801
0
23 Jan 2020
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
68
271
0
16 Jun 2019
Regularization Matters: Generalization and Optimization of Neural Nets
  v.s. their Induced Kernel
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
Jason D. Lee
Qiang Liu
Tengyu Ma
193
244
0
12 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
214
1,272
0
04 Oct 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
228
3,463
0
09 Mar 2018
Failures of Gradient-Based Deep Learning
Failures of Gradient-Based Deep Learning
Shai Shalev-Shwartz
Ohad Shamir
Shaked Shammah
ODL
UQCV
88
201
0
23 Mar 2017
Stochastic Gradient Descent for Non-smooth Optimization: Convergence
  Results and Optimal Averaging Schemes
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
148
574
0
08 Dec 2012
1