ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02244
  4. Cited By
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks

Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks

3 October 2023
Greg Yang
Dingli Yu
Chen Zhu
Soufiane Hayou
    MLT
ArXivPDFHTML

Papers citing "Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks"

26 / 26 papers shown
Title
Deep Neural Nets as Hamiltonians
Deep Neural Nets as Hamiltonians
Mike Winer
Boris Hanin
151
0
0
31 Mar 2025
Global Convergence and Rich Feature Learning in LLL-Layer Infinite-Width Neural Networks under μμμP Parametrization
Zixiang Chen
Greg Yang
Qingyue Zhao
Q. Gu
MLT
55
0
0
12 Mar 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
67
1
0
24 Feb 2025
Feature Learning Beyond the Edge of Stability
Feature Learning Beyond the Edge of Stability
Dávid Terjék
MLT
46
0
0
18 Feb 2025
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer
Blake Bordelon
Cengiz Pehlevan
AI4CE
64
1
0
04 Feb 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
41
3
0
10 Jan 2025
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
  Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
70
4
0
18 Dec 2024
On the Ability of Deep Networks to Learn Symmetries from Data: A Neural
  Kernel Theory
On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
Andrea Perin
Stéphane Deny
93
1
0
16 Dec 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
80
8
0
29 Oct 2024
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense
  and MoE Models in Large Language Models
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
Siqi Wang
Zhengyu Chen
Bei Li
Keqing He
Min Zhang
Jingang Wang
36
2
0
08 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
Cengiz Pehlevan
43
2
0
06 Oct 2024
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization
Xinhao Yao
Hongjin Qian
Xiaolin Hu
Gengze Xu
Wei Liu
Jian Luan
Bin Wang
Yong-Jin Liu
48
0
0
03 Oct 2024
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate
  Scheduler
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Songlin Yang
Matthew Stallone
Mayank Mishra
Gaoyuan Zhang
Shawn Tan
Aditya Prasad
Adriana Meza Soria
David D. Cox
Yikang Shen
39
11
0
23 Aug 2024
A Mean Field Ansatz for Zero-Shot Weight Transfer
A Mean Field Ansatz for Zero-Shot Weight Transfer
Xingyuan Chen
Wenwei Kuang
Lei Deng
Wei Han
Bo Bai
Goncalo dos Reis
39
1
0
16 Aug 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
58
9
0
24 Jul 2024
The Impact of Initialization on LoRA Finetuning Dynamics
The Impact of Initialization on LoRA Finetuning Dynamics
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
36
11
0
12 Jun 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
40
11
0
10 Jun 2024
$μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
μμμLO: Compute-Efficient Meta-Generalization of Learned Optimizers
Benjamin Thérien
Charles-Étienne Joseph
Boris Knyazev
Edouard Oyallon
Irina Rish
Eugene Belilovsky
AI4CE
40
1
0
31 May 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
39
3
0
29 May 2024
Scalable Optimization in the Modular Norm
Scalable Optimization in the Modular Norm
Tim Large
Yang Liu
Minyoung Huh
Hyojin Bahng
Phillip Isola
Jeremy Bernstein
47
11
0
23 May 2024
Deep linear networks for regression are implicitly regularized towards
  flat minima
Deep linear networks for regression are implicitly regularized towards flat minima
Pierre Marion
Lénaic Chizat
ODL
34
5
0
22 May 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
51
283
0
09 Apr 2024
The Unreasonable Ineffectiveness of the Deeper Layers
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
52
81
0
26 Mar 2024
LoRA+: Efficient Low Rank Adaptation of Large Models
LoRA+: Efficient Low Rank Adaptation of Large Models
Soufiane Hayou
Nikhil Ghosh
Bin Yu
AI4CE
37
141
0
19 Feb 2024
Stable ResNet
Stable ResNet
Soufiane Hayou
Eugenio Clerico
Bo He
George Deligiannidis
Arnaud Doucet
Judith Rousseau
ODL
SSeg
46
51
0
24 Oct 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1