ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.08246
  4. Cited By
Characterizing Implicit Bias in Terms of Optimization Geometry
v1v2v3 (latest)

Characterizing Implicit Bias in Terms of Optimization Geometry

22 February 2018
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
    AI4CE
ArXiv (abs)PDFHTML

Papers citing "Characterizing Implicit Bias in Terms of Optimization Geometry"

50 / 290 papers shown
Title
Mechanics of Next Token Prediction with Self-Attention
Mechanics of Next Token Prediction with Self-Attention
Yingcong Li
Yixiao Huang
M. E. Ildiz
A. S. Rawat
Samet Oymak
66
31
0
12 Mar 2024
Last Iterate Convergence of Incremental Methods and Applications in
  Continual Learning
Last Iterate Convergence of Incremental Methods and Applications in Continual Learning
Xu Cai
Jelena Diakonikolas
86
6
0
11 Mar 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal
  Linear Networks
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks
Hristo Papazov
Scott Pesme
Nicolas Flammarion
79
7
0
08 Mar 2024
Failures and Successes of Cross-Validation for Early-Stopped Gradient
  Descent
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent
Pratik Patil
Yuchen Wu
Robert Tibshirani
129
5
0
26 Feb 2024
One-Bit Quantization and Sparsification for Multiclass Linear
  Classification via Regularized Regression
One-Bit Quantization and Sparsification for Multiclass Linear Classification via Regularized Regression
Reza Ghane
D. Akhtiamov
Babak Hassibi
51
1
0
16 Feb 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning
Yuxiao Wen
Arthur Jacot
141
7
0
12 Feb 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training
  Dynamics, and Generative Models
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
Namjoon Suh
Guang Cheng
MedIm
109
14
0
14 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
92
38
0
30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
162
2
0
29 Nov 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm
  Rescaling
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Mingze Wang
Zeping Min
Lei Wu
86
3
0
24 Nov 2023
Acceleration and Implicit Regularization in Gaussian Phase Retrieval
Acceleration and Implicit Regularization in Gaussian Phase Retrieval
Tyler Maunu
M. Molina-Fructuoso
82
0
0
21 Nov 2023
A Challenge in Reweighting Data with Bilevel Optimization
A Challenge in Reweighting Data with Bilevel Optimization
Anastasia Ivanova
Pierre Ablin
115
1
0
26 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
Stochastic Gradient Descent outperforms Gradient Descent in recovering a
  high-dimensional signal in a glassy energy landscape
Stochastic Gradient Descent outperforms Gradient Descent in recovering a high-dimensional signal in a glassy energy landscape
Persia Jana Kamali
Pierfrancesco Urbani
75
6
0
09 Sep 2023
On the Implicit Bias of Adam
On the Implicit Bias of Adam
M. D. Cattaneo
Jason M. Klusowski
Boris Shigida
82
18
0
31 Aug 2023
Transformers as Support Vector Machines
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
133
49
0
31 Aug 2023
Six Lectures on Linearized Neural Networks
Six Lectures on Linearized Neural Networks
Theodor Misiakiewicz
Andrea Montanari
137
13
0
25 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity,
  Sharpness, and Feature Learning
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Ting Yu
MLT
61
3
0
06 Aug 2023
Learning fixed points of recurrent neural networks by reparameterizing
  the network model
Learning fixed points of recurrent neural networks by reparameterizing the network model
Vicky Zhu
Robert Rosenbaum
59
2
0
13 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient
  Flows
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows
Sibylle Marcotte
Rémi Gribonval
Gabriel Peyré
117
19
0
30 Jun 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
  Networks
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks
Mor Shpigel Nacson
Rotem Mulayoff
Greg Ongie
T. Michaeli
Daniel Soudry
84
13
0
30 Jun 2023
A Unified Approach to Controlling Implicit Regularization via Mirror
  Descent
A Unified Approach to Controlling Implicit Regularization via Mirror Descent
Haoyuan Sun
Khashayar Gatmiry
Kwangjun Ahn
Navid Azizan
AI4CE
74
13
0
24 Jun 2023
Max-Margin Token Selection in Attention Mechanism
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
111
45
0
23 Jun 2023
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
David X. Wu
A. Sahai
105
3
0
23 Jun 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer
  Linear Convolutional Neural Networks
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Yuan Cao
Difan Zou
Yuan-Fang Li
Quanquan Gu
MLT
102
5
0
20 Jun 2023
InRank: Incremental Low-Rank Learning
InRank: Incremental Low-Rank Learning
Jiawei Zhao
Yifei Zhang
Beidi Chen
F. Schafer
Anima Anandkumar
71
7
0
20 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from
  Expansion and Shrinkage
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage
Yu Gui
Cong Ma
Yiqiao Zhong
65
8
0
06 Jun 2023
Synaptic Weight Distributions Depend on the Geometry of Plasticity
Synaptic Weight Distributions Depend on the Geometry of Plasticity
Roman Pogodin
Jonathan H. Cornford
Arna Ghosh
Gauthier Gidel
Guillaume Lajoie
Blake A. Richards
63
5
0
30 May 2023
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity
  Tradeoff
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff
Arthur Jacot
MLT
120
14
0
30 May 2023
Faster Margin Maximization Rates for Generic and Adversarially Robust
  Optimization Methods
Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods
Guanghui Wang
Zihao Hu
Claudio Gentile
Vidya Muthukumar
Jacob D. Abernethy
101
0
0
27 May 2023
Representation Transfer Learning via Multiple Pre-trained models for
  Linear Regression
Representation Transfer Learning via Multiple Pre-trained models for Linear Regression
Navjot Singh
Suhas Diggavi
90
1
0
25 May 2023
Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps
  from high to low rank
Implicit bias of SGD in L2L_{2}L2​-regularized linear DNNs: One-way jumps from high to low rank
Zihan Wang
Arthur Jacot
93
21
0
25 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of
  Stability
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
65
21
0
19 May 2023
Exploring the Complexity of Deep Neural Networks through Functional
  Equivalence
Exploring the Complexity of Deep Neural Networks through Functional Equivalence
Guohao Shen
103
4
0
19 May 2023
Deep ReLU Networks Have Surprisingly Simple Polytopes
Deep ReLU Networks Have Surprisingly Simple Polytopes
Fenglei Fan
Wei Huang
Xiang-yu Zhong
Lecheng Ruan
T. Zeng
Huan Xiong
Fei Wang
112
5
0
16 May 2023
Robust Implicit Regularization via Weight Normalization
Robust Implicit Regularization via Weight Normalization
H. Chou
Holger Rauhut
Rachel A. Ward
88
7
0
09 May 2023
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions
Kuo-Wei Lai
Vidya Muthukumar
63
5
0
13 Mar 2023
On the Implicit Bias of Linear Equivariant Steerable Networks
On the Implicit Bias of Linear Equivariant Steerable Networks
Ziyu Chen
Wei-wei Zhu
89
3
0
07 Mar 2023
Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from
  KKT Conditions for Margin Maximization
Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization
Spencer Frei
Gal Vardi
Peter L. Bartlett
Nathan Srebro
86
23
0
02 Mar 2023
High-dimensional analysis of double descent for linear regression with
  random projections
High-dimensional analysis of double descent for linear regression with random projections
Francis R. Bach
93
36
0
02 Mar 2023
On the Training Instability of Shuffling SGD with Batch Normalization
On the Training Instability of Shuffling SGD with Batch Normalization
David Wu
Chulhee Yun
S. Sra
83
5
0
24 Feb 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
68
7
0
19 Feb 2023
The Generalization Error of Stochastic Mirror Descent on
  Over-Parametrized Linear Models
The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models
D. Akhtiamov
B. Hassibi
47
0
0
18 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large
  Stepsizes and Edge of Stability
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
Mathieu Even
Scott Pesme
Suriya Gunasekar
Nicolas Flammarion
83
18
0
17 Feb 2023
Theory on Forgetting and Generalization of Continual Learning
Theory on Forgetting and Generalization of Continual Learning
Sen Lin
Peizhong Ju
Yitao Liang
Ness B. Shroff
CLL
88
47
0
12 Feb 2023
Sketched Ridgeless Linear Regression: The Role of Downsampling
Sketched Ridgeless Linear Regression: The Role of Downsampling
Xin Chen
Yicheng Zeng
Siyue Yang
Qiang Sun
26
7
0
02 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear
  Regression
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
Mo Zhou
Rong Ge
121
2
0
01 Feb 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Generalization on the Unseen, Logic Reasoning and Degree Curriculum
Emmanuel Abbe
Samy Bengio
Aryo Lotfi
Kevin Rizk
LRM
97
55
0
30 Jan 2023
Implicit Regularization for Group Sparsity
Implicit Regularization for Group Sparsity
Jiangyuan Li
THANH VAN NGUYEN
Chinmay Hegde
Raymond K. W. Wong
96
9
0
29 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained
  Analysis of Matrix Sensing
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin
Zhiyuan Li
Kaifeng Lyu
S. Du
Jason D. Lee
MLT
113
37
0
27 Jan 2023
Previous
123456
Next