v1v2v3 (latest)

Characterizing Implicit Bias in Terms of Optimization Geometry

22 February 2018

Papers citing "Characterizing Implicit Bias in Terms of Optimization Geometry"

50 / 290 papers shown

Title
Mechanics of Next Token Prediction with Self-Attention Yingcong Li Yixiao Huang M. E. Ildiz A. S. Rawat Samet Oymak 66 31 0 12 Mar 2024
Last Iterate Convergence of Incremental Methods and Applications in Continual Learning Xu Cai Jelena Diakonikolas 86 6 0 11 Mar 2024
Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks Hristo Papazov Scott Pesme Nicolas Flammarion 79 7 0 08 Mar 2024
Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent Pratik Patil Yuchen Wu Robert Tibshirani 129 5 0 26 Feb 2024
One-Bit Quantization and Sparsification for Multiclass Linear Classification via Regularized Regression Reza Ghane D. Akhtiamov Babak Hassibi 51 1 0 16 Feb 2024
Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning Yuxiao Wen Arthur Jacot 141 7 0 12 Feb 2024
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models Namjoon Suh Guang Cheng MedIm 109 14 0 14 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 92 38 0 30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 162 2 0 29 Nov 2023
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling Mingze Wang Zeping Min Lei Wu 86 3 0 24 Nov 2023
Acceleration and Implicit Regularization in Gaussian Phase Retrieval Tyler Maunu M. Molina-Fructuoso 82 0 0 21 Nov 2023
A Challenge in Reweighting Data with Bilevel Optimization Anastasia Ivanova Pierre Ablin 115 1 0 26 Oct 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time Yeqi Gao Zhao Song Weixin Wang Junze Yin 114 29 0 14 Sep 2023
Stochastic Gradient Descent outperforms Gradient Descent in recovering a high-dimensional signal in a glassy energy landscape Persia Jana Kamali Pierfrancesco Urbani 75 6 0 09 Sep 2023
On the Implicit Bias of Adam M. D. Cattaneo Jason M. Klusowski Boris Shigida 82 18 0 31 Aug 2023
Transformers as Support Vector Machines Davoud Ataee Tarzanagh Yingcong Li Christos Thrampoulidis Samet Oymak 133 49 0 31 Aug 2023
Six Lectures on Linearized Neural Networks Theodor Misiakiewicz Andrea Montanari 137 13 0 25 Aug 2023
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning Nikhil Ghosh Spencer Frei Wooseok Ha Ting Yu MLT 61 3 0 06 Aug 2023
Learning fixed points of recurrent neural networks by reparameterizing the network model Vicky Zhu Robert Rosenbaum 59 2 0 13 Jul 2023
Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows Sibylle Marcotte Rémi Gribonval Gabriel Peyré 117 19 0 30 Jun 2023
The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks Mor Shpigel Nacson Rotem Mulayoff Greg Ongie T. Michaeli Daniel Soudry 84 13 0 30 Jun 2023
A Unified Approach to Controlling Implicit Regularization via Mirror Descent Haoyuan Sun Khashayar Gatmiry Kwangjun Ahn Navid Azizan AI4CE 74 13 0 24 Jun 2023
Max-Margin Token Selection in Attention Mechanism Davoud Ataee Tarzanagh Yingcong Li Xuechen Zhang Samet Oymak 111 45 0 23 Jun 2023
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models David X. Wu A. Sahai 105 3 0 23 Jun 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks Yuan Cao Difan Zou Yuan-Fang Li Quanquan Gu MLT 102 5 0 20 Jun 2023
InRank: Incremental Low-Rank Learning Jiawei Zhao Yifei Zhang Beidi Chen F. Schafer Anima Anandkumar 71 7 0 20 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage Yu Gui Cong Ma Yiqiao Zhong 65 8 0 06 Jun 2023
Synaptic Weight Distributions Depend on the Geometry of Plasticity Roman Pogodin Jonathan H. Cornford Arna Ghosh Gauthier Gidel Guillaume Lajoie Blake A. Richards 63 5 0 30 May 2023
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff Arthur Jacot MLT 120 14 0 30 May 2023
Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods Guanghui Wang Zihao Hu Claudio Gentile Vidya Muthukumar Jacob D. Abernethy 101 0 0 27 May 2023
Representation Transfer Learning via Multiple Pre-trained models for Linear Regression Navjot Singh Suhas Diggavi 90 1 0 25 May 2023
$Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank$ Implicit bias of SGD in $L_{2}$ -regularized linear DNNs: One-way jumps from high to low rank Zihan Wang Arthur Jacot 93 21 0 25 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 65 21 0 19 May 2023
Exploring the Complexity of Deep Neural Networks through Functional Equivalence Guohao Shen 103 4 0 19 May 2023
Deep ReLU Networks Have Surprisingly Simple Polytopes Fenglei Fan Wei Huang Xiang-yu Zhong Lecheng Ruan T. Zeng Huan Xiong Fei Wang 112 5 0 16 May 2023
Robust Implicit Regularization via Weight Normalization H. Chou Holger Rauhut Rachel A. Ward 88 7 0 09 May 2023
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Kuo-Wei Lai Vidya Muthukumar 63 5 0 13 Mar 2023
On the Implicit Bias of Linear Equivariant Steerable Networks Ziyu Chen Wei-wei Zhu 89 3 0 07 Mar 2023
Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization Spencer Frei Gal Vardi Peter L. Bartlett Nathan Srebro 86 23 0 02 Mar 2023
High-dimensional analysis of double descent for linear regression with random projections Francis R. Bach 93 36 0 02 Mar 2023
On the Training Instability of Shuffling SGD with Batch Normalization David Wu Chulhee Yun S. Sra 83 5 0 24 Feb 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization Kayhan Behdin Qingquan Song Aman Gupta S. Keerthi Ayan Acharya Borja Ocejo Gregory Dexter Rajiv Khanna D. Durfee Rahul Mazumder AAML 68 7 0 19 Feb 2023
The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models D. Akhtiamov B. Hassibi 47 0 0 18 Feb 2023
(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability Mathieu Even Scott Pesme Suriya Gunasekar Nicolas Flammarion 83 18 0 17 Feb 2023
Theory on Forgetting and Generalization of Continual Learning Sen Lin Peizhong Ju Yitao Liang Ness B. Shroff CLL 88 47 0 12 Feb 2023
Sketched Ridgeless Linear Regression: The Role of Downsampling Xin Chen Yicheng Zeng Siyue Yang Qiang Sun 26 7 0 02 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression Mo Zhou Rong Ge 121 2 0 01 Feb 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum Emmanuel Abbe Samy Bengio Aryo Lotfi Kevin Rizk LRM 97 55 0 30 Jan 2023
Implicit Regularization for Group Sparsity Jiangyuan Li THANH VAN NGUYEN Chinmay Hegde Raymond K. W. Wong 96 9 0 29 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing Jikai Jin Zhiyuan Li Kaifeng Lyu S. Du Jason D. Lee MLT 113 37 0 27 Jan 2023