The Implicit Bias of Gradient Descent on Separable Data

27 October 2017

Papers citing "The Implicit Bias of Gradient Descent on Separable Data"

50 / 244 papers shown

Title
Embedding principle of homogeneous neural network for classification problem Jiahan Zhang Tao Luo Yaoyu Zhang 9 0 0 18 May 2025
An Analytical Characterization of Sloppiness in Neural Networks: Insights from Linear Models Jialin Mao Itay Griniasty Yan Sun Mark K. Transtrum James P. Sethna Pratik Chaudhari 29 0 0 13 May 2025
Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias Yura Malitsky Alexander Posch 27 0 0 05 May 2025
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias Ruiquan Huang Yingbin Liang Jing Yang 55 0 0 02 May 2025
Gradient Descent as a Shrinkage Operator for Spectral Bias Simon Lucey 38 0 0 25 Apr 2025
Weight Ensembling Improves Reasoning in Language Models Xingyu Dang Christina Baek Kaiyue Wen Zico Kolter Aditi Raghunathan MoMe LRM 65 1 0 14 Apr 2025
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks Chenyang Zhang Peifeng Gao Difan Zou Yuan Cao OOD MLT 68 0 0 11 Apr 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 33 0 0 05 Apr 2025
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations Yize Zhao Tina Behnia V. Vakilian Christos Thrampoulidis 68 9 0 20 Feb 2025
The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks Sholom Schechtman Nicolas Schreuder 212 0 0 08 Feb 2025
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture Sajad Movahedi Antonio Orvieto Seyed-Mohsen Moosavi-Dezfooli AI4CE AAML 198 0 0 15 Oct 2024
Variational Search Distributions Daniel M. Steinberg Rafael Oliveira Cheng Soon Ong Edwin V. Bonilla 33 0 0 10 Sep 2024
Input Space Mode Connectivity in Deep Neural Networks Jakub Vrabel Ori Shem-Ur Yaron Oz David Krueger 58 1 0 09 Sep 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks Amit Peleg Matthias Hein 39 0 0 04 Jul 2024
Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning Subhojyoti Mukherjee Josiah P. Hanna Qiaomin Xie Robert Nowak 84 2 0 07 Jun 2024
When does compositional structure yield compositional generalization? A kernel theory Samuel Lippl Kim Stachenfeld NAI CoGe 73 6 0 26 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 50 13 0 05 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks Haiyun He Christina Lee Yu 38 5 0 04 Apr 2024
Understanding the Double Descent Phenomenon in Deep Learning Marc Lafon Alexandre Thomas 25 2 0 15 Mar 2024
Neural Redshift: Random Networks are not Random Functions Damien Teney A. Nicolicioiu Valentin Hartmann Ehsan Abbasnejad 103 19 0 04 Mar 2024
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models Frederik Kunstner Robin Yadav Alan Milligan Mark Schmidt Alberto Bietti 39 25 0 29 Feb 2024
Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features Tina Behnia Christos Thrampoulidis SSL 39 0 0 29 Feb 2024
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation Muthuraman Chidambaram Rong Ge AAML 35 0 0 10 Feb 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 37 15 0 08 Feb 2024
Fast and Exact Enumeration of Deep Networks Partitions Regions Randall Balestriero Yann LeCun 18 5 0 20 Jan 2024
An extended asymmetric sigmoid with Perceptron (SIGTRON) for imbalanced linear classification Hyenkyun Woo 20 0 0 26 Dec 2023
Towards Demystifying the Generalization Behaviors When Neural Collapse Emerges Peifeng Gao Qianqian Xu Yibo Yang Peisong Wen Huiyang Shao Zhiyong Yang Guohao Li Qingming Huang AAML 31 3 0 12 Oct 2023
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods A. Ma Yangchen Pan Amir-massoud Farahmand AAML 25 5 0 13 Aug 2023
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization Kaiyue Wen Zhiyuan Li Tengyu Ma FAtt 38 26 0 20 Jul 2023
Deconstructing Data Reconstruction: Multiclass, Weight Decay and General Losses G. Buzaglo Niv Haim Gilad Yehudai Gal Vardi Yakir Oz Yaniv Nikankin Michal Irani 34 10 0 04 Jul 2023
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models David X. Wu A. Sahai 29 2 0 23 Jun 2023
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks Yuan Cao Difan Zou Yuan-Fang Li Quanquan Gu MLT 37 5 0 20 Jun 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage Yu Gui Cong Ma Yiqiao Zhong 25 7 0 06 Jun 2023
Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff Arthur Jacot MLT 26 13 0 30 May 2023
Estimating class separability of text embeddings with persistent homology Kostis Gourgoulias Najah F. Ghalyan Maxime Labonne Yash Satsangi Sean J. Moran Joseph Sabelja 40 0 0 24 May 2023
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data Hossein Taheri Christos Thrampoulidis MLT 16 3 0 22 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability Jingfeng Wu Vladimir Braverman Jason D. Lee 32 17 0 19 May 2023
Robust Implicit Regularization via Weight Normalization H. Chou Holger Rauhut Rachel A. Ward 38 7 0 09 May 2023
Do deep neural networks have an inbuilt Occam's razor? Chris Mingard Henry Rees Guillermo Valle Pérez A. Louis UQCV BDL 21 16 0 13 Apr 2023
Saddle-to-Saddle Dynamics in Diagonal Linear Networks Scott Pesme Nicolas Flammarion 33 35 0 02 Apr 2023
On the Stepwise Nature of Self-Supervised Learning James B. Simon Maksis Knutins Liu Ziyin Daniel Geisz Abraham J. Fetterman Joshua Albrecht SSL 37 30 0 27 Mar 2023
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions Kuo-Wei Lai Vidya Muthukumar 31 5 0 13 Mar 2023
Penalising the biases in norm regularisation enforces sparsity Etienne Boursier Nicolas Flammarion 40 14 0 02 Mar 2023
Implicit regularization in Heavy-ball momentum accelerated stochastic gradient descent Avrajit Ghosh He Lyu Xitong Zhang Rongrong Wang 53 21 0 02 Feb 2023
Generalization on the Unseen, Logic Reasoning and Degree Curriculum Emmanuel Abbe Samy Bengio Aryo Lotfi Kevin Rizk LRM 41 49 0 30 Jan 2023
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing Jikai Jin Zhiyuan Li Kaifeng Lyu S. Du Jason D. Lee MLT 54 34 0 27 Jan 2023
A Stability Analysis of Fine-Tuning a Pre-Trained Model Z. Fu Anthony Man-Cho So Nigel Collier 23 3 0 24 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients David A. R. Robin Kevin Scaman Marc Lelarge 27 3 0 19 Jan 2023
Understanding Difficulty-based Sample Weighting with a Universal Difficulty Measure Xiaoling Zhou Ou Wu Weiyao Zhu Ziyang Liang 44 2 0 12 Jan 2023
Iterative regularization in classification via hinge loss diagonal descent Vassilis Apidopoulos T. Poggio Lorenzo Rosasco S. Villa 29 2 0 24 Dec 2022