Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00595
Cited By
Fast Margin Maximization via Dual Acceleration
1 July 2021
Ziwei Ji
Nathan Srebro
Matus Telgarsky
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast Margin Maximization via Dual Acceleration"
26 / 26 papers shown
Title
How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias
Ruiquan Huang
Yingbin Liang
Jing Yang
55
0
0
02 May 2025
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
33
0
0
05 Apr 2025
The Implicit Bias of Gradient Descent on Separable Multiclass Data
Hrithik Ravi
Clayton Scott
Daniel Soudry
Yutong Wang
45
2
0
02 Nov 2024
Non-asymptotic Convergence of Training Transformers for Next-token Prediction
Ruiquan Huang
Yingbin Liang
Jing Yang
37
5
0
25 Sep 2024
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
34
2
0
12 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Zhiyu Li
Zhiyu Li
E. Weinan
Lei Wu
45
6
0
31 May 2024
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Heejune Sheen
Siyu Chen
Tianhao Wang
Harrison H. Zhou
MLT
41
10
0
13 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
37
13
0
08 Feb 2024
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Mingze Wang
Zeping Min
Lei Wu
33
3
0
24 Nov 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
22
25
0
14 Sep 2023
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
48
43
0
31 Aug 2023
A Unified Approach to Controlling Implicit Regularization via Mirror Descent
Haoyuan Sun
Khashayar Gatmiry
Kwangjun Ahn
Navid Azizan
AI4CE
21
11
0
24 Jun 2023
Max-Margin Token Selection in Attention Mechanism
Davoud Ataee Tarzanagh
Yingcong Li
Xuechen Zhang
Samet Oymak
40
38
0
23 Jun 2023
Faster Margin Maximization Rates for Generic and Adversarially Robust Optimization Methods
Guanghui Wang
Zihao Hu
Claudio Gentile
Vidya Muthukumar
Jacob D. Abernethy
35
0
0
27 May 2023
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability
Jingfeng Wu
Vladimir Braverman
Jason D. Lee
32
17
0
19 May 2023
Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization
Spencer Frei
Gal Vardi
Peter L. Bartlett
Nathan Srebro
30
22
0
02 Mar 2023
Iterative regularization in classification via hinge loss diagonal descent
Vassilis Apidopoulos
T. Poggio
Lorenzo Rosasco
S. Villa
29
2
0
24 Dec 2022
On Accelerated Perceptrons and Beyond
Guanghui Wang
Rafael Hanashiro
E. Guha
Jacob D. Abernethy
23
7
0
17 Oct 2022
On Generalization of Decentralized Learning with Separable Data
Hossein Taheri
Christos Thrampoulidis
FedML
42
11
0
15 Sep 2022
On the Implicit Bias in Deep-Learning Algorithms
Gal Vardi
FedML
AI4CE
34
72
0
26 Aug 2022
Kernel Memory Networks: A Unifying Framework for Memory Modeling
Georgios Iatropoulos
Johanni Brea
W. Gerstner
18
9
0
19 Aug 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
42
27
0
08 Jul 2022
Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently
Haoyuan Sun
Kwangjun Ahn
Christos Thrampoulidis
Navid Azizan
OOD
11
21
0
25 May 2022
On the Optimization of Margin Distribution
Meng-Zhang Qian
Zheng Ai
Teng Zhang
Wei Gao
13
1
0
29 Apr 2022
Does Momentum Change the Implicit Regularization on Separable Data?
Bohan Wang
Qi Meng
Huishuai Zhang
Ruoyu Sun
Wei Chen
Zhirui Ma
Tie-Yan Liu
47
15
0
08 Oct 2021
Properties of the After Kernel
Philip M. Long
24
29
0
21 May 2021
1