Understanding AdamW through Proximal Methods and Scale-Freeness

31 January 2022

Papers citing "Understanding AdamW through Proximal Methods and Scale-Freeness"

28 / 28 papers shown

Title
Symmetry in Neural Network Parameter Spaces Bo Zhao Robin Walters Rose Yu 18 0 0 16 Jun 2025
FPDANet: A Multi-Section Classification Model for Intelligent Screening of Fetal Ultrasound Minglang Chen Jie He Caixu Xu Bocheng Liang Shengli Li Guannan He Xiongjie Tao MedIm 45 0 0 06 Jun 2025
Why Gradients Rapidly Increase Near the End of Training Aaron Defazio 23 0 0 02 Jun 2025
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters Baz Roland Kristina Malyseva Anna Pappa Tristan Cazenave 112 0 0 29 Apr 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization Dmitry Kovalev 132 5 0 16 Mar 2025
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm Nanyu Luo Feng Ji DRL 93 0 0 15 Feb 2025
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep Neural Networks Indu Kant Deo R. Jaiman 101 1 0 04 Dec 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection Pengfei Qi Yifei Zhang Wenqiang Li Youwen Hu Kunlong Bai ObjD 78 0 0 10 Sep 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme Songwei Liu Chao Zeng Lianqiang Li Chenqian Yan Lean Fu Xing Mei Fangmin Chen 86 5 0 01 Jul 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness Yuxing Liu Boyao Wang Tong Zhang 57 6 0 21 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF Jinghan Zhang Xiting Wang Yiqiao Jin Changyu Chen Xinhao Zhang Kunpeng Liu ALM 88 22 0 06 Jun 2024
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis Mohammad Amaz Uddin Muhammad Nazrul Islam Leandros A. Maglaras Helge Janicke Iqbal H. Sarker 70 3 0 12 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 78 23 0 05 Apr 2024
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network Bin Wang Fei Deng Peifan Jiang 59 9 0 20 Mar 2024
TAPTR: Tracking Any Point with Transformers as Detection Hongyang Li Hao Zhang Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li Lei Zhang 86 20 0 19 Mar 2024
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach Mohammad Amaz Uddin Iqbal H. Sarker 75 18 0 21 Feb 2024
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting Eloy Reulen S. Mehrkanoon 74 4 0 18 Jan 2024
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms Farshed Abdukhakimov Chulu Xiang Dmitry Kamzolov Robert Mansel Gower Martin Takáč 82 2 0 28 Dec 2023
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach Suhaima Jamal H. Wimmer 90 21 0 01 Nov 2023
Adam-family Methods with Decoupled Weight Decay in Deep Learning Kuang-Yu Ding Nachuan Xiao Kim-Chuan Toh 64 3 0 13 Oct 2023
Transformer-based classification of user queries for medical consultancy with respect to expert specialization Dmitry Lyutkin A. Soloviev Dmitry V. Zhukov Denis Pozdnyakov Muhammad Shahid Iqbal Malik D. Ignatov MedIm 64 0 0 26 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale Hao-Jun Michael Shi Tsung-Hsien Lee Shintaro Iwasaki Jose Gallego-Posada Zhijing Li Kaushik Rangadurai Dheevatsa Mudigere Michael Rabbat ODL 91 27 0 12 Sep 2023
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions Reza Fayyazi S. Yang 75 15 0 24 Jun 2023
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain Vanessa Liao Syed Shariyar Murtaza Yifan Nie Jimmy J. Lin 50 0 0 23 May 2023
MoMo: Momentum Models for Adaptive Learning Rates Fabian Schaipp Ruben Ohana Michael Eickenberg Aaron Defazio Robert Mansel Gower 74 13 0 12 May 2023
A Stochastic Proximal Polyak Step Size Fabian Schaipp Robert Mansel Gower M. Ulbrich 64 12 0 12 Jan 2023
Robustness to Unbounded Smoothness of Generalized SignSGD M. Crawshaw Mingrui Liu Francesco Orabona Wei Zhang Zhenxun Zhuang AAML 110 74 0 23 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models Xingyu Xie Pan Zhou Huan Li Zhouchen Lin Shuicheng Yan ODL 94 169 0 13 Aug 2022