ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.00089
  4. Cited By
Understanding AdamW through Proximal Methods and Scale-Freeness

Understanding AdamW through Proximal Methods and Scale-Freeness

31 January 2022
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
ArXiv (abs)PDFHTML

Papers citing "Understanding AdamW through Proximal Methods and Scale-Freeness"

28 / 28 papers shown
Title
Symmetry in Neural Network Parameter Spaces
Symmetry in Neural Network Parameter Spaces
Bo Zhao
Robin Walters
Rose Yu
18
0
0
16 Jun 2025
FPDANet: A Multi-Section Classification Model for Intelligent Screening of Fetal Ultrasound
FPDANet: A Multi-Section Classification Model for Intelligent Screening of Fetal Ultrasound
Minglang Chen
Jie He
Caixu Xu
Bocheng Liang
Shengli Li
Guannan He
Xiongjie Tao
MedIm
45
0
0
06 Jun 2025
Why Gradients Rapidly Increase Near the End of Training
Why Gradients Rapidly Increase Near the End of Training
Aaron Defazio
23
0
0
02 Jun 2025
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
Baz Roland
Kristina Malyseva
Anna Pappa
Tristan Cazenave
112
0
0
29 Apr 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
132
5
0
16 Mar 2025
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm
Nanyu Luo
Feng Ji
DRL
93
0
0
15 Feb 2025
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep
  Neural Networks
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep Neural Networks
Indu Kant Deo
R. Jaiman
101
1
0
04 Dec 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open
  Detection
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection
Pengfei Qi
Yifei Zhang
Wenqiang Li
Youwen Hu
Kunlong Bai
ObjD
78
0
0
10 Sep 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
86
5
0
01 Jul 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Boyao Wang
Tong Zhang
57
6
0
21 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF
Prototypical Reward Network for Data-Efficient RLHF
Jinghan Zhang
Xiting Wang
Yiqiao Jin
Changyu Chen
Xinhao Zhang
Kunpeng Liu
ALM
88
22
0
06 Jun 2024
ExplainableDetector: Exploring Transformer-based Language Modeling
  Approach for SMS Spam Detection with Explainability Analysis
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis
Mohammad Amaz Uddin
Muhammad Nazrul Islam
Leandros A. Maglaras
Helge Janicke
Iqbal H. Sarker
70
3
0
12 May 2024
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
Implicit Bias of AdamW: ℓ∞\ell_\inftyℓ∞​ Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
78
23
0
05 Apr 2024
EEGDiR: Electroencephalogram denoising network for temporal information
  storage and global modeling through Retentive Network
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Bin Wang
Fei Deng
Peifan Jiang
59
9
0
20 Mar 2024
TAPTR: Tracking Any Point with Transformers as Detection
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li
Hao Zhang
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Lei Zhang
86
20
0
19 Mar 2024
An Explainable Transformer-based Model for Phishing Email Detection: A
  Large Language Model Approach
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
Mohammad Amaz Uddin
Iqbal H. Sarker
75
18
0
21 Feb 2024
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme
  Precipitation Nowcasting
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting
Eloy Reulen
S. Mehrkanoon
74
4
0
18 Jan 2024
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant
  Stochastic Algorithms
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Farshed Abdukhakimov
Chulu Xiang
Dmitry Kamzolov
Robert Mansel Gower
Martin Takáč
82
2
0
28 Dec 2023
An Improved Transformer-based Model for Detecting Phishing, Spam, and
  Ham: A Large Language Model Approach
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach
Suhaima Jamal
H. Wimmer
90
21
0
01 Nov 2023
Adam-family Methods with Decoupled Weight Decay in Deep Learning
Adam-family Methods with Decoupled Weight Decay in Deep Learning
Kuang-Yu Ding
Nachuan Xiao
Kim-Chuan Toh
64
3
0
13 Oct 2023
Transformer-based classification of user queries for medical consultancy
  with respect to expert specialization
Transformer-based classification of user queries for medical consultancy with respect to expert specialization
Dmitry Lyutkin
A. Soloviev
Dmitry V. Zhukov
Denis Pozdnyakov
Muhammad Shahid Iqbal Malik
D. Ignatov
MedIm
64
0
0
26 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
91
27
0
12 Sep 2023
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack
  Descriptions
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions
Reza Fayyazi
S. Yang
75
15
0
24 Jun 2023
Regex-augmented Domain Transfer Topic Classification based on a
  Pre-trained Language Model: An application in Financial Domain
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain
Vanessa Liao
Syed Shariyar Murtaza
Yifan Nie
Jimmy J. Lin
50
0
0
23 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
74
13
0
12 May 2023
A Stochastic Proximal Polyak Step Size
A Stochastic Proximal Polyak Step Size
Fabian Schaipp
Robert Mansel Gower
M. Ulbrich
64
12
0
12 Jan 2023
Robustness to Unbounded Smoothness of Generalized SignSGD
Robustness to Unbounded Smoothness of Generalized SignSGD
M. Crawshaw
Mingrui Liu
Francesco Orabona
Wei Zhang
Zhenxun Zhuang
AAML
110
74
0
23 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
94
169
0
13 Aug 2022
1