Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.10986
Cited By
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
14 October 2024
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis"
4 / 4 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Zhi-Quan Luo
Jianfeng Yao
Ruoyu Sun
31
0
0
05 May 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
51
0
0
06 Mar 2025
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
89
1
0
26 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
1