Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22014
Cited By
Learning in Compact Spaces with Approximately Normalized Transformers
28 May 2025
Jörg Franke
Urs Spiegelhalter
Marianna Nezhurina
J. Jitsev
Frank Hutter
Michael Hefenbrock
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning in Compact Spaces with Approximately Normalized Transformers"
3 / 3 papers shown
Title
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
OffRL
ViT
160
20
0
13 Mar 2025
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
126
7
0
09 Feb 2025
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
173
26
0
27 Jun 2024
1