Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.07153
Cited By
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism
16 August 2021
Shulun Wang
Bin Liu
Feng Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism"
7 / 7 papers shown
Title
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Hemanth Saratchandran
Jianqiao Zheng
Yiping Ji
Wenbo Zhang
Simon Lucey
31
4
0
24 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
56
0
0
21 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
35
6
0
14 Oct 2024
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
30
5
0
19 Oct 2023
Simplicial Embeddings in Self-Supervised Learning and Downstream Classification
Samuel Lavoie
Christos Tsirigotis
Max Schwarzer
Ankit Vani
Michael Noukhovitch
Kenji Kawaguchi
Aaron C. Courville
SSL
24
17
0
01 Apr 2022
1