Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

16 August 2021

Papers citing "Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism"

7 / 7 papers shown

Title
Quantum Doubly Stochastic Transformers Jannis Born Filip Skogh Kahn Rhrissorrakrai Filippo Utro Nico Wagner Aleksandros Sobczyk 27 0 0 22 Apr 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers Akiyoshi Tomihari Issei Sato ODL 61 1 0 31 Jan 2025
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers Hemanth Saratchandran Jianqiao Zheng Yiping Ji Wenbo Zhang Simon Lucey 31 4 0 24 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers DongNyeong Heo Heeyoul Choi 56 0 0 21 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Weronika Ormaniec Felix Dangel Sidak Pal Singh 35 6 0 14 Oct 2024
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems David T. Hoffmann Simon Schrodi Jelena Bratulić Nadine Behrmann Volker Fischer Thomas Brox 30 5 0 19 Oct 2023
Simplicial Embeddings in Self-Supervised Learning and Downstream Classification Samuel Lavoie Christos Tsirigotis Max Schwarzer Ankit Vani Michael Noukhovitch Kenji Kawaguchi Aaron C. Courville SSL 24 17 0 01 Apr 2022