ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.07153
  4. Cited By
Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in
  Attention Mechanism

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

16 August 2021
Shulun Wang
Bin Liu
Feng Liu
ArXivPDFHTML

Papers citing "Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism"

7 / 7 papers shown
Title
Quantum Doubly Stochastic Transformers
Quantum Doubly Stochastic Transformers
Jannis Born
Filip Skogh
Kahn Rhrissorrakrai
Filippo Utro
Nico Wagner
Aleksandros Sobczyk
27
0
0
22 Apr 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Hemanth Saratchandran
Jianqiao Zheng
Yiping Ji
Wenbo Zhang
Simon Lucey
31
4
0
24 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
56
0
0
21 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
35
6
0
14 Oct 2024
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced
  Optimization Problems
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
30
5
0
19 Oct 2023
Simplicial Embeddings in Self-Supervised Learning and Downstream
  Classification
Simplicial Embeddings in Self-Supervised Learning and Downstream Classification
Samuel Lavoie
Christos Tsirigotis
Max Schwarzer
Ankit Vani
Michael Noukhovitch
Kenji Kawaguchi
Aaron C. Courville
SSL
24
17
0
01 Apr 2022
1