Scalable-Softmax Is Superior for Attention

31 January 2025

Papers citing "Scalable-Softmax Is Superior for Attention"

3 / 3 papers shown

Title
Scale-invariant Attention Ben Anson Xi Wang Laurence Aitchison LRM 105 0 0 20 May 2025
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models Hector Pasten Felipe Urrutia Hector Jimenez Cristian B. Calderon Cristóbal Rojas Alexander Kozachinskiy 117 0 0 15 May 2025
Multi-Token Attention O. Yu. Golovneva Tianlu Wang Jason Weston Sainbayar Sukhbaatar 89 1 0 01 Apr 2025