Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.08791
Cited By
cosFormer: Rethinking Softmax in Attention
17 February 2022
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
Re-assign community
ArXiv
PDF
HTML
Papers citing
"cosFormer: Rethinking Softmax in Attention"
50 / 139 papers shown
Title
Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies
Xiaoliang Luo
Xinyi Xu
Michael Ramscar
Bradley C. Love
30
0
0
13 May 2025
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
175
0
0
04 May 2025
Mitigating Degree Bias in Graph Representation Learning with Learnable Structural Augmentation and Structural Self-Attention
Van Thuy Hoang
Hyeon-Ju Jeon
O-Joun Lee
31
0
0
21 Apr 2025
HRMedSeg: Unlocking High-resolution Medical Image segmentation via Memory-efficient Attention Modeling
Qing Xu
Zhenye Lou
Chenxin Li
Xiangjian He
Rong Qu
Tesema Fiseha Berhanu
Yi Wang
Wenting Duan
Zhen Chen
MedIm
36
0
0
08 Apr 2025
FMDConv: Fast Multi-Attention Dynamic Convolution via Speed-Accuracy Trade-off
Tianyu Zhang
Fan Wan
Haoran Duan
Kevin W. Tong
Jingjing Deng
Yang Long
42
0
0
21 Mar 2025
iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation
Hanxiao Wang
Biao Zhang
Weize Quan
Dong-ming Yan
Peter Wonka
51
0
0
20 Mar 2025
LLM Inference Acceleration via Efficient Operation Fusion
Mahsa Salmani
I. Soloveychik
69
0
0
24 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
183
0
0
25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
54
1
0
24 Jan 2025
Bridging the Divide: Reconsidering Softmax and Linear Attention
Dongchen Han
Yifan Pu
Zhuofan Xia
Yizeng Han
Xuran Pan
Xiu Li
Jiwen Lu
Shiji Song
Gao Huang
78
8
0
09 Dec 2024
On the importance of local and global feature learning for automated measurable residual disease detection in flow cytometry data
Lisa Weijler
Michael Reiter
Pedro Hermosilla
Margarita Maurer-Granofszky
Michael N. Dworzak
79
0
0
23 Nov 2024
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers
Ning Ding
Yehui Tang
Haochen Qin
Zhenli Zhou
Chao Xu
Lin Li
Kai Han
Heng Liao
Yunhe Wang
67
0
0
20 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
54
4
0
16 Nov 2024
Kernel Approximation using Analog In-Memory Computing
Julian Büchel
Giacomo Camposampiero
A. Vasilopoulos
Corey Lammie
Manuel Le Gallo
Abbas Rahimi
Abu Sebastian
58
0
0
05 Nov 2024
Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen
Huy Huu Nguyen
Thang M. Pham
Ruiyi Zhang
Hanieh Deilamsalehy
...
Ryan A. Rossi
Trung Bui
Viet Dac Lai
Franck Dernoncourt
Thien Huu Nguyen
Mamba
RALM
37
1
0
24 Oct 2024
Rethinking Attention: Polynomial Alternatives to Softmax in Transformers
Hemanth Saratchandran
Jianqiao Zheng
Yiping Ji
Wenbo Zhang
Simon Lucey
31
4
0
24 Oct 2024
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
31
0
0
14 Oct 2024
Towards Better Multi-head Attention via Channel-wise Sample Permutation
Shen Yuan
Hongteng Xu
19
1
0
14 Oct 2024
Cluster-wise Graph Transformer with Dual-granularity Kernelized Attention
Siyuan Huang
Yunchong Song
Jiayue Zhou
Zhouhan Lin
33
1
0
09 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
J. Li
Weiyao Lin
VLM
38
1
0
09 Oct 2024
Attention layers provably solve single-location regression
P. Marion
Raphael Berthier
Gérard Biau
Claire Boyer
161
2
0
02 Oct 2024
Intelligent Fish Detection System with Similarity-Aware Transformer
Shengchen Li
Haobo Zuo
Changhong Fu
Zhiyong Wang
Zhiqiang Xu
ViT
28
0
0
28 Sep 2024
Cottention: Linear Transformers With Cosine Attention
Gabriel Mongaras
Trevor Dohm
Eric C. Larson
26
0
0
27 Sep 2024
Expanding Expressivity in Transformer Models with MöbiusAttention
Anna-Maria Halacheva
M. Nayyeri
Steffen Staab
27
1
0
08 Sep 2024
Attention is a smoothed cubic spline
Zehua Lai
Lek-Heng Lim
Yucong Liu
34
2
0
19 Aug 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
33
5
0
26 Jul 2024
Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization
Firas Khader
Omar S. M. El Nahhas
T. Han
Gustav Muller-Franzes
S. Nebelung
Jakob Nikolas Kather
Daniel Truhn
MedIm
35
0
0
03 Jun 2024
You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet
Zhen Qin
Yuxin Mao
Xuyang Shen
Dong Li
Jing Zhang
Yuchao Dai
Yiran Zhong
58
1
0
31 May 2024
Automatic Channel Pruning for Multi-Head Attention
Eunho Lee
Youngbae Hwang
ViT
40
1
0
31 May 2024
Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Zhen Qin
Xuyang Shen
Weigao Sun
Dong Li
Stanley T. Birchfield
Richard I. Hartley
Yiran Zhong
58
6
0
27 May 2024
Demystify Mamba in Vision: A Linear Attention Perspective
Dongchen Han
Ziyi Wang
Zhuofan Xia
Yizeng Han
Yifan Pu
Chunjiang Ge
Jun Song
Shiji Song
Bo Zheng
Gao Huang
Mamba
34
49
0
26 May 2024
LAM3D: Large Image-Point-Cloud Alignment Model for 3D Reconstruction from Single Image
Ruikai Cui
Xibin Song
Weixuan Sun
Senbo Wang
Weizhe Liu
...
Taizhang Shang
Yang Li
Nick Barnes
Hongdong Li
Pan Ji
3DV
53
5
0
24 May 2024
Linearizing Large Language Models
Jean-Pierre Mercat
Igor Vasiljevic
Sedrick Scott Keh
Kushal Arora
Achal Dave
Adrien Gaidon
Thomas Kollar
46
19
0
10 May 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Aaron Courville
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
47
47
0
11 Apr 2024
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
76
2
0
03 Apr 2024
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng
Hebei Li
Yueyi Zhang
Xiaoyan Sun
Feng Wu
ViT
46
13
0
02 Apr 2024
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen
Zhicheng Liu
Xutao Wang
Yuchuan Tian
Yunhe Wang
VLM
31
5
0
29 Mar 2024
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui
Weizhe Liu
Weixuan Sun
Senbo Wang
Taizhang Shang
...
Han Yan
Zhennan Wu
Shenzhou Chen
Hongdong Li
Pan Ji
56
8
0
27 Mar 2024
CipherFormer: Efficient Transformer Private Inference with Low Round Complexity
Weize Wang
Yi Kuang
28
0
0
25 Mar 2024
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
MoE
46
2
0
14 Mar 2024
Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Chenhui Deng
Zichao Yue
Zhiru Zhang
96
23
0
02 Mar 2024
Interactive Multi-Head Self-Attention with Linear Complexity
Hankyul Kang
Ming-Hsuan Yang
Jongbin Ryu
21
1
0
27 Feb 2024
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
35
48
0
06 Feb 2024
Unification of Symmetries Inside Neural Networks: Transformer, Feedforward and Neural ODE
Koji Hashimoto
Yuji Hirono
Akiyoshi Sannai
AI4CE
45
7
0
04 Feb 2024
Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models
Xindi Wang
Mahsa Salmani
Parsa Omidi
Xiangyu Ren
Mehdi Rezagholizadeh
A. Eshaghi
LRM
34
35
0
03 Feb 2024
Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data
Yue Xing
Xiaofeng Lin
Chenheng Xu
Namjoon Suh
Qifan Song
Guang Cheng
19
3
0
01 Feb 2024
Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition
Lei Liu
Li Liu
Haizhou Li
29
6
0
31 Jan 2024
CO2: Efficient Distributed Training with Full Communication-Computation Overlap
Weigao Sun
Zhen Qin
Weixuan Sun
Shidi Li
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
OffRL
61
10
0
29 Jan 2024
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
Zhaokun Zhou
Kaiwei Che
Wei Fang
Keyu Tian
Yuesheng Zhu
Shuicheng Yan
Yonghong Tian
Liuliang Yuan
ViT
41
28
0
04 Jan 2024
1
2
3
Next