ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.08791
  4. Cited By
cosFormer: Rethinking Softmax in Attention

cosFormer: Rethinking Softmax in Attention

17 February 2022
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
ArXivPDFHTML

Papers citing "cosFormer: Rethinking Softmax in Attention"

50 / 139 papers shown
Title
Linear Attention via Orthogonal Memory
Linear Attention via Orthogonal Memory
Jun Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
40
3
0
18 Dec 2023
DYAD: A Descriptive Yet Abjuring Density efficient approximation to
  linear neural network layers
DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers
S. Chandy
Varun Gangal
Yi Yang
Gabriel Maggiotti
35
0
0
11 Dec 2023
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust
  Attention
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
44
8
0
04 Dec 2023
Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking
  Neural networks: from Algorithms to Technology
Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking Neural networks: from Algorithms to Technology
Souvik Kundu
Rui-jie Zhu
Akhilesh R. Jaiswal
P. Beerel
43
4
0
02 Dec 2023
FRUITS: Feature Extraction Using Iterated Sums for Time Series
  Classification
FRUITS: Feature Extraction Using Iterated Sums for Time Series Classification
Joscha Diehl
Richard Krieg
AI4TS
25
4
0
24 Nov 2023
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Zhen Qin
Aaron Courville
Yiran Zhong
36
74
0
08 Nov 2023
Sliceformer: Make Multi-head Attention as Simple as Sorting in
  Discriminative Tasks
Sliceformer: Make Multi-head Attention as Simple as Sorting in Discriminative Tasks
Shen Yuan
Hongteng Xu
29
0
0
26 Oct 2023
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for
  Long Sequences
Fast Multipole Attention: A Divide-and-Conquer Attention Mechanism for Long Sequences
Yanming Kang
Giang Tran
H. Sterck
23
3
0
18 Oct 2023
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Attentive Multi-Layer Perceptron for Non-autoregressive Generation
Shuyang Jiang
Jinchao Zhang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
54
0
0
14 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
42
0
0
11 Oct 2023
PriViT: Vision Transformers for Fast Private Inference
PriViT: Vision Transformers for Fast Private Inference
Naren Dhyani
Jianqiao Mo
Minsu Cho
Ameya Joshi
Siddharth Garg
Brandon Reagen
Chinmay Hegde
25
4
0
06 Oct 2023
How to Capture Higher-order Correlations? Generalizing Matrix Softmax
  Attention to Kronecker Computation
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman
Zhao Song
38
32
0
06 Oct 2023
FiGURe: Simple and Efficient Unsupervised Node Representations with
  Filter Augmentations
FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations
C. Ekbote
Ajinkya Deshpande
Arun Shankar Iyer
Ramakrishna Bairi
Sundararajan Sellamanickam
SSL
44
3
0
03 Oct 2023
SEA: Sparse Linear Attention with Estimated Attention Mask
SEA: Sparse Linear Attention with Estimated Attention Mask
Heejun Lee
Jina Kim
Jeffrey Willette
Sung Ju Hwang
38
6
0
03 Oct 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Albert Mohwald
34
15
0
28 Sep 2023
GAFlow: Incorporating Gaussian Attention into Optical Flow
GAFlow: Incorporating Gaussian Attention into Optical Flow
Ao Luo
Fan Yang
Xin Li
Lang Nie
Chunyu Lin
Haoqiang Fan
Shuaicheng Liu
41
22
0
28 Sep 2023
CINFormer: Transformer network with multi-stage CNN feature injection
  for surface defect segmentation
CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Xiaoheng Jiang
Kaiyi Guo
Yang Lu
Feng Yan
Hao Liu
Jiale Cao
Mingliang Xu
Dacheng Tao
MedIm
ViT
UQCV
29
1
0
22 Sep 2023
Long-Range Transformer Architectures for Document Understanding
Long-Range Transformer Architectures for Document Understanding
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
VLM
28
2
0
11 Sep 2023
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor
  Formula for Image Dehazing
MB-TaylorFormer: Multi-branch Efficient Transformer Expanded by Taylor Formula for Image Dehazing
Yuwei Qiu
Kaihao Zhang
Chenxi Wang
Wenhan Luo
Hongdong Li
Zhi Jin
ViT
39
84
0
27 Aug 2023
All-pairs Consistency Learning for Weakly Supervised Semantic
  Segmentation
All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation
Weixuan Sun
Yanhao Zhang
Zhen Qin
Zheyuan Liu
Lin Cheng
Fanyi Wang
Yiran Zhong
Nick Barnes
ViT
41
4
0
08 Aug 2023
Attention-free Spikformer: Mixing Spike Sequences with Simple Linear
  Transforms
Attention-free Spikformer: Mixing Spike Sequences with Simple Linear Transforms
Qingyu Wang
Duzhen Zhang
Tielin Zhang
Bo Xu
33
3
0
02 Aug 2023
FLatten Transformer: Vision Transformer using Focused Linear Attention
FLatten Transformer: Vision Transformer using Focused Linear Attention
Dongchen Han
Xuran Pan
Yizeng Han
Shiji Song
Gao Huang
23
156
0
01 Aug 2023
TransNormerLLM: A Faster and Better Large Language Model with Improved
  TransNormer
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
Zhen Qin
Dong Li
Weigao Sun
Weixuan Sun
Xuyang Shen
...
Yunshen Wei
Baohong Lv
Xiao Luo
Yu Qiao
Yiran Zhong
43
15
0
27 Jul 2023
Explainable Techniques for Analyzing Flow Cytometry Cell Transformers
Explainable Techniques for Analyzing Flow Cytometry Cell Transformers
Florian Kowarsch
Lisa Weijler
Florian Kleber
Matthias Wödlinger
Michael Reiter
Margarita Maurer-Granofszky
Michael N. Dworzak
MedIm
33
0
0
27 Jul 2023
Exploring Transformer Extrapolation
Exploring Transformer Extrapolation
Zhen Qin
Yiran Zhong
Huiyuan Deng
31
9
0
19 Jul 2023
Linearized Relative Positional Encoding
Linearized Relative Positional Encoding
Zhen Qin
Weixuan Sun
Kaiyue Lu
Huizhong Deng
Dong Li
Xiaodong Han
Yuchao Dai
Lingpeng Kong
Yiran Zhong
20
12
0
18 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
45
62
0
16 Jul 2023
LEST: Large-scale LiDAR Semantic Segmentation with Transformer
LEST: Large-scale LiDAR Semantic Segmentation with Transformer
Chuanyu Luo
Nuo Cheng
Sikun Ma
Han Li
Xiaohan Li
Shengguang Lei
Pu Li
3DPC
ViT
30
2
0
14 Jul 2023
Scaling In-Context Demonstrations with Structured Attention
Scaling In-Context Demonstrations with Structured Attention
Tianle Cai
Kaixuan Huang
Jason D. Lee
Mengdi Wang
LRM
33
8
0
05 Jul 2023
Spike-driven Transformer
Spike-driven Transformer
Man Yao
Jiakui Hu
Zhaokun Zhou
Liuliang Yuan
Yonghong Tian
Boxing Xu
Guoqi Li
34
118
0
04 Jul 2023
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
LongCoder: A Long-Range Pre-trained Language Model for Code Completion
Daya Guo
Canwen Xu
Nan Duan
Jian Yin
Julian McAuley
20
78
0
26 Jun 2023
ESTISR: Adapting Efficient Scene Text Image Super-resolution for
  Real-Scenes
ESTISR: Adapting Efficient Scene Text Image Super-resolution for Real-Scenes
Minghao Fu
Xin Man
Yihan Xu
Jie Shao
33
2
0
04 Jun 2023
Faster Causal Attention Over Large Sequences Through Sparse Flash
  Attention
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention
Matteo Pagliardini
Daniele Paliotta
Martin Jaggi
Franccois Fleuret
LRM
20
22
0
01 Jun 2023
Auto-Spikformer: Spikformer Architecture Search
Auto-Spikformer: Spikformer Architecture Search
Kaiwei Che
Zhaokun Zhou
Zhengyu Ma
Wei Fang
Yanqing Chen
Shuaijie Shen
Liuliang Yuan
Yonghong Tian
29
4
0
01 Jun 2023
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal
  Representation
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Yingyi Chen
Qinghua Tao
F. Tonin
Johan A. K. Suykens
42
19
0
31 May 2023
T-former: An Efficient Transformer for Image Inpainting
T-former: An Efficient Transformer for Image Inpainting
Ye Deng
Siqi Hui
Sanping Zhou
Deyu Meng
Jinjun Wang
ViT
19
30
0
12 May 2023
Toeplitz Neural Network for Sequence Modeling
Toeplitz Neural Network for Sequence Modeling
Zhen Qin
Xiaodong Han
Weixuan Sun
Bowen He
Dong Li
Dongxu Li
Yuchao Dai
Lingpeng Kong
Yiran Zhong
AI4TS
ViT
35
40
0
08 May 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
27
1
0
17 Apr 2023
EGformer: Equirectangular Geometry-biased Transformer for 360 Depth
  Estimation
EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation
Ilwi Yun
Chanyong Shin
Hyunku Lee
Hyuk-Jae Lee
Chae-Eun Rhee
ViT
MDE
32
17
0
16 Apr 2023
Fine-grained Audible Video Description
Fine-grained Audible Video Description
Xuyang Shen
Dong Li
Jinxing Zhou
Zhen Qin
Bowen He
...
Yuchao Dai
Lingpeng Kong
Meng Wang
Yu Qiao
Yiran Zhong
VGen
41
11
0
27 Mar 2023
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR
  mechanism
HYBRIDFORMER: improving SqueezeFormer with hybrid attention and NSR mechanism
Yuguang Yang
Y. Pan
Jingjing Yin
Jiangyu Han
Lei Ma
Heng Lu
31
8
0
15 Mar 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic
  Data Imputation
Single Cells Are Spatial Tokens: Transformers for Spatial Transcriptomic Data Imputation
Haifang Wen
Wenzhuo Tang
Wei Jin
Jiayuan Ding
Renming Liu
Xinnan Dai
Feng Shi
Lulu Shang
Jiliang Tang
Yuying Xie
29
9
0
06 Feb 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in
  Transformers
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
37
8
0
03 Feb 2023
LMEC: Learnable Multiplicative Absolute Position Embedding Based
  Conformer for Speech Recognition
LMEC: Learnable Multiplicative Absolute Position Embedding Based Conformer for Speech Recognition
Yuguang Yang
Y. Pan
Jingjing Yin
Heng Lu
32
3
0
05 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
28
2
0
29 Nov 2022
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision
  Transformer with Heterogeneous Attention
MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention
Wenyuan Zeng
Meng Li
Wenjie Xiong
Tong Tong
Wen-jie Lu
Jin Tan
Runsheng Wang
Ru Huang
24
20
0
25 Nov 2022
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention
Bosheng Qin
Juncheng Li
Siliang Tang
Yueting Zhuang
25
2
0
24 Nov 2022
Castling-ViT: Compressing Self-Attention via Switching Towards
  Linear-Angular Attention at Vision Transformer Inference
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
Haoran You
Yunyang Xiong
Xiaoliang Dai
Bichen Wu
Peizhao Zhang
Haoqi Fan
Peter Vajda
Yingyan Lin
37
32
0
18 Nov 2022
Hyperbolic Cosine Transformer for LiDAR 3D Object Detection
Hyperbolic Cosine Transformer for LiDAR 3D Object Detection
Jigang Tong
Fanhang Yang
Sen Yang
Enzeng Dong
Shengzhi Du
Xing-jun Wang
Xianlin Yi
3DPC
ViT
11
1
0
10 Nov 2022
Previous
123
Next