ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.08791
  4. Cited By
cosFormer: Rethinking Softmax in Attention

cosFormer: Rethinking Softmax in Attention

17 February 2022
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
ArXivPDFHTML

Papers citing "cosFormer: Rethinking Softmax in Attention"

39 / 139 papers shown
Title
How Much Does Attention Actually Attend? Questioning the Importance of
  Attention in Pretrained Transformers
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
Michael Hassid
Hao Peng
Daniel Rotem
Jungo Kasai
Ivan Montero
Noah A. Smith
Roy Schwartz
32
24
0
07 Nov 2022
XNOR-FORMER: Learning Accurate Approximations in Long Speech
  Transformers
XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers
Roshan S. Sharma
Bhiksha Raj
28
3
0
29 Oct 2022
The Devil in Linear Transformer
The Devil in Linear Transformer
Zhen Qin
Xiaodong Han
Weixuan Sun
Dongxu Li
Lingpeng Kong
Nick Barnes
Yiran Zhong
36
70
0
19 Oct 2022
What Makes Convolutional Models Great on Long Sequence Modeling?
What Makes Convolutional Models Great on Long Sequence Modeling?
Yuhong Li
Tianle Cai
Yi Zhang
De-huai Chen
Debadeepta Dey
VLM
39
96
0
17 Oct 2022
Linear Video Transformer with Feature Fixation
Linear Video Transformer with Feature Fixation
Kaiyue Lu
Zexia Liu
Jianyuan Wang
Weixuan Sun
Zhen Qin
...
Xuyang Shen
Huizhong Deng
Xiaodong Han
Yuchao Dai
Yiran Zhong
30
4
0
15 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Lingpeng Kong
3DV
43
9
0
14 Oct 2022
DARTFormer: Finding The Best Type Of Attention
DARTFormer: Finding The Best Type Of Attention
Jason Brown
Yiren Zhao
Ilia Shumailov
Robert D. Mullins
22
6
0
02 Oct 2022
Wide Attention Is The Way Forward For Transformers?
Wide Attention Is The Way Forward For Transformers?
Jason Brown
Yiren Zhao
Ilia Shumailov
Robert D. Mullins
21
7
0
02 Oct 2022
Spikformer: When Spiking Neural Network Meets Transformer
Spikformer: When Spiking Neural Network Meets Transformer
Zhaokun Zhou
Yuesheng Zhu
Chao He
Yaowei Wang
Shuicheng Yan
Yonghong Tian
Liuliang Yuan
145
243
0
29 Sep 2022
Label Distribution Learning via Implicit Distribution Representation
Label Distribution Learning via Implicit Distribution Representation
Zhuoran Zheng
Xiuyi Jia
25
7
0
28 Sep 2022
Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling
Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling
Qingyang Wu
Zhou Yu
RALM
24
0
0
15 Sep 2022
Sparse Attentive Memory Network for Click-through Rate Prediction with
  Long Sequences
Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences
Qianying Lin
Wen-Ji Zhou
Yanshi Wang
Qing Da
Qingguo Chen
Bing Wang
VLM
15
9
0
08 Aug 2022
Neural Architecture Search on Efficient Transformers and Beyond
Neural Architecture Search on Efficient Transformers and Beyond
Zexiang Liu
Dong Li
Kaiyue Lu
Zhen Qin
Weixuan Sun
Jiacheng Xu
Yiran Zhong
35
19
0
28 Jul 2022
Deep Laparoscopic Stereo Matching with Transformers
Deep Laparoscopic Stereo Matching with Transformers
Xuelian Cheng
Yiran Zhong
Mehrtash Harandi
Tom Drummond
Zhiyong Wang
Zongyuan Ge
ViT
MedIm
35
14
0
25 Jul 2022
Distance Matters in Human-Object Interaction Detection
Distance Matters in Human-Object Interaction Detection
Guangzhi Wang
Yangyang Guo
Yongkang Wong
Mohan S. Kankanhalli
24
13
0
05 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Cheng-rong Li
Yangxin Liu
34
0
0
01 Jul 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
37
232
0
27 Jun 2022
Vicinity Vision Transformer
Vicinity Vision Transformer
Weixuan Sun
Zhen Qin
Huiyuan Deng
Jianyuan Wang
Yi Zhang
Kaihao Zhang
Nick Barnes
Stan Birchfield
Lingpeng Kong
Yiran Zhong
ViT
42
31
0
21 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
24
25
0
17 Jun 2022
ChordMixer: A Scalable Neural Attention Model for Sequences with
  Different Lengths
ChordMixer: A Scalable Neural Attention Model for Sequences with Different Lengths
Ruslan Khalitov
Tong Yu
Lei Cheng
Zhirong Yang
30
12
0
12 Jun 2022
Indirect-Instant Attention Optimization for Crowd Counting in Dense
  Scenes
Indirect-Instant Attention Optimization for Crowd Counting in Dense Scenes
Suyu Han
Guodong Wang
Donghua Liu
18
1
0
12 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
23
21
0
02 Jun 2022
Transformers from an Optimization Perspective
Transformers from an Optimization Perspective
Yongyi Yang
Zengfeng Huang
David Wipf
48
26
0
27 May 2022
HCFormer: Unified Image Segmentation with Hierarchical Clustering
HCFormer: Unified Image Segmentation with Hierarchical Clustering
Teppei Suzuki
27
0
0
20 May 2022
KERPLE: Kernelized Relative Positional Embedding for Length
  Extrapolation
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Peter J. Ramadge
Alexander I. Rudnicky
47
65
0
20 May 2022
Supplementary Material: Implementation and Experiments for GAU-based
  Model
Supplementary Material: Implementation and Experiments for GAU-based Model
Zhenjie Liu
17
0
0
12 May 2022
Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation
  Perspective
Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective
Huizhong Deng
Tong Zhang
Yuchao Dai
Jiawei Shi
Yiran Zhong
Hongdong Li
36
7
0
10 Apr 2022
Locality Matters: A Locality-Biased Linear Attention for Automatic
  Speech Recognition
Locality Matters: A Locality-Biased Linear Attention for Automatic Speech Recognition
J. Sun
Guiping Zhong
Dinghao Zhou
Baoxiang Li
Yiran Zhong
22
7
0
29 Mar 2022
Implicit Motion Handling for Video Camouflaged Object Detection
Implicit Motion Handling for Video Camouflaged Object Detection
Xuelian Cheng
Huan Xiong
Deng-Ping Fan
Yiran Zhong
Mehrtash Harandi
Tom Drummond
Zongyuan Ge
34
69
0
14 Mar 2022
Flowformer: Linearizing Transformers with Conservation Flows
Flowformer: Linearizing Transformers with Conservation Flows
Haixu Wu
Jialong Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
14
90
0
13 Feb 2022
Structure-Aware Transformer for Graph Representation Learning
Structure-Aware Transformer for Graph Representation Learning
Dexiong Chen
Leslie O’Bray
Karsten M. Borgwardt
36
237
0
07 Feb 2022
On Learning the Transformer Kernel
On Learning the Transformer Kernel
Sankalan Pal Chowdhury
Adamos Solomou
Kumar Avinava Dubey
Mrinmaya Sachan
ViT
52
14
0
15 Oct 2021
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
Long-Range Transformers for Dynamic Spatiotemporal Forecasting
J. E. Grigsby
Zhe Wang
Nam Nguyen
Yanjun Qi
AI4TS
69
87
0
24 Sep 2021
Deciphering Environmental Air Pollution with Large Scale City Data
Deciphering Environmental Air Pollution with Large Scale City Data
Mayukh Bhattacharyya
Sayan Nag
Udita Ghosh
AI4CE
19
5
0
09 Sep 2021
From block-Toeplitz matrices to differential equations on graphs:
  towards a general theory for scalable masked Transformers
From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers
K. Choromanski
Han Lin
Haoxian Chen
Tianyi Zhang
Arijit Sehanobish
Valerii Likhosherstov
Jack Parker-Holder
Tamás Sarlós
Adrian Weller
Thomas Weingarten
35
33
0
16 Jul 2021
Content-Augmented Feature Pyramid Network with Light Linear Spatial
  Transformers for Object Detection
Content-Augmented Feature Pyramid Network with Light Linear Spatial Transformers for Object Detection
Yongxiang Gu
Xiaolin Qin
Yuncong Peng
Lu Li
ViT
16
6
0
20 May 2021
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
285
2,017
0
28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
252
580
0
12 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Previous
123