Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.03584
Cited By
On the Relationship between Self-Attention and Convolutional Layers
8 November 2019
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Relationship between Self-Attention and Convolutional Layers"
50 / 269 papers shown
Title
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
18
216
0
08 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise Convolution
Qi Han
Zejia Fan
Qi Dai
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
29
105
0
08 Jun 2021
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
Convolutional Neural Networks with Gated Recurrent Connections
Jianfeng Wang
Xiaolin Hu
ObjD
22
40
0
05 Jun 2021
Detect the Interactions that Matter in Matter: Geometric Attention for Many-Body Systems
Thorben Frank
Stefan Chmiela
23
3
0
04 Jun 2021
X-volution: On the unification of convolution and self-attention
Xuanhong Chen
Hang Wang
Bingbing Ni
ViT
27
24
0
04 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
34
20
0
02 Jun 2021
Less is More: Pay Less Attention in Vision Transformers
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
24
82
0
29 May 2021
KVT: k-NN Attention for Boosting Vision Transformers
Pichao Wang
Xue Wang
F. Wang
Ming Lin
Shuning Chang
Hao Li
R. L. Jin
ViT
51
105
0
28 May 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan Ö. Arik
Tomas Pfister
ViT
22
169
0
26 May 2021
Are Convolutional Neural Networks or Transformers more like human vision?
Shikhar Tuli
Ishita Dasgupta
Erin Grant
Thomas L. Griffiths
ViT
FaML
16
182
0
15 May 2021
Graph Attention Networks with Positional Embeddings
Liheng Ma
Reihaneh Rabbany
Adriana Romero Soriano
GNN
33
235
0
09 May 2021
Attention-based Stylisation for Exemplar Image Colourisation
Marc Górriz Blanch
Issa Khalifeh
Alan F. Smeaton
Noel E. O'Connor
M. Mrak
31
4
0
04 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
280
2,606
0
04 May 2021
Deformable TDNN with adaptive receptive fields for speech recognition
Keyu An
Yi Zhang
Zhijian Ou
11
5
0
30 Apr 2021
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Jianxin Li
Hao Peng
Yuwei Cao
Yingtong Dou
Hekai Zhang
Philip S. Yu
Lifang He
30
79
0
16 Apr 2021
GAttANet: Global attention agreement for convolutional neural networks
R. V. Rullen
A. Alamia
ViT
16
2
0
12 Apr 2021
Differentiable Patch Selection for Image Recognition
Jean-Baptiste Cordonnier
Aravindh Mahendran
Alexey Dosovitskiy
Dirk Weissenborn
Jakob Uszkoreit
Thomas Unterthiner
33
93
0
07 Apr 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Ben Graham
Alaaeldin El-Nouby
Hugo Touvron
Pierre Stock
Armand Joulin
Hervé Jégou
Matthijs Douze
ViT
22
770
0
02 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
39
1,128
0
01 Apr 2021
Motion Guided Attention Fusion to Recognize Interactions from Videos
Tae Soo Kim
Jonathan D. Jones
Gregory Hager
22
15
0
01 Apr 2021
On the Robustness of Vision Transformers to Adversarial Examples
Kaleel Mahmood
Rigel Mahmood
Marten van Dijk
ViT
22
217
0
31 Mar 2021
Understanding Robustness of Transformers for Image Classification
Srinadh Bhojanapalli
Ayan Chakrabarti
Daniel Glasner
Daliang Li
Thomas Unterthiner
Andreas Veit
ViT
25
378
0
26 Mar 2021
An Image is Worth 16x16 Words, What is a Video Worth?
Gilad Sharir
Asaf Noy
Lihi Zelnik-Manor
ViT
24
120
0
25 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
27
395
0
23 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
58
805
0
19 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
27
126
0
19 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
19
304
0
10 Mar 2021
Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks
George Dasoulas
Kevin Scaman
Aladin Virmaux
GNN
24
40
0
08 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
52
373
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
91
976
0
04 Mar 2021
Generative Adversarial Transformers
Drew A. Hudson
C. L. Zitnick
ViT
25
179
0
01 Mar 2021
Conditional Positional Encodings for Vision Transformers
Xiangxiang Chu
Zhi Tian
Bo-Wen Zhang
Xinlong Wang
Chunhua Shen
ViT
36
605
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
295
0
22 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,984
0
09 Feb 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
81
178
0
03 Feb 2021
CNN with large memory layers
R. Karimov
Yury Malkov
Karim Iskakov
Victor Lempitsky
27
0
0
27 Jan 2021
Spectral Leakage and Rethinking the Kernel Size in CNNs
Nergis Tomen
Jan van Gemert
AAML
24
18
0
25 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,431
0
04 Jan 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
152
6,567
0
23 Dec 2020
ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on
Gaurav Kuppa
Andrew Jong
Vera Liu
Ziwei Liu
Teng-Sheng Moh
CVBM
13
19
0
18 Dec 2020
Toward Transformer-Based Object Detection
Josh Beal
Eric Kim
Eric Tzeng
Dong Huk Park
Andrew Zhai
Dmitry Kislyuk
ViT
27
209
0
17 Dec 2020
GTA: Global Temporal Attention for Video Action Understanding
Bo He
Xitong Yang
Zuxuan Wu
Hao Chen
Ser-Nam Lim
Abhinav Shrivastava
ViT
33
27
0
15 Dec 2020
Convolutional LSTM Neural Networks for Modeling Wildland Fire Dynamics
J. Burge
M. Bonanni
M. Ihme
Lily Hu
28
19
0
11 Dec 2020
SAFCAR: Structured Attention Fusion for Compositional Action Recognition
Tae Soo Kim
Gregory Hager
CoGe
16
10
0
03 Dec 2020
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction
Anzhu Yu
Wenyue Guo
Bing Liu
Xin Chen
Xin Wang
Xuefeng Cao
Bingchuan Jiang
3DV
26
64
0
25 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
41
39,330
0
22 Oct 2020
Group Equivariant Stand-Alone Self-Attention For Vision
David W. Romero
Jean-Baptiste Cordonnier
MDE
26
57
0
02 Oct 2020
Multi-timescale Representation Learning in LSTM Language Models
Shivangi Mahto
Vy A. Vo
Javier S. Turek
Alexander G. Huth
15
29
0
27 Sep 2020
Previous
1
2
3
4
5
6
Next