ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.03584
  4. Cited By
On the Relationship between Self-Attention and Convolutional Layers

On the Relationship between Self-Attention and Convolutional Layers

8 November 2019
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
ArXivPDFHTML

Papers citing "On the Relationship between Self-Attention and Convolutional Layers"

50 / 269 papers shown
Title
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Chasing Sparsity in Vision Transformers: An End-to-End Exploration
Tianlong Chen
Yu Cheng
Zhe Gan
Lu Yuan
Lei Zhang
Zhangyang Wang
ViT
18
216
0
08 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise
  Convolution
On the Connection between Local Attention and Dynamic Depth-wise Convolution
Qi Han
Zejia Fan
Qi Dai
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
29
105
0
08 Jun 2021
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
Convolutional Neural Networks with Gated Recurrent Connections
Convolutional Neural Networks with Gated Recurrent Connections
Jianfeng Wang
Xiaolin Hu
ObjD
22
40
0
05 Jun 2021
Detect the Interactions that Matter in Matter: Geometric Attention for
  Many-Body Systems
Detect the Interactions that Matter in Matter: Geometric Attention for Many-Body Systems
Thorben Frank
Stefan Chmiela
23
3
0
04 Jun 2021
X-volution: On the unification of convolution and self-attention
X-volution: On the unification of convolution and self-attention
Xuanhong Chen
Hang Wang
Bingbing Ni
ViT
27
24
0
04 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel
  Machines
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
34
20
0
02 Jun 2021
Less is More: Pay Less Attention in Vision Transformers
Less is More: Pay Less Attention in Vision Transformers
Zizheng Pan
Bohan Zhuang
Haoyu He
Jing Liu
Jianfei Cai
ViT
24
82
0
29 May 2021
KVT: k-NN Attention for Boosting Vision Transformers
KVT: k-NN Attention for Boosting Vision Transformers
Pichao Wang
Xue Wang
F. Wang
Ming Lin
Shuning Chang
Hao Li
R. L. Jin
ViT
51
105
0
28 May 2021
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and
  Interpretable Visual Understanding
Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding
Zizhao Zhang
Han Zhang
Long Zhao
Ting Chen
Sercan Ö. Arik
Tomas Pfister
ViT
22
169
0
26 May 2021
Are Convolutional Neural Networks or Transformers more like human
  vision?
Are Convolutional Neural Networks or Transformers more like human vision?
Shikhar Tuli
Ishita Dasgupta
Erin Grant
Thomas L. Griffiths
ViT
FaML
16
182
0
15 May 2021
Graph Attention Networks with Positional Embeddings
Graph Attention Networks with Positional Embeddings
Liheng Ma
Reihaneh Rabbany
Adriana Romero Soriano
GNN
33
235
0
09 May 2021
Attention-based Stylisation for Exemplar Image Colourisation
Attention-based Stylisation for Exemplar Image Colourisation
Marc Górriz Blanch
Issa Khalifeh
Alan F. Smeaton
Noel E. O'Connor
M. Mrak
31
4
0
04 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
280
2,606
0
04 May 2021
Deformable TDNN with adaptive receptive fields for speech recognition
Deformable TDNN with adaptive receptive fields for speech recognition
Keyu An
Yi Zhang
Zhijian Ou
11
5
0
30 Apr 2021
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Higher-Order Attribute-Enhancing Heterogeneous Graph Neural Networks
Jianxin Li
Hao Peng
Yuwei Cao
Yingtong Dou
Hekai Zhang
Philip S. Yu
Lifang He
30
79
0
16 Apr 2021
GAttANet: Global attention agreement for convolutional neural networks
GAttANet: Global attention agreement for convolutional neural networks
R. V. Rullen
A. Alamia
ViT
16
2
0
12 Apr 2021
Differentiable Patch Selection for Image Recognition
Differentiable Patch Selection for Image Recognition
Jean-Baptiste Cordonnier
Aravindh Mahendran
Alexey Dosovitskiy
Dirk Weissenborn
Jakob Uszkoreit
Thomas Unterthiner
33
93
0
07 Apr 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Ben Graham
Alaaeldin El-Nouby
Hugo Touvron
Pierre Stock
Armand Joulin
Hervé Jégou
Matthijs Douze
ViT
22
770
0
02 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
39
1,128
0
01 Apr 2021
Motion Guided Attention Fusion to Recognize Interactions from Videos
Motion Guided Attention Fusion to Recognize Interactions from Videos
Tae Soo Kim
Jonathan D. Jones
Gregory Hager
22
15
0
01 Apr 2021
On the Robustness of Vision Transformers to Adversarial Examples
On the Robustness of Vision Transformers to Adversarial Examples
Kaleel Mahmood
Rigel Mahmood
Marten van Dijk
ViT
22
217
0
31 Mar 2021
Understanding Robustness of Transformers for Image Classification
Understanding Robustness of Transformers for Image Classification
Srinadh Bhojanapalli
Ayan Chakrabarti
Daniel Glasner
Daliang Li
Thomas Unterthiner
Andreas Veit
ViT
25
378
0
26 Mar 2021
An Image is Worth 16x16 Words, What is a Video Worth?
An Image is Worth 16x16 Words, What is a Video Worth?
Gilad Sharir
Asaf Noy
Lihi Zelnik-Manor
ViT
24
120
0
25 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
27
395
0
23 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive
  Biases
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
58
805
0
19 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
27
126
0
19 Mar 2021
Involution: Inverting the Inherence of Convolution for Visual
  Recognition
Involution: Inverting the Inherence of Convolution for Visual Recognition
Duo Li
Jie Hu
Changhu Wang
Xiangtai Li
Qi She
Lei Zhu
Tong Zhang
Qifeng Chen
BDL
19
304
0
10 Mar 2021
Lipschitz Normalization for Self-Attention Layers with Application to
  Graph Neural Networks
Lipschitz Normalization for Self-Attention Layers with Application to Graph Neural Networks
George Dasoulas
Kevin Scaman
Aladin Virmaux
GNN
24
40
0
08 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
52
373
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
91
976
0
04 Mar 2021
Generative Adversarial Transformers
Generative Adversarial Transformers
Drew A. Hudson
C. L. Zitnick
ViT
25
179
0
01 Mar 2021
Conditional Positional Encodings for Vision Transformers
Conditional Positional Encodings for Vision Transformers
Xiangxiang Chu
Zhi Tian
Bo-Wen Zhang
Xinlong Wang
Chunhua Shen
ViT
36
605
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
295
0
22 Feb 2021
LambdaNetworks: Modeling Long-Range Interactions Without Attention
LambdaNetworks: Modeling Long-Range Interactions Without Attention
Irwan Bello
281
179
0
17 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,984
0
09 Feb 2021
Relaxed Transformer Decoders for Direct Action Proposal Generation
Relaxed Transformer Decoders for Direct Action Proposal Generation
Jing Tan
Jiaqi Tang
Limin Wang
Gangshan Wu
ViT
81
178
0
03 Feb 2021
CNN with large memory layers
CNN with large memory layers
R. Karimov
Yury Malkov
Karim Iskakov
Victor Lempitsky
27
0
0
27 Jan 2021
Spectral Leakage and Rethinking the Kernel Size in CNNs
Spectral Leakage and Rethinking the Kernel Size in CNNs
Nergis Tomen
Jan van Gemert
AAML
24
18
0
25 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,431
0
04 Jan 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
152
6,567
0
23 Dec 2020
ShineOn: Illuminating Design Choices for Practical Video-based Virtual
  Clothing Try-on
ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on
Gaurav Kuppa
Andrew Jong
Vera Liu
Ziwei Liu
Teng-Sheng Moh
CVBM
13
19
0
18 Dec 2020
Toward Transformer-Based Object Detection
Toward Transformer-Based Object Detection
Josh Beal
Eric Kim
Eric Tzeng
Dong Huk Park
Andrew Zhai
Dmitry Kislyuk
ViT
27
209
0
17 Dec 2020
GTA: Global Temporal Attention for Video Action Understanding
GTA: Global Temporal Attention for Video Action Understanding
Bo He
Xitong Yang
Zuxuan Wu
Hao Chen
Ser-Nam Lim
Abhinav Shrivastava
ViT
33
27
0
15 Dec 2020
Convolutional LSTM Neural Networks for Modeling Wildland Fire Dynamics
Convolutional LSTM Neural Networks for Modeling Wildland Fire Dynamics
J. Burge
M. Bonanni
M. Ihme
Lily Hu
28
19
0
11 Dec 2020
SAFCAR: Structured Attention Fusion for Compositional Action Recognition
SAFCAR: Structured Attention Fusion for Compositional Action Recognition
Tae Soo Kim
Gregory Hager
CoGe
16
10
0
03 Dec 2020
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for
  3D Reconstruction
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction
Anzhu Yu
Wenyue Guo
Bing Liu
Xin Chen
Xin Wang
Xuefeng Cao
Bingchuan Jiang
3DV
26
64
0
25 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
41
39,330
0
22 Oct 2020
Group Equivariant Stand-Alone Self-Attention For Vision
Group Equivariant Stand-Alone Self-Attention For Vision
David W. Romero
Jean-Baptiste Cordonnier
MDE
26
57
0
02 Oct 2020
Multi-timescale Representation Learning in LSTM Language Models
Multi-timescale Representation Learning in LSTM Language Models
Shivangi Mahto
Vy A. Vo
Javier S. Turek
Alexander G. Huth
15
29
0
27 Sep 2020
Previous
123456
Next