Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11605
Cited By
v1
v2 (latest)
Bottleneck Transformers for Visual Recognition
27 January 2021
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottleneck Transformers for Visual Recognition"
50 / 339 papers shown
Title
Dynamically Decoding Source Domain Knowledge for Domain Generalization
Cuicui Kang
Karthik Nandakumar
OOD
ViT
131
1
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
302
1,300
0
05 Oct 2021
GT U-Net: A U-Net Like Group Transformer Network for Tooth Root Segmentation
Yunxiang Li
Shuai Wang
Jun Wang
G. Zeng
Wenjun Liu
Qianni Zhang
Qun Jin
Yaqi Wang
ViT
MedIm
72
50
0
30 Sep 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
175
21
0
29 Sep 2021
A hierarchical residual network with compact triplet-center loss for sketch recognition
Lei Wang
Shihui Zhang
Huang He
Xiaoxia Zhang
Y. Sang
63
2
0
28 Sep 2021
DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers
Changlin Li
Guangrun Wang
Bing Wang
Xiaodan Liang
Zhihui Li
Xiaojun Chang
96
9
0
21 Sep 2021
UNetFormer: A UNet-like Transformer for Efficient Semantic Segmentation of Remote Sensing Urban Scene Imagery
Libo Wang
Rui Li
Ce Zhang
Shenghui Fang
Chenxi Duan
Xiaoliang Meng
P. M. Atkinson
ViT
127
683
0
18 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández-Orallo
77
119
0
12 Sep 2021
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
Ziniu Wan
Zhengjia Li
Maoqing Tian
Jianbo Liu
Shuai Yi
Hongsheng Li
3DH
80
82
0
06 Sep 2021
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
78
19
0
01 Sep 2021
Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Maxime Burchi
Valentin Vielzeuf
73
88
0
31 Aug 2021
Discovering Spatial Relationships by Transformers for Domain Generalization
Cuicui Kang
Karthik Nandakumar
ViT
36
0
0
23 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
278
494
0
12 Aug 2021
Learning Fair Face Representation With Progressive Cross Transformer
Yong Li
Yufei Sun
Zhen Cui
Shiguang Shan
Jian Yang
77
11
0
11 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
209
273
0
31 Jul 2021
DPT: Deformable Patch-based Transformer for Visual Recognition
Zhiyang Chen
Yousong Zhu
Chaoyang Zhao
Guosheng Hu
Wei Zeng
Jinqiao Wang
Ming Tang
ViT
70
101
0
30 Jul 2021
Rethinking and Improving Relative Position Encoding for Vision Transformer
Kan Wu
Houwen Peng
Minghao Chen
Jianlong Fu
Hongyang Chao
ViT
120
340
0
29 Jul 2021
Contextual Transformer Networks for Visual Recognition
Yehao Li
Ting Yao
Yingwei Pan
Tao Mei
ViT
108
494
0
26 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
73
193
0
22 Jul 2021
EAN: Event Adaptive Network for Enhanced Action Recognition
Yuan Tian
Yichao Yan
Guangtao Zhai
G. Guo
Zhiyong Gao
81
42
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
151
235
0
21 Jul 2021
A Comparative Study of Deep Learning Classification Methods on a Small Environmental Microorganism Image Dataset (EMDS-6): from Convolutional Neural Networks to Visual Transformers
Penghui Zhao
Chen Li
M. Rahaman
Hao Xu
Hechen Yang
Hongzan Sun
Tao Jiang
M. Grzegorzek
VLM
92
42
0
16 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip Torr
128
27
0
13 Jul 2021
LANA: Latency Aware Network Acceleration
Pavlo Molchanov
Jimmy Hall
Hongxu Yin
Jan Kautz
Nicolò Fusi
Arash Vahdat
139
10
0
12 Jul 2021
Locally Enhanced Self-Attention: Combining Self-Attention and Convolution as Local and Context Terms
Chenglin Yang
Siyuan Qiao
Adam Kortylewski
Alan Yuille
144
4
0
12 Jul 2021
U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina
Shuyun Tang
Z. Qi
Jacob Granley
M. Beyeler
58
10
0
09 Jul 2021
GLiT: Neural Architecture Search for Global and Local Image Transformer
Boyu Chen
Peixia Li
Chuming Li
Baopu Li
Lei Bai
Chen Lin
Ming Sun
Junjie Yan
Wanli Ouyang
ViT
120
86
0
07 Jul 2021
Gaze Estimation with an Ensemble of Four Architectures
Xin Cai
Boyu Chen
Jiabei Zeng
Jiajun Zhang
Yunjia Sun
X. Wang
Zhilong Ji
Xiao-Chang Liu
Xilin Chen
Shiguang Shan
CVBM
70
13
0
05 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
104
268
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
86
437
0
01 Jul 2021
TransSC: Transformer-based Shape Completion for Grasp Evaluation
Wenkai Chen
Hongzhuo Liang
Zhaopeng Chen
F. Sun
Jianwei Zhang
ViT
49
15
0
01 Jul 2021
ViTAS: Vision Transformer Architecture Search
Xiu Su
Shan You
Jiyang Xie
Mingkai Zheng
Fei Wang
Chao Qian
Changshui Zhang
Xiaogang Wang
Chang Xu
ViT
104
55
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
135
328
0
24 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViT
MLLM
140
208
0
23 Jun 2021
Probabilistic Attention for Interactive Segmentation
Prasad Gabbur
Manjot Bilkhu
J. Movellan
103
13
0
23 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
153
129
0
21 Jun 2021
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
169
639
0
18 Jun 2021
Scene Transformer: A unified architecture for predicting multiple agent trajectories
Jiquan Ngiam
Benjamin Caine
Vijay Vasudevan
Zhengdong Zhang
H. Chiang
...
Ashish Venugopal
David J. Weiss
Benjamin Sapp
Zhifeng Chen
Jonathon Shlens
129
168
0
15 Jun 2021
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
95
127
0
10 Jun 2021
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
Stéphane dÁscoli
Levent Sagun
Giulio Biroli
Ari S. Morcos
ViT
56
6
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
153
1,223
0
09 Jun 2021
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
168
1,101
0
08 Jun 2021
Refiner: Refining Self-attention for Vision Transformers
Daquan Zhou
Yujun Shi
Bingyi Kang
Weihao Yu
Zihang Jiang
Yuan Li
Xiaojie Jin
Qibin Hou
Jiashi Feng
ViT
96
62
0
07 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
213
342
0
07 Jun 2021
Vision Transformers with Hierarchical Attention
Yun-Hai Liu
Yu-Huan Wu
Guolei Sun
Le Zhang
Ajad Chhatkuli
Luc Van Gool
ViT
87
39
0
06 Jun 2021
RegionViT: Regional-to-Local Attention for Vision Transformers
Chun-Fu Chen
Yikang Shen
Quanfu Fan
ViT
148
200
0
04 Jun 2021
X-volution: On the unification of convolution and self-attention
Xuanhong Chen
Hang Wang
Bingbing Ni
ViT
53
25
0
04 Jun 2021
A Comparison for Anti-noise Robustness of Deep Learning Classification Methods on a Tiny Object Image Dataset: from Convolutional Neural Network to Visual Transformer and Performer
Ao Chen
Chen Li
Hao Chen
Hechen Yang
Penghui Zhao
Weiming Hu
Wanli Liu
Shuojia Zou
M. Grzegorzek
42
2
0
03 Jun 2021
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
A. M. Hafiz
S. A. Parah
R. A. Bhat
93
45
0
03 Jun 2021
Container: Context Aggregation Network
Peng Gao
Jiasen Lu
Hongsheng Li
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
106
72
0
02 Jun 2021
Previous
1
2
3
4
5
6
7
Next