Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.01169
Cited By
v1
v2
v3
v4
v5 (latest)
Transformers in Vision: A Survey
4 January 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Transformers in Vision: A Survey"
50 / 263 papers shown
Title
Deep Amortized Clustering
Juho Lee
Yoonho Lee
Yee Whye Teh
FedML
53
22
0
30 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
357
945
0
24 Sep 2019
FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images
Christiane Zimmermann
Duygu Ceylan
Jimei Yang
Bryan C. Russell
Max Argus
Thomas Brox
3DH
242
409
0
10 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
184
1,668
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
252
2,493
0
20 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSL
VLM
MLLM
211
906
0
16 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
74
173
0
14 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
155
1,965
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
255
3,699
0
06 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
697
24,557
0
26 Jul 2019
A Short Note on the Kinetics-700 Human Action Dataset
João Carreira
Eric Noland
Chloe Hillier
Andrew Zisserman
82
457
0
15 Jul 2019
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLM
SLR
ViT
107
1,216
0
13 Jun 2019
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
182
2,409
0
13 Jun 2019
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
195
1,479
0
03 Jun 2019
Generating Diverse High-Fidelity Images with VQ-VAE-2
Ali Razavi
Aaron van den Oord
Oriol Vinyals
DRL
BDL
151
1,828
0
02 Jun 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
172
18,193
0
28 May 2019
Cross-Domain Transferability of Adversarial Perturbations
Muzammal Naseer
Salman H. Khan
M. H. Khan
Fahad Shahbaz Khan
Fatih Porikli
AAML
99
145
0
28 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
119
1,148
0
23 May 2019
Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff
A. Srinivas
J. Fauw
Ali Razavi
Carl Doersch
S. M. Ali Eslami
Aaron van den Oord
SSL
144
1,432
0
22 May 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
624
4,809
0
13 May 2019
Video Instance Segmentation
Linjie Yang
Yuchen Fan
N. Xu
VOS
ISeg
88
510
0
12 May 2019
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Jun Liu
Amir Shahroudy
Mauricio Perez
G. Wang
Ling-yu Duan
Alex C. Kot
96
1,294
0
12 May 2019
Local Relation Networks for Image Recognition
Han Hu
Zheng Zhang
Zhenda Xie
Stephen Lin
FAtt
101
501
0
25 Apr 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
136
1,919
0
23 Apr 2019
Attention Augmented Convolutional Networks
Irwan Bello
Barret Zoph
Ashish Vaswani
Jonathon Shlens
Quoc V. Le
145
1,017
0
22 Apr 2019
Cross-Modal Self-Attention Network for Referring Image Segmentation
Linwei Ye
Mrigank Rochan
Zhi Liu
Yang Wang
EgoV
65
478
0
09 Apr 2019
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
159
662
0
05 Apr 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLM
SSL
82
1,250
0
03 Apr 2019
Single Path One-Shot Neural Architecture Search with Uniform Sampling
Zichao Guo
Xiangyu Zhang
Haoyuan Mu
Wen Heng
Zechun Liu
Yichen Wei
Jian Sun
106
940
0
31 Mar 2019
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
Eleni Triantafillou
Tyler Lixuan Zhu
Vincent Dumoulin
Pascal Lamblin
Utku Evci
...
Ross Goroshin
Carles Gelada
Kevin Swersky
Pierre-Antoine Manzagol
Hugo Larochelle
153
620
0
07 Mar 2019
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
Longlong Jing
Yingli Tian
SSL
180
1,703
0
16 Feb 2019
The Evolved Transformer
David R. So
Chen Liang
Quoc V. Le
ViT
126
465
0
30 Jan 2019
On the Turing Completeness of Modern Neural Network Architectures
Jorge A. Pérez
Javier Marinkovic
Pablo Barceló
BDL
73
146
0
10 Jan 2019
Residual Dense Network for Image Restoration
Yulun Zhang
Yapeng Tian
Yu Kong
Bineng Zhong
Y. Fu
SupR
83
725
0
25 Dec 2018
Grounded Video Description
Luowei Zhou
Yannis Kalantidis
Xinlei Chen
Jason J. Corso
Marcus Rohrbach
83
193
0
17 Dec 2018
Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions
Han-Jia Ye
Hexiang Hu
De-Chuan Zhan
Fei Sha
138
667
0
10 Dec 2018
Video Action Transformer Network
Rohit Girdhar
João Carreira
Carl Doersch
Andrew Zisserman
ViT
148
709
0
06 Dec 2018
CCNet: Criss-Cross Attention for Semantic Segmentation
Zilong Huang
Xinggang Wang
Yunchao Wei
Lichao Huang
Humphrey Shi
Wenyu Liu
Chang Huang
VOS
212
2,552
0
28 Nov 2018
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction
Longlong Jing
Xiaodong Yang
Jingen Liu
Yingli Tian
71
157
0
28 Nov 2018
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
184
883
0
27 Nov 2018
An Introductory Survey on Attention Mechanisms in NLP Problems
Dichao Hu
AIMat
77
247
0
12 Nov 2018
Cross and Learn: Cross-Modal Self-Supervision
Nawid Sayed
Biagio Brattoli
Bjorn Ommer
SSL
75
78
0
09 Nov 2018
A Corpus for Reasoning About Natural Language Grounded in Photographs
Alane Suhr
Stephanie Zhou
Ally Zhang
Iris Zhang
Huajun Bai
Yoav Artzi
LRM
108
608
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks
Juho Lee
Yoonho Lee
Jungtaek Kim
Adam R. Kosiorek
Seungjin Choi
Yee Whye Teh
132
275
0
01 Oct 2018
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Xintao Wang
Ke Yu
Shixiang Wu
Jinjin Gu
Yihao Liu
Chao Dong
Chen Change Loy
Yu Qiao
Xiaoou Tang
178
3,740
0
01 Sep 2018
Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan
Rishi Madhok
Irfan Essa
SSL
58
109
0
22 Aug 2018
Image Super-Resolution Using Very Deep Residual Channel Attention Networks
Yulun Zhang
Kunpeng Li
Kai Li
Lichen Wang
Bineng Zhong
Y. Fu
SupR
105
4,333
0
08 Jul 2018
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
Bruno Korbar
Du Tran
Lorenzo Torresani
107
476
0
30 Jun 2018
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
192
615
0
01 Jun 2018
Previous
1
2
3
4
5
6
Next