ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11929
  4. Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
    ViT
ArXivPDFHTML

Papers citing "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"

50 / 1,173 papers shown
Title
Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Huali Xu
Shuaifeng Zhi
Shuzhou Sun
Vishal M. Patel
Li Liu
81
14
0
15 Mar 2023
Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey
Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey
Kunlin Wang
Zi Wang
Zhang Li
Ang Su
Xichao Teng
Minhao Liu
Qifeng Yu
Qifeng Yu
ObjD
119
9
0
21 Feb 2023
Transformadores: Fundamentos teoricos y Aplicaciones
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
104
0
0
18 Feb 2023
Less is More: The Influence of Pruning on the Explainability of CNNs
Less is More: The Influence of Pruning on the Explainability of CNNs
David Weber
F. Merkle
Pascal Schöttle
Stephan Schlögl
Martin Nocker
FAtt
101
1
0
17 Feb 2023
Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training
Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training
Xiaoying Zhi
Varun Babbar
P. Sun
Fran Silavong
Ruibo Shi
Sean J. Moran
Sean Moran
73
1
0
17 Feb 2023
MVTN: Learning Multi-View Transformations for 3D Understanding
MVTN: Learning Multi-View Transformations for 3D Understanding
Abdullah Hamdi
Faisal AlZahrani
Silvio Giancola
Guohao Li
3DV
3DPC
72
6
0
27 Dec 2022
SAIF: Sparse Adversarial and Imperceptible Attack Framework
SAIF: Sparse Adversarial and Imperceptible Attack Framework
Tooba Imtiaz
Morgan Kohler
Jared Miller
Zifeng Wang
Octavia Camps
Mario Sznaier
Octavia Camps
Jennifer Dy
AAML
55
0
0
14 Dec 2022
Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations
Explaining Deep Convolutional Neural Networks for Image Classification by Evolving Local Interpretable Model-agnostic Explanations
Bin Wang
Wenbin Pei
Bing Xue
Mengjie Zhang
FAtt
102
3
0
28 Nov 2022
Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics
Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics
Ancheng Lin
Jun Yu Li
Yusheng Xiang
Wei Bian
Mukesh Prasad
3DPC
ViT
87
2
0
19 Nov 2022
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Daniel de Oliveira
David Martins de Matos
ViT
43
0
0
18 Oct 2022
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Haoran You
Zhanyi Sun
Huihong Shi
Zhongzhi Yu
Yang Zhao
Yongan Zhang
Chaojian Li
Baopu Li
Yingyan Lin
ViT
51
82
0
18 Oct 2022
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Jinchao Zhang
Shuyang Jiang
Jiangtao Feng
Lin Zheng
Dianbo Sui
3DV
91
9
0
14 Oct 2022
A systematic review of the use of Deep Learning in Satellite Imagery for Agriculture
A systematic review of the use of Deep Learning in Satellite Imagery for Agriculture
Brandon Victor
Zhen He
Aiden Nibali
65
9
0
03 Oct 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
FAtt
79
2
0
19 Sep 2022
Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain
Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain
Lianyu Wang
Meng Wang
Daoqiang Zhang
Huazhu Fu
48
2
0
05 Sep 2022
Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers
Toward Interpretable Sleep Stage Classification Using Cross-Modal Transformers
Jathurshan Pradeepkumar
Mithunjha Anandakumar
Vinith Kugathasan
Dhinesh Suntharalingham
S. L. Kappel
A. D. Silva
Chamira U. S. Edussooriya
52
31
0
15 Aug 2022
Masked Autoencoders are Robust Data Augmentors
Masked Autoencoders are Robust Data Augmentors
Haohang Xu
Shuangrui Ding
Xiaopeng Zhang
H. Xiong
72
27
0
10 Jun 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
82
36
0
01 Jun 2022
Vision-Language Pre-Training with Triple Contrastive Learning
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
VLM
83
290
0
21 Feb 2022
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Frederik Pahde
Maximilian Dreyer
Leander Weber
Moritz Weckbecker
Christopher J. Anders
Thomas Wiegand
Wojciech Samek
Sebastian Lapuschkin
91
9
0
07 Feb 2022
Co-domain Symmetry for Complex-Valued Deep Learning
Co-domain Symmetry for Complex-Valued Deep Learning
Utkarsh Singhal
Yifei Xing
Stella X. Yu
58
12
0
02 Dec 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
197
1,011
0
09 Oct 2021
Benchmarking the Robustness of Instance Segmentation Models
Benchmarking the Robustness of Instance Segmentation Models
Said Fahri Altindis
Yusuf Dalva
Hamza Pehlivan
Aysegül Dündar
VLM
OOD
115
12
0
02 Sep 2021
Transformer-based deep imitation learning for dual-arm robot manipulation
Transformer-based deep imitation learning for dual-arm robot manipulation
Heecheol Kim
Yoshiyuki Ohmura
Yasuo Kuniyoshi
68
48
0
01 Aug 2021
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can
  Scale Up
TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up
Yi Ding
Shiyu Chang
Zhangyang Wang
ViT
64
389
0
14 Feb 2021
Representation Matters: Offline Pretraining for Sequential Decision
  Making
Representation Matters: Offline Pretraining for Sequential Decision Making
Mengjiao Yang
Ofir Nachum
SSL
OffRL
57
119
0
11 Feb 2021
On Robustness and Transferability of Convolutional Neural Networks
On Robustness and Transferability of Convolutional Neural Networks
Josip Djolonga
Jessica Yung
Michael Tschannen
Rob Romijnders
Lucas Beyer
...
D. Moldovan
Sylvain Gelly
N. Houlsby
Xiaohua Zhai
Mario Lucic
OOD
35
154
0
16 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhiwen Chen
MoE
76
1,142
0
30 Jun 2020
Object-Centric Learning with Slot Attention
Object-Centric Learning with Slot Attention
Francesco Locatello
Dirk Weissenborn
Thomas Unterthiner
Aravindh Mahendran
G. Heigold
Jakob Uszkoreit
Alexey Dosovitskiy
Thomas Kipf
OCL
158
832
0
26 Jun 2020
Are we done with ImageNet?
Are we done with ImageNet?
Lucas Beyer
Olivier J. Hénaff
Alexander Kolesnikov
Xiaohua Zhai
Aaron van den Oord
VLM
105
398
0
12 Jun 2020
Visual Transformers: Token-based Image Representation and Processing for
  Computer Vision
Visual Transformers: Token-based Image Representation and Processing for Computer Vision
Bichen Wu
Chenfeng Xu
Xiaoliang Dai
Alvin Wan
Peizhao Zhang
Zhicheng Yan
Masayoshi Tomizuka
Joseph E. Gonzalez
Kurt Keutzer
Peter Vajda
ViT
65
553
0
05 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
467
41,106
0
28 May 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
275
12,847
0
26 May 2020
Quantifying Attention Flow in Transformers
Quantifying Attention Flow in Transformers
Samira Abnar
Willem H. Zuidema
100
786
0
02 May 2020
Exploring Self-attention for Image Recognition
Exploring Self-attention for Image Recognition
Hengshuang Zhao
Jiaya Jia
V. Koltun
SSL
78
778
0
28 Apr 2020
Fixing the train-test resolution discrepancy: FixEfficientNet
Fixing the train-test resolution discrepancy: FixEfficientNet
Hugo Touvron
Andrea Vedaldi
Matthijs Douze
Hervé Jégou
AAML
223
110
0
18 Mar 2020
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang
Yukun Zhu
Bradley Green
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
3DPC
93
669
0
17 Mar 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
194
18,523
0
13 Feb 2020
Big Transfer (BiT): General Visual Representation Learning
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
211
1,196
0
24 Dec 2019
Axial Attention in Multidimensional Transformers
Axial Attention in Multidimensional Transformers
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
72
525
0
20 Dec 2019
Self-Supervised Learning of Video-Induced Visual Invariances
Self-Supervised Learning of Video-Induced Visual Invariances
Michael Tschannen
Josip Djolonga
Marvin Ritter
Aravindh Mahendran
Xiaohua Zhai
N. Houlsby
Sylvain Gelly
Mario Lucic
SSL
74
61
0
05 Dec 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
107
11,959
0
13 Nov 2019
Self-training with Noisy Student improves ImageNet classification
Self-training with Noisy Student improves ImageNet classification
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
211
2,375
0
11 Nov 2019
On the Relationship between Self-Attention and Convolutional Layers
On the Relationship between Self-Attention and Convolutional Layers
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
87
530
0
08 Nov 2019
A Large-scale Study of Representation Learning with the Visual Task
  Adaptation Benchmark
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Xiaohua Zhai
J. Puigcerver
Alexander Kolesnikov
P. Ruyssen
C. Riquelme
...
Michael Tschannen
Marcin Michalski
Olivier Bousquet
Sylvain Gelly
N. Houlsby
SSL
60
432
0
01 Oct 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
104
1,939
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
192
3,659
0
06 Aug 2019
Fixing the train-test resolution discrepancy
Fixing the train-test resolution discrepancy
Hugo Touvron
Andrea Vedaldi
Matthijs Douze
Hervé Jégou
102
423
0
14 Jun 2019
Stand-Alone Self-Attention in Vision Models
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLM
SLR
ViT
60
1,208
0
13 Jun 2019
Scaling Autoregressive Video Models
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
66
200
0
06 Jun 2019
Previous
123...222324
Next