Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.11929
Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
50 / 1,173 papers shown
Title
Subjective Face Transform using Human First Impressions
Chaitanya Roygaga
Joshua Krinsky
Kai Zhang
Kenny Kwok
Aparna Bharati
CVBM
66
0
0
27 Sep 2023
Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization
Yongyi Su
Xun Xu
Kui Jia
TTA
111
24
0
26 Sep 2023
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
Jing Zhu
Xiang Song
V. Ioannidis
Danai Koutra
Christos Faloutsos
103
14
0
25 Sep 2023
Associative Transformer
Yuwei Sun
H. Ochiai
Zhirong Wu
Stephen Lin
Ryota Kanai
ViT
86
0
0
22 Sep 2023
Ano-SuPs: Multi-size anomaly detection for manufactured products by identifying suspected patches
Hao Xu
Juan Du
Andi Wang
YingCong Chen
49
1
0
20 Sep 2023
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Xiao Fu
Shangzhan Zhang
Tianrun Chen
Yichong Lu
Xiaowei Zhou
Andreas Geiger
Yiyi Liao
3DPC
44
8
0
19 Sep 2023
Interpretability-Aware Vision Transformer
Yao Qiang
Chengyin Li
Prashant Khanduri
D. Zhu
ViT
137
7
0
14 Sep 2023
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox
Hai-Long Sun
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
CLL
175
29
0
13 Sep 2023
Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization
Habib Hajimolahoseini
Walid Ahmed
Yang Liu
OffRL
MQ
42
7
0
07 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jing Liu
124
31
0
27 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
71
4
0
18 Aug 2023
Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Y. Lin
Cong Liu
Yehansen Chen
Jinshui Hu
Bing Yin
Baocai Yin
Zengfu Wang
109
7
0
04 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
58
2
0
31 Jul 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
140
3
0
17 Jul 2023
TVPR: Text-to-Video Person Retrieval and a New Benchmark
Fan Ni
Xu Zhang
Jianhui Wu
Guan-Nan Dong
Aichun Zhu
Hui Liu
Yue Zhang
70
0
0
14 Jul 2023
Complementary Frequency-Varying Awareness Network for Open-Set Fine-Grained Image Recognition
Qiulei Dong
Hong Wang
Qiulei Dong
54
0
0
14 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
95
0
0
10 Jul 2023
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
117
15
0
07 Jul 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
57
3
0
03 Jul 2023
Review helps learn better: Temporal Supervised Knowledge Distillation
Dongwei Wang
Zhi Han
Yanmei Wang
Xi’ai Chen
Baichen Liu
Yandong Tang
96
1
0
03 Jul 2023
Long-Tailed Continual Learning For Visual Food Recognition
Jiangpeng He
Luotao Lin
Jack Ma
H. Eicher-Miller
Fengqing Zhu
Fengqing M Zhu
89
14
0
01 Jul 2023
Counting Guidance for High Fidelity Text-to-Image Synthesis
Wonjune Kang
Kevin Galim
H. Koo
Nam Ik Cho
DiffM
67
9
0
30 Jun 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
118
93
0
27 Jun 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
64
8
0
26 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
221
122
0
20 Jun 2023
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu
Liang Liao
Delin Chen
Jing Xiao
Zheng Wang
Chia-Wen Lin
Shiníchi Satoh
ViT
DiffM
73
6
0
20 Jun 2023
RedMotion: Motion Prediction via Redundancy Reduction
Royden Wagner
Omer Sahin Tas
Marvin Klemp
Carlos Fernandez Lopez
Christoph Stiller
85
8
0
19 Jun 2023
Variational Positive-incentive Noise: How Noise Benefits Models
Hongyuan Zhang
Si-Ying Huang
Yubin Guo
Xuelong Li
49
9
0
13 Jun 2023
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi
Roberto Amoroso
Marcella Cornia
Lorenzo Baraldi
Andrea Pilzer
Rita Cucchiara
89
2
0
12 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
78
195
0
12 Jun 2023
Transferring Foundation Models for Generalizable Robotic Manipulation
Jiange Yang
Wenhui Tan
Chuhao Jin
Keling Yao
Bei Liu
Jianlong Fu
Ruihua Song
Gangshan Wu
Limin Wang
LM&Ro
85
7
0
09 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
81
28
0
01 Jun 2023
Learning Task-preferred Inference Routes for Gradient De-conflict in Multi-output DNNs
Yi Sun
Xin Xu
Jiaqiang Li
Xiaochang Hu
Yifei Shi
L. Zeng
117
2
0
31 May 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
102
41
0
30 May 2023
VDD: Varied Drone Dataset for Semantic Segmentation
Wenxiao Cai
Ke Jin
Jinyan Hou
Cong Guo
Letian Wu
Wankou Yang
68
11
0
23 May 2023
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning
Haiwei Wu
Jiantao Zhou
Shile Zhang
139
30
0
23 May 2023
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction
Hao Wu
Wei Xion
Fan Xu
Xian-Sheng Hua
C. L. Philip Chen
Xiansheng Hua
AI4TS
114
28
0
19 May 2023
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
André O. Françani
Marcos R. O. A. Máximo
50
8
0
10 May 2023
Similarity of Neural Network Models: A Survey of Functional and Representational Measures
Max Klabunde
Tobias Schumacher
M. Strohmaier
Florian Lemmerich
97
67
0
10 May 2023
A vector quantized masked autoencoder for audiovisual speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
SSL
94
6
0
05 May 2023
Semantically Structured Image Compression via Irregular Group-Based Decoupling
V. Sheoran
Yixin Gao
Shreyansh Joshi
Tanisha R. Bhayani
Zhibo Chen
114
13
0
04 May 2023
An automated end-to-end deep learning-based framework for lung cancer diagnosis by detecting and classifying the lung nodules
Samiul Based Shuvo
Tasnia Binte Mamun
124
3
0
28 Apr 2023
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
Moayed Haji-Ali
Andrew Bond
Tolga Birdal
Duygu Ceylan
Levent Karacan
Erkut Erdem
Aykut Erdem
VGen
DiffM
159
2
0
12 Apr 2023
Discriminative Class Tokens for Text-to-Image Diffusion Models
Idan Schwartz
Vésteinn Snaebjarnarson
Hila Chefer
Ryan Cotterell
Serge Belongie
Lior Wolf
Sagie Benaim
53
9
0
30 Mar 2023
If At First You Don't Succeed: Test Time Re-ranking for Zero-shot, Cross-domain Retrieval
Finlay G. C. Hudson
W. Smith
ViT
79
1
0
30 Mar 2023
Visually Wired NFTs: Exploring the Role of Inspiration in Non-Fungible Tokens
Lucio La Cava
Davide Costa
Andrea Tagarelli
54
7
0
29 Mar 2023
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu
Pan Zhou
Shuicheng Yan
Xinchao Wang
85
125
0
29 Mar 2023
Boosting Convolution with Efficient MLP-Permutation for Volumetric Medical Image Segmentation
Yi Lin
Xiao Fang
Dong Zhang
Kwang-Ting Cheng
Hao Chen
MedIm
105
3
0
23 Mar 2023
Location-Free Scene Graph Generation
Ege Özsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
67
4
0
20 Mar 2023
SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation
Sharif Amit Kamran
Khondker Fariha Hossain
Alireza Tavakkoli
Salah A. Baker
S. Zuckerbrod
ViT
MedIm
53
1
0
16 Mar 2023
Previous
1
2
3
...
21
22
23
24
Next