ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11929
  4. Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
    ViT
ArXivPDFHTML

Papers citing "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"

50 / 1,173 papers shown
Title
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
89
1
0
15 Jan 2024
Always-Sparse Training by Growing Connections with Guided Stochastic Exploration
Always-Sparse Training by Growing Connections with Guided Stochastic Exploration
Mike Heddes
Narayan Srinivasa
T. Givargis
Alexandru Nicolau
162
0
0
12 Jan 2024
Latte: Latent Diffusion Transformer for Video Generation
Latte: Latent Diffusion Transformer for Video Generation
Xin Ma
Yaohui Wang
Gengyun Jia
Xinyuan Chen
Ziqiang Liu
Yuan-Fang Li
Cunjian Chen
Yu Qiao
DiffM
VGen
163
252
0
05 Jan 2024
Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing
Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing
Hugo Chan-To-Hing
B. Veeravalli
47
8
0
05 Jan 2024
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis
Qiuhui Chen
Yi Hong
MedIm
79
1
0
02 Jan 2024
Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution
Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution
Zeke Zexi Hu
Xiaoming Chen
Yuk Ying Chung
Yiran Shen
76
1
0
01 Jan 2024
Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation
Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation
Xianjie Liu
Keren Fu
Qijun Zhao
Qijun Zhao
VLM
72
1
0
30 Dec 2023
Morphing Tokens Draw Strong Masked Image Models
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
96
3
0
30 Dec 2023
3VL: Using Trees to Improve Vision-Language Models' Interpretability
3VL: Using Trees to Improve Vision-Language Models' Interpretability
Nir Yellinek
Leonid Karlinsky
Raja Giryes
CoGe
VLM
171
4
0
28 Dec 2023
Nighttime Person Re-Identification via Collaborative Enhancement Network with Multi-domain Learning
Nighttime Person Re-Identification via Collaborative Enhancement Network with Multi-domain Learning
Andong Lu
Tianrui Zha
Tianrui Zha
Jin Tang
Xiaofeng Wang
Bin Luo
90
2
0
25 Dec 2023
Leveraging Habitat Information for Fine-grained Bird Identification
Leveraging Habitat Information for Fine-grained Bird Identification
Tin Nguyen
Peijie Chen
Anh Totti Nguyen
VLM
65
0
0
22 Dec 2023
MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations
MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations
Tonmoy Hossain
Miaomiao Zhang
85
3
0
20 Dec 2023
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Yu Zhou
DiffM
95
1
0
19 Dec 2023
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Tianlin Li
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
97
3
0
18 Dec 2023
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
Hao Shao
Quansheng Zeng
Qibin Hou
Jufeng Yang
79
14
0
14 Dec 2023
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness
Xiaoyun Xu
Shujian Yu
Jingzheng Wu
S. Picek
AAML
59
0
0
08 Dec 2023
Auto-Vocabulary Semantic Segmentation
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
90
2
0
07 Dec 2023
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
70
4
0
05 Dec 2023
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Lukas Schäfer
Logan Jones
Anssi Kanervisto
Yuhan Cao
Tabish Rashid
Raluca Georgescu
David Bignell
Siddhartha Sen
Andrea Trevino Gavito
Sam Devlin
104
3
0
04 Dec 2023
Spectral-wise Implicit Neural Representation for Hyperspectral Image Reconstruction
Spectral-wise Implicit Neural Representation for Hyperspectral Image Reconstruction
Huan Chen
Wangcai Zhao
Tingfa Xu
Shiyun Zhou
Peifu Liu
Jianan Li
76
21
0
02 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
72
0
0
01 Dec 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
77
1
0
29 Nov 2023
StructRe: Rewriting for Structured Shape Modeling
StructRe: Rewriting for Structured Shape Modeling
Jiepeng Wang
Hao Pan
Yang Liu
Xin Tong
Taku Komura
Wenping Wang
70
0
0
29 Nov 2023
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai
Yuhang Liu
Zhen Zhang
Javen Qinfeng Shi
CLIP
VLM
70
8
0
28 Nov 2023
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
Yuxin Du
Fan Bai
Tiejun Huang
Bo Zhao
VLM
70
40
0
22 Nov 2023
Inspecting Explainability of Transformer Models with Additional Statistical Information
Inspecting Explainability of Transformer Models with Additional Statistical Information
Hoang C. Nguyen
Haeil Lee
Junmo Kim
ViT
39
3
0
19 Nov 2023
Progressive Feedback-Enhanced Transformer for Image Forgery Localization
Progressive Feedback-Enhanced Transformer for Image Forgery Localization
Haochen Zhu
Gang Cao
Xianglin Huang
ViT
59
7
0
15 Nov 2023
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
64
15
0
13 Nov 2023
AI-accelerated Discovery of Altermagnetic Materials
AI-accelerated Discovery of Altermagnetic Materials
Ze-Feng Gao
Shuai Qu
Bocheng Zeng
Yang Liu
Ji-Rong Wen
Hao Sun
Peng-Jie Guo
Zhong-Yi Lu
46
27
0
08 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
100
7
0
07 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
Xuzhe Dang
Stefan Edelkamp
87
4
0
06 Nov 2023
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values
Farnoosh Javadi
Walid Ahmed
Habib Hajimolahoseini
Foozhan Ataiefard
Mohammad Hassanpour
Saina Asani
Austin Wen
Omar Mohamed Awad
Kangling Liu
Yang Liu
VLM
72
8
0
06 Nov 2023
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices
Jianlei Yang
Jiacheng Liao
Fanding Lei
Meichen Liu
Junyi Chen
Lingkun Long
Han Wan
Bei Yu
Weisheng Zhao
MoE
67
2
0
03 Nov 2023
Tailoring Mixup to Data for Calibration
Tailoring Mixup to Data for Calibration
Quentin Bouniot
Pavlo Mozharovskyi
Florence dÁlché-Buc
88
1
0
02 Nov 2023
Advances in Embodied Navigation Using Large Language Models: A Survey
Advances in Embodied Navigation Using Large Language Models: A Survey
Jinzhou Lin
Han Gao
Xuxiang Feng
Rongtao Xu
Changwei Wang
Man Zhang
Li Guo
Shibiao Xu
LM&Ro
LLMAG
105
9
0
01 Nov 2023
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Meng Lou
Hong-Yu Zhou
Sibei Yang
Yizhou Yu
Chuan Wu
Yizhou Yu
ViT
67
37
0
30 Oct 2023
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Zijie Song
Zhenzhen Hu
Richang Hong
SSL
63
0
0
27 Oct 2023
netFound: Foundation Model for Network Security
netFound: Foundation Model for Network Security
Satyandra Guthula
Navya Battula
Roman Beltiukov
Wenbo Guo
Arpit Gupta
Inder Monga
61
16
0
25 Oct 2023
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Yixin Wu
Ning Yu
Michael Backes
Yun Shen
Yang Zhang
DiffM
76
8
0
25 Oct 2023
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
Md Rafi Ur Rashid
Vishnu Asutosh Dasu
Kang Gu
Najrin Sultana
Shagufta Mehnaz
AAML
FedML
74
11
0
24 Oct 2023
Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm
Tailoring Adversarial Attacks on Deep Neural Networks for Targeted Class Manipulation Using DeepFool Algorithm
S. M. Fazle
J. Mondal
Meem Arafat Manab
Xi Xiao
Sarfaraz Newaz
AAML
59
0
0
18 Oct 2023
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot
I. Redko
Anton Mallasto
Charlotte Laclau
Karol Arndt
Oliver Struckmeier
Markus Heinonen
Ville Kyrki
Samuel Kaski
103
2
0
17 Oct 2023
Federated Class-Incremental Learning with Prompting
Federated Class-Incremental Learning with Prompting
Jiale Liu
Yu-Wei Zhan
Chong-Yu Zhang
Xin Luo
Zhen-Duo Chen
Yinwei Wei
CLL
FedML
54
2
0
13 Oct 2023
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
Changze Lv
Tianlong Li
Changze Lv
Yufei Gu
Jianhan Xu
Cenyuan Zhang
Muling Wu
Xiaoqing Zheng
Xuanjing Huang
CLIP
VLM
69
3
0
10 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
Low-Resolution Self-Attention for Semantic Segmentation
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
117
3
0
08 Oct 2023
URLOST: Unsupervised Representation Learning without Stationarity or Topology
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Zeyu Yun
Juexiao Zhang
Bruno A. Olshausen
Yann LeCun
101
1
0
06 Oct 2023
PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification
PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification
Feihong He
Gang Li
Hui Xiong
VLM
ViT
80
2
0
05 Oct 2023
DataDAM: Efficient Dataset Distillation with Attention Matching
DataDAM: Efficient Dataset Distillation with Attention Matching
A. Sajedi
Samir Khaki
Ehsan Amjadian
Lucy Z. Liu
Y. Lawryshyn
Konstantinos N. Plataniotis
DD
98
64
0
29 Sep 2023
Diverse Target and Contribution Scheduling for Domain Generalization
Diverse Target and Contribution Scheduling for Domain Generalization
Shaocong Long
Qianyu Zhou
Soham Dan
Lizhuang Ma
Yuan Luo
97
8
0
28 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Graph-level Representation Learning with Joint-Embedding Predictive Architectures
Geri Skenderi
Hang Li
Jiliang Tang
Marco Cristani
AI4TS
GNN
80
4
0
27 Sep 2023
Previous
123...2021222324
Next