ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.11929
  4. Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
    ViT
ArXivPDFHTML

Papers citing "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"

50 / 1,173 papers shown
Title
Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation
Anomalies by Synthesis: Anomaly Detection using Generative Diffusion Models for Off-Road Navigation
Siddharth Ancha
Sunshine Jiang
Travis Manderson
Laura Brandt
Yilun Du
Philip R. Osteen
Nicholas Roy
103
0
0
28 May 2025
Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning
Occlusion Boundary and Depth: Mutual Enhancement via Multi-Task Learning
Lintao Xu
Yinghao Wang
Chaohui Wang
MDE
98
0
0
27 May 2025
Locality-Aware Zero-Shot Human-Object Interaction Detection
Locality-Aware Zero-Shot Human-Object Interaction Detection
Sanghyun Kim
Deunsol Jung
Minsu Cho
VLM
109
0
0
26 May 2025
A Contrastive Learning Foundation Model Based on Perfectly Aligned Sample Pairs for Remote Sensing Images
A Contrastive Learning Foundation Model Based on Perfectly Aligned Sample Pairs for Remote Sensing Images
Hengtong Shen
Haiyan Gu
Haitao Li
Yi Yang
Agen qiu
SSL
80
0
0
26 May 2025
Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation
Zero-Shot Pseudo Labels Generation Using SAM and CLIP for Semi-Supervised Semantic Segmentation
Nagito Saito
Shintaro Ito
Koichi Ito
T. Aoki
VLM
MedIm
63
0
0
26 May 2025
TESSER: Transfer-Enhancing Adversarial Attacks from Vision Transformers via Spectral and Semantic Regularization
TESSER: Transfer-Enhancing Adversarial Attacks from Vision Transformers via Spectral and Semantic Regularization
Amira Guesmi
B. Ouni
Muhammad Shafique
AAML
115
0
0
26 May 2025
Absolute Coordinates Make Motion Generation Easy
Absolute Coordinates Make Motion Generation Easy
Zichong Meng
Zeyu Han
Xiaogang Peng
Yiming Xie
Huaizu Jiang
65
0
0
26 May 2025
Holistic White-light Polyp Classification via Alignment-free Dense Distillation of Auxiliary Optical Chromoendoscopy
Holistic White-light Polyp Classification via Alignment-free Dense Distillation of Auxiliary Optical Chromoendoscopy
Qiang Hu
Qimei Wang
Jia Chen
Xuantao Ji
Qiang Li
Zhiwei Wang
93
0
0
25 May 2025
CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation
CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation
Jiong Wu
Yang Xing
Boxiao Yu
Wei Shao
Kuang Gong
MedIm
64
0
0
25 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
97
0
0
25 May 2025
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization
Yunxin Li
Xinyu Chen
Zitao Li
Zhenyu Liu
L. Wang
Wenhan Luo
Baotian Hu
Min Zhang
OffRL
LRM
32
0
0
25 May 2025
From Single Images to Motion Policies via Video-Generation Environment Representations
From Single Images to Motion Policies via Video-Generation Environment Representations
Weiming Zhi
Ziyong Ma
Tianyi Zhang
Matthew Johnson-Roberson
VGen
3DV
42
0
0
25 May 2025
VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
Tina Khezresmaeilzadeh
Parsa Razmara
Seyedarmin Azizi
Mohammad Erfan Sadeghi
Erfan Baghaei Portaghloo
AI4TS
119
0
0
24 May 2025
Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning
Junlin Wang
Zhiyun Lin
500
0
0
24 May 2025
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning
Yisu Wang
Ruilong Wu
Xinjiao Li
Dirk Kutscher
93
0
0
24 May 2025
Semantic segmentation with reward
Semantic segmentation with reward
Xie Ting
Ye Huang
Zhilin Liu
Lixin Duan
149
0
0
23 May 2025
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems
FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems
N. Benjamin Erichson
Vinicius Mikuni
Dongwei Lyu
Yang Gao
Omri Azencot
Soon Hoe Lim
Michael W. Mahoney
AI4CE
474
0
0
23 May 2025
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving
Shuang Zeng
Xinyuan Chang
Mengwei Xie
Xinran Liu
Yifan Bai
Zheng Pan
Mu Xu
Xing Wei
LRM
68
0
0
23 May 2025
BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching
BOTM: Echocardiography Segmentation via Bi-directional Optimal Token Matching
Zhihua Liu
Lei Tong
Xilin He
Che Liu
Rossella Arcucci
Chen Jin
Huiyu Zhou
66
0
0
23 May 2025
An Attention Infused Deep Learning System with Grad-CAM Visualization for Early Screening of Glaucoma
An Attention Infused Deep Learning System with Grad-CAM Visualization for Early Screening of Glaucoma
Ramanathan Swaminathan
18
0
0
23 May 2025
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
A. Fuller
Yousef Yassin
Junfeng Wen
Daniel G. Kyrollos
Tarek Ibrahim
James R. Green
Evan Shelhamer
ViT
80
0
0
23 May 2025
CONCORD: Concept-Informed Diffusion for Dataset Distillation
CONCORD: Concept-Informed Diffusion for Dataset Distillation
Jianyang Gu
Haonan Wang
Ruoxi Jia
Saeed Vahidian
Vyacheslav Kungurtsev
Wei Jiang
Yiran Chen
DiffM
DD
496
0
0
23 May 2025
Are GNNs Worth the Effort for IoT Botnet Detection? A Comparative Study of VAE-GNN vs. ViT-MLP and VAE-MLP Approaches
Are GNNs Worth the Effort for IoT Botnet Detection? A Comparative Study of VAE-GNN vs. ViT-MLP and VAE-MLP Approaches
Hassan Wasswa
Hussein Abbass
Timothy Lynar
18
0
0
23 May 2025
Learning Generalized and Flexible Trajectory Models from Omni-Semantic Supervision
Learning Generalized and Flexible Trajectory Models from Omni-Semantic Supervision
Yuanshao Zhu
James Jianqiao Yu
Xiangyu Zhao
Xiao Han
Qidong Liu
Xuetao Wei
Yuxuan Liang
25
0
0
23 May 2025
Soft-CAM: Making black box models self-explainable for high-stakes decisions
K. Djoumessi
Philipp Berens
FAtt
BDL
93
0
0
23 May 2025
Dual Attention Residual U-Net for Accurate Brain Ultrasound Segmentation in IVH Detection
Dual Attention Residual U-Net for Accurate Brain Ultrasound Segmentation in IVH Detection
Dan Yuan
Yi Feng
Ziyun Tang
88
0
0
23 May 2025
CENet: Context Enhancement Network for Medical Image Segmentation
CENet: Context Enhancement Network for Medical Image Segmentation
Afshin Bozorgpour
Sina Ghorbani Kolahi
Reza Azad
Ilker Hacihaliloglu
Dorit Merhof
68
0
0
23 May 2025
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
Savya Khosla
Sethuraman TV
Barnett Lee
Alexander Schwing
Derek Hoiem
VGen
50
0
0
23 May 2025
Adaptive Semantic Token Communication for Transformer-based Edge Inference
Adaptive Semantic Token Communication for Transformer-based Edge Inference
Alessio Devoto
Jary Pomponi
Mattia Merluzzi
Paolo Di Lorenzo
Simone Scardapane
94
0
0
23 May 2025
Taming Diffusion for Dataset Distillation with High Representativeness
Taming Diffusion for Dataset Distillation with High Representativeness
Lin Zhao
Yushu Wu
Xinru Jiang
Jianyang Gu
Yanzhi Wang
Xiaolin Xu
Pu Zhao
Xue Lin
DD
114
0
0
23 May 2025
SVL: Spike-based Vision-language Pretraining for Efficient 3D Open-world Understanding
Xuerui Qiu
Peixi Wu
Yaozhi Wen
Shaowei Gu
Yuqi Pan
Xinhao Luo
Bo Xu
Guoqi Li
VLM
95
0
0
23 May 2025
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Slot-MLLM: Object-Centric Visual Tokenization for Multimodal LLM
Donghwan Chi
Hyomin Kim
Yoonjin Oh
Yongjin Kim
Donghoon Lee
DaeJin Jo
Jongmin Kim
Junyeob Baek
Sungjin Ahn
Sungwoong Kim
MLLM
VLM
226
0
0
23 May 2025
EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion
EVM-Fusion: An Explainable Vision Mamba Architecture with Neural Algorithmic Fusion
Zichuan Yang
118
0
0
23 May 2025
Towards VM Rescheduling Optimization Through Deep Reinforcement Learning
Xianzhong Ding
Yunkai Zhang
Binbin Chen
Donghao Ying
Tieying Zhang
Jianjun Chen
Lei Zhang
Alberto Cerpa
Wan Du
VLM
56
1
0
23 May 2025
Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning
Locality-Sensitive Hashing for Efficient Hard Negative Sampling in Contrastive Learning
Fabian Deuser
Philipp Hausenblas
Hannah Schieber
Daniel Roth
Martin Werner
Norbert Oswald
78
0
0
23 May 2025
Evolving Machine Learning: A Survey
Ignacio Cabrera Martin
Subhaditya Mukherjee
Almas Baimagambetov
Joaquin Vanschoren
Nikolaos Polatidis
VLM
102
0
0
23 May 2025
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation
Li Zhong
Ahmed Ghazal
Jun-Jun Wan
Frederik Zilly
Patrick Mackens
Joachim E. Vollrath
Bogdan Sorin Coseriu
97
0
0
23 May 2025
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints
PawPrint: Whose Footprints Are These? Identifying Animal Individuals by Their Footprints
Inpyo Song
Hyemin Hwang
Jangwon Lee
72
0
0
23 May 2025
HiLAB: A Hybrid Inverse-Design Framework
HiLAB: A Hybrid Inverse-Design Framework
Reza Marzban
Hamed Abiri
Raphael Pestourie
Ali Adibi
29
0
0
23 May 2025
Transformer brain encoders explain human high-level visual responses
Transformer brain encoders explain human high-level visual responses
Hossein Adeli
Minni Sun
N. Kriegeskorte
100
0
0
22 May 2025
Approach to Finding a Robust Deep Learning Model
Approach to Finding a Robust Deep Learning Model
Alexey Boldyrev
Fedor Ratnikov
Andrey Shevelev
OOD
86
0
0
22 May 2025
SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models
SuperPure: Efficient Purification of Localized and Distributed Adversarial Patches via Super-Resolution GAN Models
Hossein Khalili
Seongbin Park
Venkat Bollapragada
Nader Sehatbakhsh
AAML
100
0
0
22 May 2025
Auto-nnU-Net: Towards Automated Medical Image Segmentation
Auto-nnU-Net: Towards Automated Medical Image Segmentation
Jannis Becktepe
Leona Hennig
Steffen Oeltze-Jafra
Marius Lindauer
100
0
0
22 May 2025
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
AnchorFormer: Differentiable Anchor Attention for Efficient Vision Transformer
Jiquan Shan
Junxiao Wang
Lifeng Zhao
Liang Cai
Hongyuan Zhang
Ioannis Liritzis
ViT
98
0
0
22 May 2025
Stronger ViTs With Octic Equivariance
Stronger ViTs With Octic Equivariance
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
ViT
99
0
0
21 May 2025
TAGS: 3D Tumor-Adaptive Guidance for SAM
TAGS: 3D Tumor-Adaptive Guidance for SAM
Sirui Li
Linkai Peng
Zheyuan Zhang
Gorkem Durak
Ulas Bagci
MedIm
VLM
84
0
0
21 May 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Junlin Li
Guodong DU
Jing Li
Sim Kuan Goh
Wenya Wang
...
Fangming Liu
Jing Li
Saleh Alharbi
Daojing He
Min Zhang
MoMe
CLL
52
1
0
21 May 2025
HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning
HAMF: A Hybrid Attention-Mamba Framework for Joint Scene Context Understanding and Future Motion Representation Learning
Xiaodong Mei
Sheng Wang
Jie Cheng
Yingbing Chen
Dan Xu
Mamba
87
0
0
21 May 2025
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano
Hiroshi Kera
Toshihiko Yamasaki
AAML
54
0
0
20 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
54
0
0
20 May 2025
1234...222324
Next