Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.11929
Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
50 / 1,173 papers shown
Title
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xuelong Li
Guangliang Cheng
Mamba
152
0
0
01 May 2025
UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
Linshan Wu
Yuxiang Nie
Sunan He
Jiaxin Zhuang
Hao Chen
...
V. Vardhanabhuti
R. Chan
Yifan Peng
Pranav Rajpurkar
Hao Chen
LM&MA
MedIm
113
0
0
30 Apr 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
160
0
0
30 Apr 2025
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution
Pietro Bongini
S. Mandelli
Andrea Montibeller
Mirko Casu
Orazio Pontorno
...
Paolo Bestagini
Irene Amerini
F. D. De Natale
Sebastiano Battiato
Mauro Barni
VLM
160
0
0
28 Apr 2025
IoT Botnet Detection: Application of Vision Transformer to Classification of Network Flow Traffic
Hassan Wasswa
Timothy Lynar
Aziida Nanyonga
Hussein Abbass
88
3
0
26 Apr 2025
Low-Rank Matrix Approximation for Neural Network Compression
Kalyan Cherukuri
Aarav Lala
54
0
0
25 Apr 2025
A Large Vision-Language Model based Environment Perception System for Visually Impaired People
Zezhou Chen
Zhaoxiang Liu
Ning Wang
Kohou Wang
Shiguo Lian
140
0
0
25 Apr 2025
DiMeR: Disentangled Mesh Reconstruction Model
Lutao Jiang
Jiantao Lin
Kanghao Chen
Wenhang Ge
Xin Yang
Yifan Jiang
Yuanhuiyi Lyu
Xu Zheng
Yinchuan Li
Yingcong Chen
3DV
101
3
0
24 Apr 2025
A Novel Hybrid Approach Using an Attention-Based Transformer + GRU Model for Predicting Cryptocurrency Prices
Esam Mahdi
C. Martin-Barreiro
X. Cabezas
AI4TS
97
0
0
23 Apr 2025
Hyper-Transforming Latent Diffusion Models
I. Peis
Batuhan Koyuncu
Isabel Valera
J. Frellsen
119
1
0
23 Apr 2025
ForesightNav: Learning Scene Imagination for Efficient Exploration
Hardik Shah
Jiaxu Xing
Nico Messikommer
Boyang Sun
Marc Pollefeys
Davide Scaramuzza
116
1
0
22 Apr 2025
Against Opacity: Explainable AI and Large Language Models for Effective Digital Advertising
Qi Yang
Marlo Ongpin
Sergey I. Nikolenko
Alfred Huang
Aleksandr Farseev
DiffM
OffRL
105
15
0
22 Apr 2025
DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining
Wei Zhuo
Zhiyue Tang
Wufeng Xue
Hao Ding
Linlin Shen
47
0
0
22 Apr 2025
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang
Xiaolu Liu
Lingdong Kong
Jianyun Xu
Chunyong Hu
Gongfan Fang
Wentong Li
Jianke Zhu
Xinchao Wang
68
0
0
22 Apr 2025
An Automated Pipeline for Few-Shot Bird Call Classification: A Case Study with the Tooth-Billed Pigeon
Abhishek Jana
Moeumu Uili
James Atherton
Mark O'Brien
Joe Wood
Leandra Brickson
119
0
0
22 Apr 2025
COBRA: Algorithm-Architecture Co-optimized Binary Transformer Accelerator for Edge Inference
Ye Qiao
Zhiheng Cheng
Yian Wang
Yifan Zhang
Yunzhe Deng
Sitao Huang
137
0
0
22 Apr 2025
Impact of Latent Space Dimension on IoT Botnet Detection Performance: VAE-Encoder Versus ViT-Encoder
Hassan Wasswa
Aziida Nanyonga
Timothy Lynar
DRL
86
4
0
21 Apr 2025
Latent Representations for Visual Proprioception in Inexpensive Robots
Sahara Sheikholeslami
Ladislau Bölöni
115
0
0
20 Apr 2025
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Guodong Shen
Yuqi Ouyang
Junru Lu
Yixuan Yang
Victor Sanchez
137
1
0
20 Apr 2025
ROI-Guided Point Cloud Geometry Compression Towards Human and Machine Vision
Xie Liang
Gao Wei
Zhenghui Ming
Li Ge
3DPC
68
1
0
19 Apr 2025
Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis
Zhu Zhu
Shuo Jiang
Jingyuan Zheng
Yawen Li
Yifei Chen
Manli Zhao
Weizhong Gu
Feiwei Qin
Jinhu Wang
Gang Yu
MedIm
116
0
0
18 Apr 2025
U-Shape Mamba: State Space Model for faster diffusion
Alex Ergasti
Filippo Botti
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
Mamba
106
1
0
18 Apr 2025
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki
Sonia Joseph
Ippei Fujisawa
Ryota Kanai
DiffM
72
0
0
18 Apr 2025
Probabilistic Stability Guarantees for Feature Attributions
Helen Jin
Anton Xue
Weiqiu You
Surbhi Goel
Eric Wong
67
0
0
18 Apr 2025
LimitNet: Progressive, Content-Aware Image Offloading for Extremely Weak Devices & Networks
A. Hojjat
Janek Haberer
Tayyaba Zainab
Olaf Landsiedel
65
3
0
18 Apr 2025
SC3EF: A Joint Self-Correlation and Cross-Correspondence Estimation Framework for Visible and Thermal Image Registration
Xi Tong
Xing Luo
Jiangxin Yang
Yanpeng Cao
66
0
0
17 Apr 2025
Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training
Xinsong Zhang
Yarong Zeng
Xinting Huang
Hu Hu
Runquan Xie
Han Hu
Zhanhui Kang
MLLM
VLM
139
1
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
168
5
0
17 Apr 2025
Human Aligned Compression for Robust Models
Samuel Räber
Andreas Plesner
Till Aczél
Roger Wattenhofer
AAML
82
0
0
16 Apr 2025
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
Aditya Prakash
Benjamin Lundell
Dmitry Andreychuk
David Forsyth
Saurabh Gupta
H. Sawhney
95
2
0
16 Apr 2025
FACT: Foundation Model for Assessing Cancer Tissue Margins with Mass Spectrometry
Mohammad Farahmand
A. Jamzad
Fahimeh Fooladgar
Laura Connolly
Martin Kaufmann
Kevin Yi Mi Ren
John Rudan
Doug McKay
Gabor Fichtinger
P. Mousavi
69
0
0
15 Apr 2025
Enhanced Small Target Detection via Multi-Modal Fusion and Attention Mechanisms: A YOLOv5 Approach
Xiaoxiao Ma
Junxiong Tong
68
0
0
15 Apr 2025
Enhancing Features in Long-tailed Data Using Large Vision Model
Pengxiao Han
Changkun Ye
Jinguang Tong
Cuicui Jiang
Jie Hong
Li Fang
Xuesong Li
VLM
144
0
0
15 Apr 2025
DMPT: Decoupled Modality-aware Prompt Tuning for Multi-modal Object Re-identification
Minghui Lin
Shu Wang
Xiang Wang
Jianhua Tang
Longbin Fu
Zhengrong Zuo
Nong Sang
VLM
125
0
0
15 Apr 2025
AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images
Yihang Liu
Lianghua He
Y. Wen
Longzhen Yang
Hongzhou Chen
MedIm
86
0
0
15 Apr 2025
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li
Yihua Zhang
Shuai Zhang
Ming Wang
Sijia Liu
Pin-Yu Chen
MoMe
127
5
0
15 Apr 2025
Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
Zhenyu Yang
Haiming Zhu
Rihui Zhang
Haipeng Zhang
Jianliang Wang
Chunhao Wang
Minbin Chen
F. Yin
MedIm
68
0
0
15 Apr 2025
MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems
Yibiao Wei
Jie Zou
Weikang Guo
Guoqing Wang
Xing Xu
Yang Yang
89
1
0
15 Apr 2025
Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Michal Balcerak
Tamaz Amiranashvili
Suprosanna Shit
Antonio Terpin
Lea Bogensperger
Sebastian Kaltenbach
Petros Koumoutsakos
Bjoern Menze
DiffM
93
2
0
14 Apr 2025
Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models
Teppei Suzuki
Keisuke Ozawa
VLM
108
0
0
14 Apr 2025
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Juntao Zhao
Qi Lu
Wei Jia
Borui Wan
Lei Zuo
...
Size Zheng
Yanghua Peng
H. Lin
Xin Liu
Chuan Wu
AI4CE
83
0
0
14 Apr 2025
A Model Zoo of Vision Transformers
Damian Falk
Léo Meynent
Florence Pfammatter
Konstantin Schurholt
Damian Borth
129
0
0
14 Apr 2025
Enhancing Wide-Angle Image Using Narrow-Angle View of the Same Scene
Hussain Md. Safwan
Mahbub Islam Mahim
72
0
0
13 Apr 2025
Uncertainty Guided Refinement for Fine-Grained Salient Object Detection
Yao Yuan
Pan Gao
Qun Dai
Jie Qin
Wei Xiang
122
0
0
13 Apr 2025
OmniMamba4D: Spatio-temporal Mamba for longitudinal CT lesion segmentation
Justin Namuk Kim
Yiqiao Liu
R. Soans
Keith Persson
Sarah Halek
M. Tomaszewski
Jianda Yuan
G. Goldmacher
Antong Chen
Mamba
352
0
0
13 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
82
0
0
12 Apr 2025
Multi-modal Reference Learning for Fine-grained Text-to-Image Retrieval
Zehong Ma
Hao Chen
Wei Zeng
Limin Su
Shiliang Zhang
AI4TS
85
0
0
10 Apr 2025
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed
Vishnu Boddeti
66
0
0
10 Apr 2025
Learning Optimal Prompt Ensemble for Multi-source Visual Prompt Transfer
Enming Zhang
Liwen Cao
Yanru Wu
Zijie Zhao
Guan Wang
Yang Li
82
0
0
09 Apr 2025
Crafting Query-Aware Selective Attention for Single Image Super-Resolution
Junyoung Kim
Youngrok Kim
Siyeol Jung
Donghyun Min
69
0
0
09 Apr 2025
Previous
1
2
3
4
5
6
...
22
23
24
Next