Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.11929
Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
50 / 1,173 papers shown
Title
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
69
0
0
07 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
64
0
0
06 Mar 2025
Fine-Tuning Florence2 for Enhanced Object Detection in Un-constructed Environments: Vision-Language Model Approach
Soumyadeep Ro
Sanapala Satwika
Pamarthi Yasoda Gayathri
Mohmmad Ghaith Balsha
Aysegul Ucar
VLM
ObjD
87
0
0
06 Mar 2025
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models
Shengzhuang Chen
Yikai Liao
Xiaoxiao Sun
Kede Ma
Ying Wei
98
0
0
06 Mar 2025
ISP-AD: A Large-Scale Real-World Dataset for Advancing Industrial Anomaly Detection with Synthetic and Real Defects
Paul J. Krassnig
Dieter P. Gruber
381
0
0
06 Mar 2025
Self is the Best Learner: CT-free Ultra-Low-Dose PET Organ Segmentation via Collaborating Denoising and Segmentation Learning
Zanting Ye
Xiaolong Niu
Xuanbin Wu
Wantong Lu
Lijun Lu
64
0
0
05 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
103
8
0
05 Mar 2025
Is Pre-training Applicable to the Decoder for Dense Prediction?
Chao Ning
Wanshui Gan
Weihao Xuan
Naoto Yokoya
157
0
0
05 Mar 2025
Exploring Token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation
Dengke Zhang
Quan Tang
Fagui Liu
C. L. Philip Chen
Haiqing Mei
ViT
160
0
0
04 Mar 2025
Rapid Bone Scintigraphy Enhancement via Semantic Prior Distillation from Segment Anything Model
Pengchen Liang
Leijun Shi
Huiping Yao
Bin Pu
Jianguo Chen
...
Zheyu Chen
Zhaozhao Xu
Lite Xu
Qing Chang
Yiwei Li
96
0
0
04 Mar 2025
On the Relationship Between Double Descent of CNNs and Shape/Texture Bias Under Learning Process
Shun Iwase
Shuya Takahashi
Nakamasa Inoue
Rio Yokota
Ryo Nakamura
Hirokatsu Kataoka
123
0
0
04 Mar 2025
Creating Sorted Grid Layouts with Gradient-based Optimization
Kai Uwe Barthel
Florian Barthel
Peter Eisert
Nico Hezel
Konstantin Schall
158
1
0
04 Mar 2025
AirRoom: Objects Matter in Room Reidentification
Runmao Yao
Yi Du
Zhuoqun Chen
Haoze Zheng
Chen Wang
69
0
0
03 Mar 2025
Enhancing Retinal Vessel Segmentation Generalization via Layout-Aware Generative Modelling
Jonathan Fhima
Jan Van Eijgen
Lennert Beeckmans
Thomas Jacobs
Moti Freiman
Luis Filipe Nakayama
Ingeborg Stalmans
Chaim Baskin
Joachim A. Behar
MedIm
125
0
0
03 Mar 2025
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling
Yachen Chang
Hailiang Zhao
Xinkui Zhao
Kingsum Chow
Shuiguang Deng
OODD
100
0
0
01 Mar 2025
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo
Sol Namkung
Sunwoo Lee
Inho Jeong
Beomseok Kim
Dongsuk Jeon
72
0
0
28 Feb 2025
MFSR-GAN: Multi-Frame Super-Resolution with Handheld Motion Modeling
Fadeel Sher Khan
Joshua Ebenezer
Hamid Sheikh
Seok-Jun Lee
100
0
0
28 Feb 2025
Less is More? Revisiting the Importance of Frame Rate in Real-Time Zero-Shot Surgical Video Segmentation
Utku Ozbulak
Seyed Amir Mousavi
Francesca Tozzi
Nikdokht Rashidian
W. Willaert
W. D. Neve
J. Vankerschaver
58
0
0
28 Feb 2025
Unified Video Action Model
Shuang Li
Yihuai Gao
Dorsa Sadigh
Shuran Song
VGen
98
4
0
28 Feb 2025
Sensor-Invariant Tactile Representation
Harsh Gupta
Yuchen Mo
Shengmiao Jin
Wenzhen Yuan
73
3
0
27 Feb 2025
Adversarial Robustness in Parameter-Space Classifiers
Tamir Shor
Ethan Fetaya
Chaim Baskin
A. Bronstein
AAML
OOD
392
0
0
27 Feb 2025
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou
Tammy Riklin-Raviv
96
1
0
27 Feb 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
196
1
0
27 Feb 2025
Training Large Neural Networks With Low-Dimensional Error Feedback
Maher Hanut
Jonathan Kadmon
78
1
0
27 Feb 2025
A Lightweight and Extensible Cell Segmentation and Classification Model for Whole Slide Images
N. Shvetsov
T. Kilvaer
M. Tafavvoghi
Anders Sildnes
Kajsa Møllersen
Lill-ToveRasmussen Busund
L. A. Bongo
VLM
87
1
0
26 Feb 2025
Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation
Tianyang Xu
Jiyong Rao
Xiaoning Song
Zhenhua Feng
Xiao Wu
ViT
144
1
0
25 Feb 2025
FLARE: A Framework for Stellar Flare Forecasting using Stellar Physical Properties and Historical Records
Bingke Zhu
Xiaoxiao Wang
Minghui Jia
Yihan Tao
Xiao Kong
Ali Luo
Yingying Chen
Ming Tang
Jinqiao Wang
67
0
0
25 Feb 2025
Examining the Threat Landscape: Foundation Models and Model Stealing
Ankita Raj
Deepankar Varma
Chetan Arora
AAML
178
1
0
25 Feb 2025
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
Zhirui Kuai
Liu Yang
Huiyu Duan
Yuxing Han
Guoyu Tang
P. Callet
105
2
0
24 Feb 2025
Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructions
Yuhan Fu
Ruobing Xie
Jiazhen Liu
Bangxiang Lan
Xingwu Sun
Zhanhui Kang
Xirong Li
VLM
LRM
MLLM
67
0
0
24 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
66
10
0
24 Feb 2025
On the Vulnerability of Concept Erasure in Diffusion Models
Lucas Beerens
Alex D. Richardson
Peng Sun
Dongdong Chen
DiffM
125
2
0
24 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
133
46
0
24 Feb 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
174
2
0
24 Feb 2025
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Guillaume Jeanneret
Loïc Simon
F. Jurie
ViT
112
0
0
24 Feb 2025
CalibRefine: Deep Learning-Based Online Automatic Targetless LiDAR-Camera Calibration with Iterative and Attention-Driven Post-Refinement
Lei Cheng
Lihao Guo
Tianya Zhang
Tam Bang
Austin Harris
Mustafa Hajij
Mina Sartipi
Siyang Cao
56
0
0
24 Feb 2025
Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model
Yaxuan Huang
Xili Dai
Jianan Wang
Xianbiao Qi
Yixing Yuan
Xiangyu Yue
74
0
0
24 Feb 2025
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
135
4
0
24 Feb 2025
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models
Yibo Zhong
Haoxiang Jiang
Lincan Li
Ryumei Nakada
Tianci Liu
Linjun Zhang
Huaxiu Yao
Haoyu Wang
167
2
0
24 Feb 2025
Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models
Andrew DiGiugno
Ausif Mahmood
65
0
0
24 Feb 2025
Optimizing Estimators of Squared Calibration Errors in Classification
Sebastian G. Gruber
Francis Bach
152
2
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
155
8
0
24 Feb 2025
FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
Jason Jingzhou Liu
Yulong Li
Kenneth Shaw
Tony Tao
Ruslan Salakhutdinov
Deepak Pathak
OffRL
102
1
0
24 Feb 2025
GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis
Anand Kumar
Kavinder Roghit Kanthen
Josna John
3DGS
144
0
0
23 Feb 2025
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Yongqi Dong
Xingmin Lu
Ruohan Li
Wei Song
B. Arem
Haneen Farah
ViT
139
1
0
21 Feb 2025
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
Yuming Chen
Xinbin Yuan
Ruiqi Wu
Jiabao Wang
Qibin Hou
Mingg-Ming Cheng
Ming-Ming Cheng
ObjD
236
52
0
21 Feb 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
114
18
0
21 Feb 2025
Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation
Rongzhao He
Weihao Zheng
Leilei Zhao
Ying Wang
Dalin Zhu
Dan Wu
Bin Hu
Mamba
120
0
0
21 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
131
0
0
20 Feb 2025
DiffGuard: Text-Based Safety Checker for Diffusion Models
Massine El Khader
Elias Al Bouzidi
Abdellah Oumida
Mohammed Sbaihi
Eliott Binard
Jean-Philippe Poli
Wassila Ouerdane
Boussad Addad
Katarzyna Kapusta
DiffM
169
0
0
20 Feb 2025
Previous
1
2
3
...
5
6
7
...
22
23
24
Next