Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.11929
Cited By
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
22 October 2020
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
50 / 1,173 papers shown
Title
Transport-Related Surface Detection with Machine Learning: Analyzing Temporal Trends in Madrid and Vienna
Miguel Ureña Pliego
Rubén Martínez Marín
Nianfang Shi
Takeru Shibayama
Ulrich Leth
Miguel Marchamalo Sacristán
134
0
0
19 Mar 2025
FedLWS: Federated Learning with Adaptive Layer-wise Weight Shrinking
Changlong Shi
Jinmeng Li
He Zhao
D. Guo
Yi Chang
FedML
79
0
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
105
0
0
19 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
124
5
0
19 Mar 2025
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Pietro Michiardi
148
0
0
18 Mar 2025
Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
Huan Ren
Wenfei Yang
Xiang Liu
Shifeng Zhang
Tianzhu Zhang
107
2
0
18 Mar 2025
Robust Weight Imprinting: Insights from Neural Collapse and Proxy-Based Aggregation
Justus Westerhoff
Golzar Atefi
Mario Koddenbrock
Alexei Figueroa
Alexander Loser
Erik Rodner
Felix Alexader Gers
OffRL
81
0
0
18 Mar 2025
Improving Generalization of Universal Adversarial Perturbation via Dynamic Maximin Optimization
Yize Zhang
Yingzhe Xu
Junyu Shi
L. Zhang
Shengshan Hu
Minghui Li
Yanjun Zhang
AAML
113
1
0
17 Mar 2025
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
334
0
0
17 Mar 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
103
0
0
17 Mar 2025
MTGS: Multi-Traversal Gaussian Splatting
Tianyu Li
Yihang Qiu
Zhenhua Wu
Carl Lindström
Peng Su
Matthias Nießner
Hongyang Li
3DGS
179
0
0
16 Mar 2025
Segment Any-Quality Images with Generative Latent Space Enhancement
Guangqian Guo
Yoong Guo
Xuehui Yu
Wenbo Li
Yaoxing Wang
Shan Gao
VLM
121
0
0
16 Mar 2025
VeriMind: Agentic LLM for Automated Verilog Generation with a Novel Evaluation Metric
Bardia Nadimi
Ghali Omar Boutaib
Hao Zheng
77
2
0
15 Mar 2025
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Ming Cheng
Yanjie Wang
Deying Li
Chenhui Gou
Jianfei Cai
3DPC
103
0
0
15 Mar 2025
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du Chen
Tianhe Wu
Kede Ma
Lei Zhang
52
3
0
14 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffM
MU
99
6
0
14 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
Heikki Kälviäinen
SSL
363
0
0
14 Mar 2025
GMG: A Video Prediction Method Based on Global Focus and Motion Guided
Yuhao Du
Hui Liu
Haoxiang Peng
Xinyuan Chen
Chenrong Wu
Jiawei Zhang
135
0
0
14 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
117
0
0
14 Mar 2025
APLA: A Simple Adaptation Method for Vision Transformers
Moein Sorkhei
Emir Konuk
Kevin Smith
Christos Matsoukas
79
0
0
14 Mar 2025
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde
Tassilo Wald
Tobias Schumacher
Klaus H. Maier-Hein
Markus Strohmaier
Adriana Iamnitchi
AI4TS
VLM
152
5
0
13 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
Presented at
ResearchTrend Connect | VLM
on
14 Mar 2025
178
1
0
13 Mar 2025
Data-driven tool wear prediction in milling, based on a process-integrated single-sensor approach
Eric Hirsch
Christian Friedrich
103
0
0
13 Mar 2025
Poly-MgNet: Polynomial Building Blocks in Multigrid-Inspired ResNets
Antonia van Betteray
Matthias Rottmann
Karsten Kahl
114
0
0
13 Mar 2025
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen
Rui Qian
Wenze Hu
Tsu-Jui Fu
Jialing Tong
...
Lezhi Li
Bowen Zhang
Alex Schwing
Wei Liu
Yue Yang
84
0
0
13 Mar 2025
EFC++: Elastic Feature Consolidation with Prototype Re-balancing for Cold Start Exemplar-free Incremental Learning
Simone Magistri
Tomaso Trinci
Albin Soutif--Cormerais
Joost van de Weijer
Andrew D. Bagdanov
73
0
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
87
1
0
13 Mar 2025
Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion
Dikai Liu
Tianwei Zhang
Jianxiong Yin
Simon See
154
1
0
13 Mar 2025
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
Chiara Cappellino
Gianluca Mancusi
Matteo Mosconi
Angelo Porrello
Simone Calderara
Rita Cucchiara
ObjD
VLM
116
0
0
12 Mar 2025
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou
Mao Ye
Shuaifeng Li
Nianxin Li
Xiatian Zhu
Lei Deng
Hongbin Liu
Zhen Lei
BDL
VLM
TTA
131
1
0
12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
Chong Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
160
0
0
12 Mar 2025
Robust Multimodal Survival Prediction with the Latent Differentiation Conditional Variational AutoEncoder
Junjie Zhou
Jiao Tang
Yingli Zuo
Peng Wan
Daoqiang Zhang
Wei Shao
125
1
0
12 Mar 2025
RFUAV: A Benchmark Dataset for Unmanned Aerial Vehicle Detection and Identification
Rui Shi
Xiaodong Yu
Shengming Wang
Yijia Zhang
Lu Xu
Peng Pan
Chunlai Ma
80
0
0
12 Mar 2025
Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging
Minjae Chung
Jong Bum Won
Ganghyun Kim
Yujin Kim
Utku Ozbulak
MedIm
131
0
0
12 Mar 2025
A Siamese Network to Detect If Two Iris Images Are Monozygotic
Yongle Yuan
Kevin W. Bowyer
73
0
0
12 Mar 2025
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo
Xiaodong Gu
VLM
OffRL
364
3
0
11 Mar 2025
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths
Maryam Haghighat
Simon Denman
Clinton Fookes
Milad Ramezani
3DPC
83
0
0
11 Mar 2025
Attention Hijackers: Detect and Disentangle Attention Hijacking in LVLMs for Hallucination Mitigation
Beitao Chen
Xinyu Lyu
Lianli Gao
Jingkuan Song
Jikang Cheng
110
1
0
11 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
Xianrui Li
Jason Kuen
Hong Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe Lin
Marios Savvides
83
1
0
11 Mar 2025
MaRI: Material Retrieval Integration across Domains
Jianhui Wang
Zhifei Yang
Yangfan He
Huixiong Zhang
Yuxuan Chen
Jingwei Huang
122
2
0
11 Mar 2025
ALLVB: All-in-One Long Video Understanding Benchmark
Xichen Tan
Yuanjing Luo
Yunfan Ye
Fang Liu
Zhiping Cai
MLLM
VLM
104
0
0
10 Mar 2025
Universal Incremental Learning: Mitigating Confusion from Inter- and Intra-task Distribution Randomness
Sheng Luo
Yi Zhou
Tao Zhou
CLL
115
0
0
10 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
120
0
0
10 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
333
5
0
10 Mar 2025
A Quantitative Evaluation of the Expressivity of BMI, Pose and Gender in Body Embeddings for Recognition and Identification
Basudha Pal
Siyuan
Huang
116
0
0
09 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Zhenpeng Chen
Chunwei Wang
Xiuwei Chen
Hongbin Xu
Jiawei Han
Xiandan Liang
J. N. Han
Hang Xu
Xiaodan Liang
VLM
90
1
0
09 Mar 2025
Enhancing Layer Attention Efficiency through Pruning Redundant Retrievals
Hanze Li
Xiande Huang
90
0
0
09 Mar 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
134
0
0
08 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
69
0
0
07 Mar 2025
Quantum-PEFT: Ultra parameter-efficient fine-tuning
Toshiaki Koike-Akino
F. Tonin
Yongtao Wu
Frank Zhengqing Wu
Leyla Naz Candogan
Volkan Cevher
MQ
128
5
0
07 Mar 2025
Previous
1
2
3
4
5
6
...
22
23
24
Next