Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.10972
Cited By
v1
v2
v3
v4 (latest)
ImageNet-21K Pretraining for the Masses
22 April 2021
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Github (765★)
Papers citing
"ImageNet-21K Pretraining for the Masses"
50 / 427 papers shown
Title
When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class
Yujin Kim
H. Kim
Hyunwoo J.Kim
S. Kim
15
0
0
18 Jun 2025
PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers
Lukas Schiesser
Cornelius Wolff
Sophie Haas
Simon Pukrop
VLM
18
0
0
16 Jun 2025
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu
OffRL
70
0
0
13 Jun 2025
Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM
Olivier Papillon
Rafik Goubran
James Green
Julien Larivière-Chartier
Caitlin Higginson
Frank Knoefel
Rébecca Robillard
16
0
0
07 Jun 2025
Textile Analysis for Recycling Automation using Transfer Learning and Zero-Shot Foundation Models
Yannis Spyridis
Vasileios Argyriou
18
0
0
06 Jun 2025
PAID: Pairwise Angular-Invariant Decomposition for Continual Test-Time Adaptation
Kunyu Wang
Xueyang Fu
Yunfei Bao
Chengjie Ge
Chengzhi Cao
Wei-dong Zhai
Zheng-jun Zha
71
0
0
03 Jun 2025
Long-Tailed Visual Recognition via Permutation-Invariant Head-to-Tail Feature Fusion
Mengke Li
Zhikai Hu
Yang Lu
Weichao Lan
Y. Cheung
Hui Huang
36
0
0
31 May 2025
Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks
Paul Hofman
Yusuf Sale
Eyke Hüllermeier
UQCV
23
0
0
28 May 2025
From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance
Maximilian Dreyer
Lorenz Hufe
J. Berend
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
42
0
0
26 May 2025
SGD-Mix: Enhancing Domain-Specific Image Classification with Label-Preserving Data Augmentation
Yixuan Dong
Fang-Yi Su
Jung-Hsien Chiang
DiffM
60
0
0
17 May 2025
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Jie Zhu
Jirong Zha
Ding Li
Leye Wang
131
1
0
15 May 2025
FalconWing: An Open-Source Platform for Ultra-Light Fixed-Wing Aircraft Research
Yan Miao
Will Shen
Hang Cui
Sayan Mitra
126
0
0
02 May 2025
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution
Pietro Bongini
S. Mandelli
Andrea Montibeller
Mirko Casu
Orazio Pontorno
...
Paolo Bestagini
Irene Amerini
F. D. De Natale
Sebastiano Battiato
Mauro Barni
VLM
278
0
0
28 Apr 2025
POET: Prompt Offset Tuning for Continual Human Action Adaptation
Prachi Garg
Joseph K J
V. Balasubramanian
Necati Cihan Camgöz
Chengde Wan
Kenrick Kin
Weiguang Si
Shugao Ma
Fernando de la Torre
131
0
0
25 Apr 2025
Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
Zhenyu Yang
Haiming Zhu
Rihui Zhang
Haipeng Zhang
Jianliang Wang
Chunhao Wang
Minbin Chen
F. Yin
MedIm
134
0
0
15 Apr 2025
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li
Xingxuan Zhang
Hao Zou
Yige Guo
Renzhe Xu
Yilong Liu
Chuzhao Zhu
Yue He
Peng Cui
VLM
93
0
0
14 Apr 2025
Deep Learning Methods for Detecting Thermal Runaway Events in Battery Production Lines
Athanasios Athanasopoulos
Matúš Mihalák
Marcin Pietrasik
85
0
0
11 Apr 2025
Learning Object Focused Attention
Vivek Trivedy
A. Almalki
Longin Jan Latecki
86
0
0
10 Apr 2025
EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture
Wenfeng Feng
Guoying Sun
83
0
0
09 Apr 2025
Contour Integration Underlies Human-Like Vision
Ben Lonnqvist
Elsa Scialom
Abdülkadir Gökce
Zehra Merchant
Michael H. Herzog
Martin Schrimpf
VLM
91
1
0
07 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge Belongie
Ryan Cotterell
Nico Lang
Stella Frank
92
2
0
07 Apr 2025
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian
Jun Li
Jinpeng Wang
Ruisheng Luo
Yaowei Wang
Shu-Tao Xia
Bin Chen
425
0
0
04 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
Presented at
ResearchTrend Connect | VLM
on
04 Jun 2025
172
6
0
01 Apr 2025
MMCR: Advancing Visual Language Model in Multimodal Multi-Turn Contextual Reasoning
Dawei Yan
Yangfu Li
Qing-Guo Chen
Weihua Luo
Peng Wang
Han Zhang
Chunhua Shen
VGen
VLM
LRM
85
1
0
24 Mar 2025
On the Robustness Tradeoff in Fine-Tuning
Kunyang Li
Jean-Charles Noirot Ferrand
Ryan Sheatsley
Blaine Hoak
Yohan Beugin
Eric Pauley
Patrick McDaniel
91
0
0
19 Mar 2025
Revisiting semi-supervised learning in the era of foundation models
Ping Zhang
Zheda Mai
Quang-Huy Nguyen
Wei-Lun Chao
104
1
0
12 Mar 2025
DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection
Chiara Cappellino
Gianluca Mancusi
Matteo Mosconi
Angelo Porrello
Simone Calderara
Rita Cucchiara
ObjD
VLM
176
0
0
12 Mar 2025
Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation
Deyi Ji
Feng Zhao
Hongtao Lu
Feng Wu
Jieping Ye
127
3
0
11 Mar 2025
Learning and Evaluating Hierarchical Feature Representations
Depanshu Sani
Saket Anand
74
0
0
10 Mar 2025
On the Generalization of Representation Uncertainty in Earth Observation
Spyros Kondylatos
Nikolaos Ioannis Bountos
Dimitrios Michail
Xiao Xiang Zhu
Gustau Camps-Valls
Ioannis Papoutsis
108
1
0
10 Mar 2025
Zero-Shot Sim-to-Real Visual Quadrotor Control with Hard Constraints
Yan Miao
Will Shen
Sayan Mitra
142
1
0
04 Mar 2025
Anatomically-guided masked autoencoder pre-training for aneurysm detection
Alberto Mario Ceballos-Arroyo
Jisoo Kim
Hongpeng Zhou
Lei Qin
Geoffrey S. Young
Huaizu Jiang
ViT
MedIm
58
0
0
28 Feb 2025
Mixtraining: A Better Trade-Off Between Compute and Performance
Zexin Li
Jiancheng Zhang
Yufei Li
Yinglun Zhu
Cong Liu
71
0
0
26 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
203
0
0
20 Feb 2025
Detecting Systematic Weaknesses in Vision Models along Predefined Human-Understandable Dimensions
Sujan Sai Gannamaneni
Rohil Prakash Rao
Michael Mock
Maram Akila
Stefan Wrobel
AAML
452
0
0
17 Feb 2025
Towards Virtual Clinical Trials of Radiology AI with Conditional Generative Modeling
Benjamin Killeen
Bohua Wan
Aditya V. Kulkarni
Nathan G. Drenkow
Michael Oberst
Paul H. Yi
Mathias Unberath
MedIm
125
0
0
13 Feb 2025
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions
Prajwal Gatti
Kshitij Parikh
Dhriti Prasanna Paul
Manish Gupta
Anand Mishra
221
2
0
12 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
193
6
0
10 Feb 2025
Evaluating Vision-Language Models for Emotion Recognition
Sree Bhattacharyya
James Z. Wang
VLM
145
2
0
08 Feb 2025
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
Zhekai Du
Yinjie Min
Jingjing Li
Ke Lu
Changliang Zou
Liuhua Peng
Tingjin Chu
Mingming Gong
462
2
0
05 Feb 2025
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin
Ying Li
Mingyu Zhao
Shiyu Zhao
Zhenting Wang
Xiaoxiao He
Ligong Han
Tong Che
Dimitris N. Metaxas
VPVLM
VLM
335
2
0
02 Feb 2025
Rethinking Encoder-Decoder Flow Through Shared Structures
Frederik Laboyrie
M. K. Yucel
Albert Saà-Garriga
AI4CE
80
0
0
24 Jan 2025
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
Wenxuan Li
Alan Yuille
Zongwei Zhou
MedIm
148
10
0
20 Jan 2025
Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from Vision Transformer
Haoyu Zhang
Raghavendra Ramachandra
Kiran Raja
C. Busch
172
6
0
20 Jan 2025
MoRe: Class Patch Attention Needs Regularization for Weakly Supervised Semantic Segmentation
Zhiwei Yang
Yucong Meng
Kexue Fu
Shuo Wang
Zhijian Song
252
2
0
20 Jan 2025
A Room to Roam: Reset Prediction Based on Physical Object Placement for Redirected Walking
Sulim Chun
Ho Jung Lee
In-Kwon Lee
69
0
0
23 Dec 2024
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
165
4
0
18 Dec 2024
Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition
Ethan Baron
Idan Tankel
Peter Tu
Guy Ben-Yosef
VLM
134
0
0
18 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
332
2
0
18 Dec 2024
CONCLAD: COntinuous Novel CLAss Detector
Amanda Rios
I. Ndiour
Parual Datta
Omesh Tickoo
Nilesh A. Ahuja
148
0
0
13 Dec 2024
1
2
3
4
5
6
7
8
9
Next