Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.14294
Cited By
v1
v2 (latest)
Emerging Properties in Self-Supervised Vision Transformers
29 April 2021
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Emerging Properties in Self-Supervised Vision Transformers"
50 / 4,175 papers shown
Title
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
78
0
0
10 May 2025
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
Weihong Li
Xiaoqiong Liu
Heng Fan
L. Zhang
64
0
0
09 May 2025
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Alexander Lappe
M. Giese
60
1
0
09 May 2025
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Christos Plachouras
Julien Guinot
George Fazekas
Elio Quinton
Emmanouil Benetos
Johan Pauwels
439
1
0
09 May 2025
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Rameswar Panda
Oriol Nieto
Prem Seetharaman
Justin Salamon
85
1
0
08 May 2025
Learning from Similarity Proportion Loss for Classifying Skeletal Muscle Recovery Stages
Yu Yamaoka
Weng Ian Chan
Shigeto Seno
Soichiro Fukada
Hideo Matsuda
82
0
0
07 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
102
0
0
07 May 2025
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Moseli Motsóehli
Hope Mogale
Kyungim Baek
131
0
0
07 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
169
0
0
06 May 2025
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi
Eilif Muller
Shahab Bakhtiari
158
0
0
06 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
Dengyang Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
107
0
0
05 May 2025
GIF: Generative Inspiration for Face Recognition at Scale
Saeed Ebrahimi
Sahar Rahimi
Ali Dabouei
Srinjoy Das
Jeremy M. Dawson
Nasser M. Nasrabadi
CVBM
543
0
0
05 May 2025
MUSAR: Exploring Multi-Subject Customization from Single-Subject Dataset via Attention Routing
Zinan Guo
Pengze Zhang
Yanze Wu
Chong Mou
Mingcong Liu
Qian He
84
0
0
05 May 2025
Lifelong Whole Slide Image Analysis: Online Vision-Language Adaptation and Past-to-Present Gradient Distillation
Doanh C. Bui
H. Pham
Vu Trung Duong Le
T. Vu
Van Duy Tran
Khang Phuoc-Quy Nguyen
Y. Nakashima
CLL
MedIm
81
0
0
04 May 2025
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen
Yücel Yemez
OCL
169
0
0
04 May 2025
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang
Mengqi Huang
Yijing Tu
Zhendong Mao
VGen
126
0
0
04 May 2025
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Ali Mammadov
Loic Le Folgoc
Julien Adam
Anne Buronfosse
Gilles Hayem
Guillaume Hocquet
Pietro Gori
SSL
73
0
0
02 May 2025
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models
Mohammadreza Teymoorianfard
Shiqing Ma
Amir Houmansadr
WIGM
130
0
0
02 May 2025
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
Hilde Kuehne
117
0
0
02 May 2025
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Gaozheng Pei
Ke Ma
Yingfei Sun
Qianqian Xu
Qingming Huang
DiffM
84
0
0
02 May 2025
CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
Zhe Zhang
Mingxiu Cai
Haoran Wang
Gaochang Wu
Tianyou Chai
Xiatian Zhu
127
0
0
02 May 2025
InstructAttribute: Fine-grained Object Attributes editing with Instruction
Xingxi Yin
Jingfeng Zhang
Zhi Li
You Li
Yanzhe Zhang
Yin Zhang
DiffM
455
1
0
01 May 2025
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer
Olaf Dünkel
Christian Theobalt
Adam Kortylewski
72
1
0
30 Apr 2025
Recursive KL Divergence Optimization: A Dynamic Framework for Representation Learning
Anthony D Martin
125
0
0
30 Apr 2025
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
105
0
0
30 Apr 2025
Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design
Vasudev Sharma
Ahmed Alagha
Abdelhakim Khellaf
Vincent Quoc-Huy Trinh
Mahdi S. Hosseini
143
0
0
30 Apr 2025
Enhancing Self-Supervised Fine-Grained Video Object Tracking with Dynamic Memory Prediction
Zihan Zhou
Changrui Dai
Aibo Song
Xiaolin Fang
VOS
94
0
0
30 Apr 2025
Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders
Xuwei Yang
Fatemeh Tavakoli
D. B. Emerson
Anastasis Kratsios
FedML
130
0
0
30 Apr 2025
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Weizhen He
Yunfeng Yan
Shixiang Tang
Yiheng Deng
Yangyang Zhong
Pengxin Luo
Donglian Qi
VLM
197
1
0
29 Apr 2025
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Zechuan Zhang
Ji Xie
Yu Lu
Zongxin Yang
Yue Yang
DiffM
146
11
0
29 Apr 2025
SVD Based Least Squares for X-Ray Pneumonia Classification Using Deep Features
Mete Erdogan
Sebnem Demirtas
79
0
0
29 Apr 2025
PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations
Haowen Sun
Haoran Wang
Chengzhong Ma
Shaolong Zhang
Jiawei Ye
Xingyu Chen
Xuguang Lan
OffRL
115
1
0
29 Apr 2025
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
Zhengqin Li
Dilin Wang
Ka Chen
Zhaoyang Lv
Thu Nguyen-Phuoc
...
Yufeng Zhu
Carl S. Marshall
Yufeng Ren
Richard Newcombe
Zhao Dong
3DV
136
1
0
28 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
149
0
0
28 Apr 2025
CompleteMe: Reference-based Human Image Completion
Yu-Ju Tsai
Brian L. Price
Qing Liu
Luis Figueroa
D. Pakhomov
Zhihong Ding
Scott D. Cohen
Ming-Hsuan Yang
3DH
63
0
0
28 Apr 2025
Enhancing breast cancer detection on screening mammogram using self-supervised learning and a hybrid deep model of Swin Transformer and Convolutional Neural Network
Han Chen
Anne L. Martel
79
0
0
28 Apr 2025
Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics
Andre Schreiber
Katherine Rose Driggs-Campbell
475
0
0
28 Apr 2025
Taming the Randomness: Towards Label-Preserving Cropping in Contrastive Learning
Mohamed Hassan
Mohammad Wasil
Sebastian Houben
96
0
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
Lav Varshney
112
0
0
27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
Siyang Song
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
108
0
0
27 Apr 2025
MERA: Multimodal and Multiscale Self-Explanatory Model with Considerably Reduced Annotation for Lung Nodule Diagnosis
Jiahao Lu
Chong Yin
Silvia Ingala
Kenny Erleben
M. Nielsen
S. Darkner
95
0
0
27 Apr 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
439
0
0
27 Apr 2025
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki
Anabia Sohail
I. I. Ganapathi
B. Alawode
Asim Khan
Sajid Javed
Naoufel Werghi
Mohammed Bennamoun
Arif Mahmood
174
0
0
26 Apr 2025
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
Elena Plekhanova
Damien Robert
Johannes Dollinger
Emilia Arens
Philipp Brun
Jan Dirk Wegner
Niklaus Zimmermann
81
0
0
25 Apr 2025
Examining the Impact of Optical Aberrations to Image Classification and Object Detection Models
Patrick Müller
Alexander Braun
Margret Keuper
102
0
0
25 Apr 2025
Fine-tune Smarter, Not Harder: Parameter-Efficient Fine-Tuning for Geospatial Foundation Models
Francesc Marti Escofet
Benedikt Blumenstiel
L. Scheibenreif
P. Fraccaro
Konrad Schindler
91
0
0
24 Apr 2025
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
81
0
0
24 Apr 2025
RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation
Aviv Slobodkin
Hagai Taitelbaum
Yonatan Bitton
Brian Gordon
Michal Sokolik
Nitzan Bitton-Guetta
Almog Gueta
Royi Rassin
Itay Laish
Dani Lischinski
EGVM
VGen
101
0
0
24 Apr 2025
Improving Open-World Object Localization by Discovering Background
Ashish Singh
Michael Jeffrey Jones
Kuan-Chuan Peng
A. Cherian
Moitreya Chatterjee
Erik Learned-Miller
ObjD
OCL
VLM
116
0
0
24 Apr 2025
Facial Foundational Model Advances Early Warning of Coronary Artery Disease from Live Videos with DigitalShadow
Juexiao Zhou
Zhongyi Han
Mankun Xin
Xingwei He
Guotao Wang
...
Xuefei Bi
Lu Liu
Long Feng
Xiaonan He
Xin Gao
13
0
0
23 Apr 2025
Previous
1
2
3
4
5
6
...
82
83
84
Next