Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.14294
Cited By
v1
v2 (latest)
Emerging Properties in Self-Supervised Vision Transformers
29 April 2021
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Emerging Properties in Self-Supervised Vision Transformers"
50 / 4,175 papers shown
Title
Renaissance: Investigating the Pretraining of Vision-Language Encoders
Clayton Fields
C. Kennington
VLM
59
0
0
11 Nov 2024
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei
Zheyang Xiong
Weiming Ren
Xinrun Du
Ge Zhang
Wenhu Chen
176
28
0
11 Nov 2024
Semantic Enhancement for Object SLAM with Heterogeneous Multimodal Large Language Model Agents
Jungseok Hong
Ran Choi
John Leonard
VLM
154
1
0
11 Nov 2024
Understanding the Role of Equivariance in Self-supervised Learning
Yifei Wang
Kaiwen Hu
Sharut Gupta
Ziyu Ye
Yisen Wang
Stefanie Jegelka
SSL
97
2
0
10 Nov 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Kaixuan Lu
Ruiqian Zhang
Xiao Huang
Yuxing Xie
Xiaogang Ning
Hanchao Zhang
Mengke Yuan
Pan Zhang
Tao Wang
Tongkui Liao
84
2
0
09 Nov 2024
GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for Active Learning on Label Noise
Moseli Motsóehli
Kyungim Baek
90
1
0
08 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
75
4
0
08 Nov 2024
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang
Roni Paiss
Shiran Zada
Nikhil Karnad
David E. Jacobs
Yael Pritch
Inbar Mosseri
Mike Zheng Shou
Neal Wadhwa
Nataniel Ruiz
DiffM
VGen
154
21
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
89
4
0
07 Nov 2024
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
G. Zhou
Hengkai Pan
Yann LeCun
Lerrel Pinto
VGen
LM&Ro
OffRL
104
32
0
07 Nov 2024
Taming Rectified Flow for Inversion and Editing
Jiangshan Wang
Junfu Pu
Zhongang Qi
Jiayi Guo
Yue Ma
Nisha Huang
Yuxin Chen
Xiu Li
Ying Shan
107
38
0
07 Nov 2024
SA3DIP: Segment Any 3D Instance with Potential 3D Priors
Xi Yang
Xu Gu
Xingyilang Yin
Xinbo Gao
98
0
0
06 Nov 2024
AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation
Mingyu Sheng
Jianan Fan
Dongnan Liu
Ron Kikinis
Weidong Cai
80
0
0
06 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
122
4
0
05 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
139
4
0
05 Nov 2024
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
Qishuai Wen
Chun-Guang Li
ViT
64
0
0
05 Nov 2024
Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation
Xavier Timoneda
Markus Herb
Fabian Duerr
Daniel Goehring
Fisher Yu
SSL
3DPC
61
1
0
05 Nov 2024
TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
Maitreya Patel
Abhiram Kusumba
Sheng Cheng
Changhoon Kim
Tejas Gokhale
Chitta Baral
Yezhou Yang
CoGe
CLIP
143
14
0
04 Nov 2024
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions
Hao-Yu Hsu
Zhi-Hao Lin
Albert Zhai
Hongchi Xia
Shenlong Wang
VGen
105
11
0
04 Nov 2024
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal
Phillip Isola
Antonio Torralba
William T. Freeman
VLM
102
9
0
04 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
Maja Pantic
SSL
86
7
0
04 Nov 2024
UnSegMedGAT: Unsupervised Medical Image Segmentation using Graph Attention Networks Clustering
A. M. Adityaja
Saurabh J. Shigwan
Nitin Kumar
MedIm
87
1
0
04 Nov 2024
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
Yuxin Xiao
Chaoqun Wan
Yonggang Zhang
Wenxiao Wang
Binbin Lin
Xiaofei He
Xu Shen
Jieping Ye
49
0
0
04 Nov 2024
Bootstrapping Top-down Information for Self-modulating Slot Attention
Dongwon Kim
Seoyeon Kim
Suha Kwak
OCL
ObjD
83
0
0
04 Nov 2024
Breaking the Reclustering Barrier in Centroid-based Deep Clustering
Lukas Miklautz
Timo Klein
Kevin Sidak
Collin Leiber
Thomas Lang
Andrii Shkabrii
Sebastian Tschiatschek
Claudia Plant
149
1
0
04 Nov 2024
Grouped Discrete Representation for Object-Centric Learning
Rongzhen Zhao
V. Wang
Arno Solin
Joni Pajarinen
BDL
OCL
86
1
0
04 Nov 2024
Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation
Zhenbin Wang
Lei Zhang
Lituan Wang
Minjuan Zhu
Zhenwei Zhang
VGen
MedIm
101
3
0
03 Nov 2024
Exploring PCA-based feature representations of image pixels via CNN to enhance food image segmentation
Ying Dai
81
0
0
03 Nov 2024
Task-Oriented Hierarchical Object Decomposition for Visuomotor Control
Jianing Qian
Yunshuang Li
Bernadette Bucher
Dinesh Jayaraman
OCL
91
0
0
02 Nov 2024
Improving Viewpoint-Independent Object-Centric Representations through Active Viewpoint Selection
Yinxuan Huang
Chengmin Gao
Bin Li
Xiangyang Xue
OCL
65
0
0
01 Nov 2024
Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
Junlin He
Jinxiao Du
Wei Ma
SSL
118
1
0
01 Nov 2024
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
85
23
0
31 Oct 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
67
2
0
31 Oct 2024
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
72
0
0
31 Oct 2024
SceneComplete: Open-World 3D Scene Completion in Cluttered Real World Environments for Robot Manipulation
Aditya Agarwal
Gaurav Singh
Bipasha Sen
Tomás Lozano-Pérez
L. Kaelbling
3DV
84
3
0
31 Oct 2024
FRoundation: Are Foundation Models Ready for Face Recognition?
Tahar Chettaoui
Naser Damer
Fadi Boutros
CVBM
94
8
0
31 Oct 2024
Unsupervised Object Discovery: A Comprehensive Survey and Unified Taxonomy
José-Fabian Villa-Vásquez
M. Pedersoli
142
1
0
30 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
Sekeun Kim
Pengfei Jin
S. Song
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
107
0
0
30 Oct 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIP
VLM
74
2
0
30 Oct 2024
Bridging the Human to Robot Dexterity Gap through Object-Oriented Rewards
Irmak Güzey
Yinlong Dai
Georgy Savva
Raunaq M. Bhirangi
Lerrel Pinto
99
11
0
30 Oct 2024
HEX: Hierarchical Emergence Exploitation in Self-Supervised Algorithms
Kiran Kokilepersaud
Seulgi Kim
Mohit Prabhushankar
Ghassan AlRegib
88
2
0
30 Oct 2024
S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving
Maciej K. Wozniak
Hariprasath Govindarajan
Marvin Klingner
Camille Maurice
B Ravi Kiran
S. Yogamani
3DPC
156
1
0
30 Oct 2024
Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping
Qianxu Wang
Congyue Deng
Tyler Ga Wei Lum
Yuanpei Chen
Yaodong Yang
Jeannette Bohg
Yixin Zhu
Leonidas Guibas
84
4
0
30 Oct 2024
Universality of the
π
2
/
6
π^2/6
π
2
/6
Pathway in Avoiding Model Collapse
Apratim Dey
D. Donoho
132
8
0
30 Oct 2024
NeFF-BioNet: Crop Biomass Prediction from Point Cloud to Drone Imagery
Xuesong Li
Zeeshan Hayder
Ali Zia
Connor Cassidy
Shiming Liu
W. Stiller
Eric A. Stone
Warren C. Conaty
Lars Petersson
V. Rolland
60
0
0
30 Oct 2024
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization
Zhong Ji
Steve Yang
Jingren Liu
Yanwei Pang
Jungong Han
125
1
0
29 Oct 2024
SimSiam Naming Game: A Unified Approach for Representation Learning and Emergent Communication
Nguyen Le Hoang
T. Taniguchi
Fang Tianwei
Akira Taniguchi
93
1
0
29 Oct 2024
DINeuro: Distilling Knowledge from 2D Natural Images via Deformable Tubular Transferring Strategy for 3D Neuron Reconstruction
Yik San Cheng
Runkai Zhao
Heng Wang
Hanchuan Peng
Yui Lo
Yuqian Chen
L. O’Donnell
Weidong Cai
109
0
0
29 Oct 2024
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery
Yuxun Qu
Yongqiang Tang
Chenyang Zhang
Wensheng Zhang
179
0
0
29 Oct 2024
IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models
Hang Guo
Yawei Li
Tao Dai
Shu-Tao Xia
Luca Benini
MQ
127
2
0
29 Oct 2024
Previous
1
2
3
...
16
17
18
...
82
83
84
Next