Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
v1
v2 (latest)
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 826 papers shown
Title
Towards Generalizable Scene Change Detection
Jaewoo Kim
Uehwan Kim
131
0
0
10 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
102
9
0
10 Sep 2024
EndoOmni: Zero-Shot Cross-Dataset Depth Estimation in Endoscopy by Robust Self-Learning from Noisy Labels
Qingyao Tian
Zhen Chen
Huai Liao
Xinyan Huang
Lujie Li
Sebastien Ourselin
Hongbin Liu
229
3
0
09 Sep 2024
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization
Nan Chen
Mengqi Huang
Zhuowei Chen
Yang Zheng
Lei Zhang
Zhendong Mao
DiffM
154
6
0
09 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-Xiong Wang
142
23
0
05 Sep 2024
DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction
Jenny Seidenschwarz
Qunjie Zhou
Bardienus Duisterhof
Deva Ramanan
Laura Leal-Taixe
112
7
0
03 Sep 2024
Self-Supervised Vision Transformers for Writer Retrieval
Tim Raven
Arthur Matei
Gernot A. Fink
ViT
71
1
0
01 Sep 2024
DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model
Mona Sheikh Zeinoddin
Chiara Lena
Jiongqi Qu
Luca Carlini
Mattia Magro
...
E. Mazomenos
Daniel C. Alexander
Danail Stoyanov
Matthew J. Clarkson
Mobarakol Islam
81
1
0
30 Aug 2024
Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks
Sierra Bonilla
Chiara Di Vece
Rema Daher
Xinwei Ju
Danail Stoyanov
Francisco Vasconcelos
Sophia Bano
3DV
87
1
0
29 Aug 2024
A Simple and Generalist Approach for Panoptic Segmentation
Nedyalko Prisadnikov
Wouter Van Gansbeke
Danda Pani Paudel
Luc Van Gool
VLM
116
0
0
29 Aug 2024
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
157
12
0
29 Aug 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
155
68
0
28 Aug 2024
Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration
Xu Zhang
Jiaqi Ma
Guoli Wang
Qian Zhang
Huan Zhang
Lefei Zhang
VLM
183
10
0
28 Aug 2024
NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework
Shuangchen Zhao
Changde Du
Hui Li
Huiguang He
72
0
0
27 Aug 2024
The Benefits of Balance: From Information Projections to Variance Reduction
Lang Liu
Ronak R. Mehta
Soumik Pal
Zaïd Harchaoui
81
0
0
27 Aug 2024
An Embedding is Worth a Thousand Noisy Labels
Francesco Di Salvo
Sebastian Doerrich
Ines Rieger
Christian Ledig
NoLa
153
0
0
26 Aug 2024
FungiTastic: A multi-modal dataset and benchmark for image categorization
Lukás Picek
Klara Janouskova
Milan Šulc
Jirí Matas
140
1
0
24 Aug 2024
Segment Any Mesh
George Tang
William Zhao
Logan Ford
David Benhaim
Paul Zhang
97
9
0
24 Aug 2024
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang
Yuan Dong
Hanwen Jiang
Dejia Xu
Georgios Pavlakos
Qixing Huang
3DGS
189
3
0
23 Aug 2024
Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification
Sudi Murindanyi
Joyce Nakatumba-Nabende
Rahman Sanya
Rose Nakibuule
Andrew Katumba
VLM
76
1
0
22 Aug 2024
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu
Huangxing Chen
Linqing Zhao
Ziwei Wang
Jie Zhou
Jiwen Lu
121
16
0
21 Aug 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang
Christopher Hoang
Yuwen Xiong
Yann LeCun
Mengye Ren
251
0
0
20 Aug 2024
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models
Lin Zhao
Xiao Chen
Eric Z. Chen
Yikang Liu
Terrence Chen
Shanhui Sun
VLM
109
6
0
16 Aug 2024
General-purpose Clothes Manipulation with Semantic Keypoints
Yuhong Deng
David Hsu
123
2
0
15 Aug 2024
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang
Dan Song
Pengxin Zhan
Qingguo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Anan Liu
DiffM
97
4
0
12 Aug 2024
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow
Philip Wiese
Gamze İslamoğlu
Moritz Scherer
Luka Macan
Victor J. B. Jung
Luca Bompani
Francesco Conti
Luca Benini
71
2
0
05 Aug 2024
Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2
Lv Tang
Bo Li
VLM
74
7
0
31 Jul 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
158
13
0
31 Jul 2024
Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving
Bernard Lange
Masha Itkina
Jiachen Li
Mykel J. Kochenderfer
107
4
0
30 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
174
9
0
22 Jul 2024
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Zheng Chong
Xiao Dong
Haoxiang Li
Shiyue Zhang
Wenqing Zhang
Xujie Zhang
Hanqing Zhao
D. Jiang
Xiaodan Liang
DiffM
136
24
0
21 Jul 2024
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
Nir Barel
Ron Shapira Weber
Nir Mualem
Shahaf E. Finder
Oren Freifeld
171
2
0
16 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
135
233
0
10 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
222
4
0
10 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Yansen Wang
Huchuan Lu
Ming-Hsuan Yang
VOS
169
4
0
10 Jul 2024
Controlling Space and Time with Diffusion Models
Daniel Watson
Saurabh Saxena
Lala Li
Andrea Tagliasacchi
David J. Fleet
VGen
163
32
0
10 Jul 2024
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Teng Wang
Lingquan Meng
Lei Cheng
Changyin Sun
59
0
0
09 Jul 2024
KidSat: satellite imagery to map childhood poverty dataset and benchmark
Makkunda Sharma
Fan Yang
Duy-Nhat Vo
Esra Suel
Swapnil Mishra
Samir Bhatt
Oliver Fiala
William Rudgard
Seth Flaxman
116
1
0
08 Jul 2024
HiDe-PET: Continual Learning via Hierarchical Decomposition of Parameter-Efficient Tuning
Liyuan Wang
Jingyi Xie
Xingxing Zhang
Hang Su
Jun Zhu
CLL
122
7
0
07 Jul 2024
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
Leonhard Sommer
Artur Jesslen
Eddy Ilg
Adam Kortylewski
88
2
0
05 Jul 2024
Learning to Be a Transformer to Pinpoint Anomalies
Alex Costanzino
Pierluigi Zama Ramirez
Giuseppe Lisanti
Luigi Di Stefano
97
0
0
04 Jul 2024
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
Yuxiang Chai
Siyuan Huang
Yazhe Niu
Han Xiao
Liang Liu
Dingyu Zhang
Shuai Ren
Hongsheng Li
LLMAG
123
40
0
03 Jul 2024
Why do LLaVA Vision-Language Models Reply to Images in English?
Musashi Hinck
Carolin Holtermann
Matthew Lyle Olson
Florian Schneider
Sungduk Yu
Anahita Bhiwandiwalla
Anne Lauscher
Shaoyen Tseng
Vasudev Lal
VLM
131
7
0
02 Jul 2024
Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection
F. Barbato
Umberto Michieli
J. Moon
Pietro Zanuttigh
Mete Ozay
103
2
0
01 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
174
31
0
28 Jun 2024
Odd-One-Out: Anomaly Detection by Comparing with Neighbors
A. Bhunia
Changjian Li
Hakan Bilen
148
0
0
28 Jun 2024
Foundational Models for Pathology and Endoscopy Images: Application for Gastric Inflammation
H. Kerdegari
Kyle Higgins
Dennis Veselkov
I. Laponogov
I. Poļaka
...
Junior Andrea Pescino
M. Leja
M. Dinis-Ribeiro
T. F. Kanonnikoff
Kirill Veselkov
106
5
0
26 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
134
23
0
24 Jun 2024
UNICAD: A Unified Approach for Attack Detection, Noise Reduction and Novel Class Identification
Alvaro Lopez Pellicer
Kittipos Giatgong
Yi Li
N. Suri
Plamen Angelov
AAML
61
3
0
24 Jun 2024
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
Delin Qu
Qizhi Chen
Pingrui Zhang
Xianqiang Gao
Bin Zhao
Bin Zhao
Dong Wang
Xuelong Li
AI4CE
119
8
0
23 Jun 2024
Previous
1
2
3
...
12
13
14
15
16
17
Next