Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,220 papers shown
Title
Zero-shot Imitation Policy via Search in Demonstration Dataset
Federico Malato
Florian Leopold
Andrew Melnik
Ville Hautamaki
LM&Ro
OffRL
26
6
0
29 Jan 2024
Bridging Generative and Discriminative Models for Unified Visual Perception with Diffusion Priors
Shiyin Dong
Mingrui Zhu
Kun Cheng
Nannan Wang
Xinbo Gao
DiffM
30
3
0
29 Jan 2024
TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts
Jingyu Zhuang
Di Kang
Yan-Pei Cao
Guanbin Li
Liang Lin
Ying Shan
DiffM
3DGS
55
38
0
26 Jan 2024
Spatial Transcriptomics Analysis of Zero-shot Gene Expression Prediction
Yan Yang
Md Zakir Hossain
Xuesong Li
Shafin Rahman
Eric A. Stone
25
4
0
26 Jan 2024
Residual Quantization with Implicit Neural Codebooks
Iris A. M. Huijben
Matthijs Douze
Matthew Muckley
Ruud J. G. van Sloun
Jakob Verbeek
MQ
34
11
0
26 Jan 2024
Inconsistency Masks: Removing the Uncertainty from Input-Pseudo-Label Pairs
Michael R. H. Vorndran
Bernhard F. Roeck
VLM
ISeg
UQCV
39
3
0
25 Jan 2024
StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models
Mohan Zhou
Yalong Bai
Qing Yang
Tiejun Zhao
32
0
0
25 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
39
14
0
25 Jan 2024
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All
M. S. Seyfioglu
Karim Bouyarmane
Suren Kumar
Amir Tavanaei
Ismail B. Tutar
DiffM
35
7
0
24 Jan 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
56
183
0
24 Jan 2024
Finetuning Foundation Models for Joint Analysis Optimization
M. Vigl
N. Hartman
L. Heinrich
48
13
0
24 Jan 2024
PlaceFormer: Transformer-based Visual Place Recognition using Multi-Scale Patch Selection and Fusion
S. S. Kannan
Byung-Cheol Min
23
5
0
23 Jan 2024
DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained Self-supervised Vision Transformer
Sonal Kumar
Arijit Sur
R. Baruah
ViT
45
2
0
23 Jan 2024
UniHDA: A Unified and Versatile Framework for Multi-Modal Hybrid Domain Adaptation
Hengjia Li
Yang Liu
Yuqi Lin
Zhanwei Zhang
Yibo Zhao
...
Tu Zheng
Zheng Yang
Yuchun Jiang
Boxi Wu
Deng Cai
DiffM
38
0
0
23 Jan 2024
Self-Supervised Vision Transformers Are Efficient Segmentation Learners for Imperfect Labels
Seungho Lee
Seoungyoon Kang
Hyunjung Shim
ViT
VLM
36
0
0
23 Jan 2024
Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration
Yifan Zhang
Siyu Ren
Junhui Hou
Jinjian Wu
Guangming Shi
Guangming Shi
SSL
3DPC
93
3
0
23 Jan 2024
Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models
Chenyu Lian
Hong-Yu Zhou
Yizhou Yu
Liansheng Wang
MedIm
53
6
0
22 Jan 2024
Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal Loss
Jordan Shipard
Arnold Wiliem
Kien Nguyen Thanh
Wei Xiang
Clinton Fookes
VLM
CLIP
40
2
0
22 Jan 2024
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models
Reda Bensaid
Vincent Gripon
Franccois Leduc-Primeau
Lukas Mauch
G. B. Hacene
Fabien Cardinaux
VLM
44
7
0
20 Jan 2024
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang
Bingyi Kang
Zilong Huang
Xiaogang Xu
Jiashi Feng
Hengshuang Zhao
VLM
158
721
0
19 Jan 2024
Dense 3D Reconstruction Through Lidar: A Comparative Study on Ex-vivo Porcine Tissue
Guido Caccianiga
Julian Nubert
Marco Hutter
Katherine J. Kuchenbecker
39
1
0
19 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel C. F. Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MA
MedIm
59
27
0
19 Jan 2024
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke
Bert De Brabandere
DiffM
46
11
0
18 Jan 2024
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva
Andrew Zisserman
36
13
0
18 Jan 2024
Supervised Fine-tuning in turn Improves Visual Foundation Models
Xiaohu Jiang
Yixiao Ge
Yuying Ge
Dachuan Shi
Chun Yuan
Ying Shan
VLM
CLIP
48
8
0
18 Jan 2024
Visual Robotic Manipulation with Depth-Aware Pretraining
Wanying Wang
Jinming Li
Yichen Zhu
Zhiyuan Xu
Zhengping Che
Chaomin Shen
Yaxin Peng
Dong Liu
Feifei Feng
Jian Tang
MDE
37
3
0
17 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
66
18
0
16 Jan 2024
The Faiss library
Matthijs Douze
Alexandr Guzhva
Chengqi Deng
Jeff Johnson
Gergely Szilvasy
Pierre-Emmanuel Mazaré
Maria Lomeli
Lucas Hosseini
Hervé Jégou
46
147
0
16 Jan 2024
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
25
2
0
11 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi Ma
Yann LeCun
Saining Xie
VLM
MLLM
41
288
0
11 Jan 2024
Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery
Beilei Cui
Mobarakol Islam
Long Bai
Hongliang Ren
MedIm
31
37
0
11 Jan 2024
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil
Raiymbek Akshulakov
Y. A. D. Djilali
Sanath Narayan
M. Seddik
K. Mangalam
Noel E. O'Connor
VLM
34
11
0
10 Jan 2024
SOS-Match: Segmentation for Open-Set Robust Correspondence Search and Robot Localization in Unstructured Environments
Annika Thomas
Jouko Kinnari
Parker C. Lusk
Kota Kondo
Jonathan P. How
28
3
0
09 Jan 2024
Low-resource finetuning of foundation models beats state-of-the-art in histopathology
Benedikt Roth
Valentin Koch
S. J. Wagner
Julia A. Schnabel
Carsten Marr
Tingying Peng
MedIm
26
8
0
09 Jan 2024
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
32
5
0
09 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
36
22
0
09 Jan 2024
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models
Dingning Liu
Xiaoshui Huang
Yuenan Hou
Zhihui Wang
Zhen-fei Yin
Yongshun Gong
Peng Gao
Wanli Ouyang
29
8
0
09 Jan 2024
Representative Feature Extraction During Diffusion Process for Sketch Extraction with One Example
Kwan Yun
Youngseo Kim
Kwanggyoon Seo
Chang Wook Seo
Junyong Noh
DiffM
30
2
0
09 Jan 2024
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang
Jie Zhang
Zheng Yuan
Shiguang Shan
VLM
36
20
0
09 Jan 2024
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Hyogon Ryu
Seohyun Lim
Hyunjung Shim
DiffM
MQ
29
6
0
09 Jan 2024
Dr
2
^2
2
Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao
Shuming Liu
K. Mangalam
Guocheng Qian
Fatimah Zohra
Abdulmohsen Alghannam
Jitendra Malik
Guohao Li
54
3
0
08 Jan 2024
AGG: Amortized Generative 3D Gaussians for Single Image to 3D
Dejia Xu
Ye Yuan
Morteza Mardani
Sifei Liu
Jiaming Song
Zhangyang Wang
Arash Vahdat
3DGS
51
44
0
08 Jan 2024
RudolfV: A Foundation Model by Pathologists for Pathologists
Jonas Dippel
Barbara Feulner
Tobias Winterhoff
Timo Milbich
Stephan Tietz
...
David Horst
Lukas Ruff
Klaus-Robert Muller
Frederick Klauschen
Maximilian Alber
36
29
0
08 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
28
12
0
07 Jan 2024
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis
Kebin Wu
Wenbin Li
Xiaofei Xiao
21
3
0
05 Jan 2024
Denoising Vision Transformers
Jiawei Yang
Katie Z Luo
Jie Li
Kilian Q. Weinberger
Yonglong Tian
Yue Wang
DiffM
32
13
0
05 Jan 2024
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos
E. Peruzzo
Vidit Goel
Dejia Xu
Xingqian Xu
Yi Ding
Zhangyang Wang
Humphrey Shi
N. Sebe
LM&Ro
VGen
DiffM
71
9
0
04 Jan 2024
Learning the 3D Fauna of the Web
Zizhang Li
Dor Litvak
Ruining Li
Yunzhi Zhang
Tomas Jakab
Christian Rupprecht
Shangzhe Wu
Andrea Vedaldi
Jiajun Wu
41
23
0
04 Jan 2024
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
Fanqing Meng
Wenqi Shao
Quanfeng Lu
Peng Gao
Kaipeng Zhang
Yu Qiao
Ping Luo
36
46
0
04 Jan 2024
Data-Centric Foundation Models in Computational Healthcare: A Survey
Yunkun Zhang
Jin Gao
Zheling Tan
Lingfeng Zhou
Kexin Ding
Mu Zhou
Shaoting Zhang
Dequan Wang
AI4CE
50
22
0
04 Jan 2024
Previous
1
2
3
...
36
37
38
...
43
44
45
Next