Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,220 papers shown
Title
Decoding fMRI Data into Captions using Prefix Language Modeling
Vyacheslav Shen
Kassymzhomart Kunanbayev
Dae-Shik Kim
40
0
0
07 Jan 2025
Gaussian Masked Autoencoders
Jathushan Rajasegaran
Xinlei Chen
Rulilong Li
Christoph Feichtenhofer
Jitendra Malik
Shiry Ginosar
3DGS
51
1
0
06 Jan 2025
FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection
Guray Ozgur
Eduarda Caldeira
Tahar Chettaoui
Fadi Boutros
Raghavendra Ramachandra
Naser Damer
AAML
CVBM
42
0
1
06 Jan 2025
Universal Features Guided Zero-Shot Category-Level Object Pose Estimation
Wentian Qu
Chenyu Meng
Heng Li
Jian Cheng
Cuixia Ma
Hongan Wang
Xiao Zhou
Xiaoming Deng
Ping Tan
40
0
0
06 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
91
12
0
06 Jan 2025
MObI: Multimodal Object Inpainting Using Diffusion Models
Alexandru Buburuzan
Anuj Sharma
John Redford
P. Dokania
Romain Mueller
DiffM
99
1
0
06 Jan 2025
ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking
Tingyang Zhang
Chen Wang
Zhiyang Dou
Qingzhe Gao
Jiahui Lei
Baoquan Chen
Lingjie Liu
3DV
51
0
0
06 Jan 2025
Multi-layer Radial Basis Function Networks for Out-of-distribution Detection
Amol Khanna
Chenyi Ling
Derek Everett
Edward Raff
Nathan Inkawhich
OODD
41
0
0
05 Jan 2025
Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Zijie Cheng
Yangqiu Song
André Altmann
P. Keane
Yukun Zhou
MedIm
34
0
0
05 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
209
3
0
05 Jan 2025
CorrFill: Enhancing Faithfulness in Reference-based Inpainting with Correspondence Guidance in Diffusion Models
Kuan-Hung Liu
Cheng-Kun Yang
Min-Hung Chen
Yu-Lun Liu
Y. Lin
DiffM
43
1
0
04 Jan 2025
Keypoint Aware Masked Image Modelling
Madhava Krishna
Convin.AI
80
0
0
03 Jan 2025
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
70
2
0
03 Jan 2025
PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
Zhenyu Li
Wenqing Cui
S. Bhat
Peter Wonka
MDE
46
0
0
03 Jan 2025
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li
Tao Yang
Song Guo
Lefei Zhang
58
3
0
01 Jan 2025
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
67
4
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
66
19
0
31 Dec 2024
VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis
Zhipeng Chen
Lan Yang
Yonggang Qi
Honggang Zhang
Kaiyue Pang
Ke Li
Yi-Zhe Song
DiffM
102
0
0
31 Dec 2024
Forensics of Transpiled Quantum Circuits
Rupshali Roy
Archisman Ghosh
Swaroop Ghosh
71
1
0
25 Dec 2024
Personalized Large Vision-Language Models
Chau Pham
Hoang Phan
David Doermann
Yunjie Tian
VLM
54
3
0
23 Dec 2024
VarAD: Lightweight High-Resolution Image Anomaly Detection via Visual Autoregressive Modeling
Yunkang Cao
Haiming Yao
Wei Luo
Weiming Shen
61
5
0
23 Dec 2024
Be More Diverse than the Most Diverse: Optimal Mixtures of Generative Models via Mixture-UCB Bandit Algorithms
Parham Rezaei
Farzan Farnia
Cheuk Ting Li
54
1
0
23 Dec 2024
A Bias-Free Training Paradigm for More General AI-generated Image Detection
Fabrizio Guillaro
Giada Zingarini
Ben Usman
Avneesh Sud
D. Cozzolino
L. Verdoliva
DiffM
76
4
0
23 Dec 2024
GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Xingrui Wang
Cuiling Lan
Hanxin Zhu
Zhibo Chen
Yan Lu
3DGS
113
1
0
22 Dec 2024
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
Yaming Zhang
Chenqiang Gao
Fangcen Liu
Junjie Guo
Lan Wang
Xinggan Peng
Deyu Meng
109
0
0
21 Dec 2024
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
111
6
0
20 Dec 2024
Mapping the Mind of an Instruction-based Image Editing using SMILE
Zeinab Dehghani
Koorosh Aslansefat
Adil Khan
Adín Ramirez Rivera
Franky George
Muhammad Khalid
DiffM
98
0
0
20 Dec 2024
Continual Learning Using a Kernel-Based Method Over Foundation Models
Saleh Momeni
Sahisnu Mazumder
Bing-Quan Liu
CLL
90
1
0
20 Dec 2024
Interactive Scene Authoring with Specialized Generative Primitives
Clément Jambon
Changwoon Choi
Dongsu Zhang
Olga Sorkine-Hornung
Young Min Kim
VGen
86
0
0
20 Dec 2024
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
94
3
0
19 Dec 2024
Learning from Massive Human Videos for Universal Humanoid Pose Control
Jiageng Mao
Siheng Zhao
Siqi Song
Tianheng Shi
Junjie Ye
Mingtong Zhang
Haoran Geng
Jitendra Malik
Vitor Campagnolo Guizilini
Yue Wang
110
5
0
18 Dec 2024
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
Jihan Yang
Shusheng Yang
Anjali W. Gupta
Rilyn Han
Li Fei-Fei
Saining Xie
LRM
132
58
0
18 Dec 2024
Retrieval Augmented Image Harmonization
Haolin Wang
Ming-Yu Liu
Zifei Yan
Chao Zhou
Longan Xiao
Wangmeng Zuo
82
0
0
18 Dec 2024
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model
Yuqiu Liu
Jingxuan Xu
Mauricio Soroco
Yunchao Wei
Wuyang Chen
AI4CE
84
2
0
18 Dec 2024
ConDo: Continual Domain Expansion for Absolute Pose Regression
Zijun Li
Z. Cai
B. Yang
Xuelun Shen
Siqi Shen
Xiaoliang Fan
Michael Paulitsch
Cheng-Yu Wang
CLL
88
0
0
18 Dec 2024
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Massimiliano Viola
Kevin Qu
Nando Metzger
Bingxin Ke
Alexander Becker
Konrad Schindler
Anton Obukhov
VLM
MDE
98
5
0
18 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Yi Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
92
0
0
18 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
Wen Liu
Xinyu Wang
3DGS
ViT
122
5
0
17 Dec 2024
NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment
Andrea Dunn Beltran
Daniel Rho
Marc Niethammer
Roni Sengupta
Roni Sengupta
106
2
0
17 Dec 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
S. Nagendra
Kashif Rashid
Chaopeng Shen
Daniel Kifer
VLM
84
2
0
16 Dec 2024
DINO-Foresight
\texttt{DINO-Foresight}
DINO-Foresight
: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
99
2
0
16 Dec 2024
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu
Yixin Chen
Junfeng Ni
Baoxiong Jia
Yu Liu
Diwen Wan
Gang Zeng
Siyuan Huang
DiffM
135
4
0
16 Dec 2024
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Salar Abbaspourazad
Anshuman Mishra
Joseph D. Futoma
Andrew C. Miller
Ian Shapiro
97
0
0
15 Dec 2024
Adaptive Visual Perception for Robotic Construction Process: A Multi-Robot Coordination Framework
Jia Xu
Manish Dixit
Xi Wang
81
0
0
15 Dec 2024
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan
Sebastian Stapf
Ahmad Rahimi
Pedro M B Rezende
Yasaman Haghighi
...
Mathieu Salzmann
Davide Scaramuzza
Marc Pollefeys
Paolo Favaro
Alexandre Alahi
VLM
VGen
94
5
0
15 Dec 2024
Medical Manifestation-Aware De-Identification
Yuan Tian
Shuo Wang
Guangtao Zhai
MedIm
75
0
0
14 Dec 2024
Learning Visually Grounded Domain Ontologies via Embodied Conversation and Explanation
Jonghyuk Park
A. Lascarides
S. Ramamoorthy
83
0
0
13 Dec 2024
Agtech Framework for Cranberry-Ripening Analysis Using Vision Foundation Models
Faith Johnson
Ryan Meegan
Jack Lowry
Peter Oudemans
Kristin J. Dana
72
0
0
12 Dec 2024
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen
Xingyu Chen
Anpei Chen
Gerard Pons-Moll
Yuliang Xiu
3DGS
96
3
0
12 Dec 2024
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan
Ajay Bati
Sangmin Lee
Daniel Bolya
Judy Hoffman
James M. Rehg
251
2
0
12 Dec 2024
Previous
1
2
3
...
13
14
15
...
43
44
45
Next