ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision
v1v2 (latest)

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLMCLIPSSL
ArXiv (abs)PDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 826 papers shown
Title
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Open-Vocabulary Temporal Action Localization using Multimodal Guidance
Akshita Gupta
Aditya Arora
Sanath Narayan
Salman Khan
Fahad Shahbaz Khan
Graham W. Taylor
90
4
0
21 Jun 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLMObjD
173
1
0
21 Jun 2024
Is AI fun? HumorDB: a curated dataset and benchmark to investigate
  graphical humor
Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor
Veedant Jain
Felipe dos Santos Alves Feitosa
Gabriel Kreiman
VLM
97
2
0
19 Jun 2024
Large-Scale Dataset Pruning in Adversarial Training through Data
  Importance Extrapolation
Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation
Bjorn Nieth
Thomas Altstidl
Leo Schwinn
Björn Eskofier
AAML
109
3
0
19 Jun 2024
The Wisdom of a Crowd of Brains: A Universal Brain Encoder
The Wisdom of a Crowd of Brains: A Universal Brain Encoder
Roman Beliy
Navve Wasserman
Amit Zalcher
Michal Irani
100
2
0
18 Jun 2024
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by
  Distilling Neural Fields and Foundation Model Features
DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features
Letian Wang
Seung Wook Kim
Jiawei Yang
Cunjun Yu
Boris Ivanovic
Steven Waslander
Yue Wang
Sanja Fidler
Marco Pavone
Peter Karkus
95
9
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
160
4
0
17 Jun 2024
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Samar Khanna
Medhanie Irgau
David B. Lobell
Stefano Ermon
VLM
150
6
0
16 Jun 2024
Enhancing Anomaly Detection Generalization through Knowledge Exposure:
  The Dual Effects of Augmentation
Enhancing Anomaly Detection Generalization through Knowledge Exposure: The Dual Effects of Augmentation
Mohammad Akhavan Anvari
Rojina Kashefi
Vahid Reza Khazaie
Mohammad Khalooei
Mohammad Sabokrou
124
0
0
15 Jun 2024
Grounding Image Matching in 3D with MASt3R
Grounding Image Matching in 3D with MASt3R
Vincent Leroy
Yohann Cabon
Jérôme Revaud
3DGS3DV
128
164
0
14 Jun 2024
WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals
WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals
L. Adam
Vojtěch Čermák
Kostas Papafitsoros
Lukás Picek
110
2
0
13 Jun 2024
Zero-shot Image Editing with Reference Imitation
Zero-shot Image Editing with Reference Imitation
Xi Chen
Yutong Feng
Mengting Chen
Yiyang Wang
Shilong Zhang
Yu Liu
Yujun Shen
Hengshuang Zhao
DiffM
88
27
0
11 Jun 2024
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
Shester Gueuwou
Xiaodan Du
G. Shakhnarovich
Karen Livescu
SLR
104
5
0
11 Jun 2024
Active Scout: Multi-Target Tracking Using Neural Radiance Fields in Dense Urban Environments
Active Scout: Multi-Target Tracking Using Neural Radiance Fields in Dense Urban Environments
Christopher D. Hsu
Pratik Chaudhari
108
1
0
11 Jun 2024
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang
Qifan Zhang
Yu-Wei Chao
Bowen Wen
Xiaohu Guo
Yu Xiang
3DH
152
2
0
10 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
135
6
0
06 Jun 2024
An Empirical Study into Clustering of Unseen Datasets with
  Self-Supervised Encoders
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders
Scott C. Lowe
Joakim Bruslund Haurum
Sageev Oore
T. Moeslund
Graham W. Taylor
SSL
122
4
0
04 Jun 2024
Parrot: Multilingual Visual Instruction Tuning
Parrot: Multilingual Visual Instruction Tuning
Hai-Long Sun
Da-Wei Zhou
Yangfu Li
Shiyin Lu
Chao Yi
...
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
MLLM
142
12
0
04 Jun 2024
NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization
NuRF: Nudging the Particle Filter in Radiance Fields for Robot Visual Localization
Wugang Meng
Tianfu Wu
Huan Yin
Fumin Zhang
119
1
0
01 Jun 2024
RIGID: A Training-free and Model-Agnostic Framework for Robust
  AI-Generated Image Detection
RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection
Zhiyuan He
Pin-Yu Chen
Tsung-Yi Ho
101
13
0
30 May 2024
PixOOD: Pixel-Level Out-of-Distribution Detection
PixOOD: Pixel-Level Out-of-Distribution Detection
Tomávs Vojívr
Jan Sochman
Jivrí Matas
OODD
92
9
0
30 May 2024
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild
Honghao Fu
Yufei Wang
Wenhan Yang
Alex C. Kot
Bihan Wen
102
3
0
30 May 2024
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
Issar Tzachor
Boaz Lerner
Matan Levy
Michael Green
T. Shalev
...
Dvir Samuel
Noam Korngut Zailer
O. Shimshi
N. Darshan
Rami Ben-Ari
86
6
0
28 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLMISeg
151
5
0
28 May 2024
DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation
DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation
Mengtan Zhang
Yi Feng
Qijun Chen
Rui Fan
MDE
145
6
0
27 May 2024
Smoke and Mirrors in Causal Downstream Tasks
Smoke and Mirrors in Causal Downstream Tasks
Riccardo Cadei
Lukas Lindorfer
Sylvia Cremer
Cordelia Schmid
Francesco Locatello
CML
140
6
0
27 May 2024
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang
Jun Hao Liew
Hanshu Yan
Yuyang Yin
Yao Zhao
Yunchao Wei
Yunchao Wei
DiffM
207
7
0
27 May 2024
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer
Meng You
Zhiyu Zhu
Hui Liu
Junhui Hou
VGenDiffM
108
25
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
333
54
0
23 May 2024
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Simon Damm
M. Laszkiewicz
Johannes Lederer
Asja Fischer
128
8
0
23 May 2024
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo
Shuai Lu
Weihang Zhang
Huiqi Li
Huiqi Li
Hongen Liao
ViT
154
13
0
23 May 2024
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang
Arjun Karpur
Bingyi Cao
Qixing Huang
André Araujo
VLM
90
34
0
21 May 2024
Entropic associative memory for real world images
Entropic associative memory for real world images
Noé Hernández
Rafael Morales
L. A. Pineda
86
0
0
21 May 2024
EmoEdit: Evoking Emotions through Image Manipulation
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang
Jiawei Feng
Weibin Luo
Dani Lischinski
Daniel Cohen-Or
Hui Huang
DiffM
82
2
0
21 May 2024
TinyLLaVA Factory: A Modularized Codebase for Small-scale Large
  Multimodal Models
TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models
Junlong Jia
Ying Hu
Xi Weng
Yiming Shi
Miao Li
...
Baichuan Zhou
Ziyu Liu
Jie Luo
Lei Huang
Ji Wu
95
9
0
20 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
113
58
0
17 May 2024
Cross-sensor self-supervised training and alignment for remote sensing
Cross-sensor self-supervised training and alignment for remote sensing
V. Marsocci
Nicolas Audebert
82
1
0
16 May 2024
AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous
  Driving
AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving
Daniel Bogdoll
Iramm Hamdard
Lukas Namgyu Rößler
Felix Geisler
Muhammed Bayram
...
Miguel de Campos
Anushervon Tabarov
Yitian Yang
Hanno Gottschalk
J. Marius Zöllner
74
5
0
13 May 2024
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey
Jian Liu
Wei Sun
Hui Yang
Zhiwen Zeng
Chongpei Liu
Jin Zheng
Xingyu Liu
Hossein Rahmani
N. Sebe
Ajmal Mian
136
19
0
13 May 2024
PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics
Jerrin Bright
Bavesh Balaji
Yuhao Chen
David A Clausi
John S. Zelek
52
0
0
13 May 2024
General Place Recognition Survey: Towards Real-World Autonomy
General Place Recognition Survey: Towards Real-World Autonomy
Peng Yin
Jianhao Jiao
Shiqi Zhao
Lingyun Xu
Guoquan Huang
Howie Choset
Sebastian A. Scherer
Jianda Han
174
6
0
08 May 2024
Leafy Spurge Dataset: Real-world Weed Classification Within Aerial Drone
  Imagery
Leafy Spurge Dataset: Real-world Weed Classification Within Aerial Drone Imagery
Kyle Doherty
Max Gurinas
Erik Samsoe
Charles Casper
Beau Larkin
Philip W. Ramsey
Brandon Trabucco
Ruslan Salakhutdinov
33
4
0
02 May 2024
Domain-Transferred Synthetic Data Generation for Improving Monocular
  Depth Estimation
Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation
Seungyeop Lee
Knut Peterson
Solmaz Arezoomandan
Bill Cai
Peihan Li
Lifeng Zhou
David Han
MDE
39
0
0
02 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIPVLM
175
23
0
30 Apr 2024
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
Xin Zhang
Liangxiu Han
Tam Sobeih
Lianghao Han
Darren Dancey
207
2
0
26 Apr 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLMVLM
159
644
0
25 Apr 2024
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models
Bo Lin
Yingjing Xu
Xuanwen Bao
Zhou Zhao
Zuyong Zhang
Zhouyang Wang
132
3
0
23 Apr 2024
360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos
360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos
Yinzhe Xu
Huajian Huang
Yingshu Chen
Sai-Kit Yeung
VOS
102
2
0
22 Apr 2024
DF-DM: A foundational process model for multimodal data fusion in the
  artificial intelligence era
DF-DM: A foundational process model for multimodal data fusion in the artificial intelligence era
David Restrepo
Chenwei Wu
Constanza Vásquez-Venegas
Luis Filipe Nakayama
Leo Anthony Celi
Diego M. Lopez
81
13
0
18 Apr 2024
NeuroHash: A Hyperdimensional Neuro-Symbolic Framework for
  Spatially-Aware Image Hashing and Retrieval
NeuroHash: A Hyperdimensional Neuro-Symbolic Framework for Spatially-Aware Image Hashing and Retrieval
Sanggeon Yun
Ryozo Masukawa
SungHeon Jeong
Mohsen Imani
62
0
0
17 Apr 2024
Previous
123...1314151617
Next