ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLM
    CLIP
    SSL
ArXivPDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 2,222 papers shown
Title
Probing Fine-Grained Action Understanding and Cross-View Generalization
  of Foundation Models
Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models
Thinesh Thiyakesan Ponbagavathi
Kunyu Peng
Alina Roitberg
56
1
0
22 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
56
5
0
22 Jul 2024
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Zheng Chong
Xiao Dong
Haoxiang Li
Shiyue Zhang
Wenqing Zhang
Xujie Zhang
Hanqing Zhao
D. Jiang
Xiaodan Liang
DiffM
80
18
0
21 Jul 2024
Enhancing Skin Disease Classification Leveraging Transformer-based Deep
  Learning Architectures and Explainable AI
Enhancing Skin Disease Classification Leveraging Transformer-based Deep Learning Architectures and Explainable AI
Jayanth Mohan
Arrun Sivasubramanian
V. Sowmya
Ravi Vinayakumar
MedIm
31
6
0
20 Jul 2024
EarthMarker: Visual Prompt Learning for Region-level and Point-level
  Remote Sensing Imagery Comprehension
EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension
Wei Zhang
Miaoxin Cai
Tong Zhang
Jun Li
Zhuang Yin
Xuerui Mao
82
7
0
18 Jul 2024
DISCOVER: A Data-driven Interactive System for Comprehensive
  Observation, Visualization, and ExploRation of Human Behaviour
DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour
Dominik Schiller
Tobias Hallmen
Daksitha Senel Withanage Don
Elisabeth André
Tobias Baur
28
3
0
18 Jul 2024
Learn to Memorize and to Forget: A Continual Learning Perspective of
  Dynamic SLAM
Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
Baicheng Li
Zike Yan
Dong Wu
Hanqing Jiang
Hongbin Zha
CLL
32
0
0
18 Jul 2024
General Vision Encoder Features as Guidance in Medical Image
  Registration
General Vision Encoder Features as Guidance in Medical Image Registration
Fryderyk Kogl
Anna Reithmeir
Vasiliki Sideri-Lampretsa
Ines P. Machado
R. Braren
Daniel Rückert
Julia A. Schnabel
Veronika A. Zimmer
MedIm
51
0
0
18 Jul 2024
Make a Strong Teacher with Label Assistance: A Novel Knowledge
  Distillation Approach for Semantic Segmentation
Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation
Shoumeng Qiu
Jie Chen
Xinrun Li
Ru Wan
Xiangyang Xue
Jian Pu
VLM
60
3
0
18 Jul 2024
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
Yibin Yan
Weidi Xie
RALM
37
10
0
17 Jul 2024
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large
  Language Models
MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
Leyang Shen
Gongwei Chen
Rui Shao
Weili Guan
Liqiang Nie
MoE
59
7
0
17 Jul 2024
Global-Local Similarity for Efficient Fine-Grained Image Recognition
  with Vision Transformers
Global-Local Similarity for Efficient Fine-Grained Image Recognition with Vision Transformers
Edwin Arkel Rios
Min-Chun Hu
Bo-Cheng Lai
ViT
37
2
0
17 Jul 2024
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted
  Features
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc P.J. Strater
Mohammadreza Salehi
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
48
7
0
17 Jul 2024
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised
  Pre-Training
DiNO-Diffusion. Scaling Medical Diffusion via Self-Supervised Pre-Training
Guillermo Jiménez-Pérez
Pedro Osório
Josef Cersovsky
Javier Montalt-Tordera
Jens Hooge
Steffen Vogler
Sadegh Mohammadi
MedIm
63
2
0
16 Jul 2024
Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation
Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation
Shijie Chang
Youwei Pang
Xiaoqi Zhao
Lihe Zhang
Huchuan Lu
50
1
0
16 Jul 2024
PRET: Planning with Directed Fidelity Trajectory for Vision and Language
  Navigation
PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
Renjie Lu
Jingke Meng
Wei-Shi Zheng
50
3
0
16 Jul 2024
Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded
  Scenes
Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes
Zhi Cai
Yingjie Gao
Yaoyan Zheng
Nan Zhou
Di Huang
VLM
49
5
0
16 Jul 2024
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang
Frederic Z. Zhang
Cristian Rodriguez
Yizhak Ben-Shabat
A. Cherian
Stephen Gould
64
2
0
16 Jul 2024
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
Nir Barel
Ron Shapira Weber
Nir Mualem
Shahaf E. Finder
Oren Freifeld
70
1
0
16 Jul 2024
Towards Adversarially Robust Vision-Language Models: Insights from
  Design Choices and Prompt Formatting Techniques
Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques
Rishika Bhagwatkar
Shravan Nayak
Reza Bayat
Alexis Roger
Daniel Z Kaplan
P. Bashivan
Irina Rish
AAML
VLM
63
1
0
15 Jul 2024
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled
  Diffusion Models
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
Zijian He
Peixin Chen
Guangrun Wang
Guanbin Li
Philip Torr
Liang Lin
VGen
DiffM
34
6
0
15 Jul 2024
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation
  Models
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
Xin-Jian Wu
Rui-Song Zhang
Jie Qin
Shijie Ma
Cheng-Lin Liu
VLM
42
1
0
14 Jul 2024
A Self-Supervised Learning Pipeline for Demographically Fair Facial
  Attribute Classification
A Self-Supervised Learning Pipeline for Demographically Fair Facial Attribute Classification
Sreeraj Ramachandran
A. Rattani
42
1
0
14 Jul 2024
CLOVER: Context-aware Long-term Object Viewpoint- and Environment-
  Invariant Representation Learning
CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning
Dongmyeong Lee
Amanda Adkins
Joydeep Biswas
70
0
0
12 Jul 2024
Weakly-supervised Autism Severity Assessment in Long Videos
Weakly-supervised Autism Severity Assessment in Long Videos
Abid Ali
Mahmoud Ali
J. Odobez
Camilla Barbini
Séverine Dubuisson
Francois Bremond
Susanne Thümmler
30
0
0
12 Jul 2024
Multi-Modal Dataset Creation for Federated Learning with DICOM
  Structured Reports
Multi-Modal Dataset Creation for Federated Learning with DICOM Structured Reports
Malte Tolle
L. Burger
Halvar Kelm
Florian André
Peter Bannas
...
Jan Moritz Seliger
Stefan Simm
Tim Friede
Tim Seidler
Sandy Engelhardt
36
0
0
12 Jul 2024
Refusing Safe Prompts for Multi-modal Large Language Models
Refusing Safe Prompts for Multi-modal Large Language Models
Zedian Shao
Hongbin Liu
Yuepeng Hu
Neil Zhenqiang Gong
MLLM
LRM
51
1
0
12 Jul 2024
Textual Query-Driven Mask Transformer for Domain Generalized
  Segmentation
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak
Byeongju Woo
Sunghwan Kim
Dae-Hwan Kim
Hoseong Kim
81
4
0
12 Jul 2024
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using
  Large-scale Public Data
Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data
Cherie Ho
Jiaye Zou
Omar Alama
Sai Mitheran Jagadesh Kumar
Benjamin Chiang
Taneesh Gupta
Chen Wang
Nikhil Varma Keetha
Katia Sycara
Sebastian Scherer
46
2
0
11 Jul 2024
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
Akshay Krishnan
Abhijit Kundu
Kevis-Kokitsi Maninis
James Hays
Matthew Brown
30
8
0
11 Jul 2024
WildGaussians: 3D Gaussian Splatting in the Wild
WildGaussians: 3D Gaussian Splatting in the Wild
Jonáš Kulhánek
Songyou Peng
Zuzana Kukelova
Marc Pollefeys
Torsten Sattler
3DGS
103
41
0
11 Jul 2024
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
Rui Yin
Yulun Zhang
Zherong Pan
Jianjun Zhu
Cheng Wang
Biao Jia
47
1
0
11 Jul 2024
AddressCLIP: Empowering Vision-Language Models for City-wide Image
  Address Localization
AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization
Shixiong Xu
Chenghao Zhang
Lubin Fan
Gaofeng Meng
Shiming Xiang
Jieping Ye
VLM
51
5
0
11 Jul 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large
  Multimodal Models
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
Feng Li
Renrui Zhang
Hao Zhang
Yuanhan Zhang
Bo Li
Wei Li
Zejun Ma
Chunyuan Li
MLLM
VLM
60
203
0
10 Jul 2024
Swiss DINO: Efficient and Versatile Vision Framework for On-device
  Personal Object Search
Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search
Kirill Paramonov
Jia-Xing Zhong
Umberto Michieli
J. Moon
Mete Ozay
73
2
0
10 Jul 2024
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked
  Autoencoder
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder
Kun-Hsuan Wu
Zhiguo Jiang
Kunming Tang
Jun Shi
Fengying Xie
Wei Wang
Haibo Wu
Yushan Zheng
30
1
0
10 Jul 2024
Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation
  Pretraining
Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining
Tianfang Sun
Zhizhong Zhang
Xin Tan
Yanyun Qu
Yuan Xie
69
0
0
10 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Yansen Wang
Huchuan Lu
Ming-Hsuan Yang
VOS
64
4
0
10 Jul 2024
Controlling Space and Time with Diffusion Models
Controlling Space and Time with Diffusion Models
Daniel Watson
Saurabh Saxena
Lala Li
Andrea Tagliasacchi
David J. Fleet
VGen
94
27
0
10 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
52
4
0
10 Jul 2024
ProtoSAM: One-Shot Medical Image Segmentation With Foundational Models
ProtoSAM: One-Shot Medical Image Segmentation With Foundational Models
Lev Ayzenberg
Raja Giryes
H. Greenspan
VLM
MedIm
56
4
0
09 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
59
1
0
09 Jul 2024
LVLM-empowered Multi-modal Representation Learning for Visual Place
  Recognition
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Teng Wang
Lingquan Meng
Lei Cheng
Changyin Sun
39
0
0
09 Jul 2024
A Clinical Benchmark of Public Self-Supervised Pathology Foundation
  Models
A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models
Gabriele Campanella
Shengjia Chen
Ruchika Verma
Jennifer Zeng
A. Stock
...
Kuan-lin Huang
Ricky Kwan
Jane Houldsworth
Adam J. Schoenfeld
Chad M. Vanderbilt
AI4MH
OOD
LM&MA
43
16
0
09 Jul 2024
MiraData: A Large-Scale Video Dataset with Long Durations and Structured
  Captions
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Xuan Ju
Yiming Gao
Zhaoyang Zhang
Ziyang Yuan
Xintao Wang
Ailing Zeng
Yu Xiong
Qiang Xu
Ying Shan
VGen
77
39
0
08 Jul 2024
Multi-Label Plant Species Classification with Self-Supervised Vision
  Transformers
Multi-Label Plant Species Classification with Self-Supervised Vision Transformers
Murilo Gustineli
Anthony Miyaguchi
Ian Stalter
33
3
0
08 Jul 2024
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side
  Images
Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images
Zhangyang Qi
Yunhan Yang
Mengchen Zhang
Long Xing
Xiaoyang Wu
Tong Wu
Dahua Lin
Xihui Liu
Jiaqi Wang
Hengshuang Zhao
DiffM
54
8
0
08 Jul 2024
4D Contrastive Superflows are Dense 3D Representation Learners
4D Contrastive Superflows are Dense 3D Representation Learners
Xiang Xu
Lingdong Kong
Hui Shuai
Wenwei Zhang
Liang Pan
Kai Chen
Ziwei Liu
Qingshan Liu
3DPC
60
7
0
08 Jul 2024
Transfer Learning with Self-Supervised Vision Transformers for Snake
  Identification
Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
Anthony Miyaguchi
Murilo Gustineli
Austin Fischer
Ryan Lundqvist
29
3
0
08 Jul 2024
KidSat: satellite imagery to map childhood poverty dataset and benchmark
KidSat: satellite imagery to map childhood poverty dataset and benchmark
Makkunda Sharma
Fan Yang
Duy-Nhat Vo
Esra Suel
Swapnil Mishra
Samir Bhatt
Oliver Fiala
William Rudgard
Seth Flaxman
84
1
0
08 Jul 2024
Previous
123...242526...434445
Next