Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,194 papers shown
Title
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
Shanghang Zhang
72
8
0
13 Mar 2025
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Xiangyu Shi
Zerui Li
Wenqi Lyu
Jiatong Xia
Feras Dayoub
Yanyuan Qiao
Qi Wu
57
1
0
13 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yong Li
Xinggang Wang
LM&Ro
64
0
0
13 Mar 2025
One-Shot Federated Unsupervised Domain Adaptation with Scaled Entropy Attention and Multi-Source Smoothed Pseudo Labeling
Ali Abedi
Qiang Wu
Ning Zhang
Farhad Pourpanah
FedML
71
0
0
13 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLM
VLM
Presented at
ResearchTrend Connect | VLM
on
21 May 2025
92
0
0
13 Mar 2025
Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective
Xiaoming Zhao
Alexander Schwing
FaML
63
0
0
13 Mar 2025
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu
Lei Fan
Maurice Pagnucco
Yang Song
55
0
0
13 Mar 2025
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
Yijing Lin
Mengqi Huang
Shuhan Zhuang
Zhendong Mao
VGen
51
0
0
13 Mar 2025
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Leonard Waldmann
Ando Shah
Yi Wang
Nils Lehmann
Adam J. Stewart
Zhitong Xiong
Xiao Xiang Zhu
Stefan Bauer
John Chuang
51
1
0
13 Mar 2025
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Stefan Sylvius Wagner
Stefan Harmeling
OCL
76
0
0
12 Mar 2025
Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training
Jiatong Xia
Lingqiao Liu
3DGS
60
0
0
12 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
55
0
0
12 Mar 2025
Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
Wenyi Lian
Joakim Lindblad
Patrick Micke
Natasa Sladoje
62
0
0
12 Mar 2025
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
Hariprasath Govindarajan
Maciej K. Wozniak
Marvin Klingner
Camille Maurice
B. R. Kiran
S. Yogamani
55
0
0
12 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
52
0
0
12 Mar 2025
SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction
Dai Sun
Huhao Guan
Kun Zhang
Xike Xie
S.Kevin Zhou
3DGS
63
0
0
12 Mar 2025
InteractEdit: Zero-Shot Editing of Human-Object Interactions in Images
Jiun Tian Hoe
Weipeng Hu
Wei Zhou
Chao Xie
Ziwei Wang
Chee Seng Chan
Xudong Jiang
Y. Tan
61
0
0
12 Mar 2025
Online Language Splatting
Saimouli Katragadda
Cho-Ying Wu
Yuliang Guo
Xinyu Huang
Guoquan Huang
Liu Ren
3DGS
OffRL
65
0
0
12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
Chong Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
69
0
0
12 Mar 2025
Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment
Nazanin Moradinasab
S. Sengupta
Jiebei Liu
Sana Syed
Donald Brown
60
0
0
12 Mar 2025
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen
Shuchen Xue
Yuyang Zhao
Jincheng Yu
Sayak Paul
Junyu Chen
Han Cai
E. Xie
Enze Xie
VLM
66
2
0
12 Mar 2025
DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks
Wei Cui
Tongzi Wu
Jesse C. Cresswell
Yi Sui
Keyvan Golestan
68
0
0
12 Mar 2025
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang
June Suk Choi
Jaehyeong Jo
Kimin Lee
Sung Ju Hwang
DiffM
WIGM
87
1
0
12 Mar 2025
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
Feng Zhou
Pu Cao
Yiyang Ma
Lu Yang
Jianqin Yin
DiffM
51
0
0
12 Mar 2025
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
Kechun Xu
Xunlong Xia
Kaixuan Wang
Yifei Yang
Yunxuan Mao
Bing Deng
R. Xiong
Yansen Wang
OffRL
72
0
0
12 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
Mamba
MLLM
82
1
0
11 Mar 2025
CQVPR: Landmark-aware Contextual Queries for Visual Place Recognition
Dongyue Li
Daisuke Deguchi
Hiroshi Murase
63
0
0
11 Mar 2025
Enhancing Sentiment Analysis through Multimodal Fusion: A BERT-DINOv2 Approach
Taoxu Zhao
Meisi Li
Kehao Chen
Liye Wang
Xucheng Zhou
Kunal Chaturvedi
Mukesh Prasad
Ali Anaissi
Ali Braytee
58
0
0
11 Mar 2025
"Principal Components" Enable A New Language of Images
Xin Wen
Bingchen Zhao
Ismail Elezi
Jiankang Deng
Xiaojuan Qi
66
0
0
11 Mar 2025
Twinner: Shining Light on Digital Twins in a Few Snaps
Jesus Zarzar
Tom Monnier
Roman Shapovalov
Andrea Vedaldi
David Novotny
53
0
0
11 Mar 2025
1LoRA: Summation Compression for Very Low-Rank Adaptation
Alessio Quercia
Zhuo Cao
Arya Bangun
Richard D. Paul
Abigail Morrison
Ira Assent
Hanno Scharr
63
0
0
11 Mar 2025
Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation
Wenqiang Zu
Shenghao Xie
Hao Chen
Lei Ma
MedIm
47
0
0
11 Mar 2025
FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting
GeonU Kim
Kim Youwang
Lee Hyoseok
Tae-Hyun Oh
3DGS
80
0
0
11 Mar 2025
SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Hesen Chen
Junyan Wang
Zhiyu Tan
Hao Li
58
0
0
11 Mar 2025
MsaMIL-Net: An End-to-End Multi-Scale Aware Multiple Instance Learning Network for Efficient Whole Slide Image Classification
Jiangping Wen
Jinyu Wen
Meie Fang
53
0
0
11 Mar 2025
Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
Soonwoo Kwon
Jin-Young Kim
Hyojun Go
Kyungjune Baek
58
0
0
11 Mar 2025
MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction
Chenfeng Hou
Qi Xun Yeo
Mengqi Guo
Yongxin Su
Yanyan Li
G. Lee
3DGS
70
2
0
11 Mar 2025
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong
Necati Cihan Camgöz
Richard Bowden
SLR
53
0
0
11 Mar 2025
Seal Your Backdoor with Variational Defense
Ivan Sabolić
Matej Grcić
Sinisa Segvic
AAML
204
0
0
11 Mar 2025
FP3: A 3D Foundation Policy for Robotic Manipulation
Rujia Yang
Geng Chen
Chuan Wen
Yang Gao
LM&Ro
81
1
0
11 Mar 2025
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo
Jie Hu
Yansong Qu
Liujuan Cao
3DGS
211
0
0
11 Mar 2025
MGHanD: Multi-modal Guidance for authentic Hand Diffusion
Taehyeon Eum
Jieun Choi
Tae-Kyun Kim
52
0
0
11 Mar 2025
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
Kwan Yun
Seokhyeon Hong
Chaelin Kim
Junyong Noh
DiffM
VGen
48
0
0
11 Mar 2025
Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion
Mona Sheikh Zeinoddin
Mobarakol Islam
Zafer Tandogdu
Greg Shaw
Mathew J. Clarkson
E. Mazomenos
Danail Stoyanov
198
0
0
10 Mar 2025
Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization
Mihcael Green
Matan Levy
Issar Tzachor
Dvir Samuel
N. Darshan
Rami Ben-Ari
56
0
0
10 Mar 2025
Task-Specific Knowledge Distillation from the Vision Foundation Model for Enhanced Medical Image Segmentation
Pengchen Liang
Haishan Huang
Bin Pu
Jianguo Chen
Xiang Hua
Jing Zhang
Weibo Ma
Z. Chen
Yiwei Li
Qing Chang
48
0
0
10 Mar 2025
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Lixue Gong
Xiaoxia Hou
Fanshi Li
Liang Li
Xiaochen Lian
...
Qi Zhang
Yuwei Zhang
Shijia Zhao
Jianchao Yang
Weilin Huang
DiffM
VLM
63
6
0
10 Mar 2025
Visual and Text Prompt Segmentation: A Novel Multi-Model Framework for Remote Sensing
Xing Zi
Kairui Jin
Xian Tao
Jun Li
Ali Braytee
Rajiv Ratn Shah
Mukesh Prasad
VLM
72
0
0
10 Mar 2025
Semi-Supervised Medical Image Segmentation via Knowledge Mining from Large Models
Yuchen Mao
Hongwei Bran Li
Yinyi Lai
G. Papanastasiou
Peng Qi
Yunjie Yang
Chengjia Wang
VLM
55
1
0
10 Mar 2025
Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression
Thibaut Loiseau
Guillaume Bourmaud
Vincent Lepetit
67
0
0
10 Mar 2025
Previous
1
2
3
...
8
9
10
...
42
43
44
Next