Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders

25 March 2025

Papers citing "Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders"

36 / 36 papers shown

Title
Masked Image Modeling: A Survey Vlad Hondru Florinel-Alin Croitoru Shervin Minaee Radu Tudor Ionescu N. Sebe 132 8 0 13 Aug 2024
Fine-tuning can cripple your foundation model; preserving features may be the solution Jishnu Mukhoti Y. Gal Philip Torr P. Dokania CLL 86 40 0 25 Aug 2023
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale Yuxin Fang Wen Wang Binhui Xie Quan-Sen Sun Ledell Yu Wu Xinggang Wang Tiejun Huang Xinlong Wang Yue Cao VLM CLIP 183 718 0 14 Nov 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Wenhui Wang Hangbo Bao Li Dong Johan Bjorck Zhiliang Peng ... Kriti Aggarwal O. Mohammed Saksham Singhal Subhojit Som Furu Wei MLLM VLM ViT 141 642 0 22 Aug 2022
Multimodal Token Fusion for Vision Transformers Yikai Wang Xinghao Chen Lele Cao Wen-bing Huang Gang Hua Yunhe Wang ViT 84 179 0 19 Apr 2022
Omnivore: A Single Model for Many Visual Modalities Rohit Girdhar Mannat Singh Nikhil Ravi Laurens van der Maaten Armand Joulin Ishan Misra 259 235 0 20 Jan 2022
Are Large-scale Datasets Necessary for Self-Supervised Pre-training? Alaaeldin El-Nouby Gautier Izacard Hugo Touvron Ivan Laptev Hervé Jégou Edouard Grave SSL 80 150 0 20 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation Bowen Cheng Ishan Misra Alex Schwing Alexander Kirillov Rohit Girdhar ISeg 248 2,364 0 02 Dec 2021
iBOT: Image BERT Pre-Training with Online Tokenizer Jinghao Zhou Chen Wei Huiyu Wang Wei Shen Cihang Xie Alan Yuille Tao Kong 81 735 0 15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 462 7,757 0 11 Nov 2021
RigNet: Repetitive Image Guided Network for Depth Completion Zhiqiang Yan Kun Wang Xiang Li Zhenyu Zhang Jun Li Jian Yang 3DV VLM 74 119 0 29 Jul 2021
BEiT: BERT Pre-Training of Image Transformers Hangbo Bao Li Dong Songhao Piao Furu Wei ViT 274 2,826 0 15 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 688 6,079 0 29 Apr 2021
Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting Yangzheng Wu Mohsen Zand Ali Etemad Michael A. Greenspan 3DPC 62 37 0 06 Apr 2021
PENet: Towards Precise and Efficient Image Guided Depth Completion Mu Hu Shuling Wang Bin Li Shiyu Ning Li Fan Xiaojin Gong MDE 125 278 0 01 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 929 29,436 0 26 Feb 2021
AdaBins: Depth Estimation using Adaptive Bins S. Bhat Ibraheem Alhashim Peter Wonka 3DV MDE ViT 113 858 0 28 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 654 41,103 0 22 Oct 2020
Non-Local Spatial Propagation Network for Depth Completion Jinsun Park Kyungdon Joo Zhe Hu Chi Liu In So Kweon 3DV MDE 115 325 0 20 Jul 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 795 42,055 0 28 May 2020
Unsupervised Depth Completion from Visual Inertial Odometry A. Wong Xiaohan Fei Stephanie Tsuei Stefano Soatto MDE SSL 64 128 0 15 May 2019
Sparse and noisy LiDAR completion with RGB guidance and uncertainty Wouter Van Gansbeke D. Neven Bert De Brabandere Luc Van Gool 3DV 70 251 0 14 Feb 2019
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation Abhinav Valada Rohit Mohan Wolfram Burgard SSL 54 246 0 11 Aug 2018
Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation M. Jaritz Raoul de Charette É. Wirbel Xavier Perrotton F. Nashashibi 3DPC 3DV MDE 57 268 0 02 Aug 2018
Squeeze-and-Excitation Networks Jie Hu Li Shen Samuel Albanie Gang Sun Enhua Wu 424 26,500 0 05 Sep 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 701 131,652 0 12 Jun 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes Angela Dai Angel X. Chang Manolis Savva Maciej Halber Thomas Funkhouser Matthias Nießner 3DPC 3DV 474 4,062 0 14 Feb 2017
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects Tomás Hodan Pavel Haluza Stepán Obdrzálek Jirí Matas Manolis I. A. Lourakis Xenophon Zabulis 69 501 0 19 Jan 2017
COCO-Stuff: Thing and Stuff Classes in Context Holger Caesar J. Uijlings V. Ferrari 132 1,387 0 12 Dec 2016
Feature Pyramid Networks for Object Detection Nayeon Lee Piotr Dollár Ross B. Girshick Kaiming He Bharath Hariharan Serge J. Belongie ObjD 474 22,108 0 09 Dec 2016
Context Encoders: Feature Learning by Inpainting Deepak Pathak Philipp Krahenbuhl Jeff Donahue Trevor Darrell Alexei A. Efros SSL 67 5,297 0 25 Apr 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding Marius Cordts Mohamed Omran Sebastian Ramos Timo Rehfeld Markus Enzweiler Rodrigo Benenson Uwe Franke Stefan Roth Bernt Schiele 1.1K 11,623 0 06 Apr 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation Olaf Ronneberger Philipp Fischer Thomas Brox SSeg 3DV 1.8K 77,196 0 18 May 2015
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture David Eigen Rob Fergus VLM MDE 209 2,680 0 18 Nov 2014
Learning Rich Features from RGB-D Images for Object Detection and Segmentation Saurabh Gupta Ross B. Girshick Pablo Arbeláez Jitendra Malik ObjD 125 1,561 0 22 Jul 2014
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network David Eigen Christian Puhrsch Rob Fergus MDE 3DPC 3DV 239 4,059 0 09 Jun 2014