ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.19947
  4. Cited By
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders

Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders

25 March 2025
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
    MDE
ArXivPDFHTML

Papers citing "Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders"

36 / 36 papers shown
Title
Masked Image Modeling: A Survey
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
132
8
0
13 Aug 2024
Fine-tuning can cripple your foundation model; preserving features may
  be the solution
Fine-tuning can cripple your foundation model; preserving features may be the solution
Jishnu Mukhoti
Y. Gal
Philip Torr
P. Dokania
CLL
86
40
0
25 Aug 2023
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
183
718
0
14 Nov 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
141
642
0
22 Aug 2022
Multimodal Token Fusion for Vision Transformers
Multimodal Token Fusion for Vision Transformers
Yikai Wang
Xinghao Chen
Lele Cao
Wen-bing Huang
Gang Hua
Yunhe Wang
ViT
84
179
0
19 Apr 2022
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
259
235
0
20 Jan 2022
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
Alaaeldin El-Nouby
Gautier Izacard
Hugo Touvron
Ivan Laptev
Hervé Jégou
Edouard Grave
SSL
80
150
0
20 Dec 2021
Masked-attention Mask Transformer for Universal Image Segmentation
Masked-attention Mask Transformer for Universal Image Segmentation
Bowen Cheng
Ishan Misra
Alex Schwing
Alexander Kirillov
Rohit Girdhar
ISeg
248
2,364
0
02 Dec 2021
iBOT: Image BERT Pre-Training with Online Tokenizer
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
81
735
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
462
7,757
0
11 Nov 2021
RigNet: Repetitive Image Guided Network for Depth Completion
RigNet: Repetitive Image Guided Network for Depth Completion
Zhiqiang Yan
Kun Wang
Xiang Li
Zhenyu Zhang
Jun Li
Jian Yang
3DV
VLM
74
119
0
29 Jul 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
274
2,826
0
15 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
688
6,079
0
29 Apr 2021
Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial
  Keypoint Voting
Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting
Yangzheng Wu
Mohsen Zand
Ali Etemad
Michael A. Greenspan
3DPC
62
37
0
06 Apr 2021
PENet: Towards Precise and Efficient Image Guided Depth Completion
PENet: Towards Precise and Efficient Image Guided Depth Completion
Mu Hu
Shuling Wang
Bin Li
Shiyu Ning
Li Fan
Xiaojin Gong
MDE
125
278
0
01 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
929
29,436
0
26 Feb 2021
AdaBins: Depth Estimation using Adaptive Bins
AdaBins: Depth Estimation using Adaptive Bins
S. Bhat
Ibraheem Alhashim
Peter Wonka
3DV
MDE
ViT
113
858
0
28 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
654
41,103
0
22 Oct 2020
Non-Local Spatial Propagation Network for Depth Completion
Non-Local Spatial Propagation Network for Depth Completion
Jinsun Park
Kyungdon Joo
Zhe Hu
Chi Liu
In So Kweon
3DV
MDE
115
325
0
20 Jul 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
795
42,055
0
28 May 2020
Unsupervised Depth Completion from Visual Inertial Odometry
Unsupervised Depth Completion from Visual Inertial Odometry
A. Wong
Xiaohan Fei
Stephanie Tsuei
Stefano Soatto
MDE
SSL
64
128
0
15 May 2019
Sparse and noisy LiDAR completion with RGB guidance and uncertainty
Sparse and noisy LiDAR completion with RGB guidance and uncertainty
Wouter Van Gansbeke
D. Neven
Bert De Brabandere
Luc Van Gool
3DV
70
251
0
14 Feb 2019
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Abhinav Valada
Rohit Mohan
Wolfram Burgard
SSL
54
246
0
11 Aug 2018
Sparse and Dense Data with CNNs: Depth Completion and Semantic
  Segmentation
Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation
M. Jaritz
Raoul de Charette
É. Wirbel
Xavier Perrotton
F. Nashashibi
3DPC
3DV
MDE
57
268
0
02 Aug 2018
Squeeze-and-Excitation Networks
Squeeze-and-Excitation Networks
Jie Hu
Li Shen
Samuel Albanie
Gang Sun
Enhua Wu
424
26,500
0
05 Sep 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
474
4,062
0
14 Feb 2017
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
Tomás Hodan
Pavel Haluza
Stepán Obdrzálek
Jirí Matas
Manolis I. A. Lourakis
Xenophon Zabulis
69
501
0
19 Jan 2017
COCO-Stuff: Thing and Stuff Classes in Context
COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar
J. Uijlings
V. Ferrari
132
1,387
0
12 Dec 2016
Feature Pyramid Networks for Object Detection
Feature Pyramid Networks for Object Detection
Nayeon Lee
Piotr Dollár
Ross B. Girshick
Kaiming He
Bharath Hariharan
Serge J. Belongie
ObjD
474
22,108
0
09 Dec 2016
Context Encoders: Feature Learning by Inpainting
Context Encoders: Feature Learning by Inpainting
Deepak Pathak
Philipp Krahenbuhl
Jeff Donahue
Trevor Darrell
Alexei A. Efros
SSL
67
5,297
0
25 Apr 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts
Mohamed Omran
Sebastian Ramos
Timo Rehfeld
Markus Enzweiler
Rodrigo Benenson
Uwe Franke
Stefan Roth
Bernt Schiele
1.1K
11,623
0
06 Apr 2016
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.8K
77,196
0
18 May 2015
Predicting Depth, Surface Normals and Semantic Labels with a Common
  Multi-Scale Convolutional Architecture
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
David Eigen
Rob Fergus
VLM
MDE
209
2,680
0
18 Nov 2014
Learning Rich Features from RGB-D Images for Object Detection and
  Segmentation
Learning Rich Features from RGB-D Images for Object Detection and Segmentation
Saurabh Gupta
Ross B. Girshick
Pablo Arbeláez
Jitendra Malik
ObjD
125
1,561
0
22 Jul 2014
Depth Map Prediction from a Single Image using a Multi-Scale Deep
  Network
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network
David Eigen
Christian Puhrsch
Rob Fergus
MDE
3DPC
3DV
239
4,059
0
09 Jun 2014
1