Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.01077
Cited By
Can Language Understand Depth?
3 July 2022
Renrui Zhang
Ziyao Zeng
Ziyu Guo
Yafeng Li
VLM
MDE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Language Understand Depth?"
50 / 53 papers shown
Title
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
46
0
0
05 May 2025
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang
Guoyu Lu
VLM
MDE
50
0
0
18 Mar 2025
Multi-Modality Driven LoRA for Adverse Condition Depth Estimation
Guanglei Yang
Rui Tian
Yongqiang Zhang
Zhun Zhong
Yongqiang Li
Wangmeng Zuo
33
0
0
31 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
98
1
0
04 Dec 2024
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Ziyao Zeng
Jingcheng Ni
Daniel Wang
Patrick Rim
Younjoon Chung
Fengyu Yang
Byung-Woo Hong
A. Wong
DiffM
MDE
106
2
0
24 Nov 2024
Enhancing Exchange Rate Forecasting with Explainable Deep Learning Models
Shuchen Meng
Andi Chen
Chihang Wang
Mengyao Zheng
Fangyu Wu
Xupeng Chen
Haowei Ni
Panfeng Li
48
2
0
25 Oct 2024
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
Ce Zhang
Simon Stepputtis
Katia P. Sycara
Yaqi Xie
VLM
35
5
0
16 Oct 2024
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Ziyao Zeng
Yangchao Wu
Hyoungseob Park
Daniel Wang
Fengyu Yang
Stefano Soatto
Dong Lao
Byung-Woo Hong
Alex Wong
MDE
20
7
0
03 Oct 2024
Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation
Chunliang Tao
Xiaojing Fan
Yahe Yang
28
19
0
18 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
38
13
0
15 Sep 2024
Style Transfer: From Stitching to Neural Networks
Xinhe Xu
Zhuoer Wang
Yihan Zhang
Yizhou Liu
Zhaoyue Wang
Zhihao Xu
Muhan Zhao
Huaiying Luo
28
3
0
01 Sep 2024
Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods
Yiming Zhou
Zixuan Zeng
Andi Chen
Xiaofan Zhou
Haowei Ni
Shiyao Zhang
Panfeng Li
Liangxi Liu
Mengyao Zheng
Xupeng Chen
3DGS
37
17
0
08 Aug 2024
Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao Du
Qiang Zhai
Weihang Dai
X. Li
46
8
0
07 Aug 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
33
9
0
19 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
39
0
0
11 Jul 2024
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Shuang Hao
Chunlin Zhong
He Tang
26
1
0
09 Jul 2024
SpatialBot: Precise Spatial Understanding with Vision Language Models
Wenxiao Cai
Yaroslav Ponomarenko
Jianhao Yuan
Xiaoqi Li
Wankou Yang
Hao Dong
Bo-Lu Zhao
VLM
46
27
0
19 Jun 2024
Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
Zhicheng Ding
Panfeng Li
Qikai Yang
Siyang Li
VLM
MLLM
40
13
0
04 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi
David T. Hoffmann
Max Argus
Volker Fischer
Thomas Brox
VLM
50
0
0
11 Apr 2024
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng
Daniel Wang
Fengyu Yang
Hyoungseob Park
Yangchao Wu
Stefano Soatto
Byung-Woo Hong
Dong Lao
Alex Wong
MDE
40
26
0
04 Apr 2024
Transfer CLIP for Generalizable Image Denoising
Junting Cheng
Dong Liang
Shan Tan
VLM
35
12
0
22 Mar 2024
Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset
Ning Cheng
You Li
Jing Gao
Bin Fang
Jinan Xu
Wenjuan Han
46
4
0
14 Mar 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu
Sipeng Zheng
Qin Jin
44
7
0
09 Mar 2024
CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Xiao Lin
Minghao Zhu
Ronghao Dang
Guangliang Zhou
Shaolong Shu
Feng Lin
Chengju Liu
Qi Chen
CLIP
41
8
0
24 Feb 2024
CLIP Can Understand Depth
Dunam Kim
Seokju Lee
VLM
MDE
38
2
0
05 Feb 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
38
53
0
31 Jan 2024
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
Rongqin Liang
Yuanman Li
Jiantao Zhou
Xia Li
23
6
0
07 Jan 2024
The Potential of Vision-Language Models for Content Moderation of Children's Videos
Syed Hammad Ahmed
Shengnan Hu
G. Sukthankar
VLM
19
3
0
06 Dec 2023
Consistency Prototype Module and Motion Compensation for Few-Shot Action Recognition (CLIP-CP
M
2
\mathbf{M^2}
M
2
C)
Fei-Yu Guo
Li Zhu
YiKang Wang
Han Qi
17
2
0
02 Dec 2023
Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks
Panfeng Li
M. Abouelenien
Rada Mihalcea
Zhicheng Ding
Qikai Yang
Yiming Zhou
24
73
0
18 Nov 2023
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Xue-mei Hu
Ce Zhang
Yi Zhang
Bowen Hai
Ke Yu
Zhihai He
MDE
VLM
28
17
0
02 Nov 2023
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
17
12
0
06 Aug 2023
LAMP: Leveraging Language Prompts for Multi-person Pose Estimation
Shengnan Hu
Ce Zheng
Zixiang Zhou
C. L. P. Chen
G. Sukthankar
14
3
0
21 Jul 2023
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao-Liang Yang
Liyuan Pan
Yan Yang
Richard Hartley
Miaomiao Liu
VLM
29
9
0
19 Jul 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
16
7
0
28 Jun 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
Xiang-yu Zhu
Renrui Zhang
Bowei He
A-Long Zhou
Dong Wang
Bingyan Zhao
Peng Gao
VLM
29
79
0
03 Apr 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
33
53
0
29 Mar 2023
Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
Renrui Zhang
Liuhui Wang
Ziyu Guo
Yali Wang
Peng Gao
Hongsheng Li
Jianbo Shi
3DPC
24
51
0
14 Mar 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
Anthony Chen
Kevin Zhang
Renrui Zhang
Zihan Wang
Yuheng Lu
Yandong Guo
Shanghang Zhang
3DPC
70
60
0
14 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLM
MLLM
35
170
0
03 Mar 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzhi Li
Pheng-Ann Heng
3DPC
30
52
0
27 Feb 2023
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning
Pei-Kai Huang
L. Liu
Renrui Zhang
Song Zhang
Xin Xu
Bai-Qi Wang
G. Liu
3DPC
MDE
34
42
0
28 Dec 2022
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
21
22
0
13 Dec 2022
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language Models And Image-Object Cross-Attention
Dylan Auty
K. Mikolajczyk
VLM
15
3
0
30 Nov 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
53
62
0
29 Sep 2022
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
VLM
AAML
57
109
0
28 Sep 2022
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
Yinghui Xing
Qirui Wu
De-Chun Cheng
Shizhou Zhang
Guoqiang Liang
Peng Wang
Yanning Zhang
VLM
VPVLM
54
51
0
17 Aug 2022
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
166
435
0
04 Dec 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
189
385
0
06 Nov 2021
PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation
Hualie Jiang
Laiyan Ding
Junjie Hu
Rui Huang
3DPC
SSL
MDE
49
19
0
12 Oct 2021
1
2
Next