ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.01077
  4. Cited By
Can Language Understand Depth?

Can Language Understand Depth?

3 July 2022
Renrui Zhang
Ziyao Zeng
Ziyu Guo
Yafeng Li
    VLM
    MDE
ArXivPDFHTML

Papers citing "Can Language Understand Depth?"

50 / 53 papers shown
Title
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
46
0
0
05 May 2025
Vision-Language Embodiment for Monocular Depth Estimation
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang
Guoyu Lu
VLM
MDE
50
0
0
18 Mar 2025
Multi-Modality Driven LoRA for Adverse Condition Depth Estimation
Multi-Modality Driven LoRA for Adverse Condition Depth Estimation
Guanglei Yang
Rui Tian
Yongqiang Zhang
Zhun Zhong
Yongqiang Li
Wangmeng Zuo
33
0
0
31 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
98
1
0
04 Dec 2024
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Ziyao Zeng
Jingcheng Ni
Daniel Wang
Patrick Rim
Younjoon Chung
Fengyu Yang
Byung-Woo Hong
A. Wong
DiffM
MDE
106
2
0
24 Nov 2024
Enhancing Exchange Rate Forecasting with Explainable Deep Learning
  Models
Enhancing Exchange Rate Forecasting with Explainable Deep Learning Models
Shuchen Meng
Andi Chen
Chihang Wang
Mengyao Zheng
Fangyu Wu
Xupeng Chen
Haowei Ni
Panfeng Li
48
2
0
25 Oct 2024
Dual Prototype Evolving for Test-Time Generalization of Vision-Language
  Models
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
Ce Zhang
Simon Stepputtis
Katia P. Sycara
Yaqi Xie
VLM
35
5
0
16 Oct 2024
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through
  Language Descriptions
RSA: Resolving Scale Ambiguities in Monocular Depth Estimators through Language Descriptions
Ziyao Zeng
Yangchao Wu
Hyoungseob Park
Daniel Wang
Fengyu Yang
Stefano Soatto
Dong Lao
Byung-Woo Hong
Alex Wong
MDE
20
7
0
03 Oct 2024
Harnessing LLMs for API Interactions: A Framework for Classification and
  Synthetic Data Generation
Harnessing LLMs for API Interactions: A Framework for Classification and Synthetic Data Generation
Chunliang Tao
Xiaojing Fan
Yahe Yang
28
19
0
18 Sep 2024
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training
Yiyi Tao
Zhuoyue Wang
Hang Zhang
Lun Wang
VLM
38
13
0
15 Sep 2024
Style Transfer: From Stitching to Neural Networks
Style Transfer: From Stitching to Neural Networks
Xinhe Xu
Zhuoer Wang
Yihan Zhang
Yizhou Liu
Zhaoyue Wang
Zhihao Xu
Muhan Zhao
Huaiying Luo
28
3
0
01 Sep 2024
Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs
  Gaussian-Based Methods
Evaluating Modern Approaches in 3D Scene Reconstruction: NeRF vs Gaussian-Based Methods
Yiming Zhou
Zixuan Zeng
Andi Chen
Xiaofan Zhou
Haowei Ni
Shiyao Zhang
Panfeng Li
Liangxi Liu
Mengyao Zheng
Xupeng Chen
3DGS
37
17
0
08 Aug 2024
Teach CLIP to Develop a Number Sense for Ordinal Regression
Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao Du
Qiang Zhai
Weihang Dai
X. Li
46
8
0
07 Aug 2024
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
NeuroBind: Towards Unified Multimodal Representations for Neural Signals
Fengyu Yang
Chao Feng
Daniel Wang
Tianye Wang
Ziyao Zeng
...
Hyoungseob Park
Pengliang Ji
Han Zhao
Yuanning Li
Alex Wong
33
9
0
19 Jul 2024
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement
Zijie Yue
Miaojing Shi
Hanli Wang
Shuai Ding
Qijun Chen
Shanlin Yang
39
0
0
11 Jul 2024
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient
  Object Detection
CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection
Shuang Hao
Chunlin Zhong
He Tang
26
1
0
09 Jul 2024
SpatialBot: Precise Spatial Understanding with Vision Language Models
SpatialBot: Precise Spatial Understanding with Vision Language Models
Wenxiao Cai
Yaroslav Ponomarenko
Jianhao Yuan
Xiaoqi Li
Wankou Yang
Hao Dong
Bo-Lu Zhao
VLM
46
27
0
19 Jun 2024
Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
Enhance Image-to-Image Generation with LLaVA Prompt and Negative Prompt
Zhicheng Ding
Panfeng Li
Qikai Yang
Siyang Li
VLM
MLLM
40
13
0
04 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi
David T. Hoffmann
Max Argus
Volker Fischer
Thomas Brox
VLM
50
0
0
11 Apr 2024
WorDepth: Variational Language Prior for Monocular Depth Estimation
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng
Daniel Wang
Fengyu Yang
Hyoungseob Park
Yangchao Wu
Stefano Soatto
Byung-Woo Hong
Dong Lao
Alex Wong
MDE
40
26
0
04 Apr 2024
Transfer CLIP for Generalizable Image Denoising
Transfer CLIP for Generalizable Image Denoising
Junting Cheng
Dong Liang
Shan Tan
VLM
35
12
0
22 Mar 2024
Towards Comprehensive Multimodal Perception: Introducing the
  Touch-Language-Vision Dataset
Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset
Ning Cheng
You Li
Jing Gao
Bin Fang
Jinan Xu
Wenjuan Han
46
4
0
14 Mar 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
  Interaction in the Multi-View World
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu
Sipeng Zheng
Qin Jin
44
7
0
09 Mar 2024
CLIPose: Category-Level Object Pose Estimation with Pre-trained
  Vision-Language Knowledge
CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge
Xiao Lin
Minghao Zhu
Ronghao Dang
Guangliang Zhou
Shaolong Shu
Feng Lin
Chengju Liu
Qi Chen
CLIP
41
8
0
24 Feb 2024
CLIP Can Understand Depth
CLIP Can Understand Depth
Dunam Kim
Seokju Lee
VLM
MDE
38
2
0
05 Feb 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile
  Representations
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
38
53
0
31 Jan 2024
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency
  Modeling in Driving Videos
Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos
Rongqin Liang
Yuanman Li
Jiantao Zhou
Xia Li
23
6
0
07 Jan 2024
The Potential of Vision-Language Models for Content Moderation of
  Children's Videos
The Potential of Vision-Language Models for Content Moderation of Children's Videos
Syed Hammad Ahmed
Shengnan Hu
G. Sukthankar
VLM
19
3
0
06 Dec 2023
Consistency Prototype Module and Motion Compensation for Few-Shot Action
  Recognition (CLIP-CP$\mathbf{M^2}$C)
Consistency Prototype Module and Motion Compensation for Few-Shot Action Recognition (CLIP-CPM2\mathbf{M^2}M2C)
Fei-Yu Guo
Li Zhu
YiKang Wang
Han Qi
17
2
0
02 Dec 2023
Deception Detection from Linguistic and Physiological Data Streams Using
  Bimodal Convolutional Neural Networks
Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks
Panfeng Li
M. Abouelenien
Rada Mihalcea
Zhicheng Ding
Qikai Yang
Yiming Zhou
24
73
0
18 Nov 2023
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Learning to Adapt CLIP for Few-Shot Monocular Depth Estimation
Xue-mei Hu
Ce Zhang
Yi Zhang
Bowen Hai
Ke Yu
Zhihai He
MDE
VLM
28
17
0
02 Nov 2023
EventBind: Learning a Unified Representation to Bind Them All for
  Event-based Open-world Understanding
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
17
12
0
06 Aug 2023
LAMP: Leveraging Language Prompts for Multi-person Pose Estimation
LAMP: Leveraging Language Prompts for Multi-person Pose Estimation
Shengnan Hu
Ce Zheng
Zixiang Zhou
C. L. P. Chen
G. Sukthankar
14
3
0
21 Jul 2023
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao-Liang Yang
Liyuan Pan
Yan Yang
Richard Hartley
Miaomiao Liu
VLM
29
9
0
19 Jul 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models:
  Challenges on Granularity and Specificity
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
16
7
0
28 Jun 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
  Refinement
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
Xiang-yu Zhu
Renrui Zhang
Bowei He
A-Long Zhou
Dong Wang
Bingyan Zhao
Peng Gao
VLM
29
79
0
03 Apr 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with
  GPT and Prototype Guidance
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
33
53
0
29 Mar 2023
Parameter is Not All You Need: Starting from Non-Parametric Networks for
  3D Point Cloud Analysis
Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
Renrui Zhang
Liuhui Wang
Ziyu Guo
Yali Wang
Peng Gao
Hongsheng Li
Jianbo Shi
3DPC
24
51
0
14 Mar 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D
  Object Detection
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
Anthony Chen
Kevin Zhang
Renrui Zhang
Zihan Wang
Yuheng Lu
Yandong Guo
Shanghang Zhang
3DPC
70
60
0
14 Mar 2023
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong
  Few-shot Learners
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Renrui Zhang
Xiangfei Hu
Bohao Li
Siyuan Huang
Hanqiu Deng
Hongsheng Li
Yu Qiao
Peng Gao
VLM
MLLM
35
170
0
03 Mar 2023
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud
  Pre-training
Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzhi Li
Pheng-Ann Heng
3DPC
30
52
0
27 Feb 2023
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry
  Learning
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning
Pei-Kai Huang
L. Liu
Renrui Zhang
Song Zhang
Xin Xu
Bai-Qi Wang
G. Liu
3DPC
MDE
34
42
0
28 Dec 2022
LidarCLIP or: How I Learned to Talk to Point Clouds
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
21
22
0
13 Dec 2022
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language
  Models And Image-Object Cross-Attention
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language Models And Image-Object Cross-Attention
Dylan Auty
K. Mikolajczyk
VLM
15
3
0
30 Nov 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual
  Grounding
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
53
62
0
29 Sep 2022
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
VLM
AAML
57
109
0
28 Sep 2022
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
Yinghui Xing
Qirui Wu
De-Chun Cheng
Shizhou Zhang
Guoqiang Liang
Peng Wang
Yanning Zhang
VLM
VPVLM
54
51
0
17 Aug 2022
PointCLIP: Point Cloud Understanding by CLIP
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Bin Cui
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
166
435
0
04 Dec 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language
  Modeling
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
189
385
0
06 Nov 2021
PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation
PLNet: Plane and Line Priors for Unsupervised Indoor Depth Estimation
Hualie Jiang
Laiyan Ding
Junjie Hu
Rui Huang
3DPC
SSL
MDE
49
19
0
12 Oct 2021
12
Next