ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15011
  4. Cited By
CrossOver: 3D Scene Cross-Modal Alignment

CrossOver: 3D Scene Cross-Modal Alignment

20 February 2025
S. Sarkar
O. Mikšík
Marc Pollefeys
Daniel Barath
Iro Armeni
    3DPC
ArXivPDFHTML

Papers citing "CrossOver: 3D Scene Cross-Modal Alignment"

37 / 37 papers shown
Title
SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
Yang Miao
Francis Engelmann
Olga Vysotska
Federico Tombari
Marc Pollefeys
Daniel Barath
3DPC
86
8
0
30 Mar 2024
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion
  for 3D Scene Graph Alignment and Its Downstream Tasks
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie
A. Pagani
Didier Stricker
92
4
0
28 Mar 2024
LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
LIP-Loc: LiDAR Image Pretraining for Cross-Modal Localization
Sai Shubodh Puligilla
Mohammad Omama
Husain Zaidi
Udit Singh Parihar
Madhava Krishna
54
14
0
27 Dec 2023
Living Scenes: Multi-object Relocalization and Reconstruction in
  Changing 3D Environments
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
Liyuan Zhu
Shengyu Huang
Konrad Schindler
Iro Armeni
47
10
0
14 Dec 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D
  Understanding, Generation, and Instruction Following
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following
Ziyu Guo
Renrui Zhang
Xiangyang Zhu
Yiwen Tang
Xianzheng Ma
...
Ke Chen
Peng Gao
Xianzhi Li
Hongsheng Li
Pheng-Ann Heng
MLLM
73
139
0
01 Sep 2023
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
Ziyu Zhu
Xiaojian Ma
Yixin Chen
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
65
117
0
08 Aug 2023
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Le Xue
Ning Yu
Shu Zhen Zhang
Artemis Panagopoulou
Junnan Li
...
Jiajun Wu
Caiming Xiong
Ran Xu
Juan Carlos Niebles
Silvio Savarese
86
122
0
14 May 2023
ImageBind: One Embedding Space To Bind Them All
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar
Alaaeldin El-Nouby
Zhuang Liu
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
VLM
113
918
0
09 May 2023
SGAligner : 3D Scene Alignment with Scene Graphs
SGAligner : 3D Scene Alignment with Scene Graphs
S. Sarkar
O. Mikšík
Marc Pollefeys
Dániel Baráth
Iro Armeni
79
17
0
28 Apr 2023
LidarCLIP or: How I Learned to Talk to Point Clouds
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
45
22
0
13 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via
  Image-to-Point Masked Autoencoders
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
72
130
0
13 Dec 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
69
83
0
17 Nov 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text
  Retrieval
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIP
VLM
69
282
0
15 Jul 2022
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Manuel Kolmet
Qunjie Zhou
Aljosa Osep
Laura Leal-Taixe
52
23
0
28 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
490
4,324
0
28 Jan 2022
PointCLIP: Point Cloud Understanding by CLIP
PointCLIP: Point Cloud Understanding by CLIP
Renrui Zhang
Ziyu Guo
Wei Zhang
Kunchang Li
Xupeng Miao
Tengjiao Wang
Yu Qiao
Peng Gao
Hongsheng Li
VLM
3DPC
240
445
0
04 Dec 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
241
1,035
0
09 Oct 2021
Vector Neurons: A General Framework for SO(3)-Equivariant Networks
Vector Neurons: A General Framework for SO(3)-Equivariant Networks
Congyue Deng
Or Litany
Yueqi Duan
A. Poulenard
Andrea Tagliasacchi
Leonidas Guibas
3DPC
170
324
0
25 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
391
801
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
808
29,167
0
26 Feb 2021
Mending Neural Implicit Modeling for 3D Vehicle Reconstruction in the
  Wild
Mending Neural Implicit Modeling for 3D Vehicle Reconstruction in the Wild
Shivam Duggal
Zihao Wang
Wei-Chiu Ma
S. Manivasagam
Justin Liang
Shenlong Wang
R. Urtasun
3DV
96
25
0
18 Jan 2021
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Johanna Wald
Helisa Dhamo
Nassir Navab
Federico Tombari
3DV
3DPC
63
217
0
08 Apr 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal
  Understanding and Generation
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Botian Shi
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
85
440
0
15 Feb 2020
3D Dynamic Scene Graphs: Actionable Spatial Perception with Places,
  Objects, and Humans
3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans
Antoni Rosinol
Arjun Gupta
Marcus Abate
Jingang Shi
Luca Carlone
77
194
0
15 Feb 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
79
368
0
18 Dec 2019
SuperGlue: Learning Feature Matching with Graph Neural Networks
SuperGlue: Learning Feature Matching with Graph Neural Networks
Paul-Edouard Sarlin
Daniel DeTone
Tomasz Malisiewicz
Andrew Rabinovich
3DPC
OffRL
97
1,929
0
26 Nov 2019
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
Iro Armeni
Zhi-Yang He
JunYoung Gwak
Amir Zamir
Martin Fischer
Jitendra Malik
Silvio Savarese
3DV
3DPC
87
344
0
06 Oct 2019
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
Johanna Wald
A. Avetisyan
Nassir Navab
Federico Tombari
Matthias Nießner
57
155
0
16 Aug 2019
3-D Scene Graph: A Sparse and Semantic Representation of Physical
  Environments for Intelligent Agents
3-D Scene Graph: A Sparse and Semantic Representation of Physical Environments for Intelligent Agents
Ue-Hwan Kim
Jin-Man Park
Taek-jin Song
Jong-hwan Kim
3DV
49
107
0
14 Aug 2019
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
Chris Choy
JunYoung Gwak
Silvio Savarese
3DPC
142
1,780
0
18 Apr 2019
Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
Scan2CAD: Learning CAD Model Alignment in RGB-D Scans
A. Avetisyan
Manuel Dahnert
Angela Dai
Manolis Savva
Angel X. Chang
Matthias Nießner
3DPC
3DV
66
231
0
27 Nov 2018
Representation Learning with Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
280
10,253
0
10 Jul 2018
Learning Factorized Multimodal Representations
Learning Factorized Multimodal Representations
Yao-Hung Hubert Tsai
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
Ruslan Salakhutdinov
DRL
98
407
0
16 Jun 2018
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
77
2,917
0
26 May 2017
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes
Angela Dai
Angel X. Chang
Manolis Savva
Maciej Halber
Thomas Funkhouser
Matthias Nießner
3DPC
3DV
432
4,040
0
14 Feb 2017
ShapeNet: An Information-Rich 3D Model Repository
ShapeNet: An Information-Rich 3D Model Repository
Angel X. Chang
Thomas Funkhouser
Leonidas Guibas
Pat Hanrahan
Qi-Xing Huang
...
Shuran Song
Hao Su
Jianxiong Xiao
L. Yi
Feng Yu
3DV
125
5,508
0
09 Dec 2015
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
Relja Arandjelović
Petr Gronát
Akihiko Torii
Tomas Pajdla
Josef Sivic
3DV
SSL
116
2,631
0
23 Nov 2015
1