ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.06084
  4. Cited By
3D Vision and Language Pretraining with Large-Scale Synthetic Data

3D Vision and Language Pretraining with Large-Scale Synthetic Data

8 July 2024
Dejie Yang
Zhu Xu
Wentao Mo
Qingchao Chen
Siyuan Huang
Yang Liu
ArXiv (abs)PDFHTML

Papers citing "3D Vision and Language Pretraining with Large-Scale Synthetic Data"

16 / 16 papers shown
Title
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning
Yanjun Chen
Yirong Sun
Xinghao Chen
Jian Wang
Xiaoyu Shen
W. Li
Wei Zhang
3DVLRM
107
1
0
08 Mar 2025
Multi-CLIP: Contrastive Vision-Language Pre-training for Question
  Answering tasks in 3D Scenes
Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes
Alexandros Delitzas
Maria Parelli
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
43
20
0
04 Jun 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
121
2,067
0
11 May 2023
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
87
88
0
17 Nov 2022
Delving into the Continuous Domain Adaptation
Delving into the Continuous Domain Adaptation
Yinsong Xu
Zhuqing Jiang
Aidong Men
Yang Liu
Qingchao Chen
56
4
0
28 Aug 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
416
3,585
0
29 Apr 2022
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive
  Selection
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
Jun-Bin Luo
Jiahui Fu
Xianghao Kong
Chen Gao
Haibing Ren
Hao Shen
Huaxia Xia
Si Liu
78
95
0
13 Apr 2022
Multi-View Transformer for 3D Visual Grounding
Multi-View Transformer for 3D Visual Grounding
Shijia Huang
Yilun Chen
Jiaya Jia
Liwei Wang
86
125
0
05 Apr 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yu-Gang Jiang
3DPC
85
47
0
10 Mar 2022
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D
  Visual Grounding
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding
Dailan He
Yusheng Zhao
Junyu Luo
Tianrui Hui
Shaofei Huang
Aixi Zhang
Si Liu
ViT
51
95
0
05 Aug 2021
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
SAT: 2D Semantics Assisted Training for 3D Visual Grounding
Zhengyuan Yang
Songyang Zhang
Liwei Wang
Jiebo Luo
3DPC
81
126
0
24 May 2021
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework
  for LiDAR Point Cloud Segmentation
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation
Sicheng Zhao
Yezhen Wang
Yue Liu
Bichen Wu
Yang Gao
Pengfei Xu
Trevor Darrell
Kurt Keutzer
3DPC
74
93
0
07 Sep 2020
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions
Johanna Wald
Helisa Dhamo
Nassir Navab
Federico Tombari
3DV3DPC
71
218
0
08 Apr 2020
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Dave Zhenyu Chen
Angel X. Chang
Matthias Nießner
3DPC
89
376
0
18 Dec 2019
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments
Johanna Wald
A. Avetisyan
Nassir Navab
Federico Tombari
Matthias Nießner
59
158
0
16 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSLVLM
231
3,693
0
06 Aug 2019
1