Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.10255
Cited By
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
16 May 2024
Xianzheng Ma
Yash Bhalgat
Brandon Smart
Shuai Chen
Xinghui Li
Jian Ding
Jindong Gu
Dave Zhenyu Chen
Songyou Peng
Jiawang Bian
Philip Torr
Marc Pollefeys
Matthias Nießner
Ian D Reid
Angel X. Chang
Iro Laina
V. Prisacariu
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1689★)
Papers citing
"When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models"
50 / 155 papers shown
Title
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
106
326
0
04 Aug 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
200
1,773
0
02 Aug 2022
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
193
3,898
0
26 Jul 2022
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models
Huy Ha
Shuran Song
LM&Ro
VLM
92
105
0
23 Jul 2022
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
H. Rasheed
Muhammad Maaz
Muhammad Uzair Khattak
Salman Khan
Fahad Shahbaz Khan
ObjD
VLM
104
154
0
07 Jul 2022
VoxGRAF: Fast 3D-Aware Image Synthesis with Sparse Voxel Grids
Katja Schwarz
Axel Sauer
Michael Niemeyer
Yiyi Liao
Andreas Geiger
93
151
0
15 Jun 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
286
2,507
0
15 Jun 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
94
314
0
12 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
416
3,585
0
29 Apr 2022
GARF: Gaussian Activated Radiance Fields for High Fidelity Reconstruction and Pose Estimation
Shin-Fang Chng
Sameera Ramasinghe
Jamie Sherrah
Simon Lucey
AI4CE
70
88
0
12 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLM
SSL
132
226
0
07 Apr 2022
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation
Paritosh Mittal
Y. Cheng
Maneesh Singh
Shubham Tulsiani
66
229
0
17 Mar 2022
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models
Archiki Prasad
Peter Hase
Xiang Zhou
Joey Tianyi Zhou
95
123
0
14 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
167
1,451
0
07 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
13,148
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
542
4,398
0
28 Jan 2022
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Thomas Müller
Alex Evans
Christoph Schied
A. Keller
330
4,037
0
16 Jan 2022
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
143
490
0
23 Dec 2021
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
124
382
0
22 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
460
15,665
0
20 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
148
577
0
16 Dec 2021
Decoupling Zero-Shot Semantic Segmentation
Jian Ding
Nan Xue
Guisong Xia
Dengxin Dai
VLM
104
195
0
15 Dec 2021
Plenoxels: Radiance Fields without Neural Networks
Alex Yu
Sara Fridovich-Keil
Matthew Tancik
Qinhong Chen
Benjamin Recht
Angjoo Kanazawa
275
1,662
0
09 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
99
710
0
08 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
265
360
0
06 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
67
31
0
02 Dec 2021
Zero-Shot Text-Guided Object Generation with Dream Fields
Ajay Jain
B. Mildenhall
Jonathan T. Barron
Pieter Abbeel
Ben Poole
76
567
0
02 Dec 2021
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu
Lulu Tang
Yongming Rao
Tiejun Huang
Jie Zhou
Jiwen Lu
3DPC
136
682
0
29 Nov 2021
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Jonathan T. Barron
B. Mildenhall
Dor Verbin
Pratul P. Srinivasan
Peter Hedman
196
1,691
0
23 Nov 2021
Neural Fields in Visual Computing and Beyond
Yiheng Xie
Towaki Takikawa
Shunsuke Saito
Or Litany
Shiqin Yan
Numair Khan
Federico Tombari
James Tompkin
Vincent Sitzmann
Srinath Sridhar
3DH
169
630
0
22 Nov 2021
Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
Cheng Sun
Min Sun
Hwann-Tzong Chen
125
1,079
0
22 Nov 2021
ARKitScenes: A Diverse Real-World Dataset For 3D Indoor Scene Understanding Using Mobile RGB-D Data
Gilad Baruch
Zhuoyuan Chen
Afshin Dehghan
Tal Dimry
Yuri Feigin
...
Thomas Gebauer
Brandon Joffe
Daniel Kurz
Arik Schwartz
Elad Shulman
3DV
3DPC
89
202
0
17 Nov 2021
Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis
Tianchang Shen
Jun Gao
K. Yin
Xuan Li
Sanja Fidler
3DV
91
464
0
08 Nov 2021
A Systematic Investigation of Commonsense Knowledge in Large Language Models
Xiang Lorraine Li
A. Kuncoro
Jordan Hoffmann
Cyprien de Masson dÁutume
Phil Blunsom
Aida Nematzadeh
LRM
75
59
0
31 Oct 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
346
1,706
0
15 Oct 2021
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
Tu Vu
Brian Lester
Noah Constant
Rami Al-Rfou
Daniel Cer
VLM
LRM
198
286
0
15 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
150
457
0
11 Oct 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
108
294
0
06 Oct 2021
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
116
657
0
24 Sep 2021
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI
Santhosh Kumar Ramakrishnan
Aaron Gokaslan
Erik Wijmans
Oleksandr Maksymets
Alexander Clegg
...
Andrew Westbury
Angel X. Chang
Manolis Savva
Yili Zhao
Dhruv Batra
85
390
0
16 Sep 2021
PPT: Pre-trained Prompt Tuning for Few-shot Learning
Yuxian Gu
Xu Han
Zhiyuan Liu
Minlie Huang
VLM
96
416
0
09 Sep 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
133
800
0
24 Aug 2021
DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization
Cheng Zhang
Zhaopeng Cui
Cai Chen
Shuaicheng Liu
B. Zeng
Hujun Bao
Yinda Zhang
3DPC
3DV
77
37
0
24 Aug 2021
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding
Dailan He
Yusheng Zhao
Junyu Luo
Tianrui Hui
Shaofei Huang
Aixi Zhang
Si Liu
ViT
51
95
0
05 Aug 2021
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Pengfei Liu
Weizhe Yuan
Jinlan Fu
Zhengbao Jiang
Hiroaki Hayashi
Graham Neubig
VLM
SyDa
218
3,977
0
28 Jul 2021
Per-Pixel Classification is Not All You Need for Semantic Segmentation
Bowen Cheng
Alex Schwing
Alexander Kirillov
VLM
ViT
210
1,548
0
13 Jul 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
67
95
0
07 Jul 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIP
VLM
122
368
0
24 Jun 2021
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction
Peng Wang
Lingjie Liu
Yuan Liu
Christian Theobalt
Taku Komura
Wenping Wang
99
1,728
0
20 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
694
6,121
0
29 Apr 2021
Previous
1
2
3
4
Next