Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.05863
Cited By
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision
12 August 2021
Xiaoshi Wu
Hadar Averbuch-Elor
J. Sun
Noah Snavely
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision"
14 / 14 papers shown
Title
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model
Yuqiu Liu
Jingxuan Xu
Mauricio Soroco
Yunchao Wei
Wuyang Chen
AI4CE
84
2
0
18 Dec 2024
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections
Chen Dudai
Morris Alper
Hana Bezalel
Rana Hanocka
Itai Lang
Hadar Averbuch-Elor
28
2
0
14 Feb 2024
Doppelgangers: Learning to Disambiguate Images of Similar Structures
Ruojin Cai
Joseph Tung
Qianqian Wang
Hadar Averbuch-Elor
Bharath Hariharan
Noah Snavely
3DH
30
20
0
05 Sep 2023
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models
Jianglong Ye
Naiyan Wang
Xueliang Wang
DiffM
48
41
0
22 Mar 2023
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
21
28
0
23 Feb 2023
Decomposing NeRF for Editing via Feature Field Distillation
Sosuke Kobayashi
Eiichi Matsumoto
Vincent Sitzmann
184
328
0
31 May 2022
Face2Text revisited: Improved data set and baseline results
Marc Tanti
Shaun Abdilla
A. Muscat
Claudia Borg
R. Farrugia
Albert Gatt
CVBM
10
3
0
24 May 2022
Weakly-Supervised End-to-End CAD Retrieval to Scan Objects
T. Beyer
Angela Dai
3DPC
31
4
0
24 Mar 2022
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Yue Ruan
Han-Hung Lee
Yiming Zhang
Ke Zhang
Angel X. Chang
32
22
0
19 Jan 2022
PartGlot: Learning Shape Part Segmentation from Language Reference Games
Juil Koo
Ian Huang
Panos Achlioptas
Leonidas J. Guibas
Minhyuk Sung
3DPC
40
28
0
13 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
21
29
0
02 Dec 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
213
1,656
0
16 Mar 2020
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
85
1,442
0
06 Dec 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,464
0
06 Jun 2016
1