Towers of Babel: Combining Images, Language, and 3D Geometry for
Learning Multimodal Vision

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

12 August 2021

Hadar Averbuch-Elor

Papers citing "Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision"

14 / 14 papers shown

Title
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model Yuqiu Liu Jingxuan Xu Mauricio Soroco Yunchao Wei Wuyang Chen AI4CE 84 2 0 18 Dec 2024
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections Chen Dudai Morris Alper Hana Bezalel Rana Hanocka Itai Lang Hadar Averbuch-Elor 28 2 0 14 Feb 2024
Doppelgangers: Learning to Disambiguate Images of Similar Structures Ruojin Cai Joseph Tung Qianqian Wang Hadar Averbuch-Elor Bharath Hariharan Noah Snavely 3DH 30 20 0 05 Sep 2023
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models Jianglong Ye Naiyan Wang Xueliang Wang DiffM 48 41 0 22 Mar 2023
Learning Visual Representations via Language-Guided Sampling Mohamed El Banani Karan Desai Justin Johnson SSL VLM 21 28 0 23 Feb 2023
Decomposing NeRF for Editing via Feature Field Distillation Sosuke Kobayashi Eiichi Matsumoto Vincent Sitzmann 184 328 0 31 May 2022
Face2Text revisited: Improved data set and baseline results Marc Tanti Shaun Abdilla A. Muscat Claudia Borg R. Farrugia Albert Gatt CVBM 10 3 0 24 May 2022
Weakly-Supervised End-to-End CAD Retrieval to Scan Objects T. Beyer Angela Dai 3DPC 31 4 0 24 Mar 2022
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval Yue Ruan Han-Hung Lee Yiming Zhang Ke Zhang Angel X. Chang 32 22 0 19 Jan 2022
PartGlot: Learning Shape Part Segmentation from Language Reference Games Juil Koo Ian Huang Panos Achlioptas Leonidas J. Guibas Minhyuk Sung 3DPC 40 28 0 13 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding Dave Zhenyu Chen Qirui Wu Matthias Nießner Angel X. Chang 21 29 0 02 Dec 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages Peng Qi Yuhao Zhang Yuhui Zhang Jason Bolton Christopher D. Manning AI4TS 213 1,656 0 16 Mar 2020
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning Jiasen Lu Caiming Xiong Devi Parikh R. Socher 85 1,442 0 06 Dec 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 167 1,464 0 06 Jun 2016