ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.05863
  4. Cited By
Towers of Babel: Combining Images, Language, and 3D Geometry for
  Learning Multimodal Vision

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision

12 August 2021
Xiaoshi Wu
Hadar Averbuch-Elor
J. Sun
Noah Snavely
ArXivPDFHTML

Papers citing "Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision"

14 / 14 papers shown
Title
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation
  Model
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model
Yuqiu Liu
Jingxuan Xu
Mauricio Soroco
Yunchao Wei
Wuyang Chen
AI4CE
84
2
0
18 Dec 2024
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring
  Unconstrained Photo Collections
HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections
Chen Dudai
Morris Alper
Hana Bezalel
Rana Hanocka
Itai Lang
Hadar Averbuch-Elor
28
2
0
14 Feb 2024
Doppelgangers: Learning to Disambiguate Images of Similar Structures
Doppelgangers: Learning to Disambiguate Images of Similar Structures
Ruojin Cai
Joseph Tung
Qianqian Wang
Hadar Averbuch-Elor
Bharath Hariharan
Noah Snavely
3DH
30
20
0
05 Sep 2023
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation
  Models
FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models
Jianglong Ye
Naiyan Wang
Xueliang Wang
DiffM
48
41
0
22 Mar 2023
Learning Visual Representations via Language-Guided Sampling
Learning Visual Representations via Language-Guided Sampling
Mohamed El Banani
Karan Desai
Justin Johnson
SSL
VLM
21
28
0
23 Feb 2023
Decomposing NeRF for Editing via Feature Field Distillation
Decomposing NeRF for Editing via Feature Field Distillation
Sosuke Kobayashi
Eiichi Matsumoto
Vincent Sitzmann
184
328
0
31 May 2022
Face2Text revisited: Improved data set and baseline results
Face2Text revisited: Improved data set and baseline results
Marc Tanti
Shaun Abdilla
A. Muscat
Claudia Borg
R. Farrugia
Albert Gatt
CVBM
10
3
0
24 May 2022
Weakly-Supervised End-to-End CAD Retrieval to Scan Objects
Weakly-Supervised End-to-End CAD Retrieval to Scan Objects
T. Beyer
Angela Dai
3DPC
31
4
0
24 Mar 2022
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Yue Ruan
Han-Hung Lee
Yiming Zhang
Ke Zhang
Angel X. Chang
32
22
0
19 Jan 2022
PartGlot: Learning Shape Part Segmentation from Language Reference Games
PartGlot: Learning Shape Part Segmentation from Language Reference Games
Juil Koo
Ian Huang
Panos Achlioptas
Leonidas J. Guibas
Minhyuk Sung
3DPC
40
28
0
13 Dec 2021
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning
  and Visual Grounding
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
21
29
0
02 Dec 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human
  Languages
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
213
1,656
0
16 Mar 2020
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image
  Captioning
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
Jiasen Lu
Caiming Xiong
Devi Parikh
R. Socher
85
1,442
0
06 Dec 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
167
1,464
0
06 Jun 2016
1