Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.03405
Cited By
Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation
6 March 2024
Liuyi Wang
Zongtao He
Ronghao Dang
Huiyi Chen
Chengju Liu
Qi Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation"
4 / 4 papers shown
Title
VISTA: Generative Visual Imagination for Vision-and-Language Navigation
Yanjia Huang
M. Wu
Renjie Li
Zhengzhong Tu
LM&Ro
36
0
0
09 May 2025
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,125
0
28 Jan 2022
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
196
405
0
13 Jul 2021
Speaker-Follower Models for Vision-and-Language Navigation
Daniel Fried
Ronghang Hu
Volkan Cirik
Anna Rohrbach
Jacob Andreas
Louis-Philippe Morency
Taylor Berg-Kirkpatrick
Kate Saenko
Dan Klein
Trevor Darrell
LM&Ro
LRM
248
496
0
07 Jun 2018
1