Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.05078
Cited By
Visual Grounding in Video for Unsupervised Word Translation
11 March 2020
Gunnar A. Sigurdsson
Jean-Baptiste Alayrac
Aida Nematzadeh
Lucas Smaira
Mateusz Malinowski
João Carreira
Phil Blunsom
Andrew Zisserman
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Grounding in Video for Unsupervised Word Translation"
18 / 18 papers shown
Title
Divert More Attention to Vision-Language Object Tracking
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
42
3
0
19 Jul 2023
Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training
Wenliang Dai
Zihan Liu
Ziwei Ji
Dan Su
Pascale Fung
MLLM
VLM
32
62
0
14 Oct 2022
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Andrew Rouditchenko
Yung-Sung Chuang
Nina Shvetsova
Samuel Thomas
Rogerio Feris
Brian Kingsbury
Leonid Karlinsky
David Harwath
Hilde Kuehne
James R. Glass
VLM
34
4
0
07 Oct 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations
Jialu Li
Hao Tan
Joey Tianyi Zhou
LM&Ro
64
12
0
05 Jul 2022
VALHALLA: Visual Hallucination for Machine Translation
Yi Li
Yikang Shen
Yoon Kim
Chun-Fu Chen
Rogerio Feris
David D. Cox
Nuno Vasconcelos
MLLM
40
38
0
31 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
28
0
0
23 May 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
22
90
0
12 Mar 2022
SVIP: Sequence VerIfication for Procedures in Videos
Yichen Qian
Weixin Luo
Dongze Lian
Xu Tang
P. Zhao
Shenghua Gao
ViT
29
17
0
13 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
107
8
0
08 Nov 2021
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training
Yuqing Song
Shizhe Chen
Qin Jin
Wei Luo
Jun Xie
Fei Huang
24
18
0
25 Aug 2021
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems
E. Razumovskaia
Goran Glavavs
Olga Majewska
Edoardo Ponti
Anna Korhonen
Ivan Vulić
26
32
0
17 Apr 2021
UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
Mingyang Zhou
Luowei Zhou
Shuohang Wang
Yu Cheng
Linjie Li
Zhou Yu
Jingjing Liu
MLLM
VLM
31
89
0
01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning
Adrià Recasens
Pauline Luc
Jean-Baptiste Alayrac
Luyu Wang
Ross Hemsley
...
Florent Altché
M. Valko
Jean-Bastien Grill
Aaron van den Oord
Andrew Zisserman
SSL
AI4TS
33
127
0
30 Mar 2021
Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models
Po-Yao (Bernie) Huang
Mandela Patrick
Junjie Hu
Graham Neubig
Florian Metze
Alexander G. Hauptmann
MLLM
VLM
24
56
0
16 Mar 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
79
110
0
31 Jan 2021
Visual Pivoting for (Unsupervised) Entity Alignment
Fangyu Liu
Muhao Chen
Dan Roth
Nigel Collier
OCL
21
117
0
28 Sep 2020
Word Translation Without Parallel Data
Alexis Conneau
Guillaume Lample
MarcÁurelio Ranzato
Ludovic Denoyer
Hervé Jégou
189
1,639
0
11 Oct 2017
1