Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07133
Cited By
Translating speech with just images
11 June 2024
Dan Oneaţă
Herman Kamper
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Translating speech with just images"
6 / 6 papers shown
Title
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Layne Berry
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Hung-yi Lee
David Harwath
VLM
21
9
0
02 Nov 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
230
3,458
0
29 Apr 2022
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
61
678
0
17 Nov 2021
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
37
142
0
16 Jun 2020
A Call for Clarity in Reporting BLEU Scores
Matt Post
75
2,941
0
23 Apr 2018
Deep Multimodal Semantic Embeddings for Speech and Images
David Harwath
James R. Glass
27
156
0
11 Nov 2015
1