Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.00489
Cited By
Visual Features for Context-Aware Speech Recognition
1 December 2017
Abhinav Gupta
Yajie Miao
Leonardo Neves
Florian Metze
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Features for Context-Aware Speech Recognition"
11 / 11 papers shown
Title
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
19
19
0
27 Apr 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
96
40
0
25 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
38
23
0
09 Dec 2021
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
25
11
0
05 Oct 2020
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Takaki Makino
H. Liao
Yannis Assael
Brendan Shillingford
Basi García
Otavio Braga
Olivier Siohan
32
129
0
08 Nov 2019
Multimodal Language Analysis with Recurrent Multistage Fusion
Paul Pu Liang
Liu Ziyin
Amir Zadeh
Louis-Philippe Morency
30
198
0
12 Aug 2018
End-to-End Multimodal Speech Recognition
Shruti Palaskar
Ramon Sanabria
Florian Metze
33
41
0
25 Apr 2018
Unspeech: Unsupervised Speech Context Embeddings
Benjamin Milde
Chris Biemann
SSL
27
28
0
18 Apr 2018
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
29
53
0
05 Oct 2017
Visually grounded learning of keyword prediction from untranscribed speech
Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
19
63
0
23 Mar 2017
1