Visual Features for Context-Aware Speech Recognition

Visual Features for Context-Aware Speech Recognition

1 December 2017

Papers citing "Visual Features for Context-Aware Speech Recognition"

11 / 11 papers shown

Title
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 29 15 0 29 Mar 2023
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations Dan Oneaţă H. Cucu 19 19 0 27 Apr 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 96 40 0 25 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 38 23 0 09 Dec 2021
Fine-Grained Grounding for Multimodal Speech Recognition Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott 25 11 0 05 Oct 2020
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition Takaki Makino H. Liao Yannis Assael Brendan Shillingford Basi García Otavio Braga Olivier Siohan 32 129 0 08 Nov 2019
Multimodal Language Analysis with Recurrent Multistage Fusion Paul Pu Liang Liu Ziyin Amir Zadeh Louis-Philippe Morency 30 198 0 12 Aug 2018
End-to-End Multimodal Speech Recognition Shruti Palaskar Ramon Sanabria Florian Metze 33 41 0 25 Apr 2018
Unspeech: Unsupervised Speech Context Embeddings Benjamin Milde Chris Biemann SSL 27 28 0 18 Apr 2018
Semantic speech retrieval with a visually grounded model of untranscribed speech Herman Kamper Gregory Shakhnarovich Karen Livescu 29 53 0 05 Oct 2017
Visually grounded learning of keyword prediction from untranscribed speech Herman Kamper Shane Settle Gregory Shakhnarovich Karen Livescu 19 63 0 23 Mar 2017