End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

11 May 2022

Papers citing "End-to-End Multi-Person Audio/Visual Automatic Speech Recognition"

9 / 9 papers shown

Title
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition Guinan Li Jiajun Deng Mengzhe Geng Zengrui Jin Tianzi Wang Shujie Hu Mingyu Cui Helen M. Meng Xunying Liu 37 10 0 06 Jul 2023
Conformers are All You Need for Visual Speech Recognition Oscar Chang H. Liao Dmitriy Serdyuk Ankit Parag Shah Olivier Siohan VLM 50 14 0 17 Feb 2023
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection Otavio Braga Olivier Siohan 24 7 0 11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection Otavio Braga Olivier Siohan CVBM 29 8 0 10 May 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 94 40 0 25 Jan 2022
Cross-attention conformer for context modeling in speech enhancement for ASR A. Narayanan Chung-Cheng Chiu Tom O'Malley Quan Wang Yanzhang He 24 14 0 30 Oct 2021
$Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels$ Audio-Visual Speech Recognition is Worth 32 $\times$ 32 $\times$ 8 Voxels Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 31 7 0 20 Sep 2021
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 185 784 0 16 Nov 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,929 0 17 Aug 2015