ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.05586
  4. Cited By
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

End-to-End Multi-Person Audio/Visual Automatic Speech Recognition

11 May 2022
Otavio Braga
Takaki Makino
Olivier Siohan
H. Liao
    CVBM
ArXivPDFHTML

Papers citing "End-to-End Multi-Person Audio/Visual Automatic Speech Recognition"

9 / 9 papers shown
Title
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation
  and Recognition
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
Guinan Li
Jiajun Deng
Mengzhe Geng
Zengrui Jin
Tianzi Wang
Shujie Hu
Mingyu Cui
Helen M. Meng
Xunying Liu
37
10
0
06 Jul 2023
Conformers are All You Need for Visual Speech Recognition
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
50
14
0
17 Feb 2023
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active
  Speaker Selection
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Otavio Braga
Olivier Siohan
24
7
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech
  Recognition and Active Speaker Detection
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
29
8
0
10 May 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
94
40
0
25 Jan 2022
Cross-attention conformer for context modeling in speech enhancement for
  ASR
Cross-attention conformer for context modeling in speech enhancement for ASR
A. Narayanan
Chung-Cheng Chiu
Tom O'Malley
Quan Wang
Yanzhang He
24
14
0
30 Oct 2021
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
31
7
0
20 Sep 2021
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,929
0
17 Aug 2015
1