ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.05684
  4. Cited By
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active
  Speaker Selection

A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection

11 May 2022
Otavio Braga
Olivier Siohan
ArXivPDFHTML

Papers citing "A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection"

6 / 6 papers shown
Title
Conformers are All You Need for Visual Speech Recognition
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
50
14
0
17 Feb 2023
Audio-Visual Activity Guided Cross-Modal Identity Association for Active
  Speaker Detection
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
37
8
0
01 Dec 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech
  Recognition and Active Speaker Detection
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
29
8
0
10 May 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
94
40
0
25 Jan 2022
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
31
7
0
20 Sep 2021
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1