ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.09536
  4. Cited By
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels

Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels

20 September 2021
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
    ViT
ArXivPDFHTML

Papers citing "Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels"

6 / 6 papers shown
Title
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer
  to Unlabeled Modality
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
29
42
0
14 Jul 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
130
145
0
26 Feb 2022
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
86
226
0
12 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
Video Transformer Network
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
204
422
0
01 Feb 2021
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1