Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.09536
Cited By
Audio-Visual Speech Recognition is Worth 32
×
\times
×
32
×
\times
×
8 Voxels
20 September 2021
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels"
6 / 6 papers shown
Title
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
29
41
0
14 Jul 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
130
145
0
26 Feb 2022
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
86
226
0
12 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
204
422
0
01 Feb 2021
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
1