Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.08535
Cited By
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
15 September 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper"
13 / 13 papers shown
Title
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
56
0
0
14 Mar 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
69
1
0
08 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
116
1
0
03 Feb 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
59
3
0
03 Jan 2025
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
36
9
0
18 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
31
0
0
31 Aug 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
94
2
0
09 Jul 2024
ViSpeR: Multilingual Audio-Visual Speech Recognition
Sanath Narayan
Y. A. D. Djilali
Ankit Singh
Eustache Le Bihan
Hakim Hacid
VLM
30
0
0
27 May 2024
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
50
14
0
17 Feb 2023
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
125
144
0
26 Feb 2022
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
84
225
0
12 Feb 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
245
2,233
0
14 Jun 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
167
784
0
16 Nov 2016
1