ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.08535
  4. Cited By
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

15 September 2023
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
    VLM
ArXivPDFHTML

Papers citing "Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper"

13 / 13 papers shown
Title
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
56
0
0
14 Mar 2025
Large Multimodal Models for Low-Resource Languages: A Survey
Large Multimodal Models for Low-Resource Languages: A Survey
Marian Lupascu
Ana-Cristina Rogoz
Mihai-Sorin Stupariu
Radu Tudor Ionescu
69
1
0
08 Feb 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
116
1
0
03 Feb 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
59
3
0
03 Jan 2025
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
36
9
0
18 Sep 2024
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module
Xinyu Wang
Qian Wang
Haolin Huang
Yu Fang
Mengjie Xu
Qian Wang
31
0
0
31 Aug 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
96
2
0
09 Jul 2024
ViSpeR: Multilingual Audio-Visual Speech Recognition
ViSpeR: Multilingual Audio-Visual Speech Recognition
Sanath Narayan
Y. A. D. Djilali
Ankit Singh
Eustache Le Bihan
Hakim Hacid
VLM
33
0
0
27 May 2024
Conformers are All You Need for Visual Speech Recognition
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
50
14
0
17 Feb 2023
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
128
144
0
26 Feb 2022
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
84
225
0
12 Feb 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
251
2,233
0
14 Jun 2018
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
167
784
0
16 Nov 2016
1