ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.13084
  4. Cited By
Visual Speech Recognition for Multiple Languages in the Wild

Visual Speech Recognition for Multiple Languages in the Wild

26 February 2022
Pingchuan Ma
Stavros Petridis
M. Pantic
    VLM
ArXivPDFHTML

Papers citing "Visual Speech Recognition for Multiple Languages in the Wild"

50 / 75 papers shown
Title
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
53
0
0
14 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
40
0
0
07 Mar 2025
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
113
1
0
03 Feb 2025
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
58
2
0
15 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and
  Audiovisual Inputs
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
M. Pantic
SSL
37
5
0
04 Nov 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech
  Recognition
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
Ziqiang Liu
Xiaolou Li
Chen Chen
Li Guo
Lantian Li
D. Wang
25
0
0
21 Oct 2024
Full-Rank No More: Low-Rank Weight Training for Modern Speech
  Recognition Models
Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models
Adriana Fernandez-Lopez
Shiwei Liu
L. Yin
Stavros Petridis
Maja Pantic
29
0
0
10 Oct 2024
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Diffusion-based Unsupervised Audio-visual Speech Enhancement
Jean-Eudes Ayilo
Mostafa Sadeghi
Romain Serizel
Xavier Alameda-Pineda
DiffM
20
0
0
04 Oct 2024
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Large Language Models are Strong Audio-Visual Speech Recognition Learners
Umberto Cappellazzo
Minsu Kim
Honglie Chen
Pingchuan Ma
Stavros Petridis
Daniele Falavigna
Alessio Brutti
Maja Pantic
36
9
0
18 Sep 2024
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning
  with Symmetric Views
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views
Zejun gu
Junxia jiang
31
0
0
09 Sep 2024
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
91
2
0
09 Jul 2024
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance
  from Lip Reading Expert
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert
Han EunGi
Oh Hyun-Bin
Kim Sung-Bin
Corentin Nivelet Etcheberry
Suekyeong Nam
Janghoon Joo
Tae-Hyun Oh
23
5
0
01 Jul 2024
Dynamic Data Pruning for Automatic Speech Recognition
Dynamic Data Pruning for Automatic Speech Recognition
Q. Xiao
Pingchuan Ma
Adriana Fernandez-Lopez
Boqian Wu
Lu Yin
Stavros Petridis
Mykola Pechenizkiy
Maja Pantic
D. Mocanu
Shiwei Liu
33
1
0
26 Jun 2024
MSRS: Training Multimodal Speech Recognition Models from Scratch with
  Sparse Mask Optimization
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
Lu Yin
Q. Xiao
Stavros Petridis
Shiwei Liu
Maja Pantic
46
2
0
25 Jun 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End
  Crossmodal Audio Token Synchronization
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
34
7
0
18 Jun 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition
  Challenge
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge
Chen Chen
Zehua Liu
Xiaolou Li
Lantian Li
D. Wang
35
2
0
14 Jun 2024
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
Payal Mohapatra
Shamika Likhite
Subrata Biswas
Bashima Islam
Qi Zhu
46
2
0
11 Jun 2024
LipGER: Visually-Conditioned Generative Error Correction for Robust
  Automatic Speech Recognition
LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Purva Chiniya
Utkarsh Tyagi
R. Duraiswami
Dinesh Manocha
41
0
0
06 Jun 2024
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical
  Flow Guidance
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance
Shuheng Ge
Haoyu Xing
Li Zhang
Xiangqian Wu
39
0
0
23 May 2024
Learn2Talk: 3D Talking Face Learns from 2D Talking Face
Learn2Talk: 3D Talking Face Learns from 2D Talking Face
Yixiang Zhuang
Baoping Cheng
Yao Cheng
Yuntao Jin
Renshuai Liu
Chengyang Li
Xuan Cheng
Jing Liao
Juncong Lin
CVBM
3DH
34
6
0
19 Apr 2024
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory
  Speech Recognition
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
A. Haliassos
Andreas Zinonos
Rodrigo Mira
Stavros Petridis
Maja Pantic
VLM
SSL
AI4TS
39
12
0
02 Apr 2024
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information
  Regularization
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Linzhi Wu
Xingyu Zhang
Yakun Zhang
Changyan Zheng
Tiejun Liu
Liang Xie
Ye Yan
Erwei Yin
27
1
0
24 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
42
8
0
14 Mar 2024
Neural Additive Image Model: Interpretation through Interpolation
Neural Additive Image Model: Interpretation through Interpolation
Arik Reuter
Anton Thielmann
Benjamin Saefken
DiffM
34
1
0
06 Mar 2024
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge
  Distillation for Visual Speech Recognition
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition
Chang Sun
Hong Yang
Bo Qin
VLM
27
1
0
04 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and
  Context-Aware Visual Speech Processing
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
53
22
0
23 Feb 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech
  Technologies
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
José-M. Acosta-Triana
David Gimeno-Gómez
Carlos David Martínez Hinarejos
VLM
VGen
41
2
0
20 Feb 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
32
1
0
20 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
27
5
0
18 Jan 2024
Multichannel AV-wav2vec2: A Framework for Learning Multichannel
  Multi-Modal Speech Representation
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Qiu-shi Zhu
Jie Zhang
Yu Gu
Yuli Hu
Lirong Dai
SSL
35
11
0
07 Jan 2024
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech
  Representations of Unlabeled Data
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux
Emil Mededovic
Ahmed Hallawa
Lukas Martin
A. Peine
Anke Schmeink
VLM
23
4
0
15 Dec 2023
The GUA-Speech System Description for CNVSRC Challenge 2023
The GUA-Speech System Description for CNVSRC Challenge 2023
Shengqiang Li
Chao Lei
Baozhong Ma
Binbin Zhang
Fuping Pan
21
0
0
12 Dec 2023
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech
  Synthesis achieving both Auditory and Photo-realism
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism
Georgios Milis
P. Filntisis
A. Roussos
Petros Maragos
CVBM
34
2
0
11 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
27
12
0
05 Dec 2023
Do VSR Models Generalize Beyond LRS3?
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
35
4
0
23 Nov 2023
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous
  Spanish
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish
David Gimeno-Gómez
Carlos David Martínez Hinarejos
28
0
0
21 Nov 2023
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild
David Gimeno-Gómez
Carlos David Martínez Hinarejos
11
8
0
21 Nov 2023
A Perceptual Shape Loss for Monocular 3D Face Reconstruction
A Perceptual Shape Loss for Monocular 3D Face Reconstruction
Christopher Otto
Prashanth Chandran
Gaspard Zoss
Markus Gross
Paulo F. U. Gotardo
Derek Bradley
3DH
CVBM
38
6
0
30 Oct 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
27
3
0
29 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
26
12
0
15 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
30
5
0
29 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by
  Compressing Audio Knowledge of a Pretrained Model
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
26
18
0
15 Aug 2023
Lip2Vec: Efficient and Robust Visual Speech Recognition via
  Latent-to-Latent Visual to Audio Representation Mapping
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Y. A. D. Djilali
Sanath Narayan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
29
10
0
11 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
A. Haliassos
Stavros Petridis
M. Pantic
VLM
33
7
0
10 Jul 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
32
3
0
27 Jun 2023
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with
  Adversarial Network for Audio-Visual Speech Recognition
MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition
Yuchen Hu
Chen Chen
Ruizhe Li
Heqing Zou
Chng Eng Siong
GAN
42
9
0
18 Jun 2023
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for
  Robust Audio-Visual Speech Recognition
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition
Yuchen Hu
Ruizhe Li
Cheng Chen
Chengwei Qin
Qiu-shi Zhu
E. Chng
29
5
0
18 Jun 2023
Automated Speaker Independent Visual Speech Recognition: A Comprehensive
  Survey
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey
Praneeth Nemani
G. S. Krishna
Kundrapu Supriya
BDL
24
3
0
14 Jun 2023
12
Next