Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.05162
Cited By
Large-Scale Visual Speech Recognition
13 July 2018
Brendan Shillingford
Yannis Assael
Matthew W. Hoffman
T. Paine
Cían Hughes
Utsav Prabhu
H. Liao
Hasim Sak
Kanishka Rao
Lorrayne Bennett
Marie Mulville
Ben Coppin
Ben Laurie
A. Senior
Nando de Freitas
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large-Scale Visual Speech Recognition"
50 / 79 papers shown
Title
Development and evaluation of a deep learning algorithm for German word recognition from lip movements
Dinh Nam Pham
Torsten Rahne
55
2
0
22 Apr 2025
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas
Edward Fish
Richard Bowden
41
0
0
27 Mar 2025
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
M. Pantic
SSL
37
5
0
04 Nov 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
Ziqiang Liu
Xiaolou Li
Chen Chen
Li Guo
Lantian Li
D. Wang
35
0
0
21 Oct 2024
Approaching Metaheuristic Deep Learning Combos for Automated Data Mining
Gustavo Assunção
Paulo Menezes
24
0
0
16 Oct 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
42
1
0
13 Sep 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge
Chen Chen
Zehua Liu
Xiaolou Li
Lantian Li
D. Wang
35
2
0
14 Jun 2024
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition
A. Haliassos
Andreas Zinonos
Rodrigo Mira
Stavros Petridis
Maja Pantic
VLM
SSL
AI4TS
47
12
0
02 Apr 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
56
32
0
23 Feb 2024
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition
Hao Wang
Shuhei Kurita
Shuichiro Shimizu
Daisuke Kawahara
15
3
0
18 Jan 2024
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux
Emil Mededovic
Ahmed Hallawa
Lukas Martin
A. Peine
Anke Schmeink
VLM
26
4
0
15 Dec 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
27
3
0
29 Sep 2023
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Li Liu
Lufei Gao
Wen-Ling Lei
Fengji Ma
Xiaotian Lin
Jin-Tao Wang
CVBM
27
5
0
17 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
Jeong Hun Yeo
Minsu Kim
J. Choi
Dae Hoe Kim
Y. Ro
26
18
0
15 Aug 2023
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping
Y. A. D. Djilali
Sanath Narayan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
37
10
0
11 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
DiffM
38
1
0
31 Jul 2023
Cascaded encoders for fine-tuning ASR models on overlapped speech
R. Rose
Oscar Chang
Olivier Siohan
26
1
0
28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
37
3
0
27 Jun 2023
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation
Qianji Di
Wenxing Ma
Zhongang Qi
Tianxiang Hou
Ying Shan
Hanzi Wang
14
0
0
23 Jun 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces
Ziqiao Peng
Yihao Luo
Yue Shi
Hao-Xuan Xu
Xiangyu Zhu
Jun He
Hongyan Liu
Zhaoxin Fan
55
40
0
19 Jun 2023
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning
Sara Kashiwagi
Keitaro Tanaka
Qi Feng
Shigeo Morishima
19
2
0
23 May 2023
Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition
Pangoth Santhosh Kumar
Garika Akshay
17
2
0
30 Apr 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
M. Pantic
Christian Fuegen
52
19
0
30 Mar 2023
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation
Haozhe Wu
Jia Jia
Junliang Xing
Hongwei Xu
Xiangyuan Wang
Jelo Wang
CVBM
32
7
0
17 Mar 2023
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
50
14
0
17 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
40
34
0
10 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
29
10
0
21 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi
Radu Timofte
VLM
11
33
0
04 Jan 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
M. Pantic
SSL
45
48
0
12 Dec 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
33
37
0
21 Nov 2022
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
51
28
0
28 Sep 2022
Deep Learning for Visual Speech Analysis: A Survey
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Y. Guo
Xin Xu
M. Pietikäinen
Li Liu
VLM
34
33
0
22 May 2022
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Otavio Braga
Takaki Makino
Olivier Siohan
H. Liao
CVBM
16
15
0
11 May 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Otavio Braga
Olivier Siohan
21
7
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
29
8
0
10 May 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module
R. Rose
Olivier Siohan
13
3
0
01 Apr 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
128
144
0
26 Feb 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Imran Razzak
Kevin Lee
Chetan Arora
Ali Hassani
A. Zaslavsky
AAML
29
6
0
22 Feb 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate
Aixiang Chen
Chen
Jinting Zhang
Zanbo Zhang
Zhihong Li
46
0
0
21 Feb 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
91
40
0
25 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
46
305
0
05 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
38
23
0
09 Dec 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
19
16
0
19 Nov 2021
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
11
13
0
29 Oct 2021
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
30
15
0
15 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
G
ˉ
m
s
t
\bar{G}_{mst}
G
ˉ
m
s
t
:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It
Aixiang Chen
19
0
0
07 Oct 2021
Audio-Visual Speech Recognition is Worth 32
×
\times
×
32
×
\times
×
8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
31
7
0
20 Sep 2021
Interactive decoding of words from visual speech recognition models
Brendan Shillingford
Yannis Assael
Misha Denil
15
0
0
01 Jul 2021
Understanding the Design Space of Mouth Microgestures
Victor Chen
Xuhai Xu
Richard Li
Yuanchun Shi
Shwetak N. Patel
Yuntao wang
11
21
0
02 Jun 2021
1
2
Next