Title
Development and evaluation of a deep learning algorithm for German word recognition from lip movements Dinh Nam Pham Torsten Rahne 55 2 0 22 Apr 2025
VALLR: Visual ASR Language Model for Lip Reading Marshall Thomas Edward Fish Richard Bowden 41 0 0 27 Mar 2025
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs A. Haliassos Rodrigo Mira Honglie Chen Zoe Landgraf Stavros Petridis M. Pantic SSL 37 5 0 04 Nov 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition Ziqiang Liu Xiaolou Li Chen Chen Li Guo Lantian Li D. Wang 35 0 0 21 Oct 2024
Approaching Metaheuristic Deep Learning Combos for Automated Data Mining Gustavo Assunção Paulo Menezes 24 0 0 16 Oct 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? Yiwen Guan V. Trinh Vivek Voleti Jacob Whitehill 42 1 0 13 Sep 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge Chen Chen Zehua Liu Xiaolou Li Lantian Li D. Wang 35 2 0 14 Jun 2024
BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition A. Haliassos Andreas Zinonos Rodrigo Mira Stavros Petridis Maja Pantic VLM SSL AI4TS 47 12 0 02 Apr 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Jeong Hun Yeo Seunghee Han Minsu Kim Y. Ro 56 32 0 23 Feb 2024
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition Hao Wang Shuhei Kurita Shuichiro Shimizu Daisuke Kawahara 15 3 0 18 Jan 2024
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data Hendrik Laux Emil Mededovic Ahmed Hallawa Lukas Martin A. Peine Anke Schmeink VLM 26 4 0 15 Dec 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition Andrew Rouditchenko R. Collobert Tatiana Likhomanenko VLM 27 3 0 29 Sep 2023
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation Li Liu Lufei Gao Wen-Ling Lei Fengji Ma Xiaotian Lin Jin-Tao Wang CVBM 27 5 0 17 Aug 2023
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model Jeong Hun Yeo Minsu Kim J. Choi Dae Hoe Kim Y. Ro 26 18 0 15 Aug 2023
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping Y. A. D. Djilali Sanath Narayan Haithem Boussaid Ebtesam Almazrouei Merouane Debbah 37 10 0 11 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio Triantafyllos Kefalas Yannis Panagakis M. Pantic VGen DiffM 38 1 0 31 Jul 2023
Cascaded encoders for fine-tuning ASR models on overlapped speech R. Rose Oscar Chang Olivier Siohan 26 1 0 28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis Triantafyllos Kefalas Yannis Panagakis M. Pantic VGen 37 3 0 27 Jun 2023
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation Qianji Di Wenxing Ma Zhongang Qi Tianxiang Hou Ying Shan Hanzi Wang 14 0 0 23 Jun 2023
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces Ziqiao Peng Yihao Luo Yue Shi Hao-Xuan Xu Xiangyu Zhu Jun He Hongyan Liu Zhaoxin Fan 55 40 0 19 Jun 2023
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning Sara Kashiwagi Keitaro Tanaka Qi Feng Shigeo Morishima 19 2 0 23 May 2023
Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition Pangoth Santhosh Kumar Garika Akshay 17 2 0 30 Apr 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision Xubo Liu Egor Lakomkin Konstantinos Vougioukas Pingchuan Ma Honglie Chen ... Niko Moritz J. Kolár Stavros Petridis M. Pantic Christian Fuegen 52 19 0 30 Mar 2023
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation Haozhe Wu Jia Jia Junliang Xing Hongwei Xu Xiangyuan Wang Jelo Wang CVBM 32 7 0 17 Mar 2023
Conformers are All You Need for Visual Speech Recognition Oscar Chang H. Liao Dmitriy Serdyuk Ankit Parag Shah Olivier Siohan VLM 50 14 0 17 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations Jiachen Lian Alexei Baevski Wei-Ning Hsu Michael Auli SSL 40 34 0 10 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset J. Peymanfard Samin Heydarian Ali Lashini Hossein Zeinali Mohammad Reza Mohammadi N. Mozayani 29 10 0 21 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition Maxime Burchi Radu Timofte VLM 11 33 0 04 Jan 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data A. Haliassos Pingchuan Ma Rodrigo Mira Stavros Petridis M. Pantic SSL 45 48 0 12 Dec 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning Qiu-shi Zhu Long Zhou Zi-Hua Zhang Shujie Liu Binxing Jiao Jie Zhang Lirong Dai Daxin Jiang Jinyu Li Furu Wei 33 37 0 21 Nov 2022
TVLT: Textless Vision-Language Transformer Zineng Tang Jaemin Cho Yixin Nie Joey Tianyi Zhou VLM 51 28 0 28 Sep 2022
Deep Learning for Visual Speech Analysis: A Survey Changchong Sheng Gangyao Kuang L. Bai Chen Hou Y. Guo Xin Xu M. Pietikäinen Li Liu VLM 34 33 0 22 May 2022
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition Otavio Braga Takaki Makino Olivier Siohan H. Liao CVBM 16 15 0 11 May 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection Otavio Braga Olivier Siohan 21 7 0 11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection Otavio Braga Olivier Siohan CVBM 29 8 0 10 May 2022
End-to-end multi-talker audio-visual ASR using an active speaker attention module R. Rose Olivier Siohan 13 3 0 01 Apr 2022
Visual Speech Recognition for Multiple Languages in the Wild Pingchuan Ma Stavros Petridis M. Pantic VLM 128 144 0 26 Feb 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey Ngoc Dung Huynh Mohamed Reda Bouadjenek Imran Razzak Kevin Lee Chetan Arora Ali Hassani A. Zaslavsky AAML 29 6 0 22 Feb 2022
MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an Exponential Convergence Rate Aixiang Chen Chen Jinting Zhang Zanbo Zhang Zhihong Li 46 0 0 21 Feb 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 91 40 0 25 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Bowen Shi Wei-Ning Hsu Kushal Lakhotia Abdel-rahman Mohamed SSL 46 305 0 05 Jan 2022
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 38 23 0 09 Dec 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 19 16 0 19 Nov 2021
Visual Keyword Spotting with Attention Prajwal K R Liliane Momeni Triantafyllos Afouras Andrew Zisserman 11 13 0 29 Oct 2021
Advances and Challenges in Deep Lip Reading Marzieh Oghbaie Arian Sabaghi Kooshan Hashemifard Mohammad Akbari VLM 30 15 0 15 Oct 2021
Sub-word Level Lip Reading With Visual Attention Prajwal K R Triantafyllos Afouras Andrew Zisserman 17 92 0 14 Oct 2021
$$\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It$ $\bar{G}_{mst}$ :An Unbiased Stratified Statistic and a Fast Gradient Optimization Algorithm Based on It Aixiang Chen 19 0 0 07 Oct 2021
$Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels$ Audio-Visual Speech Recognition is Worth 32 $\times$ 32 $\times$ 8 Voxels Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 31 7 0 20 Sep 2021
Interactive decoding of words from visual speech recognition models Brendan Shillingford Yannis Assael Misha Denil 15 0 0 01 Jul 2021
Understanding the Design Space of Mouth Microgestures Victor Chen Xuhai Xu Richard Li Yuanchun Shi Shwetak N. Patel Yuntao wang 11 21 0 02 Jun 2021