Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.05358
Cited By
Lip Reading Sentences in the Wild
16 November 2016
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Lip Reading Sentences in the Wild"
50 / 340 papers shown
Title
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer
Young-Hu Park
R.-H. Park
Hyung-Min Park
49
0
0
07 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
120
0
0
06 May 2025
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
49
0
0
29 Apr 2025
Development and evaluation of a deep learning algorithm for German word recognition from lip movements
Dinh Nam Pham
Torsten Rahne
32
2
0
22 Apr 2025
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
Jinghua Zhao
Yuhang Jia
Shiyao Wang
Jiaming Zhou
Hui Wang
Yong Qin
37
0
0
21 Apr 2025
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fa-Ting Hong
Zunnan Xu
Zixiang Zhou
Zhiqiang Zhang
Xiu Li
Qin Lin
Qinglin Lu
D. Xu
DiffM
VGen
57
2
0
03 Apr 2025
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas
Edward Fish
Richard Bowden
41
0
0
27 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
53
1
0
21 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
63
0
0
14 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
56
0
0
14 Mar 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
57
0
0
09 Mar 2025
Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task Learning
Yi He
Lei Yang
Shilin Wang
56
0
0
05 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Luu Anh Tuan
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
79
2
0
04 Mar 2025
The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
Jenalea Rajab
Anuoluwapo Aremu
Everlyn Asiko Chimoto
Dale Dunbar
Graham Morrissey
...
Onyothi Nekoto
Pelonomi Moiloa
Jade Z. Abbott
Vukosi Marivate
Benjamin Rosman
46
0
0
21 Feb 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
49
1
0
17 Feb 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
Moreno La Quatra
Valerio Mario Salerno
Yu Tsao
Sabato Marco Siniscalchi
94
0
0
22 Jan 2025
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Zhaofeng Lin
Naomi Harte
86
1
0
20 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
59
3
0
03 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
84
0
0
18 Dec 2024
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
Dragos-Alexandru Boldisor
Stefan Smeu
Dan Oneaţă
Elisabeta Oneata
103
1
0
29 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
M. Pantic
SSL
37
5
0
04 Nov 2024
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition
Ziqiang Liu
Xiaolou Li
Chen Chen
Li Guo
Lantian Li
D. Wang
30
0
0
21 Oct 2024
LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details
Jian Yang
Xukun Wang
Wentao Wang
Guoming Li
Qihang Fang
Ruihong Yuan
Tianyang Wang
Jason Zhaoxin Fan
Yeying Jin
Zhaoxin Fan
VGen
47
1
0
01 Oct 2024
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective
Chen Chen
Xiaolou Li
Zehua Liu
Lantian Li
D. Wang
31
0
0
29 Sep 2024
You Only Speak Once to See
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
35
1
0
27 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
38
4
0
22 Sep 2024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Yihan Wu
Yifan Peng
Yichen Lu
Xuankai Chang
Ruihua Song
Shinji Watanabe
49
2
0
19 Sep 2024
DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
Fa-Ting Hong
Yunfei Liu
Yu Li
Changyin Zhou
Fei Yu
D. Xu
DiffM
35
0
0
16 Sep 2024
Interpretable Convolutional SyncNet
Sungjoon Park
Jaesub Yun
Donggeon Lee
Minsik Park
54
0
0
02 Sep 2024
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Sungnyun Kim
Kangwook Jang
Sangmin Bae
Hoirin Kim
Se-Young Yun
47
3
0
04 Jul 2024
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert
Han EunGi
Oh Hyun-Bin
Kim Sung-Bin
Corentin Nivelet Etcheberry
Suekyeong Nam
Janghoon Joo
Tae-Hyun Oh
23
5
0
01 Jul 2024
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
Lu Yin
Q. Xiao
Stavros Petridis
Shiwei Liu
Maja Pantic
46
2
0
25 Jun 2024
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Young Jin Ahn
Jungwoo Park
Sangha Park
Jonghyun Choi
Kee-Eung Kim
34
7
0
18 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
39
3
0
06 Jun 2024
OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance
Shuheng Ge
Haoyu Xing
Li Zhang
Xiangqian Wu
39
0
0
23 May 2024
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Chengwei Qin
Pin-Yu Chen
Chng Eng Siong
Chao Zhang
VLM
33
3
0
23 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
E. Chng
Ruizhe Li
AuLLM
KELM
49
5
0
16 May 2024
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Linzhi Wu
Xingyu Zhang
Yakun Zhang
Changyan Zheng
Tiejun Liu
Liang Xie
Ye Yan
Erwei Yin
29
1
0
24 Mar 2024
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception
HyoJung Han
Mohamed Anwar
J. Pino
Wei-Ning Hsu
Marine Carpuat
Bowen Shi
Changhan Wang
VLM
37
9
0
21 Mar 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya-Qin Zhang
Yanfeng Wang
39
10
0
17 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
42
8
0
14 Mar 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai
Hang Chen
Jun Du
Ruoyu Wang
Shihao Chen
Jie Ma
Haotian Wang
Chin-Hui Lee
45
4
0
07 Mar 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
27
4
0
02 Mar 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
56
22
0
23 Feb 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
32
1
0
20 Feb 2024
Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading
Samar Daou
Ahmed Rekik
A. Ben-Hamadou
Abdelaziz Kallel
31
3
0
18 Feb 2024
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
27
5
0
18 Jan 2024
1
2
3
4
5
6
7
Next