Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.05358
Cited By
v1
v2 (latest)
Lip Reading Sentences in the Wild
16 November 2016
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Lip Reading Sentences in the Wild"
50 / 344 papers shown
Title
Word-level Persian Lipreading Dataset
J. Peymanfard
Ali Lashini
Samin Heydarian
Hossein Zeinali
N. Mozayani
68
5
0
08 Apr 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
60
15
0
29 Mar 2023
LIPSFUS: A neuromorphic dataset for audio-visual sensory fusion of lip reading
A. Rios-Navarro
E. Piñero-Fuentes
S. Canas-Moreno
Aqib Javed
Jin Harkin
A. Linares-Barranco
25
4
0
28 Mar 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
Maja Pantic
96
115
0
25 Mar 2023
MusicFace: Music-driven Expressive Singing Face Synthesis
Peng Liu
W. Deng
Hengda Li
Jintai Wang
Yinglin Zheng
Yiwei Ding
Xiaohu Guo
Ming Zeng
CVBM
75
12
0
24 Mar 2023
Learning Cross-lingual Visual Speech Representations
Andreas Zinonos
A. Haliassos
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
48
8
0
14 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
51
3
0
09 Mar 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
84
25
0
09 Mar 2023
A Light Weight Model for Active Speaker Detection
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
74
43
0
08 Mar 2023
Visuo-Tactile-Based Slip Detection Using A Multi-Scale Temporal Convolution Network
Junli Gao
Zhaoji Huang
Zhao-Li Tang
Haitao Song
Wenyu Liang
68
5
0
27 Feb 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
78
3
0
27 Feb 2023
Lip-to-Speech Synthesis in the Wild with Multi-task Learning
Minsu Kim
Joanna Hong
Y. Ro
101
25
0
17 Feb 2023
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
Minsu Kim
Hyungil Kim
Y. Ro
VLM
69
19
0
16 Feb 2023
LipFormer: Learning to Lipread Unseen Speakers based on Visual-Landmark Transformers
Feng Xue
Yu Li
Deyin Liu
Yincen Xie
Lin Wu
Richang Hong
76
14
0
04 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
76
11
0
21 Jan 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
86
17
0
19 Jan 2023
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
J. Park
Jung-Wook Hwang
Kwanghee Choi
Seung-Hyun Lee
Jun-Hwan Ahn
R.-H. Park
Hyung-Min Park
67
3
0
16 Jan 2023
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model
Dan Bigioi
Shubhajit Basak
Michał Stypułkowski
Maciej Ziȩba
H. Jordan
R. Mcdonnell
Peter Corcoran
DiffM
VGen
110
36
0
10 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi
Radu Timofte
VLM
86
35
0
04 Jan 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
92
49
0
12 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming-Hsuan Yang
Qin Huang
141
27
0
08 Dec 2022
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
60
3
0
21 Nov 2022
AVATAR submission to the Ego4D AV Transcription Challenge
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
40
0
0
18 Nov 2022
An Investigation of Smart Contract for Collaborative Machine Learning Model Training
Sheng Ding
Chenhui Hu
46
2
0
12 Sep 2022
Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Jiadong Wang
Xinyuan Qian
Haizhou Li
68
14
0
05 Sep 2022
Training Strategies for Improved Lip-reading
Pingchuan Ma
Yujiang Wang
Stavros Petridis
Jie Shen
Maja Pantic
133
49
0
03 Sep 2022
Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
116
11
0
01 Sep 2022
Bayesian Neural Network Language Modeling for Speech Recognition
Boyang Xue
Shoukang Hu
Junhao Xu
Mengzhe Geng
Xunying Liu
Helen M. Meng
UQCV
BDL
127
18
0
28 Aug 2022
Speaker-adaptive Lip Reading with User-dependent Padding
Minsu Kim
Hyunjun Kim
Y. Ro
84
21
0
09 Aug 2022
Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition
Joanna Hong
Minsu Kim
Daehun Yoo
Y. Ro
66
21
0
13 Jul 2022
Dual-Path Cross-Modal Attention for better Audio-Visual Speech Extraction
Zhongweiyang Xu
Xulin Fan
M. Hasegawa-Johnson
53
3
0
09 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
102
0
0
28 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
132
136
0
18 Jun 2022
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur
Paul Hongsuck Seo
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
72
11
0
15 Jun 2022
Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel
M. Behr
Fevziye Irem Eyiokur
Dogucan Yaman
Tuan-Nam Nguyen
Carlos Mullov
Mehmet Arif Demirtas
Alperen Kantarci
Stefan Constantin
H. K. Ekenel
CVBM
69
16
0
09 Jun 2022
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
Hesham M. Eraqi
VLM
68
2
0
05 Jun 2022
Learning Speaker-specific Lip-to-Speech Generation
Munender Varshney
Ravindra Yadav
Vinay P. Namboodiri
R. Hegde
102
7
0
04 Jun 2022
HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System
Bao-Sinh Nguyen
Q. Tran
Tuan-Anh Dang Nguyen
D. Nguyen
H. Le
58
0
0
01 Jun 2022
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Debjoy Saha
Shravan Nayak
Timo Baumann
87
3
0
24 May 2022
Deep Learning for Visual Speech Analysis: A Survey
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Y. Guo
Xin Xu
M. Pietikäinen
Li Liu
VLM
98
36
0
22 May 2022
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Otavio Braga
Takaki Makino
Olivier Siohan
H. Liao
CVBM
59
15
0
11 May 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
Otavio Braga
Olivier Siohan
78
7
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
Otavio Braga
Olivier Siohan
CVBM
63
8
0
10 May 2022
Scaling up sign spotting through sign language dictionaries
Gül Varol
Liliane Momeni
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
66
15
0
09 May 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
95
64
0
04 Apr 2022
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction
Zexu Pan
Meng Ge
Haizhou Li
72
20
0
31 Mar 2022
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
167
81
0
17 Mar 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
230
152
0
26 Feb 2022
Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video
Matthew Groh
Aruna Sankaranarayanan
Nikhil Singh
Dong Young Kim
A. Lippman
Rosalind W. Picard
112
19
0
25 Feb 2022
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Xichen Pan
Peiyu Chen
Yichen Gong
Helong Zhou
Xinbing Wang
Zhouhan Lin
SSL
86
37
0
24 Feb 2022
Previous
1
2
3
4
5
6
7
Next