ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.05358
  4. Cited By
Lip Reading Sentences in the Wild

Lip Reading Sentences in the Wild

16 November 2016
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
ArXivPDFHTML

Papers citing "Lip Reading Sentences in the Wild"

50 / 340 papers shown
Title
Associative Adversarial Learning Based on Selective Attack
Associative Adversarial Learning Based on Selective Attack
Runqi Wang
Xiaoyue Duan
Baochang Zhang
Shenjun Xue
Wentao Zhu
David Doermann
G. Guo
AAML
34
0
0
28 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
29
48
0
27 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
38
23
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
26
37
0
08 Dec 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
23
0
25 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Deep Spoken Keyword Spotting: An Overview
Deep Spoken Keyword Spotting: An Overview
Iván López-Espejo
Zheng-Hua Tan
John H. L. Hansen
Jesper Jensen
21
102
0
20 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
19
16
0
19 Nov 2021
3D Lip Event Detection via Interframe Motion Divergence at Multiple
  Temporal Resolutions
3D Lip Event Detection via Interframe Motion Divergence at Multiple Temporal Resolutions
Jie M. Zhang
Robert B. Fisher
12
1
0
18 Nov 2021
LiMuSE: Lightweight Multi-modal Speaker Extraction
LiMuSE: Lightweight Multi-modal Speaker Extraction
Qinghua Liu
Yating Huang
Yunzhe Hao
Jiaming Xu
Bo Xu
43
6
0
07 Nov 2021
Personalized One-Shot Lipreading for an ALS Patient
Personalized One-Shot Lipreading for an ALS Patient
Bipasha Sen
Aditya Agarwal
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
LM&MA
11
3
0
02 Nov 2021
Evaluation of Human and Machine Face Detection using a Novel Distinctive
  Human Appearance Dataset
Evaluation of Human and Machine Face Detection using a Novel Distinctive Human Appearance Dataset
Necdet Gurkan
Jordan W. Suchow
CVBM
20
3
0
01 Nov 2021
Visual Keyword Spotting with Attention
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
11
13
0
29 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
Advances and Challenges in Deep Lip Reading
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
30
15
0
15 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
17
92
0
14 Oct 2021
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
29
7
0
20 Sep 2021
Invertible Frowns: Video-to-Video Facial Emotion Translation
Invertible Frowns: Video-to-Video Facial Emotion Translation
Ian H. Magnusson
Aruna Sankaranarayanan
A. Lippman
VGen
32
6
0
16 Sep 2021
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Wentao Yu
Steffen Zeiler
D. Kolossa
64
3
0
10 Sep 2021
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided
  Adaptive Memory
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory
Zhijie Lin
Zhou Zhao
Haoyuan Li
Jinglin Liu
Meng Zhang
Xingshan Zeng
Xiaofei He
24
18
0
31 Aug 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
44
20
0
17 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary
  Person
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
Xinsheng Wang
Qicong Xie
Jihua Zhu
Lei Xie
O. Scharenborg
31
16
0
09 Aug 2021
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip
  Reading
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading
Shahd Elashmawy
Marian M. Ramsis
Hesham M. Eraqi
Farah Eldeshnawy
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
35
1
0
07 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
26
7
0
01 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
13
4
0
26 Jul 2021
Parallel and High-Fidelity Text-to-Lip Generation
Parallel and High-Fidelity Text-to-Lip Generation
Jinglin Liu
Zhiying Zhu
Yi Ren
Wencan Huang
Baoxing Huai
N. Yuan
Zhou Zhao
40
10
0
14 Jul 2021
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
L. D. Pham
Alexander Schindler
Mina Schütz
Jasmin Lampert
S. Schlarb
Ross King
32
9
0
12 Jun 2021
Audio-visual scene classification: analysis of DCASE 2021 Challenge
  submissions
Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions
Shanshan Wang
Toni Heittola
A. Mesaros
Tuomas Virtanen
19
18
0
28 May 2021
Improving Sign Language Translation with Monolingual Data by Sign
  Back-Translation
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
Hao Zhou
Wen-gang Zhou
Weizhen Qi
Junfu Pu
Houqiang Li
SLR
35
180
0
26 May 2021
Lip reading using external viseme decoding
Lip reading using external viseme decoding
J. Peymanfard
Mohammad Reza Mohammadi
Hossein Zeinali
N. Mozayani
13
11
0
10 Apr 2021
Context-self contrastive pretraining for crop type semantic segmentation
Context-self contrastive pretraining for crop type semantic segmentation
Michail Tarasiou
R. Güler
S. Zafeiriou
SSL
26
17
0
09 Apr 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
Jiashuo Yu
Ying Cheng
Rui Feng
23
14
0
07 Apr 2021
Contrastive Learning of Global-Local Video Representations
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
32
7
0
07 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
Robust Audio-Visual Instance Discrimination
Robust Audio-Visual Instance Discrimination
Pedro Morgado
Ishan Misra
Nuno Vasconcelos
SSL
19
110
0
29 Mar 2021
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual
  Speech Separation
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee
Soo-Whan Chung
Sunok Kim
Hong-Goo Kang
Kwanghoon Sohn
4
51
0
25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
KoDF: A Large-scale Korean DeepFake Detection Dataset
KoDF: A Large-scale Korean DeepFake Detection Dataset
Patrick Kwon
J. You
Gyuhyeon Nam
Sungwoo Park
Gyeongsu Chae
29
100
0
18 Mar 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
84
225
0
12 Feb 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
65
51
0
11 Jan 2021
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Chin-Hui Lee
Baocai Yin
22
6
0
28 Dec 2020
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with
  Visual and Thermal Video Streams
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
Madina Abdrakhmanova
Askat Kuzdeuov
Sheikh Jarju
Yerbolat Khassanov
Michael Lewis
H. A. Varol
CVBM
10
58
0
05 Dec 2020
AuthNet: A Deep Learning based Authentication Mechanism using Temporal
  Facial Feature Movements
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements
M. Raghavendra
P. Omprakash
B. Mukesh
Sowmya Kamath
CVBM
12
2
0
04 Dec 2020
Disentangling Homophemes in Lip Reading using Perplexity Analysis
Disentangling Homophemes in Lip Reading using Perplexity Analysis
Souheil Fenghour
Daqing Chen
Kun Guo
Perry Xiao
31
3
0
28 Nov 2020
End-to-end Silent Speech Recognition with Acoustic Sensing
End-to-end Silent Speech Recognition with Acoustic Sensing
Jian Luo
Jianzong Wang
Ning Cheng
Guilin Jiang
Jing Xiao
6
7
0
23 Nov 2020
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging,
  audio, and lip videos
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos
M. Ribeiro
Jennifer Sanger
Jingxuan Zhang
Aciel Eshky
A. Wrench
Korin Richmond
Steve Renals
LM&MA
19
33
0
19 Nov 2020
Learn an Effective Lip Reading Model without Pains
Learn an Effective Lip Reading Model without Pains
Dalu Feng
Shuang Yang
Shiguang Shan
Xilin Chen
30
61
0
15 Nov 2020
Augmenting Images for ASR and TTS through Single-loop and Dual-loop
  Multimodal Chain Framework
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
11
3
0
04 Nov 2020
Watch, read and lookup: learning to spot signs from multiple supervisors
Watch, read and lookup: learning to spot signs from multiple supervisors
Liliane Momeni
Gül Varol
Samuel Albanie
Triantafyllos Afouras
Andrew Zisserman
21
43
0
08 Oct 2020
Training Strategies to Handle Missing Modalities for Audio-Visual
  Expression Recognition
Training Strategies to Handle Missing Modalities for Audio-Visual Expression Recognition
Srinivas Parthasarathy
Shiva Sundaram
16
76
0
02 Oct 2020
Previous
1234567
Next