ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.05358
  4. Cited By
Lip Reading Sentences in the Wild
v1v2 (latest)

Lip Reading Sentences in the Wild

16 November 2016
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Lip Reading Sentences in the Wild"

50 / 344 papers shown
Title
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition
  for Single and Multi-Person Video
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
191
41
0
25 Jan 2022
Survey on the Convergence of Machine Learning and Blockchain
Survey on the Convergence of Machine Learning and Blockchain
Sheng Ding
Chenhui Hu
SyDa
93
10
0
04 Jan 2022
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face
  Attributes Neural Rendering
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering
Shunyu Yao
Ruizhe Zhong
Yichao Yan
Guangtao Zhai
Xiaokang Yang
CVBM
78
93
0
03 Jan 2022
Skin feature point tracking using deep feature encodings
Skin feature point tracking using deep feature encodings
J. Chang
Torbjörn E. M. Nordling
70
2
0
28 Dec 2021
Associative Adversarial Learning Based on Selective Attack
Associative Adversarial Learning Based on Selective Attack
Runqi Wang
Xiaoyue Duan
Baochang Zhang
Shenjun Xue
Wentao Zhu
David Doermann
G. Guo
AAML
79
0
0
28 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
200
51
0
27 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction
  and Lip Reading
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
79
23
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
124
40
0
08 Dec 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
81
27
0
25 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
89
27
0
21 Nov 2021
Deep Spoken Keyword Spotting: An Overview
Deep Spoken Keyword Spotting: An Overview
Iván López-Espejo
Zheng-Hua Tan
John H. L. Hansen
Jesper Jensen
87
107
0
20 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
72
18
0
19 Nov 2021
3D Lip Event Detection via Interframe Motion Divergence at Multiple
  Temporal Resolutions
3D Lip Event Detection via Interframe Motion Divergence at Multiple Temporal Resolutions
Jie M. Zhang
Robert B. Fisher
22
1
0
18 Nov 2021
LiMuSE: Lightweight Multi-modal Speaker Extraction
LiMuSE: Lightweight Multi-modal Speaker Extraction
Qinghua Liu
Yating Huang
Yunzhe Hao
Jiaming Xu
Bo Xu
105
6
0
07 Nov 2021
Personalized One-Shot Lipreading for an ALS Patient
Personalized One-Shot Lipreading for an ALS Patient
Bipasha Sen
Aditya Agarwal
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
LM&MA
50
3
0
02 Nov 2021
Evaluation of Human and Machine Face Detection using a Novel Distinctive
  Human Appearance Dataset
Evaluation of Human and Machine Face Detection using a Novel Distinctive Human Appearance Dataset
Necdet Gurkan
Jordan W. Suchow
CVBM
57
3
0
01 Nov 2021
Visual Keyword Spotting with Attention
Visual Keyword Spotting with Attention
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
72
13
0
29 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffMVGen
99
43
0
15 Oct 2021
Advances and Challenges in Deep Lip Reading
Advances and Challenges in Deep Lip Reading
Marzieh Oghbaie
Arian Sabaghi
Kooshan Hashemifard
Mohammad Akbari
VLM
67
15
0
15 Oct 2021
Sub-word Level Lip Reading With Visual Attention
Sub-word Level Lip Reading With Visual Attention
Prajwal K R
Triantafyllos Afouras
Andrew Zisserman
91
93
0
14 Oct 2021
Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels
Audio-Visual Speech Recognition is Worth 32×\times×32×\times×8 Voxels
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
87
7
0
20 Sep 2021
Invertible Frowns: Video-to-Video Facial Emotion Translation
Invertible Frowns: Video-to-Video Facial Emotion Translation
Ian H. Magnusson
Aruna Sankaranarayanan
A. Lippman
VGen
76
7
0
16 Sep 2021
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments
Wentao Yu
Steffen Zeiler
D. Kolossa
88
3
0
10 Sep 2021
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided
  Adaptive Memory
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory
Zhijie Lin
Zhou Zhao
Haoyuan Li
Jinglin Liu
Meng Zhang
Xingshan Zeng
Xiaofei He
52
18
0
31 Aug 2021
Look Who's Talking: Active Speaker Detection in the Wild
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
113
21
0
17 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary
  Person
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
Xinsheng Wang
Qicong Xie
Jihua Zhu
Lei Xie
O. Scharenborg
120
19
0
09 Aug 2021
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip
  Reading
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading
Shahd Elashmawy
Marian M. Ramsis
Hesham M. Eraqi
Farah Eldeshnawy
Hadeel Mabrouk
Omar Abugabal
Nourhan Sakr
77
1
0
07 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
65
7
0
01 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
139
4
0
26 Jul 2021
Parallel and High-Fidelity Text-to-Lip Generation
Parallel and High-Fidelity Text-to-Lip Generation
Jinglin Liu
Zhiying Zhu
Yi Ren
Wencan Huang
Baoxing Huai
N. Yuan
Zhou Zhao
55
10
0
14 Jul 2021
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
L. D. Pham
Alexander Schindler
Mina Schütz
Jasmin Lampert
S. Schlarb
Ross King
57
9
0
12 Jun 2021
Audio-visual scene classification: analysis of DCASE 2021 Challenge
  submissions
Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions
Shanshan Wang
Toni Heittola
A. Mesaros
Tuomas Virtanen
32
18
0
28 May 2021
Improving Sign Language Translation with Monolingual Data by Sign
  Back-Translation
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
Hao Zhou
Wen-gang Zhou
Weizhen Qi
Junfu Pu
Houqiang Li
SLR
65
194
0
26 May 2021
Lip reading using external viseme decoding
Lip reading using external viseme decoding
J. Peymanfard
Mohammad Reza Mohammadi
Hossein Zeinali
N. Mozayani
48
11
0
10 Apr 2021
Context-self contrastive pretraining for crop type semantic segmentation
Context-self contrastive pretraining for crop type semantic segmentation
Michail Tarasiou
R. Güler
Stefanos Zafeiriou
SSL
63
17
0
09 Apr 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
MPN: Multimodal Parallel Network for Audio-Visual Event Localization
Jiashuo Yu
Ying Cheng
Rui Feng
73
14
0
07 Apr 2021
Contrastive Learning of Global-Local Video Representations
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
104
7
0
07 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
102
39
0
05 Apr 2021
Robust Audio-Visual Instance Discrimination
Robust Audio-Visual Instance Discrimination
Pedro Morgado
Ishan Misra
Nuno Vasconcelos
SSL
115
110
0
29 Mar 2021
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual
  Speech Separation
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Jiyoung Lee
Soo-Whan Chung
Sunok Kim
Hong-Goo Kang
Kwanghoon Sohn
64
51
0
25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation
  Learning
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
100
35
0
18 Mar 2021
KoDF: A Large-scale Korean DeepFake Detection Dataset
KoDF: A Large-scale Korean DeepFake Detection Dataset
Patrick Kwon
J. You
Gyuhyeon Nam
Sungwoo Park
Gyeongsu Chae
112
104
0
18 Mar 2021
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
Maja Pantic
160
234
0
12 Feb 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
130
52
0
11 Jan 2021
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention
Hang Chen
Jun Du
Yu Hu
Lirong Dai
Chin-Hui Lee
Baocai Yin
64
6
0
28 Dec 2020
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with
  Visual and Thermal Video Streams
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams
Madina Abdrakhmanova
Askat Kuzdeuov
Sheikh Jarju
Yerbolat Khassanov
Michael Lewis
H. A. Varol
CVBM
61
58
0
05 Dec 2020
AuthNet: A Deep Learning based Authentication Mechanism using Temporal
  Facial Feature Movements
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements
M. Raghavendra
P. Omprakash
B. Mukesh
Sowmya Kamath
CVBM
24
2
0
04 Dec 2020
Disentangling Homophemes in Lip Reading using Perplexity Analysis
Disentangling Homophemes in Lip Reading using Perplexity Analysis
Souheil Fenghour
Daqing Chen
Kun Guo
Perry Xiao
43
3
0
28 Nov 2020
End-to-end Silent Speech Recognition with Acoustic Sensing
End-to-end Silent Speech Recognition with Acoustic Sensing
Jian Luo
Jianzong Wang
Ning Cheng
Guilin Jiang
Jing Xiao
16
7
0
23 Nov 2020
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging,
  audio, and lip videos
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos
M. Ribeiro
Jennifer Sanger
Jingxuan Zhang
Aciel Eshky
A. Wrench
Korin Richmond
Steve Renals
LM&MA
48
35
0
19 Nov 2020
Previous
1234567
Next