v1v2 (latest)

Lip Reading Sentences in the Wild

16 November 2016

Joon Son Chung

Papers citing "Lip Reading Sentences in the Wild"

50 / 344 papers shown

Title
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 191 41 0 25 Jan 2022
Survey on the Convergence of Machine Learning and Blockchain Sheng Ding Chenhui Hu SyDa 93 10 0 04 Jan 2022
DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering Shunyu Yao Ruizhe Zhong Yichao Yan Guangtao Zhai Xiaokang Yang CVBM 78 93 0 03 Jan 2022
Skin feature point tracking using deep feature encodings J. Chang Torbjörn E. M. Nordling 70 2 0 28 Dec 2021
Associative Adversarial Learning Based on Selective Attack Runqi Wang Xiaoyue Duan Baochang Zhang Shenjun Xue Wentao Zhu David Doermann G. Guo AAML 79 0 0 28 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era Fangneng Zhan Yingchen Yu Rongliang Wu Jiahui Zhang Shijian Lu Lingjie Liu Adam Kortylewski Christian Theobalt Eric Xing EGVM 200 51 0 27 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 79 23 0 09 Dec 2021
Audio-Visual Synchronisation in the wild Honglie Chen Weidi Xie Triantafyllos Afouras Arsha Nagrani Andrea Vedaldi Andrew Zisserman 124 40 0 08 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 81 27 0 25 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video Rishabh Garg Ruohan Gao Kristen Grauman 89 27 0 21 Nov 2021
Deep Spoken Keyword Spotting: An Overview Iván López-Espejo Zheng-Hua Tan John H. L. Hansen Jesper Jensen 87 107 0 20 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech Michael Hassid Michelle Tadmor Ramanovich Brendan Shillingford Miaosen Wang Ye Jia Tal Remez DiffM 72 18 0 19 Nov 2021
3D Lip Event Detection via Interframe Motion Divergence at Multiple Temporal Resolutions Jie M. Zhang Robert B. Fisher 22 1 0 18 Nov 2021
LiMuSE: Lightweight Multi-modal Speaker Extraction Qinghua Liu Yating Huang Yunzhe Hao Jiaming Xu Bo Xu 105 6 0 07 Nov 2021
Personalized One-Shot Lipreading for an ALS Patient Bipasha Sen Aditya Agarwal Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar LM&MA 50 3 0 02 Nov 2021
Evaluation of Human and Machine Face Detection using a Novel Distinctive Human Appearance Dataset Necdet Gurkan Jordan W. Suchow CVBM 57 3 0 01 Nov 2021
Visual Keyword Spotting with Attention Prajwal K R Liliane Momeni Triantafyllos Afouras Andrew Zisserman 72 13 0 29 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 99 43 0 15 Oct 2021
Advances and Challenges in Deep Lip Reading Marzieh Oghbaie Arian Sabaghi Kooshan Hashemifard Mohammad Akbari VLM 67 15 0 15 Oct 2021
Sub-word Level Lip Reading With Visual Attention Prajwal K R Triantafyllos Afouras Andrew Zisserman 91 93 0 14 Oct 2021
$Audio-Visual Speech Recognition is Worth 32$\times$32$\times$8 Voxels$ Audio-Visual Speech Recognition is Worth 32 $\times$ 32 $\times$ 8 Voxels Dmitriy Serdyuk Otavio Braga Olivier Siohan ViT 87 7 0 20 Sep 2021
Invertible Frowns: Video-to-Video Facial Emotion Translation Ian H. Magnusson Aruna Sankaranarayanan A. Lippman VGen 76 7 0 16 Sep 2021
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments Wentao Yu Steffen Zeiler D. Kolossa 88 3 0 10 Sep 2021
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory Zhijie Lin Zhou Zhao Haoyuan Li Jinglin Liu Meng Zhang Xingshan Zeng Xiaofei He 52 18 0 31 Aug 2021
Look Who's Talking: Active Speaker Detection in the Wild You Jin Kim Hee-Soo Heo Soyeon Choe Soo-Whan Chung Yoohwan Kwon Bong-Jin Lee Youngki Kwon Joon Son Chung 113 21 0 17 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person Xinsheng Wang Qicong Xie Jihua Zhu Lei Xie O. Scharenborg 120 19 0 09 Aug 2021
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading Shahd Elashmawy Marian M. Ramsis Hesham M. Eraqi Farah Eldeshnawy Hadeel Mabrouk Omar Abugabal Nourhan Sakr 77 1 0 07 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing Zhaofeng Shi 65 7 0 01 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations Seyun Um Jihyun Kim Jihyun Lee Hong-Goo Kang CVBM 139 4 0 26 Jul 2021
Parallel and High-Fidelity Text-to-Lip Generation Jinglin Liu Zhiying Zhu Yi Ren Wencan Huang Baoxing Huai N. Yuan Zhou Zhao 55 10 0 14 Jul 2021
Deep Learning Frameworks Applied For Audio-Visual Scene Classification L. D. Pham Alexander Schindler Mina Schütz Jasmin Lampert S. Schlarb Ross King 57 9 0 12 Jun 2021
Audio-visual scene classification: analysis of DCASE 2021 Challenge submissions Shanshan Wang Toni Heittola A. Mesaros Tuomas Virtanen 32 18 0 28 May 2021
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation Hao Zhou Wen-gang Zhou Weizhen Qi Junfu Pu Houqiang Li SLR 65 194 0 26 May 2021
Lip reading using external viseme decoding J. Peymanfard Mohammad Reza Mohammadi Hossein Zeinali N. Mozayani 48 11 0 10 Apr 2021
Context-self contrastive pretraining for crop type semantic segmentation Michail Tarasiou R. Güler Stefanos Zafeiriou SSL 63 17 0 09 Apr 2021
MPN: Multimodal Parallel Network for Audio-Visual Event Localization Jiashuo Yu Ying Cheng Rui Feng 73 14 0 07 Apr 2021
Contrastive Learning of Global-Local Video Representations Shuang Ma Zhaoyang Zeng Daniel J. McDuff Yale Song SSL 104 7 0 07 Apr 2021
Can audio-visual integration strengthen robustness under multimodal attacks? Yapeng Tian Chenliang Xu AAML 102 39 0 05 Apr 2021
Robust Audio-Visual Instance Discrimination Pedro Morgado Ishan Misra Nuno Vasconcelos SSL 115 110 0 29 Mar 2021
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation Jiyoung Lee Soo-Whan Chung Sunok Kim Hong-Goo Kang Kwanghoon Sohn 64 51 0 25 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 100 35 0 18 Mar 2021
KoDF: A Large-scale Korean DeepFake Detection Dataset Patrick Kwon J. You Gyuhyeon Nam Sungwoo Park Gyeongsu Chae 112 104 0 18 Mar 2021
End-to-end Audio-visual Speech Recognition with Conformers Pingchuan Ma Stavros Petridis Maja Pantic 160 234 0 12 Feb 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Guohao Li 130 52 0 11 Jan 2021
Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention Hang Chen Jun Du Yu Hu Lirong Dai Chin-Hui Lee Baocai Yin 64 6 0 28 Dec 2020
SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams Madina Abdrakhmanova Askat Kuzdeuov Sheikh Jarju Yerbolat Khassanov Michael Lewis H. A. Varol CVBM 61 58 0 05 Dec 2020
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature Movements M. Raghavendra P. Omprakash B. Mukesh Sowmya Kamath CVBM 24 2 0 04 Dec 2020
Disentangling Homophemes in Lip Reading using Perplexity Analysis Souheil Fenghour Daqing Chen Kun Guo Perry Xiao 43 3 0 28 Nov 2020
End-to-end Silent Speech Recognition with Acoustic Sensing Jian Luo Jianzong Wang Ning Cheng Guilin Jiang Jing Xiao 16 7 0 23 Nov 2020
TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos M. Ribeiro Jennifer Sanger Jingxuan Zhang Aciel Eshky A. Wrench Korin Richmond Steve Renals LM&MA 48 35 0 19 Nov 2020