Visual Keyword Spotting with Attention

Visual Keyword Spotting with Attention

29 October 2021

Triantafyllos Afouras

Andrew Zisserman

Papers citing "Visual Keyword Spotting with Attention"

14 / 14 papers shown

Title
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation J. Choi Ji-Hoon Kim Kim Sung-Bin Tae-Hyun Oh Joon Son Chung DiffM 49 0 0 29 Apr 2025
Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task Learning Yi He Lei Yang Shilin Wang 61 0 0 05 Mar 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation Akam Rahimi Triantafyllos Afouras Andrew Zisserman 42 28 0 02 Jan 2025
Towards Accurate Lip-to-Speech Synthesis in-the-Wild Sindhu B. Hegde Rudrabha Mukhopadhyay C. V. Jawahar Vinay P. Namboodiri 27 4 0 02 Mar 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition David Gimeno-Gómez Carlos David Martínez Hinarejos 32 1 0 20 Feb 2024
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords Yong-Hyeok Lee Namhyun Cho 24 18 0 31 Aug 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio Max Bain Jaesung Huh Tengda Han Andrew Zisserman 45 210 0 01 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video Minsu Kim Chae Won Kim Y. Ro CVBM DiffM 38 3 0 27 Feb 2023
Weakly-supervised Fingerspelling Recognition in British Sign Language Videos Prajwal K R Hannah Bull Liliane Momeni Samuel Albanie Gül Varol Andrew Zisserman 29 14 0 16 Nov 2022
Automatic dense annotation of large-vocabulary sign language videos Liliane Momeni Hannah Bull Prajwal K R Samuel Albanie Gül Varol Andrew Zisserman SLR 32 18 0 04 Aug 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language Otniel-Bogdan Mercea Lukas Riesch A. Sophia Koepke Zeynep Akata 33 48 0 07 Mar 2022
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 283 1,989 0 09 Feb 2021
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 185 784 0 16 Nov 2016
Trainable Frontend For Robust and Far-Field Keyword Spotting Yuxuan Wang Pascal Getreuer Thad Hughes R. Lyon Rif A. Saurous 61 142 0 19 Jul 2016