ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.15957
  4. Cited By
Visual Keyword Spotting with Attention

Visual Keyword Spotting with Attention

29 October 2021
Prajwal K R
Liliane Momeni
Triantafyllos Afouras
Andrew Zisserman
ArXivPDFHTML

Papers citing "Visual Keyword Spotting with Attention"

14 / 14 papers shown
Title
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
J. Choi
Ji-Hoon Kim
Kim Sung-Bin
Tae-Hyun Oh
Joon Son Chung
DiffM
49
0
0
29 Apr 2025
Enhancing Visual Forced Alignment with Local Context-Aware Feature Extraction and Multi-Task Learning
Yi He
Lei Yang
Shilin Wang
61
0
0
05 Mar 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
42
28
0
02 Jan 2025
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
27
4
0
02 Mar 2024
Comparison of Conventional Hybrid and CTC/Attention Decoders for
  Continuous Visual Speech Recognition
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition
David Gimeno-Gómez
Carlos David Martínez Hinarejos
32
1
0
20 Feb 2024
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined
  Keywords
PhonMatchNet: Phoneme-Guided Zero-Shot Keyword Spotting for User-Defined Keywords
Yong-Hyeok Lee
Namhyun Cho
24
18
0
31 Aug 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
45
210
0
01 Mar 2023
Deep Visual Forced Alignment: Learning to Align Transcription with
  Talking Face Video
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video
Minsu Kim
Chae Won Kim
Y. Ro
CVBM
DiffM
38
3
0
27 Feb 2023
Weakly-supervised Fingerspelling Recognition in British Sign Language
  Videos
Weakly-supervised Fingerspelling Recognition in British Sign Language Videos
Prajwal K R
Hannah Bull
Liliane Momeni
Samuel Albanie
Gül Varol
Andrew Zisserman
29
14
0
16 Nov 2022
Automatic dense annotation of large-vocabulary sign language videos
Automatic dense annotation of large-vocabulary sign language videos
Liliane Momeni
Hannah Bull
Prajwal K R
Samuel Albanie
Gül Varol
Andrew Zisserman
SLR
32
18
0
04 Aug 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention
  and Language
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
33
48
0
07 Mar 2022
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
1,989
0
09 Feb 2021
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
Trainable Frontend For Robust and Far-Field Keyword Spotting
Trainable Frontend For Robust and Far-Field Keyword Spotting
Yuxuan Wang
Pascal Getreuer
Thad Hughes
R. Lyon
Rif A. Saurous
61
142
0
19 Jul 2016
1