Looking Enhances Listening: Recovering Missing Speech Using Images

13 February 2020

Papers citing "Looking Enhances Listening: Recovering Missing Speech Using Images"

10 / 10 papers shown

Title
VHASR: A Multimodal Speech Recognition System With Vision Hotwords Jiliang Hu Zuchao Li Ping Wang Haojun Ai Lefei Zhang Hai Zhao 62 1 0 01 Oct 2024
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 58 15 0 29 Mar 2023
Multimodal Speech Recognition for Language-Guided Embodied Agents Allen Chang Xiaoyuan Zhu Aarav Monga Seoho Ahn Tejas Srinivasan Jesse Thomason AuLLM 105 3 0 27 Feb 2023
AVATAR: Unconstrained Audiovisual Speech Recognition Valentin Gabeur Paul Hongsuck Seo Arsha Nagrani Chen Sun Alahari Karteek Cordelia Schmid 72 11 0 15 Jun 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations Dan Oneaţă H. Cucu 51 19 0 27 Apr 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey Ngoc Dung Huynh Mohamed Reda Bouadjenek Imran Razzak Kevin Lee Chetan Arora Ali Hassani A. Zaslavsky AAML 65 6 0 22 Feb 2022
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations Shahram Ghorbani Yashesh Gaur Yu Shi Jinyu Li 75 14 0 08 Nov 2020
Multimodal Speech Recognition with Unstructured Audio Masking Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott CVBM 48 10 0 16 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott 76 11 0 05 Oct 2020
Experience Grounds Language Yonatan Bisk Ari Holtzman Jesse Thomason Jacob Andreas Yoshua Bengio ... Angeliki Lazaridou Jonathan May Aleksandr Nisnevich Nicolas Pinto Joseph P. Turian 126 361 0 21 Apr 2020