ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.05639
  4. Cited By
Looking Enhances Listening: Recovering Missing Speech Using Images

Looking Enhances Listening: Recovering Missing Speech Using Images

13 February 2020
Tejas Srinivasan
Ramon Sanabria
Florian Metze
ArXiv (abs)PDFHTML

Papers citing "Looking Enhances Listening: Recovering Missing Speech Using Images"

10 / 10 papers shown
Title
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
Jiliang Hu
Zuchao Li
Ping Wang
Haojun Ai
Lefei Zhang
Hai Zhao
62
1
0
01 Oct 2024
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
58
15
0
29 Mar 2023
Multimodal Speech Recognition for Language-Guided Embodied Agents
Multimodal Speech Recognition for Language-Guided Embodied Agents
Allen Chang
Xiaoyuan Zhu
Aarav Monga
Seoho Ahn
Tejas Srinivasan
Jesse Thomason
AuLLM
105
3
0
27 Feb 2023
AVATAR: Unconstrained Audiovisual Speech Recognition
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur
Paul Hongsuck Seo
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
72
11
0
15 Jun 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
51
19
0
27 Apr 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical
  Applications: A Survey
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Imran Razzak
Kevin Lee
Chetan Arora
Ali Hassani
A. Zaslavsky
AAML
65
6
0
22 Feb 2022
Listen, Look and Deliberate: Visual context-aware speech recognition
  using pre-trained text-video representations
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
75
14
0
08 Nov 2020
Multimodal Speech Recognition with Unstructured Audio Masking
Multimodal Speech Recognition with Unstructured Audio Masking
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
CVBM
48
10
0
16 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
76
11
0
05 Oct 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
126
361
0
21 Apr 2020
1