Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

8 November 2020

Papers citing "Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations"

5 / 5 papers shown

Title
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 29 15 0 29 Mar 2023
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent? Pradip Pramanick Chayan Sarkar 24 7 0 21 Oct 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations Dan Oneaţă H. Cucu 19 19 0 27 Apr 2022
Recent Advances in End-to-End Automatic Speech Recognition Jinyu Li VLM 37 363 0 02 Nov 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation Jing Liu Xinxin Zhu Fei Liu Longteng Guo Zijia Zhao ... Weining Wang Hanqing Lu Shiyu Zhou Jiajun Zhang Jinqiao Wang 39 37 0 01 Jul 2021