Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.04084
Cited By
Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
8 November 2020
Shahram Ghorbani
Yashesh Gaur
Yu Shi
Jinyu Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations"
5 / 5 papers shown
Title
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?
Pradip Pramanick
Chayan Sarkar
24
7
0
21 Oct 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
19
19
0
27 Apr 2022
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
37
363
0
02 Nov 2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation
Jing Liu
Xinxin Zhu
Fei Liu
Longteng Guo
Zijia Zhao
...
Weining Wang
Hanqing Lu
Shiyu Zhou
Jiajun Zhang
Jinqiao Wang
39
37
0
01 Jul 2021
1