AVATAR: Unconstrained Audiovisual Speech Recognition

15 June 2022

Valentin Gabeur

Paul Hongsuck Seo

Alahari Karteek

Cordelia Schmid

Papers citing "AVATAR: Unconstrained Audiovisual Speech Recognition"

13 / 13 papers shown

Title
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides Jinghua Zhao Yuhang Jia Shiyao Wang Jiaming Zhou Hui Wang Yong Qin 39 0 0 21 Apr 2025
Character-aware audio-visual subtitling in context Jaesung Huh Andrew Zisserman 41 0 0 14 Oct 2024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Yihan Wu Yifan Peng Yichen Lu Xuankai Chang Ruihua Song Shinji Watanabe 52 2 0 19 Sep 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data Yichen Lu Álvaro Huertas-García Xuankai Chang Hengwei Bian Soumi Maiti Shinji Watanabe 46 2 0 01 Aug 2024
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus Haoxu Wang Fan Yu Xian Shi Yuezhang Wang Shiliang Zhang Ming Li 37 11 0 11 Sep 2023
An Outlook into the Future of Egocentric Vision Chiara Plizzari Gabriele Goletto Antonino Furnari Siddhant Bansal Francesco Ragusa G. Farinella Dima Damen Tatiana Tommasi EgoV 40 38 0 14 Aug 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition Ziyi Ni Minglun Han Feilong Chen Linghui Meng Jing Shi Shuang Xu Bo Xu 42 0 0 31 May 2023
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization Puyuan Peng Brian Yan Shinji Watanabe David Harwath VLM LRM 40 46 0 18 May 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 29 15 0 29 Mar 2023
AVATAR submission to the Ego4D AV Transcription Challenge Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid 30 0 0 18 Nov 2022
End-to-end Audio-visual Speech Recognition with Conformers Pingchuan Ma Stavros Petridis M. Pantic 86 225 0 12 Feb 2021
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 185 784 0 16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,748 0 26 Sep 2016