ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.07684
  4. Cited By
AVATAR: Unconstrained Audiovisual Speech Recognition

AVATAR: Unconstrained Audiovisual Speech Recognition

15 June 2022
Valentin Gabeur
Paul Hongsuck Seo
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
ArXivPDFHTML

Papers citing "AVATAR: Unconstrained Audiovisual Speech Recognition"

13 / 13 papers shown
Title
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides
Jinghua Zhao
Yuhang Jia
Shiyao Wang
Jiaming Zhou
Hui Wang
Yong Qin
39
0
0
21 Apr 2025
Character-aware audio-visual subtitling in context
Character-aware audio-visual subtitling in context
Jaesung Huh
Andrew Zisserman
41
0
0
14 Oct 2024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Yihan Wu
Yifan Peng
Yichen Lu
Xuankai Chang
Ruihua Song
Shinji Watanabe
52
2
0
19 Sep 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
46
2
0
01 Aug 2024
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus
Haoxu Wang
Fan Yu
Xian Shi
Yuezhang Wang
Shiliang Zhang
Ming Li
37
11
0
11 Sep 2023
An Outlook into the Future of Egocentric Vision
An Outlook into the Future of Egocentric Vision
Chiara Plizzari
Gabriele Goletto
Antonino Furnari
Siddhant Bansal
Francesco Ragusa
G. Farinella
Dima Damen
Tatiana Tommasi
EgoV
40
38
0
14 Aug 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic
  Speech Recognition
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
42
0
0
31 May 2023
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot
  Task Generalization
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Puyuan Peng
Brian Yan
Shinji Watanabe
David Harwath
VLM
LRM
40
46
0
18 May 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
  AV-ASR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
29
15
0
29 Mar 2023
AVATAR submission to the Ego4D AV Transcription Challenge
AVATAR submission to the Ego4D AV Transcription Challenge
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
30
0
0
18 Nov 2022
End-to-end Audio-visual Speech Recognition with Conformers
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
M. Pantic
86
225
0
12 Feb 2021
Lip Reading Sentences in the Wild
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
185
784
0
16 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,748
0
26 Sep 2016
1