Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.14946
Cited By
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
23 October 2023
Joanna Hong
Se Jin Park
Y. Ro
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model"
14 / 14 papers shown
Title
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
191
1
0
03 Feb 2025
Tailored Design of Audio-Visual Speech Recognition Models using Branchformers
David Gimeno-Gómez
Carlos David Martínez Hinarejos
141
2
0
09 Jul 2024
Learning Cross-lingual Visual Speech Representations
Andreas Zinonos
A. Haliassos
Pingchuan Ma
Stavros Petridis
Maja Pantic
SSL
45
8
0
14 Mar 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
79
25
0
09 Mar 2023
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Mohamed Anwar
Bowen Shi
Vedanuj Goswami
Wei-Ning Hsu
J. Pino
Changhan Wang
71
39
0
01 Mar 2023
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
77
57
0
17 Nov 2022
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Minsu Kim
Jeong Hun Yeo
Yong Man Ro
67
64
0
04 Apr 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
113
321
0
05 Jan 2022
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
Maja Pantic
143
234
0
12 Feb 2021
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
301
5,849
0
20 Jun 2020
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
98
710
0
06 Sep 2018
LRS3-TED: a large-scale dataset for visual speech recognition
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
67
445
0
03 Sep 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
356
2,287
0
14 Jun 2018
Deep Multimodal Learning for Audio-Visual Speech Recognition
Youssef Mroueh
E. Marcheret
Vaibhava Goel
62
227
0
22 Jan 2015
1