Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.17395
Cited By
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
29 September 2023
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition"
35 / 35 papers shown
Title
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition
Andrew Rouditchenko
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
169
1
0
03 Feb 2025
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
96
6
0
22 Jul 2024
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
Maja Pantic
63
114
0
25 Mar 2023
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
89
14
0
17 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
119
34
0
10 Feb 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
201
3,732
0
06 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
61
13
0
06 Dec 2022
u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality
Wei-Ning Hsu
Bowen Shi
SSL
VLM
84
43
0
14 Jul 2022
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
Maja Pantic
VLM
165
150
0
26 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
97
859
0
07 Feb 2022
Self-supervised Learning with Random-projection Quantizer for Speech Recognition
Chung-Cheng Chiu
James Qin
Yu Zhang
Jiahui Yu
Yonghui Wu
SSL
90
169
0
03 Feb 2022
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
ViT
134
41
0
25 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
98
320
0
05 Jan 2022
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Björn W. Schuller
Maja Pantic
SSL
48
53
0
16 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
182
2,993
0
14 Jun 2021
CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings
Tatiana Likhomanenko
Qiantong Xu
Gabriel Synnaeve
R. Collobert
A. Rogozhnikov
OOD
ViT
80
59
0
06 Jun 2021
End-to-end Audio-visual Speech Recognition with Conformers
Pingchuan Ma
Stavros Petridis
Maja Pantic
131
233
0
12 Feb 2021
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
123
98
0
22 Oct 2020
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
James Qin
Daniel S. Park
Wei Han
Chung-Cheng Chiu
Ruoming Pang
Quoc V. Le
Yonghui Wu
VLM
SSL
204
310
0
20 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
297
5,837
0
20 Jun 2020
Self-Training for End-to-End Speech Translation
J. Pino
Qiantong Xu
Xutai Ma
M. Dousti
Yun Tang
74
60
0
03 Jun 2020
Iterative Pseudo-Labeling for Speech Recognition
Qiantong Xu
Tatiana Likhomanenko
Jacob Kahn
Awni Y. Hannun
Gabriel Synnaeve
R. Collobert
VLM
73
134
0
19 May 2020
ASR is all you need: cross-modal distillation for lip reading
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
51
135
0
28 Nov 2019
ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring
David Berthelot
Nicholas Carlini
E. D. Cubuk
Alexey Kurakin
Kihyuk Sohn
Han Zhang
Colin Raffel
97
684
0
21 Nov 2019
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Takaki Makino
H. Liao
Yannis Assael
Brendan Shillingford
Basi García
Otavio Braga
Olivier Siohan
76
130
0
08 Nov 2019
Revisiting Self-Training for Neural Sequence Generation
Junxian He
Jiatao Gu
Jiajun Shen
MarcÁurelio Ranzato
SSL
LRM
275
274
0
30 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
120
596
0
25 Sep 2019
Self-Training for End-to-End Speech Recognition
Jacob Kahn
Ann Lee
Awni Y. Hannun
SSL
58
236
0
19 Sep 2019
Deep Audio-Visual Speech Recognition
Triantafyllos Afouras
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
98
708
0
06 Sep 2018
LRS3-TED: a large-scale dataset for visual speech recognition
Triantafyllos Afouras
Joon Son Chung
Andrew Zisserman
64
445
0
03 Sep 2018
Large-Scale Visual Speech Recognition
Brendan Shillingford
Yannis Assael
Matthew W. Hoffman
T. Paine
Cían Hughes
...
Marie Mulville
Ben Coppin
Ben Laurie
A. Senior
Nando de Freitas
60
155
0
13 Jul 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
356
2,287
0
14 Jun 2018
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Taku Kudo
226
1,173
0
29 Apr 2018
Combining Residual Networks with LSTMs for Lipreading
Themos Stafylakis
Georgios Tzimiropoulos
VLM
79
309
0
12 Mar 2017
ModDrop: adaptive multi-modal gesture recognition
Natalia Neverova
Christian Wolf
Graham W. Taylor
Florian Nebout
100
296
0
31 Dec 2014
1