ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.05453
  4. Cited By
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

12 October 2019
Alexei Baevski
Steffen Schneider
Michael Auli
    SSL
ArXivPDFHTML

Papers citing "vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations"

50 / 182 papers shown
Title
Exploring WavLM on Speech Enhancement
Exploring WavLM on Speech Enhancement
Hyungchan Song
Sanyuan Chen
Zhuo Chen
Yu-Huan Wu
Takuya Yoshioka
M. Tang
Jong Won Shin
Shujie Liu
19
17
0
18 Nov 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
32
13
0
17 Nov 2022
Introducing Semantics into Speech Encoders
Introducing Semantics into Speech Encoders
Derek Xu
Shuyan Dong
Changhan Wang
Suyoun Kim
Zhaojiang Lin
...
Alexei Baevski
Guan-Ting Lin
Hung-yi Lee
Yizhou Sun
Wei Wang
SSL
36
3
0
15 Nov 2022
Investigating Enhancements to Contrastive Predictive Coding for Human
  Activity Recognition
Investigating Enhancements to Contrastive Predictive Coding for Human Activity Recognition
H. Haresamudram
Irfan Essa
Thomas Ploetz
AI4TS
30
15
0
11 Nov 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASR
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
30
2
0
04 Nov 2022
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge
  Distillation
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation
Liyong Guo
Xiaoyu Yang
Quandong Wang
Yuxiang Kong
Zengwei Yao
...
Wei Kang
Long Lin
Mingshuang Luo
Piotr Żelasko
Daniel Povey
VLM
38
7
0
31 Oct 2022
Improved acoustic-to-articulatory inversion using representations from
  pretrained self-supervised learning models
Improved acoustic-to-articulatory inversion using representations from pretrained self-supervised learning models
Sathvik Udupa
Siddarth C
P. Ghosh
27
7
0
30 Oct 2022
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired
  Speech and Text
token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text
Xianghu Yue
Junyi Ao
Xiaoxue Gao
Haizhou Li
SSL
26
8
0
30 Oct 2022
Learning Dependencies of Discrete Speech Representations with Neural
  Hidden Markov Models
Learning Dependencies of Discrete Speech Representations with Neural Hidden Markov Models
Sung-Lin Yeh
Hao Tang
SSL
BDL
35
1
0
29 Oct 2022
Fast Yet Effective Speech Emotion Recognition with Self-distillation
Fast Yet Effective Speech Emotion Recognition with Self-distillation
Zhao Ren
Thanh Tam Nguyen
Yi Chang
Björn W. Schuller
23
11
0
26 Oct 2022
Multilevel Transformer For Multimodal Emotion Recognition
Multilevel Transformer For Multimodal Emotion Recognition
Junyi He
Meimei Wu
Meng Li
Xiaobo Zhu
Feng Ye
15
6
0
26 Oct 2022
Audio MFCC-gram Transformers for respiratory insufficiency detection in
  COVID-19
Audio MFCC-gram Transformers for respiratory insufficiency detection in COVID-19
M. Gauy
Marcelo Finger
24
7
0
25 Oct 2022
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Chen Wang
Yuchen Liu
Boxing Chen
Jiajun Zhang
Wei Luo
Zhongqiang Huang
Chengqing Zong
39
10
0
18 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
36
33
0
16 Oct 2022
Learning Invariant Representation and Risk Minimized for Unsupervised
  Accent Domain Adaptation
Learning Invariant Representation and Risk Minimized for Unsupervised Accent Domain Adaptation
Chendong Zhao
Jianzong Wang
Xiaoyang Qu
Haoqian Wang
Jing Xiao
SSL
38
1
0
15 Oct 2022
JOIST: A Joint Speech and Text Streaming Model For ASR
JOIST: A Joint Speech and Text Streaming Model For ASR
Tara N. Sainath
Rohit Prabhavalkar
Ankur Bapna
Yu Zhang
Zhouyuan Huo
Zhehuai Chen
Bo-wen Li
Weiran Wang
Trevor Strohman
RALM
AuLLM
53
35
0
13 Oct 2022
On the Utility of Self-supervised Models for Prosody-related Tasks
On the Utility of Self-supervised Models for Prosody-related Tasks
Guan-Ting Lin
Chiyu Feng
Wei-Ping Huang
Yuan Tseng
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Nigel G. Ward
23
48
0
13 Oct 2022
A context-aware knowledge transferring strategy for CTC-based ASR
A context-aware knowledge transferring strategy for CTC-based ASR
Keda Lu
Kuan-Yu Chen
15
15
0
12 Oct 2022
CoBERT: Self-Supervised Speech Representation Learning Through Code
  Representation Learning
CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning
Chutong Meng
Junyi Ao
Tom Ko
Mingxuan Wang
Haizhou Li
SSL
47
6
0
08 Oct 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
  Based Speech-Text Pre-training
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Zi-Hua Zhang
Long Zhou
Junyi Ao
Shujie Liu
Lirong Dai
Jinyu Li
Furu Wei
61
57
0
07 Oct 2022
Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with
  Hierarchical Neural Embeddings
Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Tenglong Ao
Qingzhe Gao
Yuke Lou
Baoquan Chen
Libin Liu
SLR
32
59
0
04 Oct 2022
The Efficacy of Self-Supervised Speech Models for Audio Representations
The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu
Chen An Li
Tzu-Han Lin
Tsung-Yuan Hsu
Hung-yi Lee
32
5
0
26 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
73
573
0
07 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
Equivariant Self-Supervision for Musical Tempo Estimation
Elio Quinton
37
9
0
03 Sep 2022
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based
  on Generative Adversarial Network
Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network
Da-Rong Liu
Po-Chun Hsu
Yi-Chen Chen
Sung-Feng Huang
Shun-Po Chuang
Da-Yi Wu
Hung-yi Lee
GAN
28
7
0
29 Jul 2022
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
A. I. S. Ferreira
Gustavo dos Reis Oliveira
27
3
0
29 Jul 2022
Towards Proper Contrastive Self-supervised Learning Strategies For Music
  Audio Representation
Towards Proper Contrastive Self-supervised Learning Strategies For Music Audio Representation
Jeong-Eun Choi
Seongwon Jang
Hyunsouk Cho
Sehee Chung
SSL
16
6
0
10 Jul 2022
Cross-Scale Vector Quantization for Scalable Neural Speech Coding
Cross-Scale Vector Quantization for Scalable Neural Speech Coding
Xue Jiang
Xiulian Peng
Huaying Xue
Yuan Zhang
Yan Lu
MQ
39
9
0
07 Jul 2022
Wav2Vec-Aug: Improved self-supervised training with limited data
Wav2Vec-Aug: Improved self-supervised training with limited data
Anuroop Sriram
Michael Auli
Alexei Baevski
SSL
VLM
22
15
0
27 Jun 2022
Distilling a Pretrained Language Model to a Multilingual ASR Model
Distilling a Pretrained Language Model to a Multilingual ASR Model
Kwanghee Choi
Hyung-Min Park
VLM
31
11
0
25 Jun 2022
Supervision-Guided Codebooks for Masked Prediction in Speech
  Pre-training
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
30
18
0
21 Jun 2022
Investigation of Ensemble features of Self-Supervised Pretrained Models
  for Automatic Speech Recognition
Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition
Anjana Arunkumar
Vrunda N. Sukhadia
S. Umesh
30
10
0
11 Jun 2022
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Joint Encoder-Decoder Self-Supervised Pre-training for ASR
Arunkumar A
S. Umesh
SSL
34
8
0
09 Jun 2022
Speaker Identification using Speech Recognition
Speaker Identification using Speech Recognition
Syeda Rabia Arshad
Syed Mujtaba Haider
Abdul Basit Mughal
28
1
0
29 May 2022
Contrastive Siamese Network for Semi-supervised Speech Recognition
Contrastive Siamese Network for Semi-supervised Speech Recognition
S. Khorram
Jaeyoung Kim
Anshuman Tripathi
Han Lu
Qian Zhang
Hasim Sak
SSL
31
11
0
27 May 2022
Joint Training of Speech Enhancement and Self-supervised Model for
  Noise-robust ASR
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie Zhang
Zitian Zhang
Lirong Dai
43
15
0
26 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
352
0
21 May 2022
Clinical outcome prediction under hypothetical interventions -- a
  representation learning framework for counterfactual reasoning
Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning
Yikuan Li
M. Mamouei
Shishir Rao
A. Hassaine
D. Canoy
Thomas Lukasiewicz
K. Rahimi
G. Salimi-Khorshidi
OOD
CML
AI4CE
31
1
0
15 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to
  Store Speaker Information
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
31
8
0
08 May 2022
Quantifying Language Variation Acoustically with Few Resources
Quantifying Language Variation Acoustically with Few Resources
Martijn Bartelds
Martijn B. Wieling
27
11
0
05 May 2022
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
  Languages
Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages
Felix Wu
Kwangyoun Kim
Shinji Watanabe
Kyu Jeong Han
Ryan T. McDonald
Kilian Q. Weinberger
Yoav Artzi
SyDa
48
38
0
02 May 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
30
110
0
20 Apr 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An
  Extensive Benchmark on Air Traffic Control Communications
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
29
44
0
31 Mar 2022
Autoregressive Co-Training for Learning Discrete Speech Representations
Autoregressive Co-Training for Learning Discrete Speech Representations
Sung-Lin Yeh
Hao Tang
SSL
27
6
0
29 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
26
109
0
14 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
40
106
0
02 Mar 2022
A Brief Overview of Unsupervised Neural Speech Representation Learning
A Brief Overview of Unsupervised Neural Speech Representation Learning
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
Lars Maaløe
Christian Igel
BDL
AI4TS
SSL
19
11
0
01 Mar 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and
  Self-Supervised Scoring
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring
Herman Kamper
34
25
0
24 Feb 2022
Improving Automatic Speech Recognition for Non-Native English with
  Transfer Learning and Language Model Decoding
Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding
Peter Sullivan
Toshiko Shibano
Muhammad Abdul-Mageed
44
11
0
10 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual
  Grounding and Masked Language Modeling
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David Harwath
SSL
43
26
0
07 Feb 2022
Previous
1234
Next