ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1904.05862
  4. Cited By
wav2vec: Unsupervised Pre-training for Speech Recognition

wav2vec: Unsupervised Pre-training for Speech Recognition

11 April 2019
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
    SSL
ArXivPDFHTML

Papers citing "wav2vec: Unsupervised Pre-training for Speech Recognition"

50 / 88 papers shown
Title
Evaluation of Deep Audio Representations for Hearables
Evaluation of Deep Audio Representations for Hearables
Fabian Gröger
Pascal Baumann
Ludovic Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
93
0
0
10 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
90
0
0
05 Feb 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
194
12
0
03 Feb 2025
Enhancing and Exploring Mild Cognitive Impairment Detection with W2V-BERT-2.0
Yueguan Wang
Tatsunari Matsushima
Soichiro Matsushima
Toshimitsu Sakai
43
0
0
28 Jan 2025
A Probabilistic Model for Self-Supervised Learning
A Probabilistic Model for Self-Supervised Learning
Maximilian Fleissner
P. Esser
D. Ghoshdastidar
SSL
BDL
104
1
0
22 Jan 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao
Adithya Ganesan
Oscar Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. Andrew Schwartz
42
0
0
15 Jan 2025
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Federico Nocentini
T. Besnier
Claudio Ferrari
Sylvain Arguillere
Stefano Berretti
Mohamed Daoudi
59
1
0
14 Oct 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image
  Animation
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffM
VGen
51
22
0
10 Oct 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Yuanfeng Song
Di Jiang
44
2
0
29 Sep 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Yufeng Yang
Desh Raj
Ju Lin
Niko Moritz
Junteng Jia
...
Egor Lakomkin
Yiteng Huang
Jacob Donley
Jay Mahadeokar
Ozlem Kalinli
39
2
0
17 Sep 2024
Universal Pooling Method of Multi-layer Features from Pretrained Models
  for Speaker Verification
Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification
Jin Sob Kim
Hyun Joon Park
Wooseok Shin
Sung Won Han
SLR
52
0
0
12 Sep 2024
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR
  Model Evaluations
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations
Kavya Manohar
Leena G Pillai
39
3
0
04 Sep 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
54
0
0
20 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
47
4
0
21 Jul 2024
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable
  Landmark Conditions
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
Zhiyuan Chen
Jiajiong Cao
Zhiquan Chen
Yuming Li
Chenguang Ma
VGen
45
49
0
11 Jul 2024
STONE: Self-supervised Tonality Estimator
STONE: Self-supervised Tonality Estimator
Yuexuan Kong
Vincent Lostanlen
Gabriel Meseguer-Brocal
Stella Wong
Mathieu Lagrange
Romain Hennequin
47
1
0
10 Jul 2024
Refining Self-Supervised Learnt Speech Representation using Brain
  Activations
Refining Self-Supervised Learnt Speech Representation using Brain Activations
Hengyu Li
Kangdi Mei
Zhaoci Liu
Yang Ai
Liping Chen
Jie Zhang
Zhenhua Ling
SSL
29
1
0
12 Jun 2024
PolySpeech: Exploring Unified Multitask Speech Models for
  Competitiveness with Single-task Models
PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models
Runyan Yang
Huibao Yang
Xiqing Zhang
Tiantian Ye
Ying Liu
Yingying Gao
Shilei Zhang
Chao Deng
Junlan Feng
36
0
0
12 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
62
1
0
09 Jun 2024
Towards Supervised Performance on Speaker Verification with
  Self-Supervised Learning by Leveraging Large-Scale ASR Models
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Victor Miara
Theo Lepage
Reda Dehak
37
1
0
04 Jun 2024
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Siavash Shams
Sukru Samet Dindar
Xilin Jiang
N. Mesgarani
Mamba
74
19
0
20 May 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss
  Weighting
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting
Shreyan Ganguly
Roshan Nayak
Rakshith Rao
Ujan Deb
AP Prathosh
32
1
0
11 May 2024
Active Dendrites Enable Efficient Continual Learning in
  Time-To-First-Spike Neural Networks
Active Dendrites Enable Efficient Continual Learning in Time-To-First-Spike Neural Networks
Lorenzo Pes
Rick Luiken
Federico Corradi
Charlotte Frenkel
40
5
0
30 Apr 2024
It's Never Too Late: Fusing Acoustic Information into Large Language
  Models for Automatic Speech Recognition
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
41
20
0
08 Feb 2024
Self-supervised Learning for Electroencephalogram: A Systematic Survey
Self-supervised Learning for Electroencephalogram: A Systematic Survey
Weining Weng
Yang Gu
Shuai Guo
Yuan Ma
Zhaohua Yang
Yuchen Liu
Yiqiang Chen
38
12
0
09 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
27
92
0
23 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
38
1
0
18 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
34
9
0
13 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
35
17
0
27 Nov 2023
Indonesian Automatic Speech Recognition with XLSR-53
Indonesian Automatic Speech Recognition with XLSR-53
Panji Arisaputra
Amalia Zahra
21
6
0
20 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
47
224
0
10 Aug 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
38
35
0
20 Jul 2023
Multimodal Audio-textual Architecture for Robust Spoken Language
  Understanding
Multimodal Audio-textual Architecture for Robust Spoken Language Understanding
Anderson R. Avila
Mehdi Rezagholizadeh
Chao Xing
21
1
0
12 Jun 2023
Universal Source Separation with Weakly Labelled Data
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
K. Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization
Peng Lu
Ahmad Rashid
I. Kobyzev
Mehdi Rezagholizadeh
Philippe Langlais
13
0
0
08 May 2023
A multimodal dynamical variational autoencoder for audiovisual speech
  representation learning
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
38
11
0
05 May 2023
Exploring Representation Learning for Small-Footprint Keyword Spotting
Exploring Representation Learning for Small-Footprint Keyword Spotting
Fan Cui
Liyong Guo
Quandong Wang
Peng Gao
Yujun Wang
SSL
22
3
0
20 Mar 2023
Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
Zihan Zhao
Yu Wang
Yanfeng Wang
22
18
0
20 Feb 2023
Imitator: Personalized Speech-driven 3D Facial Animation
Imitator: Personalized Speech-driven 3D Facial Animation
Balamurugan Thambiraja
I. Habibie
S. Aliakbarian
Darren Cosker
Christian Theobalt
Justus Thies
CVBM
47
49
0
30 Dec 2022
Biased Self-supervised learning for ASR
Biased Self-supervised learning for ASR
Florian Kreyssig
Yangyang Shi
Jinxi Guo
Leda Sari
Abdel-rahman Mohamed
P. Woodland
SSL
35
2
0
04 Nov 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
43
7
0
16 Oct 2022
AudioGen: Textually Guided Audio Generation
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
27
290
0
30 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question Answering
Improving the Cross-Lingual Generalisation in Visual Question Answering
Farhad Nooralahzadeh
Rico Sennrich
37
5
0
07 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
Equivariant Self-Supervision for Musical Tempo Estimation
Elio Quinton
42
9
0
03 Sep 2022
SampleMatch: Drum Sample Retrieval by Musical Context
SampleMatch: Drum Sample Retrieval by Musical Context
Stefan Lattner
32
7
0
01 Aug 2022
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge
A. I. S. Ferreira
Gustavo dos Reis Oliveira
27
3
0
29 Jul 2022
Comparison of Speech Representations for the MOS Prediction System
Comparison of Speech Representations for the MOS Prediction System
A. Kunikoshi
Jaebok Kim
Won-Suk Jun
K. Sjölander
16
1
0
28 Jun 2022
Distilling a Pretrained Language Model to a Multilingual ASR Model
Distilling a Pretrained Language Model to a Multilingual ASR Model
Kwanghee Choi
Hyung-Min Park
VLM
33
11
0
25 Jun 2022
Self-supervised models of audio effectively explain human cortical
  responses to speech
Self-supervised models of audio effectively explain human cortical responses to speech
Aditya R. Vaidya
Shailee Jain
Alexander G. Huth
33
42
0
27 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
354
0
21 May 2022
12
Next