Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.05862
Cited By
v1
v2
v3
v4 (latest)
wav2vec: Unsupervised Pre-training for Speech Recognition
11 April 2019
Steffen Schneider
Alexei Baevski
R. Collobert
Michael Auli
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"wav2vec: Unsupervised Pre-training for Speech Recognition"
50 / 106 papers shown
Title
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
385
29
0
01 Jul 2025
SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction
Saurabh Agrawal
Raj Gohil
Gopal Kumar Agrawal
Vikram C M
Kushal Verma
31
0
0
02 Jun 2025
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
93
0
0
17 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
128
0
0
21 Apr 2025
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
Baptiste Chopin
Tashvik Dhamija
P. Balaji
Yaohui Wang
A. Dantcheva
DiffM
VGen
113
0
0
24 Feb 2025
Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models
Taj Jones-McCormick
Aukosh Jagannath
S. Sen
127
0
0
24 Feb 2025
On the Robust Approximation of ASR Metrics
Abdul Waheed
Hanin Atwany
Rita Singh
Bhiksha Raj
32
0
0
18 Feb 2025
Evaluation of Deep Audio Representations for Hearables
Fabian Gröger
Pascal Baumann
Ludovic Amruthalingam
Laurent Simon
Ruksana Giurda
Simone Lionetti
123
0
0
10 Feb 2025
WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
Rajath Rao
Adithya Ganesan
Oscar Kjell
Jonah Luby
Akshay Raghavan
...
B. Luft
Camilo Ruggero
Neville Ryant
R. Kotov
H. Andrew Schwartz
129
0
0
15 Jan 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
117
4
0
26 Dec 2024
Detecting Adversarial Examples
Furkan Mumcu
Yasin Yilmaz
AAML
60
2
0
22 Oct 2024
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Federico Nocentini
T. Besnier
Claudio Ferrari
Sylvain Arguillere
Stefano Berretti
Mohamed Daoudi
117
1
0
14 Oct 2024
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui
Hui Li
Yao Yao
Hao Zhu
Hanlin Shang
Kaihui Cheng
Hang Zhou
Siyu Zhu
Jingdong Wang
DiffM
VGen
108
29
0
10 Oct 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Yuanfeng Song
Di Jiang
86
2
0
29 Sep 2024
DreamHead: Learning Spatial-Temporal Correspondence via Hierarchical Diffusion for Audio-driven Talking Head Synthesis
Fa-Ting Hong
Yunfei Liu
Yu Li
Changyin Zhou
Fei Yu
D. Xu
DiffM
68
0
0
16 Sep 2024
What is lost in Normalization? Exploring Pitfalls in Multilingual ASR Model Evaluations
Kavya Manohar
Leena G Pillai
73
3
0
04 Sep 2024
CyberHost: Taming Audio-driven Avatar Diffusion Model with Region Codebook Attention
Gaojie Lin
Jianwen Jiang
Chao Liang
Tianyun Zhong
Jiaqi Yang
Yanbo Zheng
VGen
DiffM
142
19
0
03 Sep 2024
GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis
Yijie Jin
60
0
0
27 Aug 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
101
0
0
20 Aug 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
74
2
0
28 Jul 2024
STONE: Self-supervised Tonality Estimator
Yuexuan Kong
Vincent Lostanlen
Gabriel Meseguer-Brocal
Stella Wong
Mathieu Lagrange
Romain Hennequin
110
1
0
10 Jul 2024
Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation
Eungbeom Kim
Hantae Kim
Kyogu Lee
79
2
0
12 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
80
1
0
09 Jun 2024
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Siavash Shams
Sukru Samet Dindar
Xilin Jiang
N. Mesgarani
Mamba
124
22
0
20 May 2024
Learning to Generate Context-Sensitive Backchannel Smiles for Embodied AI Agents with Applications in Mental Health Dialogues
Maneesh Bilalpur
Mert Inan
Dorsa Zeinali
Jeffrey F. Cohn
Malihe Alikhani
103
1
0
13 Feb 2024
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
99
22
0
08 Feb 2024
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
103
12
0
13 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
153
20
0
27 Nov 2023
Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables
Ahmed Adel Attia
Yashish M. Siriwardena
Carol Espy-Wilson
SSL
65
8
0
17 Sep 2023
Indonesian Automatic Speech Recognition with XLSR-53
Panji Arisaputra
Amalia Zahra
45
8
0
20 Aug 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
111
40
0
20 Jul 2023
Multimodal Audio-textual Architecture for Robust Spoken Language Understanding
Anderson R. Avila
Mehdi Rezagholizadeh
Chao Xing
58
1
0
12 Jun 2023
PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Tiantian Feng
Shrikanth Narayanan
105
31
0
08 Jun 2023
HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders
Doyeon Kim
Soo-Whan Chung
Hyewon Han
Youna Ji
Hong-Goo Kang
66
7
0
02 Jun 2023
Duplex Diffusion Models Improve Speech-to-Speech Translation
Xianchao Wu
DiffM
83
5
0
22 May 2023
TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Tiantian Feng
Rajat Hebbar
Shrikanth Narayanan
69
7
0
18 May 2023
A multimodal dynamical variational autoencoder for audiovisual speech representation learning
Samir Sadok
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
Renaud Séguier
102
13
0
05 May 2023
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Yuchen Hu
Cheng Chen
Qiu-shi Zhu
Eng Siong Chng
122
16
0
11 Apr 2023
Exploring Representation Learning for Small-Footprint Keyword Spotting
Fan Cui
Liyong Guo
Quandong Wang
Peng Gao
Yujun Wang
SSL
96
3
0
20 Mar 2023
Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition
Zihan Zhao
Yu Wang
Yanfeng Wang
61
18
0
20 Feb 2023
Imitator: Personalized Speech-driven 3D Facial Animation
Balamurugan Thambiraja
I. Habibie
S. Aliakbarian
Darren Cosker
Christian Theobalt
Justus Thies
CVBM
79
52
0
30 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
65
18
0
16 Dec 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
111
38
0
21 Nov 2022
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu
Yang Zhang
Kaizhi Qian
Zhifan Ye
Zhongzhi Yu
Cheng-I Jeff Lai
Yingyan Lin
168
9
0
02 Nov 2022
Simple and Effective Unsupervised Speech Translation
Changhan Wang
Hirofumi Inaguma
Peng-Jen Chen
Ilia Kulikov
Yun Tang
Wei-Ning Hsu
Michael Auli
J. Pino
SSL
97
14
0
18 Oct 2022
CTCBERT: Advancing Hidden-unit BERT with CTC Objectives
Ruchao Fan
Yiming Wang
Yashesh Gaur
Jinyu Li
103
8
0
16 Oct 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
127
309
0
30 Sep 2022
Improving the Cross-Lingual Generalisation in Visual Question Answering
Farhad Nooralahzadeh
Rico Sennrich
89
6
0
07 Sep 2022
Equivariant Self-Supervision for Musical Tempo Estimation
Elio Quinton
85
9
0
03 Sep 2022
SampleMatch: Drum Sample Retrieval by Musical Context
Stefan Lattner
56
7
0
01 Aug 2022
1
2
3
Next