Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.11477
Cited By
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
20 June 2020
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations"
50 / 187 papers shown
Title
An End-to-End Approach for Child Reading Assessment in the Xhosa Language
Sergio Chevtchenko
Nikhil Navas
Rafaella Vale
Franco Ubaudi
Sipumelele Lucwaba
Cally Ardington
Soheil Afshar
Mark Antoniou
Saeed Afshar
33
0
0
23 May 2025
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
95
0
0
23 May 2025
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
136
0
0
22 May 2025
LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding
Junlong Tong
Jinlan Fu
Zixuan Lin
Yingqi Fan
Anhao Zhao
Hui Su
Xiaoyu Shen
72
0
0
22 May 2025
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection
Chenxu Guo
Jiachen Lian
Xuanru Zhou
Jinming Zhang
Shuhe Li
...
Rian Bogley
Lisa Wauters
Zachary Miller
M. G. Tempini
Gopala Anumanchipalli
60
0
0
22 May 2025
"Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding
Alkis Koudounas
Claudio Savelli
Flavio Giobergia
Elena Baralis
MU
85
0
0
21 May 2025
Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
Yuxin Lin
Yinglin Zheng
Ming Zeng
Wangzheng Shi
66
0
0
19 May 2025
Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning
Kristin Qi
Jiali Cheng
Youxiang Zhu
Hadi Amiri
Xiaohui Liang
95
0
0
19 May 2025
Exploring the Potential of SSL Models for Sound Event Detection
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
59
0
0
17 May 2025
Multi-Stage Speaker Diarization for Noisy Classrooms
Ali Sartaz Khan
Tolulope Ogunremi
Ahmed Adel Attia
Dorottya Demszky
65
0
0
16 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
175
3
0
12 May 2025
TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models
Junyi Peng
Takanori Ashihara
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Shoko Araki
J. Černocký
ELM
87
0
0
10 May 2025
Miipher-2: A Universal Speech Restoration Model for Million-Hour Scale Data Restoration
Shigeki Karita
Yuma Koizumi
Heiga Zen
Haruko Ishikawa
Robin Scheibler
M. Bacchiani
VLM
341
1
0
07 May 2025
OT-Talk: Animating 3D Talking Head with Optimal Transportation
Xinmu Wang
Xiang Gao
Xiyun Song
Heather Yu
Zongfang Lin
Liang Peng
Xianfeng Gu
70
0
0
03 May 2025
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Dongxing Yu
89
0
0
03 May 2025
StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models
Yeona Hong
Hyewon Han
Woo-Jin Chung
Hong-Goo Kang
MQ
85
0
0
21 Apr 2025
Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis
Radek Daněček
Carolin Schmitt
Senya Polikovsky
Michael J. Black
83
0
0
18 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
102
0
0
17 Apr 2025
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
104
1
0
16 Apr 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
115
0
0
09 Apr 2025
Exploring Local Interpretable Model-Agnostic Explanations for Speech Emotion Recognition with Distribution-Shift
Maja J. Hjuler
Line H. Clemmensen
Sneha Das
FAtt
100
1
0
07 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
233
1
0
01 Apr 2025
Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection
Mahdi Amiri
Hatef Otroshi Shahreza
Ina Kodrasi
87
0
0
31 Mar 2025
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
Hyunjong Ok
Suho Yoo
Jaeho Lee
143
0
0
30 Mar 2025
STSA: Spatial-Temporal Semantic Alignment for Visual Dubbing
Zijun Ding
Mingdie Xiong
Congcong Zhu
Jingrun Chen
DiffM
105
0
0
29 Mar 2025
Dual Audio-Centric Modality Coupling for Talking Head Generation
Ao Fu
Ziqi Ni
Yi Zhou
109
1
0
26 Mar 2025
MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network
Vrushank Ahire
Kunal Shah
Mudasir Nazir Khan
Nikhil Pakhale
L. Sookha
M. A. Ganaie
Abhinav Dhall
106
0
0
16 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
116
1
0
15 Mar 2025
Lightweight Models for Emotional Analysis in Video
Quoc-Tien Nguyen
H. Nguyen
V. Huynh
83
0
0
13 Mar 2025
Heterogeneous bimodal attention fusion for speech emotion recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua Reiss
87
0
0
09 Mar 2025
Bimodal Connection Attention Fusion for Speech Emotion Recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua D. Reiss
89
0
0
08 Mar 2025
DGFM: Full Body Dance Generation Driven by Music Foundation Models
Xinran Liu
Zhenhua Feng
Diptesh Kanojia
Wenwu Wang
DiffM
124
1
0
27 Feb 2025
TimePFN: Effective Multivariate Time Series Forecasting with Synthetic Data
Ege Onur Taga
M. E. Ildiz
Samet Oymak
AI4TS
111
2
0
22 Feb 2025
NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance
Raphael T. Husistein
Markus Reiher
Marco Eckhoff
204
1
0
20 Feb 2025
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Borui Liao
Yulong Xu
Jiao Ou
Kaiyuan Yang
Weihua Jian
Pengfei Wan
Di Zhang
AuLLM
115
0
0
19 Feb 2025
Learn2Mix: Training Neural Networks Using Adaptive Data Integration
Shyam Venkatasubramanian
Vahid Tarokh
140
0
0
17 Feb 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
107
4
0
17 Feb 2025
Conformal Prediction Sets Can Cause Disparate Impact
Jesse C. Cresswell
Bhargava Kumar
Yi Sui
Mouloud Belbahri
FaML
494
1
0
17 Feb 2025
DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities
Xiangyu Lu
Wang Xu
Haoyu Wang
Hongyun Zhou
Haiyan Zhao
Conghui Zhu
Tiejun Zhao
M. Yang
Mamba
AuLLM
82
0
0
16 Feb 2025
On the Promise for Assurance of Differentiable Neurosymbolic Reasoning Paradigms
Luke E. Richards
Jessie Yaros
Jasen Babcock
Coung Ly
Robin Cosbey
Timothy Doster
Cynthia Matuszek
NAI
100
0
0
13 Feb 2025
Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content
Girish A. Koushik
Diptesh Kanojia
Helen Treharne
100
2
0
11 Feb 2025
Comprehensive Layer-wise Analysis of SSL Models for Audio Deepfake Detection
Yassine El Kheir
Youness Samih
Suraj Maharjan
Tim Polzehl
Sebastian Möller
123
1
0
05 Feb 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic
Korbinian Riedhammer
Tobias Bocklet
137
0
0
03 Feb 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
271
22
0
03 Feb 2025
Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Igor Abramovski
Alon Vinnikov
Shalev Shaer
Naoyuki Kanda
Xiaofei Wang
Amir Ivry
Eyal Krupka
92
0
0
28 Jan 2025
Boli: A dataset for understanding stuttering experience and analyzing stuttered speech
Ashita Batra
Mannas narang
Neeraj Kumar Sharma
Pradip K Das
76
0
0
27 Jan 2025
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
Shuqi Dai
Yunyun Wang
Roger B. Dannenberg
Zeyu Jin
DiffM
95
0
0
23 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
113
1
0
23 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
86
5
0
20 Jan 2025
How Redundant Is the Transformer Stack in Speech Representation Models?
Teresa Dorszewski
Albert Kjøller Jacobsen
Lenka Tětková
Lars Kai Hansen
145
0
0
20 Jan 2025
1
2
3
4
Next