Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.16221
Cited By
v1
v2 (latest)
SSDM: Scalable Speech Dysfluency Modeling
29 August 2024
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SSDM: Scalable Speech Dysfluency Modeling"
39 / 39 papers shown
Title
Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection
Chenxu Guo
Jiachen Lian
Xuanru Zhou
Jinming Zhang
Shuhe Li
...
Rian Bogley
Lisa Wauters
Zachary Miller
M. G. Tempini
Gopala Anumanchipalli
72
1
0
22 May 2025
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
89
9
0
16 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
81
27
0
04 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
97
58
0
30 Sep 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
59
63
0
18 Sep 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
94
333
0
11 Jun 2023
A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem
Sebastian P. Bayerl
Dominik Wagner
Ilja Baumann
Florian Honig
Tobias Bocklet
Elmar Nöth
Korbinian Riedhammer
77
12
0
30 May 2023
Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling
Theodoros Kouzelis
Georgios Paraskevopoulos
Athanasios Katsamanis
Vassilis Katsouros
74
10
0
30 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Alexander H. Liu
Heng-Jui Chang
Michael Auli
Wei-Ning Hsu
James R. Glass
49
26
0
17 May 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
108
34
0
10 Feb 2023
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
70
46
0
17 Nov 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
82
309
0
30 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
157
608
0
07 Sep 2022
FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau
Min Ma
Simran Khanuja
Yu Zhang
Vera Axelrod
Siddharth Dalmia
Jason Riesa
Clara E. Rivera
Ankur Bapna
VLM
138
328
0
25 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
107
221
0
09 May 2022
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
61
112
0
20 Apr 2022
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition
Jiachen Lian
A. Black
Louis Goldstein
Gopala Krishna Anumanchipalli
67
19
0
01 Apr 2022
Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass
Olabanji Shonibare
Xiaosu Tong
Venkatesh Ravichandran
68
28
0
08 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
97
859
0
07 Feb 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
228
412
0
04 Dec 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
252
1,896
0
26 Oct 2021
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Heng-Jui Chang
Shu-Wen Yang
Hung-yi Lee
SSL
104
174
0
05 Oct 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
Yu Zhang
Daniel S. Park
Wei Han
James Qin
Anmol Gulati
...
Zhifeng Chen
Quoc V. Le
Chung-Cheng Chiu
Ruoming Pang
Yonghui Wu
SSL
67
175
0
27 Sep 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
477
10,367
0
17 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
180
2,989
0
14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
128
894
0
11 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
694
6,121
0
29 Apr 2021
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
125
880
0
05 Apr 2021
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
371
6,833
0
13 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
105
1,401
0
08 Jun 2020
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
70
186
0
05 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
100
492
0
22 May 2020
VFlow: More Expressive Generative Flows with Variational Data Augmentation
Jianfei Chen
Cheng Lu
Biqi Chenli
Jun Zhu
Tian Tian
DRL
60
63
0
22 Feb 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
372
18,778
0
13 Feb 2020
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
104
956
0
05 Apr 2019
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma
Prafulla Dhariwal
BDL
DRL
295
3,138
0
09 Jul 2018
Soft-DTW: a Differentiable Loss Function for Time-Series
Marco Cuturi
Mathieu Blondel
AI4TS
169
627
0
05 Mar 2017
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
342
5,372
0
03 Nov 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
321
20,070
0
07 Oct 2016
1