ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.03929
  4. Cited By
Comparative layer-wise analysis of self-supervised speech models

Comparative layer-wise analysis of self-supervised speech models

8 November 2022
Ankita Pasad
Bowen Shi
Karen Livescu
    SSL
ArXivPDFHTML

Papers citing "Comparative layer-wise analysis of self-supervised speech models"

50 / 87 papers shown
Title
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
52
2
0
11 Apr 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
50
1
0
21 Mar 2025
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Wupeng Wang
Zexu Pan
Jingru Lin
Shuai Wang
Haizhou Li
53
0
0
16 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
52
0
0
13 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xinbing Wang
Mingqi Jiang
Z. Ma
Ziyu Zhang
S. Liu
...
Zhifei Li
Xie Chen
Lei Xie
Y. Guo
Wei Xue
81
12
0
03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
52
0
0
02 Mar 2025
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Why disentanglement-based speaker anonymization systems fail at preserving emotions?
Ünal Ege Gaznepoglu
Nils Peters
83
0
0
22 Jan 2025
How Redundant Is the Transformer Stack in Speech Representation Models?
How Redundant Is the Transformer Stack in Speech Representation Models?
Teresa Dorszewski
Albert Kjøller Jacobsen
Lenka Tětková
Lars Kai Hansen
107
0
0
20 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models
Towards Unsupervised Speech Recognition Without Pronunciation Models
Junrui Ni
Liming Wang
Yang Zhang
Kaizhi Qian
Heting Gao
Mark Hasegawa-Johnson
Chang D. Yoo
SSL
OffRL
88
0
0
10 Jan 2025
An Empirical Analysis of Speech Self-Supervised Learning at Multiple
  Resolutions
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
38
0
0
31 Oct 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
K R Prajwal
Bowen Shi
Matthew Lee
Apoorv Vyas
Andros Tjandra
...
Baishan Guo
Huiyu Wang
Triantafyllos Afouras
David Kant
Wei-Ning Hsu
43
5
0
27 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
28
0
0
14 Oct 2024
Music Genre Classification using Large Language Models
Music Genre Classification using Large Language Models
Mohamed El Amine Meguenani
Alceu de Souza Britto Jr.
A. L. Koerich
31
0
0
10 Oct 2024
Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment:
  Insights and Analysis
Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis
Tuan Nguyen
C. Fredouille
A. Ghio
M. Balaguer
Virginie Woisard
16
0
0
10 Oct 2024
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Learn from Real: Reality Defender's Submission to ASVspoof5 Challenge
Yi Zhu
C. Goel
Surya Koppisetti
Trang Tran
Ankur Kumar
Gaurav Bharaj
AAML
28
0
0
09 Oct 2024
Mitigation of gender bias in automatic facial non-verbal behaviors
  generation
Mitigation of gender bias in automatic facial non-verbal behaviors generation
Alice Delbosc
M. Ochs
Nicolas Sabouret
Brian Ravenet
Stéphane Ayache
29
0
0
09 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade
Puyuan Peng
David Harwath
50
3
0
05 Oct 2024
Adaptive Large Language Models By Layerwise Attention Shortcuts
Adaptive Large Language Models By Layerwise Attention Shortcuts
Prateek Verma
Mert Pilanci
KELM
OffRL
52
0
0
17 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
49
0
0
16 Sep 2024
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural
  Networks
Connecting Concept Convexity and Human-Machine Alignment in Deep Neural Networks
Teresa Dorszewski
Lenka Tětková
Lorenz Linhardt
Lars Kai Hansen
HAI
36
0
0
10 Sep 2024
Property Neurons in Self-Supervised Speech Transformers
Property Neurons in Self-Supervised Speech Transformers
T. Lin
Guan-Ting Lin
Hung-yi Lee
Hao Tang
MILM
27
0
0
07 Sep 2024
Probing self-attention in self-supervised speech models for
  cross-linguistic differences
Probing self-attention in self-supervised speech models for cross-linguistic differences
Sai Gopinath
Joselyn Rodriguez
MILM
56
0
0
04 Sep 2024
Convexity-based Pruning of Speech Representation Models
Convexity-based Pruning of Speech Representation Models
Teresa Dorszewski
Lenka Tětková
Lars Kai Hansen
25
2
0
16 Aug 2024
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake
  Detection
SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection
Yi Zhu
Surya Koppisetti
Trang Tran
Gaurav Bharaj
52
9
0
26 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech
  Translation
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
41
0
0
08 Jul 2024
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Improving Self-supervised Pre-training using Accent-Specific Codebooks
Darshan Prabhu
Abhishek Gupta
Omkar Nitsure
P. Jyothi
Sriram Ganapathy
SSL
44
0
0
04 Jul 2024
Cross-Lingual Transfer Learning for Speech Translation
Cross-Lingual Transfer Learning for Speech Translation
Rao Ma
Yassir Fathullah
Mengjie Qian
Siyuan Tang
Mark J. F. Gales
Kate Knill
20
1
0
01 Jul 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech
  Units: A Pilot Study
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Peikun Chen
Sining Sun
Changhao Shan
Qing Yang
Lei Xie
42
2
0
27 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech
  Health Diagnostic Model
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
41
0
0
26 Jun 2024
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu
Yu-Xiang Lin
Tsui-Wei Weng
52
1
0
24 Jun 2024
Speech Representation Analysis based on Inter- and Intra-Model
  Similarities
Speech Representation Analysis based on Inter- and Intra-Model Similarities
Yassine El Kheir
Ahmed M. Ali
Shammur A. Chowdhury
SSL
43
2
0
23 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
36
1
0
18 Jun 2024
Interface Design for Self-Supervised Speech Models
Interface Design for Self-Supervised Speech Models
Yi-Jen Shih
David Harwath
54
1
0
18 Jun 2024
Orthogonality and isotropy of speaker and phonetic information in
  self-supervised speech representations
Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations
Mukhtar Mohamed
Oli Danyi Liu
Hao Tang
Sharon Goldwater
SSL
44
2
0
13 Jun 2024
Self-Supervised Speech Representations are More Phonetic than Semantic
Self-Supervised Speech Representations are More Phonetic than Semantic
Kwanghee Choi
Ankita Pasad
Tomohiko Nakamura
Satoru Fukayama
Karen Livescu
Shinji Watanabe
31
14
0
12 Jun 2024
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Yue Li
Xinsheng Wang
Li Zhang
Lei Xie
42
1
0
12 Jun 2024
Sustainable self-supervised learning for speech representations
Sustainable self-supervised learning for speech representations
Luis Lugo
Valentin Vielzeuf
31
2
0
11 Jun 2024
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
49
1
0
09 Jun 2024
Towards objective and interpretable speech disorder assessment: a
  comparative analysis of CNN and transformer-based models
Towards objective and interpretable speech disorder assessment: a comparative analysis of CNN and transformer-based models
Malo Maisonneuve
C. Fredouille
M. Lalain
A. Ghio
Virginie Woisard
48
0
0
07 Jun 2024
Fill in the Gap! Combining Self-supervised Representation Learning with
  Neural Audio Synthesis for Speech Inpainting
Fill in the Gap! Combining Self-supervised Representation Learning with Neural Audio Synthesis for Speech Inpainting
Ihab Asaad
Maxime Jacquelin
Olivier Perrotin
Laurent Girin
Thomas Hueber
33
0
0
30 May 2024
Crossmodal ASR Error Correction with Discrete Speech Units
Crossmodal ASR Error Correction with Discrete Speech Units
Yuanchao Li
Pinzhen Chen
Peter Bell
Catherine Lai
36
6
0
26 May 2024
Investigating the Áutoencoder Behavior' in Speech Self-Supervised
  Models: a focus on HuBERT's Pretraining
Investigating the Áutoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining
Valentin Vielzeuf
SSL
44
0
0
14 May 2024
A predictive learning model can simulate temporal dynamics and context
  effects found in neural representations of continuous speech
A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech
Oli Danyi Liu
Hao Tang
Naomi H Feldman
Sharon Goldwater
24
1
0
13 May 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
Compact Speech Translation Models via Discrete Speech Units Pretraining
Compact Speech Translation Models via Discrete Speech Units Pretraining
Tsz Kin Lam
Alexandra Birch
Barry Haddow
53
2
0
29 Feb 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and
  Context-Aware Visual Speech Processing
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
53
22
0
23 Feb 2024
Establishing degrees of closeness between audio recordings along
  different dimensions using large-scale cross-lingual models
Establishing degrees of closeness between audio recordings along different dimensions using large-scale cross-lingual models
Maxime Fily
Guillaume Wisniewski
Severine Guillaume
Gilles Adda
Alexis Michaud
22
1
0
08 Feb 2024
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study
  on Speech Emotion Recognition
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
Alexandra Saliba
Yuanchao Li
Ramon Sanabria
Catherine Lai
38
8
0
04 Feb 2024
12
Next