ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.05752
  4. Cited By
UniSpeech-SAT: Universal Speech Representation Learning with Speaker
  Aware Pre-Training

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

12 October 2021
Sanyuan Chen
Yu Wu
Chengyi Wang
Zhengyang Chen
Zhuo Chen
Shujie Liu
Jian Wu
Yao Qian
Furu Wei
Jinyu Li
Xiangzhan Yu
    SSL
ArXivPDFHTML

Papers citing "UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training"

49 / 49 papers shown
Title
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
Haizhou Li
AI4TS
34
0
0
03 Apr 2025
Safe Gradient Flow for Bilevel Optimization
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
E. Y. Hamedani
Mahyar Fazlyab
36
0
0
27 Jan 2025
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
H. Li
34
4
0
05 Nov 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple
  Resolutions
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
36
0
0
31 Oct 2024
On the Use of Audio to Improve Dialogue Policies
On the Use of Audio to Improve Dialogue Policies
Daniel Roncel
Federico Costa
Javier Hernando
26
0
0
17 Oct 2024
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech
  Forensic Tasks
Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks
Orchid Chetia Phukan
Devyani Koshal
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
21
0
0
16 Oct 2024
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion
  Recognition with Sequential Class-Finetuning
SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning
Sarthak Jain
Orchid Chetia Phukan
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
CLL
19
0
0
16 Oct 2024
JOOCI: a Framework for Learning Comprehensive Speech Representations
JOOCI: a Framework for Learning Comprehensive Speech Representations
Hemant Yadav
R. Shah
Sunayana Sitaram
23
0
0
14 Oct 2024
Representation Loss Minimization with Randomized Selection Strategy for
  Efficient Environmental Fake Audio Detection
Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection
Orchid Chetia Phukan
Girish
Mohd Mujtaba Akhtar
Swarup Ranjan Behera
Nitin Choudhury
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
32
0
0
24 Sep 2024
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation
  Models with Optimal Transport for Non-Verbal Emotion Recognition
Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition
Orchid Chetia Phukan
Mohd Mujtaba Akhtar
Girish
Swarup Ranjan Behera
Sishir Kalita
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
EgoV
26
0
0
21 Sep 2024
Are Music Foundation Models Better at Singing Voice Deepfake Detection?
  Far-Better Fuse them with Speech Foundation Models
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
Orchid Chetia Phukan
Sarthak Jain
Swarup Ranjan Behera
Arun Balaji Buduru
Rajesh Sharma
S. R Mahadeva Prasanna
26
0
0
21 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
49
0
0
16 Sep 2024
Progressive Residual Extraction based Pre-training for Speech
  Representation Learning
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
43
0
0
31 Aug 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
The VoxCeleb Speaker Recognition Challenge: A Retrospective
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
36
3
0
27 Aug 2024
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation
Hemant Yadav
Sunayana Sitaram
R. Shah
SSL
47
0
0
20 Aug 2024
Temporal Variability and Multi-Viewed Self-Supervised Representations to
  Tackle the ASVspoof5 Deepfake Challenge
Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge
Yuankun Xie
Xiaopeng Wang
Zhiyong Wang
Ruibo Fu
Zhengqi Wen
Haonan Cheng
Long Ye
38
1
0
13 Aug 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech
  Processing Tasks
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
34
1
0
28 Jul 2024
Sentiment Reasoning for Healthcare
Sentiment Reasoning for Healthcare
Khai Le-Duc
Khai-Nguyen Nguyen
Bach Phan Tat
Duy Le
Jerry Ngo
Long Vo-Dang
Anh Totti Nguyen
Truong Son-Hy
LRM
33
0
0
24 Jul 2024
The Reasonable Effectiveness of Speaker Embeddings for Violence
  Detection
The Reasonable Effectiveness of Speaker Embeddings for Violence Detection
Sarthak Jain
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
21
0
0
10 Jun 2024
A Large-Scale Evaluation of Speech Foundation Models
A Large-Scale Evaluation of Speech Foundation Models
Shu-Wen Yang
Heng-Jui Chang
Zili Huang
Andy T. Liu
Cheng-I Jeff Lai
...
Kushal Lakhotia
Shang-Wen Li
Abdelrahman Mohamed
Shinji Watanabe
Hung-yi Lee
38
19
0
15 Apr 2024
A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping
  Attacks
A Backdoor Approach with Inverted Labels Using Dirty Label-Flipping Attacks
Orson Mengara
AAML
30
4
0
29 Mar 2024
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
  Speaker Verification
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
Mufan Sang
John H. L. Hansen
41
6
0
01 Mar 2024
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control
  and Contrastive Learning with Negative Samples Augmentation
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation
Yimin Deng
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
25
3
0
15 Nov 2023
Yet Another Model for Arabic Dialect Identification
Yet Another Model for Arabic Dialect Identification
Ajinkya Kulkarni
Hanan Aldarmaki
21
1
0
20 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
H. Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
45
24
0
04 Oct 2023
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy
  Research
VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research
Sarina Meyer
Xiaoxiao Miao
Ngoc Thang Vu
29
6
0
14 Sep 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
21
171
0
11 Jun 2023
ChatVideo: A Tracklet-centric Multimodal and Versatile Video
  Understanding System
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System
Junke Wang
Dongdong Chen
Chong Luo
Xiyang Dai
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
95
54
0
27 Apr 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Self-supervised speech representation learning for keyword-spotting with
  light-weight transformers
Self-supervised speech representation learning for keyword-spotting with light-weight transformers
Chenyang Gao
Yue Gu
Francesco Calivá
Yuzong Liu
OffRL
24
4
0
07 Mar 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
44
3,283
0
06 Dec 2022
Exploring WavLM on Speech Enhancement
Exploring WavLM on Speech Enhancement
Hyungchan Song
Sanyuan Chen
Zhuo Chen
Yu-Huan Wu
Takuya Yoshioka
M. Tang
Jong Won Shin
Shujie Liu
11
16
0
18 Nov 2022
Self-Supervised Learning for Speech Enhancement through Synthesis
Self-Supervised Learning for Speech Enhancement through Synthesis
Bryce Irvin
Marko Stamenovic
M. Kegler
Li-Chia Yang
35
18
0
04 Nov 2022
Parameter-efficient transfer learning of pre-trained Transformer models
  for speaker verification using adapters
Parameter-efficient transfer learning of pre-trained Transformer models for speaker verification using adapters
Junyi Peng
Themos Stafylakis
Rongzhi Gu
Oldvrich Plchot
Ladislav Movsner
Lukávs Burget
JanHonza'' vCernocký
34
22
0
28 Oct 2022
Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation
Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation
Evonne Lee
Guangzhi Sun
C. Zhang
P. Woodland
17
1
0
24 Oct 2022
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of
  Self-Supervised Speech Representation Learning
SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng
Annie Dong
Ching-Feng Yeh
Shu-Wen Yang
Tzu-Quan Lin
...
Xuankai Chang
Shinji Watanabe
Abdel-rahman Mohamed
Shang-Wen Li
Hung-yi Lee
ELM
SSL
26
33
0
16 Oct 2022
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised
  learning of speech representations
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
Vasista Sai Lodagala
Sreyan Ghosh
S. Umesh
SSL
38
18
0
05 Oct 2022
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples
  on Self-Supervised Speech Recognition models
Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
R. Olivier
H. Abdullah
Bhiksha Raj
AAML
24
1
0
17 Sep 2022
Joint Training of Speech Enhancement and Self-supervised Model for
  Noise-robust ASR
Joint Training of Speech Enhancement and Self-supervised Model for Noise-robust ASR
Qiu-shi Zhu
Jie M. Zhang
Zitian Zhang
Lirong Dai
40
15
0
26 May 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
128
349
0
21 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to
  Store Speaker Information
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
20
8
0
08 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
  Recognition?
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
16
39
0
27 Apr 2022
Self-Supervised Speech Representations Preserve Speech Characteristics
  while Anonymizing Voices
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices
Abner Hernandez
Paula Andrea Pérez-Toro
Juan Camilo Vásquez-Correa
J. Orozco-Arroyave
Andreas K. Maier
S. Yang
19
1
0
04 Apr 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
80
1,700
0
26 Oct 2021
Large-scale Self-Supervised Speech Representation Learning for Automatic
  Speaker Verification
Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
Zhengyang Chen
Sanyuan Chen
Yu-Huan Wu
Yao Qian
Chengyi Wang
Shujie Liu
Y. Qian
Michael Zeng
SSL
26
124
0
12 Oct 2021
Pretext Tasks selection for multitask self-supervised speech
  representation learning
Pretext Tasks selection for multitask self-supervised speech representation learning
Salah Zaiem
Titouan Parcollet
S. Essid
Abdel Heba
SSL
14
12
0
01 Jul 2021
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition
Cheng-I Jeff Lai
Yang Zhang
Alexander H. Liu
Shiyu Chang
Yi-Lun Liao
Yung-Sung Chuang
Kaizhi Qian
Sameer Khurana
David D. Cox
James R. Glass
VLM
49
70
0
10 Jun 2021
Multi-task self-supervised learning for Robust Speech Recognition
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
189
288
0
25 Jan 2020
1