ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.01051
  4. Cited By
SUPERB: Speech processing Universal PERformance Benchmark

SUPERB: Speech processing Universal PERformance Benchmark

3 May 2021
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
Yist Y. Lin
Andy T. Liu
Jiatong Shi
Xuankai Chang
Guan-Ting Lin
Tzu-hsien Huang
Wei-Cheng Tseng
Ko-tik Lee
Da-Rong Liu
Zili Huang
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
    SSL
ArXivPDFHTML

Papers citing "SUPERB: Speech processing Universal PERformance Benchmark"

50 / 212 papers shown
Title
What is different between these datasets?
What is different between these datasets?
Varun Babbar
Zhicheng Guo
Cynthia Rudin
59
1
0
08 Mar 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
34
22
0
20 Feb 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
44
0
0
29 Jan 2024
Speech foundation models on intelligibility prediction for
  hearing-impaired listeners
Speech foundation models on intelligibility prediction for hearing-impaired listeners
Santiago Cuervo
R. Marxer
30
6
0
24 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
21
12
0
07 Jan 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
25
86
0
23 Dec 2023
On the choice of the optimal temporal support for audio classification
  with Pre-trained embeddings
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quélennec
Michel Olvera
Geoffroy Peeters
S. Essid
27
2
0
21 Dec 2023
Efficiency-oriented approaches for self-supervised speech representation
  learning
Efficiency-oriented approaches for self-supervised speech representation learning
Luis Lugo
Valentin Vielzeuf
SSL
26
1
0
18 Dec 2023
Fine-Tuned Self-Supervised Speech Representations for Language
  Diarization in Multilingual Code-Switched Speech
Fine-Tuned Self-Supervised Speech Representations for Language Diarization in Multilingual Code-Switched Speech
Geoffrey T. Frost
Emily Morris
Joshua Jansen van Vüren
T. Niesler
26
2
0
15 Dec 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
R-Spin: Efficient Speaker and Noise-invariant Representation Learning
  with Acoustic Pieces
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Heng-Jui Chang
James R. Glass
33
3
0
15 Nov 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
33
8
0
16 Oct 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
26
20
0
12 Oct 2023
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Fast Word Error Rate Estimation Using Self-Supervised Representations for Speech and Text
Chanho Park
Chengsong Lu
Mingjie Chen
Thomas Hain
28
3
0
12 Oct 2023
Leveraging Multilingual Self-Supervised Pretrained Models for
  Sequence-to-Sequence End-to-End Spoken Language Understanding
Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding
Pavel Denisov
Ngoc Thang Vu
29
1
0
09 Oct 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi
William Chen
Dan Berrebbi
Hsiu-Hsuan Wang
Wei-Ping Huang
...
Yuxun Tang
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Shinji Watanabe
LRM
ELM
36
15
0
09 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech
  segmentation into words
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
36
1
0
08 Oct 2023
Partial Rank Similarity Minimization Method for Quality MOS Prediction
  of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting
Hemant Yadav
Erica Cooper
Junichi Yamagishi
Sunayana Sitaram
R. Shah
11
0
0
08 Oct 2023
LanSER: Language-Model Supported Speech Emotion Recognition
LanSER: Language-Model Supported Speech Emotion Recognition
Taesik Gong
Joshua Belanich
Krishna Somandepalli
Arsha Nagrani
B. Eoff
Brendan Jou
33
10
0
07 Sep 2023
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for
  Automatic Speech Recognition
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition
Zhisheng Zheng
Ziyang Ma
Yu Wang
Xie Chen
26
2
0
28 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
Integrating Emotion Recognition with Speech Recognition and Speaker
  Diarisation for Conversations
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wen Wu
C. Zhang
P. Woodland
31
3
0
14 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Improving Joint Speech-Text Representations Without Alignment
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Kyunghyun Cho
VLM
31
4
0
11 Aug 2023
Joint speech and overlap detection: a benchmark over multiple audio
  setup and speech domains
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains
Martin Lebourdais
Théo Mariotte
Marie Tahon
Anthony Larcher
Antoine Laurent
Silvio Montrésor
S. Meignier
Jean-Hugh Thomas
VLM
25
5
0
24 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
28
35
0
20 Jul 2023
On the Use of Self-Supervised Speech Representations in Spontaneous
  Speech Synthesis
On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis
Siyang Wang
G. Henter
Joakim Gustafson
Éva Székely
44
5
0
11 Jul 2023
On-Device Constrained Self-Supervised Speech Representation Learning for
  Keyword Spotting via Knowledge Distillation
On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Gene-Ping Yang
Yue Gu
Qingming Tang
Dongsu Du
Yuzong Liu
22
5
0
06 Jul 2023
Bidirectional Looking with A Novel Double Exponential Moving Average to
  Adaptive and Non-adaptive Momentum Optimizers
Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers
Yineng Chen
Z. Li
Lefei Zhang
Bo Du
Hai Zhao
33
4
0
02 Jul 2023
When to Use Efficient Self Attention? Profiling Text, Speech and Image
  Transformer Variants
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants
Anuj Diwan
Eunsol Choi
David Harwath
41
0
0
14 Jun 2023
Estimating the Uncertainty in Emotion Attributes using Deep Evidential
  Regression
Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression
Wen Wu
C. Zhang
P. Woodland
UQCV
UD
EDL
19
11
0
11 Jun 2023
Simultaneous or Sequential Training? How Speech Representations
  Cooperate in a Multi-Task Self-Supervised Learning System
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
Khazar Khorrami
María Andrea Cruz Blandón
Tuomas Virtanen
Okko Rasanen
SSL
27
1
0
05 Jun 2023
On the Robustness of Arabic Speech Dialect Identification
On the Robustness of Arabic Speech Dialect Identification
Peter Sullivan
AbdelRahim Elmadany
Muhammad Abdul-Mageed
23
8
0
01 Jun 2023
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models
Yu-Hsiang Wang
Huan Chen
Kai-Wei Chang
Winston H. Hsu
Hung-yi Lee
24
6
0
30 May 2023
Improving Textless Spoken Language Understanding with Discrete Units as
  Intermediate Target
Improving Textless Spoken Language Understanding with Discrete Units as Intermediate Target
Guanyong Wu
Guan-Ting Lin
Shang-Wen Li
Hung-yi Lee
26
5
0
29 May 2023
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Investigating Pre-trained Audio Encoders in the Low-Resource Condition
Haomiao Yang
Jinming Zhao
Gholamreza Haffari
Ehsan Shareghi
19
6
0
28 May 2023
One-Step Knowledge Distillation and Fine-Tuning in Using Large
  Pre-Trained Self-Supervised Learning Models for Speaker Verification
One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification
Ju-Sung Heo
Chan-yeong Lim
Ju-ho Kim
Hyun-Seo Shin
Ha-Jin Yu
26
2
0
27 May 2023
Can Self-Supervised Neural Representations Pre-Trained on Human Speech
  distinguish Animal Callers?
Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?
Eklavya Sarkar
Mathew Magimai.-Doss
21
11
0
23 May 2023
Self-supervised representations in speech-based depression detection
Self-supervised representations in speech-based depression detection
Wen Wu
C. Zhang
P. Woodland
14
23
0
20 May 2023
Scaling laws for language encoding models in fMRI
Scaling laws for language encoding models in fMRI
Richard Antonello
Aditya R. Vaidya
Alexander G. Huth
MedIm
30
55
0
19 May 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on
  E-Branchformer and Multi-task Learning
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning
Jiyang Tang
William Chen
Xuankai Chang
Shinji Watanabe
B. MacWhinney
24
10
0
19 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
29
5
0
19 May 2023
Syllable Discovery and Cross-Lingual Generalization in a Visually
  Grounded, Self-Supervised Speech Model
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
Puyuan Peng
Shang-Wen Li
Okko Rasanen
Abdel-rahman Mohamed
David Harwath
SSL
VLM
26
7
0
19 May 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
Jiatong Shi
Dan Berrebbi
William Chen
Ho-Lam Chung
En-Pei Hu
...
Xuankai Chang
Shang-Wen Li
Abdel-rahman Mohamed
Hung-yi Lee
Shinji Watanabe
ELM
55
58
0
18 May 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised
  Speech Representation Learning
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Alexander H. Liu
Heng-Jui Chang
Michael Auli
Wei-Ning Hsu
James R. Glass
24
25
0
17 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech
  Representation Models
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
25
3
0
09 May 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for
  Speech Emotion Recognition
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
28
6
0
22 Apr 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
34
6
0
13 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
H. Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
Designing and Evaluating Speech Emotion Recognition Systems: A reality
  check case study with IEMOCAP
Designing and Evaluating Speech Emotion Recognition Systems: A reality check case study with IEMOCAP
Nikolaos Antoniou
Athanasios Katsamanis
Theodoros Giannakopoulos
Shrikanth Narayanan
23
17
0
03 Apr 2023
Previous
12345
Next