ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.02720
  4. Cited By
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
v1v2 (latest)

Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction

4 October 2023
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
ArXiv (abs)PDFHTML

Papers citing "Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction"

50 / 56 papers shown
Title
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
88
0
0
16 Sep 2024
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond
Jiatong Shi
William Chen
Dan Berrebbi
Hsiu-Hsuan Wang
Wei-Ping Huang
...
Yuxun Tang
Shang-Wen Li
Abdelrahman Mohamed
Hung-yi Lee
Shinji Watanabe
LRMELM
111
16
0
09 Oct 2023
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
Seamless Communication
Loïc Barrault
Yu-An Chung
Mariano Cora Meglioli
David Dale
...
Holger Schwenk
Paden Tomasello
Changhan Wang
Jeff Wang
Skyler Wang
86
91
0
22 Aug 2023
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
  Resynthesis
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
...
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
71
60
0
10 Aug 2023
Accelerating Transducers through Adjacent Token Merging
Accelerating Transducers through Adjacent Token Merging
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
60
4
0
28 Jun 2023
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
ML-SUPERB: Multilingual Speech Universal PERformance Benchmark
Jiatong Shi
Dan Berrebbi
William Chen
Ho-Lam Chung
En-Pei Hu
...
Xuankai Chang
Shang-Wen Li
Abdel-rahman Mohamed
Hung-yi Lee
Shinji Watanabe
ELM
97
69
0
18 May 2023
Parameter Efficient Transfer Learning for Various Speech Processing
  Tasks
Parameter Efficient Transfer Learning for Various Speech Processing Tasks
Shinta Otake
Rei Kawakami
Nakamasa Inoue
31
17
0
06 Dec 2022
MelHuBERT: A simplified HuBERT on Mel spectrograms
MelHuBERT: A simplified HuBERT on Mel spectrograms
Tzu-Quan Lin
Hung-yi Lee
Hao Tang
SSL
68
16
0
17 Nov 2022
Efficient Transformers with Dynamic Token Pooling
Efficient Transformers with Dynamic Token Pooling
Piotr Nawrot
J. Chorowski
Adrian Lañcucki
Edoardo Ponti
57
43
0
17 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
69
12
0
06 Nov 2022
End-to-End Integration of Speech Recognition, Dereverberation,
  Beamforming, and Self-Supervised Learning Representation
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Samuele Cornell
Shinji Watanabe
Nobutaka Ono
57
19
0
19 Oct 2022
On Compressing Sequences for Self-Supervised Speech Models
On Compressing Sequences for Self-Supervised Speech Models
Yen Meng
Hsuan-Jui Chen
Jiatong Shi
Shinji Watanabe
Paola García
Hung-yi Lee
Hao Tang
SSL
52
15
0
13 Oct 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker
  Recognition?
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Sanyuan Chen
Yu Wu
Chengyi Wang
Shujie Liu
Zhuo Chen
...
Gang Liu
Jinyu Li
Jian Wu
Xiangzhan Yu
Furu Wei
SSL
81
42
0
27 Apr 2022
ContentVec: An Improved Self-Supervised Speech Representation by
  Disentangling Speakers
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
Kaizhi Qian
Yang Zhang
Heting Gao
Junrui Ni
Cheng-I Jeff Lai
David D. Cox
M. Hasegawa-Johnson
Shiyu Chang
DRL
61
112
0
20 Apr 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
85
111
0
14 Mar 2022
HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio Representations
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
77
106
0
06 Mar 2022
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised
  Learning
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning
Qiqi Wang
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
DRL
97
23
0
22 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSLVLMViT
97
858
0
07 Feb 2022
Self-supervised Learning with Random-projection Quantizer for Speech
  Recognition
Self-supervised Learning with Random-projection Quantizer for Speech Recognition
Chung-Cheng Chiu
James Qin
Yu Zhang
Jiahui Yu
Yonghui Wu
SSL
87
169
0
03 Feb 2022
Discretization and Re-synthesis: an alternative method to solve the
  Cocktail Party Problem
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem
Jing Shi
Xuankai Chang
Tomoki Hayashi
Yen-Ju Lu
Shinji Watanabe
Bo Xu
66
19
0
17 Dec 2021
Textless Speech-to-Speech Translation on Real Data
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
67
148
0
15 Dec 2021
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at
  Scale
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Arun Babu
Changhan Wang
Andros Tjandra
Kushal Lakhotia
Qiantong Xu
...
Yatharth Saraf
J. Pino
Alexei Baevski
Alexis Conneau
Michael Auli
SSL
110
704
0
17 Nov 2021
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion
  Recognition, Speaker Verification and Spoken Language Understanding
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
58
151
0
04 Nov 2021
Neural Analysis and Synthesis: Reconstructing Speech from
  Self-Supervised Representations
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Hyeong-Seok Choi
Juheon Lee
W. Kim
Jie Hwan Lee
Hoon Heo
Kyogu Lee
81
158
0
27 Oct 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
250
1,873
0
26 Oct 2021
Hierarchical Transformers Are More Efficient Language Models
Hierarchical Transformers Are More Efficient Language Models
Piotr Nawrot
Szymon Tworkowski
Michał Tyrolski
Lukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
71
65
0
26 Oct 2021
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised
  Speech Representations
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Hung-yi Lee
Shinji Watanabe
Tomoki Toda
71
40
0
12 Oct 2021
UniSpeech-SAT: Universal Speech Representation Learning with Speaker
  Aware Pre-Training
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
Sanyuan Chen
Yu Wu
Chengyi Wang
Zhengyang Chen
Zhuo Chen
...
Jian Wu
Yao Qian
Furu Wei
Jinyu Li
Xiangzhan Yu
SSL
60
93
0
12 Oct 2021
An Exploration of Self-Supervised Pretrained Representations for
  End-to-End Speech Recognition
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Xuankai Chang
Takashi Maekaku
Pengcheng Guo
Jing Shi
Yen-Ju Lu
...
Tianzi Wang
Shu-Wen Yang
Yu Tsao
Hung-yi Lee
Shinji Watanabe
SSLAI4TS
54
81
0
09 Oct 2021
Efficient conformer: Progressive downsampling and grouped attention for
  automatic speech recognition
Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition
Maxime Burchi
Valentin Vielzeuf
61
86
0
31 Aug 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling
  for Self-Supervised Speech Pre-Training
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSLVLM
56
427
0
07 Aug 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
180
2,966
0
14 Jun 2021
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
  Representation Learning from Speech
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Solène Evain
H. Nguyen
Hang Le
Marcely Zanon Boito
Salima Mdhaffar
...
François Portet
Solange Rossato
Fabien Ringeval
D. Schwab
Laurent Besacier
SSL
64
70
0
23 Apr 2021
CTC-based Compression for Direct Speech Translation
CTC-based Compression for Direct Speech Translation
Marco Gaido
Mauro Cettolo
Matteo Negri
Marco Turchi
72
58
0
02 Feb 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
  Learning, Semi-Supervised Learning and Interpretation
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
SSL
100
488
0
02 Jan 2021
SLURP: A Spoken Language Understanding Resource Package
SLURP: A Spoken Language Understanding Resource Package
E. Bastianelli
Andrea Vanzo
P. Swietojanski
Verena Rieser
VLM
88
229
0
26 Nov 2020
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised
  Discrete Speech Representations
Any-to-One Sequence-to-Sequence Voice Conversion using Self-Supervised Discrete Speech Representations
Wen-Chin Huang
Yi-Chiao Wu
Tomoki Hayashi
Tomoki Toda
BDL
89
38
0
23 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
177
1,936
0
12 Oct 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and
  cross-lingual voice conversion
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
Yi Zhao
Wen-Chin Huang
Xiaohai Tian
Junichi Yamagishi
Rohan Kumar Das
Tomi Kinnunen
Zhenhua Ling
Tomoki Toda
81
207
0
28 Aug 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
285
5,801
0
20 Jun 2020
Multistream CNN for Robust Acoustic Modeling
Multistream CNN for Robust Acoustic Modeling
Kyu Jeong Han
Jing Pan
Venkata Krishna Naveen Tadala
T. Ma
Daniel Povey
61
34
0
21 May 2020
SpEx: Multi-Scale Time Domain Speaker Extraction Network
SpEx: Multi-Scale Time Domain Speaker Extraction Network
Chenglin Xu
Wei Rao
Eng Siong Chng
Haizhou Li
50
170
0
17 Apr 2020
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
...
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
70
672
0
17 Dec 2019
Mockingjay: Unsupervised Speech Representation Learning with Deep
  Bidirectional Transformer Encoders
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Andy T. Liu
Shu-Wen Yang
Po-Han Chi
Po-Chun Hsu
Hung-yi Lee
SSL
146
374
0
25 Oct 2019
Parallel WaveGAN: A fast waveform generation model based on generative
  adversarial networks with multi-resolution spectrogram
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
56
818
0
25 Oct 2019
SpanBERT: Improving Pre-training by Representing and Predicting Spans
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Mandar Joshi
Danqi Chen
Yinhan Liu
Daniel S. Weld
Luke Zettlemoyer
Omer Levy
147
1,965
0
24 Jul 2019
Multi-Stream End-to-End Speech Recognition
Multi-Stream End-to-End Speech Recognition
Ruizhi Li
Xiaofei Wang
Sri Harish Reddy Mallidi
Shinji Watanabe
Takaaki Hori
H. Hermansky
45
21
0
17 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLMFaML
111
3,151
0
01 Apr 2019
Bytes are All You Need: End-to-End Multilingual Speech Recognition and
  Synthesis with Bytes
Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes
Yue Liu
Yu Zhang
Tara N. Sainath
Yonghui Wu
William Chan
AuLLM
77
130
0
22 Nov 2018
Big-Little Net: An Efficient Multi-Scale Feature Representation for
  Visual and Speech Recognition
Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition
Chun-Fu Chen
Quanfu Fan
Neil Rohit Mallinar
Tom Sercu
Rogerio Feris
42
96
0
10 Jul 2018
12
Next