ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.14078
  4. Cited By
MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
v1v2 (latest)

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

28 September 2022
Andrés Vasco-Carofilis
Laura Fernández-Robles
Enrique Alegre
Eduardo FIDALGO
ArXiv (abs)PDFHTML

Papers citing "MeWEHV: Mel and Wave Embeddings for Human Voice Tasks"

37 / 37 papers shown
Title
Combining Spectral and Self-Supervised Features for Low Resource Speech
  Recognition and Translation
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation
Dan Berrebbi
Jiatong Shi
Brian Yan
Osbel López-Francisco
Jonathan D. Amith
Shinji Watanabe
56
27
0
05 Apr 2022
BERT-LID: Leveraging BERT to Improve Spoken Language Identification
BERT-LID: Leveraging BERT to Improve Spoken Language Identification
Yuting Nie
Junhong Zhao
Weiqiang Zhang
Jinfeng Bai
VLM
61
5
0
01 Mar 2022
Attentive Temporal Pooling for Conformer-based Streaming Language
  Identification in Long-form Speech
Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech
Quan Wang
Yang Yu
Jason W. Pelecanos
Yiling Huang
Ignacio López Moreno
54
14
0
24 Feb 2022
Emotional Speaker Identification using a Novel Capsule Nets Model
Emotional Speaker Identification using a Novel Capsule Nets Model
Ali Bou Nassif
I. Shahin
A. Elnagar
Divya Velayudhan
A. Alhudhaif
K. Polat
57
28
0
09 Jan 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
265
1,905
0
26 Oct 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
184
3,003
0
14 Jun 2021
SUPERB: Speech processing Universal PERformance Benchmark
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
108
942
0
03 May 2021
The Accented English Speech Recognition Challenge 2020: Open Datasets,
  Tracks, Baselines, Results and Methods
The Accented English Speech Recognition Challenge 2020: Open Datasets, Tracks, Baselines, Results and Methods
Xian Shi
Fan Yu
Yizhou Lu
Yuhao Liang
Qiangze Feng
Daliang Wang
Y. Qian
Lei Xie
55
67
0
20 Feb 2021
AISPEECH-SJTU accent identification system for the Accented English
  Speech Recognition Challenge
AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge
Houjun Huang
Xu Xiang
Yexin Yang
Rao Ma
Y. Qian
70
25
0
19 Feb 2021
CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in
  Noisy and Emotional Talking Conditions
CASA-Based Speaker Identification Using Cascaded GMM-CNN Classifier in Noisy and Emotional Talking Conditions
Ali Bou Nassif
I. Shahin
Shibani Hamsa
Nawel Nemmour
K. Hirose
46
58
0
11 Feb 2021
MLS: A Large-Scale Multilingual Dataset for Speech Research
MLS: A Large-Scale Multilingual Dataset for Speech Research
Vineel Pratap
Qiantong Xu
Anuroop Sriram
Gabriel Synnaeve
R. Collobert
AuLLM
99
511
0
07 Dec 2020
Audio Tagging by Cross Filtering Noisy Labels
Audio Tagging by Cross Filtering Noisy Labels
Boqing Zhu
Kele Xu
Qiuqiang Kong
Huaimin Wang
Yuxing Peng
NoLa
298
16
0
16 Jul 2020
NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling
NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling
Shareef Babu Kalluri
Deepu Vijayasenan
Sriram Ganapathy
M. RageshRajan
Prashant Krishnan
44
18
0
12 Jul 2020
Unsupervised Cross-lingual Representation Learning for Speech
  Recognition
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
154
782
0
24 Jun 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
299
5,849
0
20 Jun 2020
AP20-OLR Challenge: Three Tasks and Their Baselines
AP20-OLR Challenge: Three Tasks and Their Baselines
Zheng Li
Miao Zhao
Q. Hong
Lin Li
Zhiyuan Tang
Dong Wang
Liming Song
Cheng Yang
67
34
0
04 Jun 2020
AccentDB: A Database of Non-Native English Accents to Assist Neural
  Speech Recognition
AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition
Afroz Ahamad
Ankit Anand
Pranesh Bhargava
34
23
0
16 May 2020
Towards Learning a Universal Non-Semantic Representation of Speech
Towards Learning a Universal Non-Semantic Representation of Speech
Joel Shor
A. Jansen
Ronnie Maor
Oran Lang
Omry Tuval
Félix de Chaumont Quitry
Marco Tagliasacchi
Ira Shavitt
Dotan Emanuel
Yinnon A. Haviv
SSL
130
158
0
25 Feb 2020
Multi-Representation Knowledge Distillation For Audio Classification
Multi-Representation Knowledge Distillation For Audio Classification
Liang Gao
Kele Xu
Huaimin Wang
Yuxing Peng
110
26
0
22 Feb 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
194
1,084
0
21 Dec 2019
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Jacob Kahn
M. Rivière
Weiyi Zheng
Evgeny Kharitonov
Qiantong Xu
...
Tatiana Likhomanenko
Gabriel Synnaeve
Armand Joulin
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
75
673
0
17 Dec 2019
Common Voice: A Massively-Multilingual Speech Corpus
Common Voice: A Massively-Multilingual Speech Corpus
Rosana Ardila
Megan Branson
Kelly Davis
Michael Henretty
M. Kohler
Josh Meyer
Reuben Morais
Lindsay Saunders
Francis M. Tyers
Gregor Weber
VLM
93
1,620
0
13 Dec 2019
A Comprehensive Survey on Transfer Learning
A Comprehensive Survey on Transfer Learning
Fuzhen Zhuang
Zhiyuan Qi
Keyu Duan
Dongbo Xi
Yongchun Zhu
Hengshu Zhu
Hui Xiong
Qing He
188
4,474
0
07 Nov 2019
Spoken Language Identification using ConvNets
Spoken Language Identification using ConvNets
Sarthak
Shikhar Shukla
Govind Mittal
39
28
0
09 Oct 2019
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken
  Utterances Extracted from the Bible
MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible
Marcely Zanon Boito
William N. Havard
Mahault Garnerin
Éric Le Ferrand
Laurent Besacier
80
47
0
30 Jul 2019
Speech Model Pre-training for End-to-End Spoken Language Understanding
Speech Model Pre-training for End-to-End Spoken Language Understanding
Loren Lugosch
Mirco Ravanelli
Patrick Ignoto
Vikrant Singh Tomar
Yoshua Bengio
SyDaAuLLM
70
355
0
07 Apr 2019
A Survey of the Recent Architectures of Deep Convolutional Neural
  Networks
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Asifullah Khan
A. Sohail
Umme Zahoora
Aqsa Saeed Qureshi
OOD
114
2,310
0
17 Jan 2019
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for
  Speaker Identification in Emotional Talking Environments
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments
I. Shahin
Ali Bou Nassif
Shibani Hamsa
27
49
0
11 Oct 2018
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task
  Description, Dataset, and Baseline
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
Eduardo Fonseca
Manoj Plakal
F. Font
D. Ellis
Xavier Favory
Jordi Pons
Xavier Serra
91
149
0
26 Jul 2018
A multi-device dataset for urban acoustic scene classification
A multi-device dataset for urban acoustic scene classification
A. Mesaros
Toni Heittola
Tuomas Virtanen
35
381
0
25 Jul 2018
From Word to Sense Embeddings: A Survey on Vector Representations of
  Meaning
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Jose Camacho-Collados
Mohammad Taher Pilehvar
80
341
0
10 May 2018
Negative Log Likelihood Ratio Loss for Deep Neural Network
  Classification
Negative Log Likelihood Ratio Loss for Deep Neural Network Classification
Donglai Zhu
Hengshuai Yao
Bei Jiang
YU Peng
54
78
0
27 Apr 2018
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Pete Warden
97
1,627
0
09 Apr 2018
VoxCeleb: a large-scale speaker identification dataset
VoxCeleb: a large-scale speaker identification dataset
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
127
2,283
0
26 Jun 2017
Learning without Forgetting
Learning without Forgetting
Zhizhong Li
Derek Hoiem
CLLOODSSL
308
4,432
0
29 Jun 2016
THCHS-30 : A Free Chinese Speech Corpus
THCHS-30 : A Free Chinese Speech Corpus
Dong Wang
Xuewei Zhang
84
233
0
07 Dec 2015
Semi-Supervised Learning with Deep Generative Models
Semi-Supervised Learning with Deep Generative Models
Diederik P. Kingma
Danilo Jimenez Rezende
S. Mohamed
Max Welling
GANSSLBDL
100
2,742
0
20 Jun 2014
1