Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.15185
Cited By
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
23 December 2023
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation"
35 / 35 papers shown
Title
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Soumya Dutta
Avni Jain
Sriram Ganapathy
115
0
0
23 May 2025
EmoSign: A Multimodal Dataset for Understanding Emotions in American Sign Language
Phoebe Chua
Cathy Mengying Fang
Takehiko Ohkawa
Raja Kushalnagar
Suranga Nanayakkara
Pattie Maes
SLR
59
0
0
20 May 2025
EmoHead: Emotional Talking Head via Manipulating Semantic Expression Parameters
Xuli Shen
Hua Cai
Dingding Yu
Weilin Shen
Qing-Song Xu
Xiangyang Xue
88
0
0
25 Mar 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
148
16
0
28 Jan 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Xuelong Geng
Kun Wei
Qijie Shao
Shuiyun Liu
Zhennan Lin
...
Yuhang Dai
Xinfa Zhu
Yue Li
Li Zhang
Lei Xie
117
5
0
23 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
152
24
0
17 Jan 2025
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Xinfa Zhu
Lei He
Yujia Xiao
Xi Wang
Xu Tan
Sheng Zhao
Lei Xie
DiffM
78
2
0
08 Jan 2025
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
115
8
0
04 Nov 2024
Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Yuzhe Weng
Haotian Wang
Tian Gao
Kewei Li
Shutong Niu
Jun Du
81
0
0
19 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLM
MLLM
VLM
147
29
0
26 Sep 2024
Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions
Kun Zhou
You Zhang
Shengkui Zhao
Hao Wang
Zexu Pan
...
Chongjia Ni
Yukun Ma
Trung Hieu Nguyen
J. Yip
Bin Ma
106
7
0
25 Sep 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
124
44
0
29 Aug 2024
Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation
Xiaoxiao Miao
Yuxiang Zhang
Xin Wang
N. Tomashenko
D. Soh
Ian Mcloughlin
69
2
0
12 Aug 2024
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
Jisoo Kim
Jungbin Cho
Joonho Park
Soonmin Hwang
Da Eun Kim
Geon Kim
Youngjae Yu
102
1
0
12 Aug 2024
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
201
3,732
0
06 Dec 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
86
20
0
14 Nov 2022
Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora
Yuanchao Li
Yumnah Mohamied
P. Bell
Catherine Lai
SSL
79
47
0
05 Oct 2022
Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Chengyi Wang
Yiming Wang
Yu Wu
Sanyuan Chen
Jinyu Li
Shujie Liu
Furu Wei
SSL
75
20
0
21 Jun 2022
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Jinming Zhao
Tenggan Zhang
Jingwen Hu
Yuchen Liu
Qin Jin
Xinchao Wang
Haizhou Li
62
56
0
09 May 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
97
859
0
07 Feb 2022
Speech Emotion Recognition using Self-Supervised Features
E. Morais
R. Hoory
Weizhong Zhu
Itai Gat
Matheus Damasceno
Hagai Aronowitz
SSL
MDE
54
118
0
07 Feb 2022
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Yingzhi Wang
Abdelmoumene Boumadane
A. Heba
68
152
0
04 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
259
1,898
0
26 Oct 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
289
2,841
0
15 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
182
2,993
0
14 Jun 2021
Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
L. Pepino
Pablo Riera
Luciana Ferrer
73
365
0
08 Apr 2021
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
297
5,837
0
20 Jun 2020
Bootstrap your own latent: A new approach to self-supervised Learning
Jean-Bastien Grill
Florian Strub
Florent Altché
Corentin Tallec
Pierre Harvey Richemond
...
M. G. Azar
Bilal Piot
Koray Kavukcuoglu
Rémi Munos
Michal Valko
SSL
395
6,837
0
13 Jun 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
213
12,124
0
13 Nov 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
677
24,541
0
26 Jul 2019
ShEMO -- A Large-Scale Validated Database for Persian Speech Emotion Detection
Omid Mohamad Nezami
Paria Jamshid Lou
Mansoureh Karami
CVBM
52
74
0
04 Jun 2019
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Soujanya Poria
Devamanyu Hazarika
Navonil Majumder
Gautam Naik
Min Zhang
Rada Mihalcea
109
1,077
0
05 Oct 2018
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction
Leland McInnes
John Healy
James Melville
199
9,473
0
09 Feb 2018
Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
Bjarke Felbo
A. Mislove
Anders Søgaard
Iyad Rahwan
Sune Lehmann
87
744
0
01 Aug 2017
MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos
Amir Zadeh
Rowan Zellers
Eli Pincus
Louis-Philippe Morency
78
455
0
20 Jun 2016
1