Multimodal Emotion Recognition with High-level Speech and Text Features

29 September 2021

Papers citing "Multimodal Emotion Recognition with High-level Speech and Text Features"

24 / 24 papers shown

Title
"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer? Benjamin Z. Reichman Kartik Talamadupula 84 0 0 07 Jan 2025
Fusion approaches for emotion recognition from speech using acoustic and text-based features L. Pepino Pablo Riera Luciana Ferrer Agustin Gravano 70 49 0 27 Mar 2024
VISTANet: VIsual Spoken Textual Additive Net for Interpretable Multimodal Emotion Recognition Puneet Kumar Sarthak Malik Balasubramanian Raman Xiaobai Li 113 2 0 24 Aug 2022
Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings L. Pepino Pablo Riera Luciana Ferrer 67 363 0 08 Apr 2021
On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition Manon Macary Marie Tahon Yannick Esteve Anthony Rousseau SSL 53 55 0 18 Nov 2020
Emotion recognition by fusing time synchronous and time asynchronous representations Wen Wu Chao Zhang P. Woodland 56 67 0 27 Oct 2020
Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition Shamane Siriwardhana Andrew Reis Rivindu Weerasekera Suranga Nanayakkara 63 112 0 15 Aug 2020
Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition Shuiyang Mao P. Ching C.-C. Jay Kuo Tan Lee 32 11 0 15 Aug 2020
Transformer based unsupervised pre-training for acoustic representation learning Ruixiong Zhang Haiwei Wu Wubo Li Dongwei Jiang Wei Zou Xiangang Li SSL ViT 56 27 0 29 Jul 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 282 5,801 0 20 Jun 2020
Unsupervised Speech Decomposition via Triple Information Bottleneck Kaizhi Qian Yang Zhang Shiyu Chang David D. Cox M. Hasegawa-Johnson 82 184 0 23 Apr 2020
Speaker-invariant Affective Representation Learning via Adversarial Training Haoqi Li Ming Tu Jing-ling Huang Shrikanth Narayanan P. Georgiou 66 56 0 04 Nov 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding Zhilin Yang Zihang Dai Yiming Yang J. Carbonell Ruslan Salakhutdinov Quoc V. Le AI4CE 232 8,433 0 19 Jun 2019
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss Kaizhi Qian Yang Zhang Shiyu Chang Xuesong Yang M. Hasegawa-Johnson 81 465 0 14 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 94,891 0 11 Oct 2018
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 353 2,279 0 14 Jun 2018
Exploring Disentangled Feature Representation Beyond Face Identification Yu Liu Fangyin Wei Jing Shao Lu Sheng Junjie Yan Xiaogang Wang CoGe CVBM 53 156 0 10 Apr 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 54 554 0 24 Mar 2018
Generalized End-to-End Loss for Speaker Verification Li Wan Quan Wang Alan Papir Ignacio López Moreno VLM 68 927 0 28 Oct 2017
VoxCeleb: a large-scale speaker identification dataset Arsha Nagrani Joon Son Chung Andrew Zisserman 125 2,274 0 26 Jun 2017
Unsupervised Learning of Disentangled Representations from Video Emily L. Denton Vighnesh Birodkar DRL CoGe OOD 76 552 0 31 May 2017
WaveNet: A Generative Model for Raw Audio Aaron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner A. Senior Koray Kavukcuoglu DiffM 406 7,399 0 12 Sep 2016
Listen, Attend and Spell William Chan Navdeep Jaitly Quoc V. Le Oriol Vinyals RALM 156 2,266 0 05 Aug 2015
Explaining and Harnessing Adversarial Examples Ian Goodfellow Jonathon Shlens Christian Szegedy AAML GAN 277 19,066 0 20 Dec 2014