Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.18327
Cited By
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition
28 April 2024
Peihao Xiang
Chaohao Lin
Kaida Wu
Ou Bai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition"
27 / 27 papers shown
Title
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
111
0
0
20 Feb 2025
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
74
14
0
19 Sep 2023
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
50
17
0
05 Jul 2023
PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition
Jia Le Ngwe
K. Lim
C. Lee
T. Ong
CVBM
52
12
0
16 Jun 2023
A vector quantized masked autoencoder for speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
79
20
0
21 Apr 2023
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
37
53
0
15 Dec 2022
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
55
276
0
13 Jul 2022
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild
Fuyan Ma
Bin Sun
Shutao Li
ViT
31
31
0
10 May 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
191
1,156
0
23 Mar 2022
Self-attention fusion for audiovisual emotion recognition with incomplete data
K. Chumachenko
Alexandros Iosifidis
Moncef Gabbouj
101
38
0
26 Jan 2022
A Pre-trained Audio-Visual Transformer for Emotion Recognition
Minh Tran
M. Soleymani
70
25
0
23 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
69
310
0
05 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
352
7,600
0
11 Nov 2021
A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition
Ziwang Fu
Feng Liu
Hanyang Wang
Jiayin Qi
Xiangling Fu
Aimin Zhou
Zhibin Li
43
30
0
03 Nov 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
110
2,879
0
14 Jun 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
S. Verbitskiy
Vladimir Berikov
Viacheslav Vyshegorodtsev
35
73
0
03 Jun 2021
MSAF: Multimodal Split Attention Fusion
Lang Su
Chuqing Hu
Guofa Li
Dongpu Cao
49
37
0
13 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
41
77
0
08 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
185
40,217
0
22 Oct 2020
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
34
278
0
20 Nov 2019
Multimodal Transformer for Unaligned Multimodal Language Sequences
Yao-Hung Hubert Tsai
Shaojie Bai
Paul Pu Liang
J. Zico Kolter
Louis-Philippe Morency
Ruslan Salakhutdinov
54
1,280
0
01 Jun 2019
Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network
Shervin Minaee
AmirAli Abdolrashidi
CVBM
109
561
0
04 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
815
93,936
0
11 Oct 2018
Real-time Convolutional Neural Networks for Emotion and Gender Classification
Octavio Arriaga
Matias Valdenegro-Toro
Paul G. Plöger
3DH
25
295
0
20 Oct 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
43
1,221
0
23 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
324
129,831
0
12 Jun 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
253
1,466
0
06 Jun 2016
1