ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.18327
  4. Cited By
MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion
  Recognition

MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

28 April 2024
Peihao Xiang
Chaohao Lin
Kaida Wu
Ou Bai
ArXivPDFHTML

Papers citing "MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition"

27 / 27 papers shown
Title
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
111
0
0
20 Feb 2025
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
74
14
0
19 Sep 2023
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic
  Facial Expression Recognition
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
50
17
0
05 Jul 2023
PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging
  Facial Expression Recognition
PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition
Jia Le Ngwe
K. Lim
C. Lee
T. Ong
CVBM
52
12
0
16 Jun 2023
A vector quantized masked autoencoder for speech emotion recognition
A vector quantized masked autoencoder for speech emotion recognition
Samir Sadok
Simon Leglaive
Renaud Séguier
79
20
0
21 Apr 2023
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
37
53
0
15 Dec 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
55
276
0
13 Jul 2022
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in
  the Wild
Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild
Fuyan Ma
Bin Sun
Shutao Li
ViT
31
31
0
10 May 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for
  Self-Supervised Video Pre-Training
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
191
1,156
0
23 Mar 2022
Self-attention fusion for audiovisual emotion recognition with
  incomplete data
Self-attention fusion for audiovisual emotion recognition with incomplete data
K. Chumachenko
Alexandros Iosifidis
Moncef Gabbouj
101
38
0
26 Jan 2022
A Pre-trained Audio-Visual Transformer for Emotion Recognition
A Pre-trained Audio-Visual Transformer for Emotion Recognition
Minh Tran
M. Soleymani
70
25
0
23 Jan 2022
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster
  Prediction
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdel-rahman Mohamed
SSL
69
310
0
05 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
352
7,600
0
11 Nov 2021
A cross-modal fusion network based on self-attention and residual
  structure for multimodal emotion recognition
A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition
Ziwang Fu
Feng Liu
Hanyang Wang
Jiayin Qi
Xiangling Fu
Aimin Zhou
Zhibin Li
43
30
0
03 Nov 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
110
2,879
0
14 Jun 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern
  Recognition
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
S. Verbitskiy
Vladimir Berikov
Viacheslav Vyshegorodtsev
35
73
0
03 Jun 2021
MSAF: Multimodal Split Attention Fusion
MSAF: Multimodal Split Attention Fusion
Lang Su
Chuqing Hu
Guofa Li
Dongpu Cao
49
37
0
13 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation
  Learning
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
41
77
0
08 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
185
40,217
0
22 Oct 2020
MMTM: Multimodal Transfer Module for CNN Fusion
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
34
278
0
20 Nov 2019
Multimodal Transformer for Unaligned Multimodal Language Sequences
Multimodal Transformer for Unaligned Multimodal Language Sequences
Yao-Hung Hubert Tsai
Shaojie Bai
Paul Pu Liang
J. Zico Kolter
Louis-Philippe Morency
Ruslan Salakhutdinov
54
1,280
0
01 Jun 2019
Deep-Emotion: Facial Expression Recognition Using Attentional
  Convolutional Network
Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network
Shervin Minaee
AmirAli Abdolrashidi
CVBM
109
561
0
04 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
815
93,936
0
11 Oct 2018
Real-time Convolutional Neural Networks for Emotion and Gender
  Classification
Real-time Convolutional Neural Networks for Emotion and Gender Classification
Octavio Arriaga
Matias Valdenegro-Toro
Paul G. Plöger
3DH
25
295
0
20 Oct 2017
Tensor Fusion Network for Multimodal Sentiment Analysis
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
43
1,221
0
23 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
324
129,831
0
12 Jun 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
253
1,466
0
06 Jun 2016
1