Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.00135
Cited By
Attention Bottlenecks for Multimodal Fusion
30 June 2021
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Attention Bottlenecks for Multimodal Fusion"
50 / 285 papers shown
Title
A Survey on Side Information-driven Session-based Recommendation: From a Data-centric Perspective
Xiaokun Zhang
Bo Xu
Chenliang Li
Bowei He
Hongfei Lin
Chen Ma
Fenglong Ma
9
0
0
18 May 2025
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Yu Gui
Cong Ma
Zongming Ma
SSL
26
0
0
18 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
26
0
0
15 May 2025
TACFN: Transformer-based Adaptive Cross-modal Fusion Network for Multimodal Emotion Recognition
Feng Liu
Ziwang Fu
Yansen Wang
Qijian Zheng
40
4
0
10 May 2025
PREMISE: Matching-based Prediction for Accurate Review Recommendation
Wei Han
Hui Chen
Soujanya Poria
52
0
0
02 May 2025
Learning Universal User Representations Leveraging Cross-domain User Intent at Snapchat
Clark Mingxuan Ju
Leonardo Neves
Bhuvesh Kumar
Liam Collins
Tong Zhao
...
Rengim Ozturk
Yong-Jin Liu
Sen Yang
Manish Malik
Neil Shah
41
0
0
30 Apr 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
87
0
0
30 Apr 2025
4D Multimodal Co-attention Fusion Network with Latent Contrastive Alignment for Alzheimer's Diagnosis
Yuxiang Wei
Wenjie Qu
Xi Xiao
Tianyang Wang
Xuben Wang
Vince D. Calhoun
181
0
0
23 Apr 2025
DeepMLF: Multimodal language model with learnable tokens for deep fusion in sentiment analysis
Efthymios Georgiou
Vassilis Katsouros
Yannis Avrithis
Alexandros Potamianos
28
1
0
15 Apr 2025
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Maria Santos-Villafranca
Dustin Carrión-Ojeda
Alejandro Pérez-Yus
J. Bermudez-Cameo
Jose J. Guerrero
Simone Schaub-Meyer
EgoV
VLM
39
0
0
11 Apr 2025
MultiTSF: Transformer-based Sensor Fusion for Human-Centric Multi-view and Multi-modal Action Recognition
Trung Thanh Nguyen
Yasutomo Kawanishi
Vijay John
Takahiro Komamizu
Ichiro Ide
ViT
41
0
0
03 Apr 2025
MultiSensor-Home: A Wide-area Multi-modal Multi-view Dataset for Action Recognition and Transformer-based Sensor Fusion
Trung Thanh Nguyen
Yasutomo Kawanishi
Vijay John
Takahiro Komamizu
Ichiro Ide
43
0
0
03 Apr 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
56
0
0
30 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
59
0
0
20 Mar 2025
FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification
S. Sami
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
Raghuveer Rao
53
0
0
12 Mar 2025
EPR-GAIL: An EPR-Enhanced Hierarchical Imitation Learning Framework to Simulate Complex User Consumption Behaviors
Tao Feng
Yunke Zhang
Huandong Wang
Yong Li
AI4TS
60
0
0
09 Mar 2025
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu
Na Zhao
Gang Niu
Masashi Sugiyama
Xiaofeng Zhu
92
0
0
06 Mar 2025
Attention Bootstrapping for Multi-Modal Test-Time Adaptation
Yusheng Zhao
Junyu Luo
Xiao Luo
Jinsheng Huang
Jingyang Yuan
Zhiping Xiao
M. Zhang
TTA
92
0
0
04 Mar 2025
HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation
Hongye Cheng
Tianyu Wang
Guangsi Shi
Zexing Zhao
Yanwei Fu
SLR
50
1
0
03 Mar 2025
Cross-Attention Fusion of MRI and Jacobian Maps for Alzheimer's Disease Diagnosis
Shijia Zhang
Xiyu Ding
Brian Caffo
Junyu Chen
Cindy Zhang
Hadi Kharrazi
Zheyu Wang
MedIm
31
0
0
01 Mar 2025
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
84
0
0
20 Feb 2025
QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition
Cong Wang
Li Chen
Lili Wang
Zhaofan Li
Xuebin Lv
86
1
0
28 Jan 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
40
28
0
02 Jan 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
67
0
0
31 Dec 2024
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
Xinghang Li
Peiyan Li
Minghuan Liu
Dong Wang
Jirong Liu
Bingyi Kang
Xiao Ma
Tao Kong
Hanbo Zhang
Huaping Liu
LM&Ro
99
18
0
18 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
79
0
0
18 Dec 2024
PT: A Plain Transformer is Good Hospital Readmission Predictor
Zhenyi Fan
Jiaqi Li
Dongyu Luo
Yuqi Yuan
75
0
0
17 Dec 2024
MEATRD: Multimodal Anomalous Tissue Region Detection Enhanced with Spatial Transcriptomics
Kaichen Xu
Qilong Wu
Yan Lu
Yinan Zheng
W. Li
Xingjie Tang
Jun Wang
Xiaobo Sun
79
0
0
14 Dec 2024
On Moving Object Segmentation from Monocular Video with Transformers
Christian Homeyer
Christoph Schnörr
102
3
0
28 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
75
0
0
24 Nov 2024
Relevance-guided Audio Visual Fusion for Video Saliency Prediction
Li Yu
Xuanzhe Sun
Pan Gao
Moncef Gabbouj
71
1
0
18 Nov 2024
HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
Haoxu Huang
Cem M. Deniz
K. Cho
S. Chopra
Divyam Madaan
42
1
0
16 Nov 2024
Multimodal Fusion Balancing Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
36
0
0
11 Nov 2024
Enhancing SNN-based Spatio-Temporal Learning: A Benchmark Dataset and Cross-Modality Attention Model
Shibo Zhou
Bo Yang
Mengwen Yuan
Runhao Jiang
Rui Yan
Gang Pan
Huajin Tang
37
4
0
21 Oct 2024
Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
Jiayu Xiong
Jing Wang
Hengjing Xiang
Jun Xue
Chen Xu
Zhouqiang Jiang
35
0
0
20 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
39
0
0
08 Oct 2024
Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation
Sen Fang
Sizhou Chen
Yalin Feng
Xiaofeng Zhang
T. Teoh
28
0
0
04 Oct 2024
Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations
Minoh Jeong
Min Namgung
Zae Myung Kim
Dongyeop Kang
Yao-Yi Chiang
Alfred Hero
30
0
0
02 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
45
6
0
27 Sep 2024
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIP
VLM
36
0
0
18 Sep 2024
Towards Social AI: A Survey on Understanding Social Interactions
Sangmin Lee
Minzhi Li
Bolin Lai
Wenqi Jia
Fiona Ryan
...
Ozgur Kara
Bikram Boote
Weiyan Shi
Diyi Yang
James M. Rehg
39
4
0
05 Sep 2024
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
Mahrukh Awan
Asmar Nadeem
Muhammad Junaid Awan
Armin Mustafa
Syed Sameed Husain
28
1
0
26 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
Zebang Cheng
Shuyuan Tu
Dawei Huang
Minghan Li
Xiaojiang Peng
Zhi-Qi Cheng
Alexander G. Hauptmann
53
2
0
20 Aug 2024
Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches
Tanya Liyaqat
T. Ahmad
Chandni Saxena
32
2
0
18 Aug 2024
Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition
Muhammad Haseeb Aslam
M. Pedersoli
Alessandro Lameiras Koerich
Eric Granger
34
1
0
16 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
48
0
0
14 Aug 2024
FuXi Weather: An end-to-end machine learning weather data assimilation and forecasting system
Xiuyu Sun
Xiaohui Zhong
Xiaoze Xu
Yuanqing Huang
Hao Li
Jie Feng
Wei Han
Libo Wu
Yuan Qi
AI4Cl
38
4
0
10 Aug 2024
Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation
Runze Yuan
Tao Liu
Wenke Ma
Xuelong Li
36
8
0
02 Aug 2024
Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning
Chen Shen
Chunfeng Lian
Wanqing Zhang
Fan Wang
Jianhua Zhang
...
Hongshu Mu
Hao Wu
Xinggong Liang
Jianhua Ma
Zhenyuan Wang
36
1
0
20 Jul 2024
ElasticAST: An Audio Spectrogram Transformer for All Length and Resolutions
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
33
1
0
11 Jul 2024
1
2
3
4
5
6
Next