Attention Bottlenecks for Multimodal Fusion

30 June 2021

Papers citing "Attention Bottlenecks for Multimodal Fusion"

50 / 285 papers shown

Title
4M: Massively Multimodal Masked Modeling David Mizrahi Roman Bachmann Ouguzhan Fatih Kar Teresa Yeo Mingfei Gao Afshin Dehghan Amir Zamir MLLM 50 64 0 11 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling Shentong Mo Pedro Morgado 27 13 0 02 Dec 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition Dongho Lee Jongseo Lee Jinwoo Choi EgoV 35 12 0 30 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen Zhengcong Fei Mingyuan Fan Junshi Huang 25 17 0 27 Nov 2023
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition Dongyuan Li Yusong Wang Kotaro Funakoshi Manabu Okumura 26 12 0 18 Nov 2023
Accommodating Missing Modalities in Time-Continuous Multimodal Emotion Recognition Juan Vazquez-Rodriguez G. Lefebvre Julien Cumin James L. Crowley 34 2 0 16 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing Yating Xu Conghui Hu Gim Hee Lee 22 2 0 14 Nov 2023
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities A. Piergiovanni Isaac Noble Dahun Kim Michael S. Ryoo Victor Gomes A. Angelova 43 19 0 09 Nov 2023
MOSEL: Inference Serving Using Dynamic Modality Selection Bodun Hu Le Xu Jeongyoon Moon N. Yadwadkar Aditya Akella 13 4 0 27 Oct 2023
SynergyNet: Bridging the Gap between Discrete and Continuous Representations for Precise Medical Image Segmentation Vandan Gorade Sparsh Mittal Debesh Jha Ulas Bagci 33 11 0 26 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA Asmar Nadeem Adrian Hilton R. Dawes Graham A. Thomas A. Mustafa 33 9 0 25 Oct 2023
GraFT: Gradual Fusion Transformer for Multimodal Re-Identification Haoli Yin Jiayao Li Eva Schiller Luke McDermott Daniel Cummings 29 6 0 25 Oct 2023
Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images Bissmella Bahaduri Zuheng Ming Fangchen Feng Anissa Mokraou 32 1 0 21 Oct 2023
Advancing Perception in Artificial Intelligence through Principles of Cognitive Science Palaash Agrawal Cheston Tan Heena Rathore 54 1 0 13 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation Yuxin Mao Jing Zhang Mochu Xiang Yiran Zhong Yuchao Dai 40 34 0 12 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment Jaewoo Lee Jaehong Yoon Wonjae Kim Yunji Kim Sung Ju Hwang CLL 19 1 0 12 Oct 2023
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models Chenzhuang Du Yue Zhao Chonghua Liao Jiacheng You Jie Fu Hang Zhao 47 2 0 08 Oct 2023
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization Edward Fish Jon Weinbren Andrew Gilbert 36 0 0 05 Oct 2023
RegBN: Batch Normalization of Multimodal Data with Regularization Morteza Ghahremani Christian Wachinger 30 6 0 01 Oct 2023
Audio-Visual Speaker Verification via Joint Cross-Attention R Gnana Praveen Jahangir Alam 34 6 0 28 Sep 2023
$M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding$ M $^{3}$ 3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding Muhammad Abdullah Jamal Omid Mohareri 3DPC 24 1 0 26 Sep 2023
Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments Simon Queyrut V. Schiavoni Pascal Felber AAML FedML 32 6 0 13 Sep 2023
Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices Mohamed Imed Eddine Ghebriout Halima Bouzidi Smail Niar Hamza Ouarnoughi 19 3 0 12 Sep 2023
Enhancing multimodal cooperation via sample-level modality valuation Yake Wei Ruoxuan Feng Zihe Wang Di Hu 38 11 0 12 Sep 2023
Multimodal Fish Feeding Intensity Assessment in Aquaculture Meng Cui Xubo Liu Haohe Liu Zhuangzhuang Du Tao Chen Guoping Lian Daoliang Li Wenwu Wang 34 5 0 10 Sep 2023
Multi-modal Extreme Classification Anshul Mittal Kunal Dahiya Shreya Malani Janani Ramaswamy Seba Kuruvilla Jitendra Ajmera Keng-hao Chang Sumeet Agarwal Purushottam Kar Manik Varma 34 8 0 10 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata VLM 27 2 0 07 Sep 2023
Exchanging-based Multimodal Fusion with Transformer Renyu Zhu Chengcheng Han Yong Qian Qiushi Sun Xiang Li Ming Gao Xuezhi Cao Yunsen Xian 40 2 0 05 Sep 2023
Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery J. Park Daniel Sungho Jung Gyeongsik Moon Kyoung Mu Lee 30 6 0 05 Sep 2023
LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data Shun-Wen Hsiao Chengbin Sun CVBM 18 1 0 04 Sep 2023
MM-AU:Towards Multimodal Understanding of Advertisement Videos Digbalay Bose Rajat Hebbar Tiantian Feng Krishna Somandepalli Anfeng Xu Shrikanth Narayanan 32 5 0 27 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding Mona Ahmadian Frank Guerin Andrew Gilbert 44 2 0 23 Aug 2023
Learning Bottleneck Transformer for Event Image-Voxel Feature Fusion based Classification Chengguo Yuan Yu Jin Zong-Yao Wu Fanting Wei Yangzirui Wang Langlang Chen Tianlin Li ViT 100 7 0 23 Aug 2023
MEGA: Multimodal Alignment Aggregation and Distillation For Cinematic Video Segmentation Najmeh Sadoughi Xinyu Li Avijit Vajpayee D. Fan Bing Shuai H. Santos-Villalobos Vimal Bhat M. Rohith 31 4 0 22 Aug 2023
Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey Xin Li Yulin Ren Xin Jin Cuiling Lan X. Wang Wenjun Zeng Xinchao Wang Zhibo Chen 43 86 0 18 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes Zhaohui Li Haitao Wang Xinghua Jiang 40 1 0 14 Aug 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder Yusheng Dai Hang Chen Jun Du xiao-ying Ding Ning Ding Feijun Jiang Chin-Hui Lee 32 7 0 14 Aug 2023
Pelta: Shielding Transformers to Mitigate Evasion Attacks in Federated Learning Simon Queyrut Yérom-David Bromberg V. Schiavoni FedML AAML 11 1 0 08 Aug 2023
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition Tianlin Li Zong-Yao Wu Yao Rong Lin Zhu Bowei Jiang Jin Tang Yonghong Tian ViT 77 18 0 08 Aug 2023
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation Vu Ngoc Tu V. Huynh Hyung-Jeong Yang M. Zaheer Shah Nawaz Karthik Nandakumar Soo-Hyung Kim 22 4 0 31 Jul 2023
Visual Prompt Flexible-Modal Face Anti-Spoofing Zitong Yu Rizhao Cai Yawen Cui Ajian Liu Changsheng Chen 38 6 0 26 Jul 2023
FedMEKT: Distillation-based Embedding Knowledge Transfer for Multimodal Federated Learning Huy Q. Le Minh N. H. Nguyen Chu Myaet Thwal Yu Qiao Chao Zhang Choong Seon Hong 16 13 0 25 Jul 2023
Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving Kaavya Rekanar Ciarán Eising Ganesh Sistu Martin Hayes 8 3 0 18 Jul 2023
Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media Liam Hebert Gaurav Sahu Yuxuan Guo Nanda Kishore Sreenivas Lukasz Golab Robin Cohen 23 10 0 18 Jul 2023
FlexiAST: Flexibility is What AST Needs Jiu Feng Mehmet Hamza Erol Joon Son Chung Arda Senocak 23 3 0 18 Jul 2023
CoTracker: It is Better to Track Together Nikita Karaev Ignacio Rocco Benjamin Graham Natalia Neverova Andrea Vedaldi Christian Rupprecht VOT ViT 51 246 0 14 Jul 2023
Multimodal Distillation for Egocentric Action Recognition Gorjan Radevski Dusan Grujicic Marie-Francine Moens Matthew Blaschko Tinne Tuytelaars EgoV 30 23 0 14 Jul 2023
One-Versus-Others Attention: Scalable Multimodal Integration for Clinical Data Michal Golovanevsky Eva Schiller Akira Nair Ritambhara Singh Carsten Eickhoff 23 2 0 11 Jul 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers Yuan Gong Sameer Khurana Leonid Karlinsky James R. Glass 27 68 0 06 Jul 2023
Deep Equilibrium Multimodal Fusion Jinhong Ni Yalong Bai Wei Zhang Ting Yao Tao Mei 28 1 0 29 Jun 2023