ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.00135
  4. Cited By
Attention Bottlenecks for Multimodal Fusion

Attention Bottlenecks for Multimodal Fusion

30 June 2021
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
ArXivPDFHTML

Papers citing "Attention Bottlenecks for Multimodal Fusion"

50 / 285 papers shown
Title
Zorro: the masked multimodal transformer
Zorro: the masked multimodal transformer
Adrià Recasens
Jason Lin
João Carreira
Drew Jaegle
Luyu Wang
...
Pauline Luc
Antoine Miech
Lucas Smaira
Ross Hemsley
Andrew Zisserman
39
20
0
23 Jan 2023
What You Say Is What You Show: Visual Narration Detection in
  Instructional Videos
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
26
4
0
05 Jan 2023
Learning Multimodal Data Augmentation in Feature Space
Learning Multimodal Data Augmentation in Feature Space
Zichang Liu
Zhiqiang Tang
Xingjian Shi
Aston Zhang
Mu Li
Anshumali Shrivastava
A. Wilson
39
19
0
29 Dec 2022
Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text
  Features
Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features
V. Rathod
Bryan Seybold
Sudheendra Vijayanarasimhan
Austin Myers
Xiuye Gu
Vighnesh Birodkar
David A. Ross
VLM
ObjD
15
7
0
20 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
41
258
0
18 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
26
51
0
15 Dec 2022
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
34
73
0
15 Dec 2022
Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML
  Prefetcher for Accelerating Graph Analytics
Phases, Modalities, Temporal and Spatial Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics
Pengmiao Zhang
Rajgopal Kannan
Viktor K. Prasanna
18
2
0
10 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
39
43
0
09 Dec 2022
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
Hao Li
Yizhi Zhang
Junzhe Zhu
Shaoxiong Wang
Michelle A. Lee
Huazhe Xu
Edward H. Adelson
Li Fei-Fei
Ruohan Gao
Jiajun Wu
32
60
0
07 Dec 2022
On the Importance of Clinical Notes in Multi-modal Learning for EHR Data
On the Importance of Clinical Notes in Multi-modal Learning for EHR Data
Severin Husmann
Hugo Yèche
Gunnar Rätsch
Rita Kuznetsova
HAI
16
10
0
06 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
47
0
0
05 Dec 2022
Multimodal Query-guided Object Localization
Multimodal Query-guided Object Localization
Aditay Tripathi
Rajath R Dani
Anand Mishra
Anirban Chakraborty
29
0
0
01 Dec 2022
A Dual-scale Lead-seperated Transformer With Lead-orthogonal Attention
  And Meta-information For Ecg Classification
A Dual-scale Lead-seperated Transformer With Lead-orthogonal Attention And Meta-information For Ecg Classification
Heng Chang
Guijin Wang
Zhourui Xia
Wenming Yang
Li Sun
MedIm
37
1
0
23 Nov 2022
AVATAR submission to the Ego4D AV Transcription Challenge
AVATAR submission to the Ego4D AV Transcription Challenge
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
30
0
0
18 Nov 2022
Token Turing Machines
Token Turing Machines
Michael S. Ryoo
K. Gopalakrishnan
Kumara Kahatapitiya
Ted Xiao
Kanishka Rao
Austin Stone
Yao Lu
Julian Ibarz
Anurag Arnab
27
21
0
16 Nov 2022
Real Estate Attribute Prediction from Multiple Visual Modalities with
  Missing Data
Real Estate Attribute Prediction from Multiple Visual Modalities with Missing Data
Eric Stumpe
Miroslav Despotovic
Zedong Zhang
Matthias Zeppelzauer
20
0
0
16 Nov 2022
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
K. Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
39
490
0
12 Nov 2022
Self-Supervised Predictive Coding with Multimodal Fusion for Patient
  Deterioration Prediction in Fine-grained Time Resolution
Self-Supervised Predictive Coding with Multimodal Fusion for Patient Deterioration Prediction in Fine-grained Time Resolution
Kwanhyung Lee
John Won
Heejung Hyun
Sangchul Hahn
Edward Choi
Joohyung Lee
28
3
0
29 Oct 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
43
18
0
28 Oct 2022
Anticipative Feature Fusion Transformer for Multi-Modal Action
  Anticipation
Anticipative Feature Fusion Transformer for Multi-Modal Action Anticipation
Zeyun Zhong
David Schneider
Michael Voit
Rainer Stiefelhagen
Jürgen Beyerer
74
44
0
23 Oct 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
37
4
0
20 Oct 2022
Spatio-channel Attention Blocks for Cross-modal Crowd Counting
Spatio-channel Attention Blocks for Cross-modal Crowd Counting
Youjia Zhang
Soyun Choi
Sungeun Hong
26
25
0
19 Oct 2022
Students taught by multimodal teachers are superior action recognizers
Students taught by multimodal teachers are superior action recognizers
Gorjan Radevski
Dusan Grujicic
Matthew Blaschko
Marie-Francine Moens
Tinne Tuytelaars
24
1
0
09 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
37
120
0
02 Oct 2022
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space
  Using Joint Cross-Attention
Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention
R Gnana Praveen
Eric Granger
P. Cardinal
CVBM
56
31
0
19 Sep 2022
Distribution Aware Metrics for Conditional Natural Language Generation
Distribution Aware Metrics for Conditional Natural Language Generation
David M. Chan
Yiming Ni
David A. Ross
Sudheendra Vijayanarasimhan
Austin Myers
John F. Canny
48
4
0
15 Sep 2022
Fusion of Satellite Images and Weather Data with Transformer Networks
  for Downy Mildew Disease Detection
Fusion of Satellite Images and Weather Data with Transformer Networks for Downy Mildew Disease Detection
William Maillet
Maryam Ouhami
A. Hafiane
ViT
MedIm
22
6
0
06 Sep 2022
Features Fusion Framework for Multimodal Irregular Time-series Events
Features Fusion Framework for Multimodal Irregular Time-series Events
Peiwang Tang
Xianchao Zhang
AI4TS
26
2
0
05 Sep 2022
Progressive Fusion for Multimodal Integration
Progressive Fusion for Multimodal Integration
Shiv Shankar
Laure Thompson
M. Fiterau
36
3
0
01 Sep 2022
Efficient Multimodal Transformer with Dual-Level Feature Restoration for
  Robust Multimodal Sentiment Analysis
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
43
47
0
16 Aug 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
33
21
0
29 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
27
47
0
26 Jul 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
29
10
0
21 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision
  and Language Models
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming Yang
Serge Belongie
Huayu Chen
VLM
41
22
0
15 Jul 2022
MedFuse: Multi-modal fusion with clinical time-series data and chest
  X-ray images
MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images
Nasir Hayat
Krzysztof J. Geras
Farah E. Shamout
MedIm
27
41
0
14 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
21
268
0
13 Jul 2022
MM-ALT: A Multimodal Automatic Lyric Transcription System
MM-ALT: A Multimodal Automatic Lyric Transcription System
Xiangming Gu
Longshen Ou
Danielle Ong
Ye Wang
19
13
0
13 Jul 2022
Radiomics-Guided Global-Local Transformer for Weakly Supervised
  Pathology Localization in Chest X-Rays
Radiomics-Guided Global-Local Transformer for Weakly Supervised Pathology Localization in Chest X-Rays
Yan Han
G. Holste
Ying Ding
Ahmed H. Tewfik
Yifan Peng
Zhangyang Wang
LM&MA
ViT
39
15
0
10 Jul 2022
Dual-Path Cross-Modal Attention for better Audio-Visual Speech
  Extraction
Dual-Path Cross-Modal Attention for better Audio-Visual Speech Extraction
Zhongweiyang Xu
Xulin Fan
M. Hasegawa-Johnson
19
2
0
09 Jul 2022
M&M Mix: A Multimodal Multiview Transformer Ensemble
M&M Mix: A Multimodal Multiview Transformer Ensemble
Xuehan Xiong
Anurag Arnab
Arsha Nagrani
Cordelia Schmid
ViT
23
19
0
20 Jun 2022
Bear the Query in Mind: Visual Grounding with Query-conditioned
  Convolution
Bear the Query in Mind: Visual Grounding with Query-conditioned Convolution
Chonghan Chen
Qi Jiang1
Chih-Hao Wang
Noel Chen
Haohan Wang
Xiang Li
Bhiksha Raj
ObjD
21
0
0
18 Jun 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language
  Representation Learning
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
51
64
0
17 Jun 2022
AVATAR: Unconstrained Audiovisual Speech Recognition
AVATAR: Unconstrained Audiovisual Speech Recognition
Valentin Gabeur
Paul Hongsuck Seo
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
31
11
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
530
0
13 Jun 2022
AFNet-M: Adaptive Fusion Network with Masks for 2D+3D Facial Expression
  Recognition
AFNet-M: Adaptive Fusion Network with Masks for 2D+3D Facial Expression Recognition
Ming-Fa Sui
Hanting Li
Zhaoqing Zhu
Feng Zhao
3DPC
3DH
CVBM
36
3
0
24 May 2022
Multimodal Token Fusion for Vision Transformers
Multimodal Token Fusion for Vision Transformers
Yikai Wang
Xinghao Chen
Lele Cao
Wen-bing Huang
Gang Hua
Yunhe Wang
ViT
44
168
0
19 Apr 2022
Probabilistic Compositional Embeddings for Multimodal Image Retrieval
Probabilistic Compositional Embeddings for Multimodal Image Retrieval
Andrei Neculai
Yanbei Chen
Zeynep Akata
CoGe
27
31
0
12 Apr 2022
Are Multimodal Transformers Robust to Missing Modality?
Are Multimodal Transformers Robust to Missing Modality?
Mengmeng Ma
Jian Ren
Long Zhao
Davide Testuggine
Xi Peng
ViT
52
148
0
12 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
54
39
0
06 Apr 2022
Previous
123456
Next