Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.14441
Cited By
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
26 August 2024
Mahrukh Awan
Asmar Nadeem
Muhammad Junaid Awan
Armin Mustafa
Syed Sameed Husain
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification"
27 / 27 papers shown
Title
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
92
4
0
10 Jun 2024
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
49
8
0
26 Mar 2023
Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 Challenge
Ziyang Zhang
Liuwei An
Zishun Cui
Ao Xu
Tengteng Dong
Yueqi Jiang
Jingyi Shi
Xin Liu
Xiao Sun
Meng Wang
CVBM
54
20
0
16 Mar 2023
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
77
26
0
20 Jul 2022
A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
R Gnana Praveen
W. Melo
Nasib Ullah
Haseeb Aslam
Osama Zeeshan
...
M. Pedersoli
Alessandro Lameiras Koerich
Simon L Bacon
P. Cardinal
Eric Granger
73
70
0
28 Mar 2022
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
100
567
0
30 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
130
11
0
12 Jun 2021
Distilling Knowledge via Knowledge Review
Pengguang Chen
Shu Liu
Hengshuang Zhao
Jiaya Jia
189
442
0
19 Apr 2021
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
138
525
0
22 Dec 2020
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
74
123
0
03 Nov 2020
Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
D. Powers
173
5,281
0
11 Oct 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
135
108
0
13 Aug 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
537
610
0
21 Jul 2020
Audio-Visual Instance Discrimination with Cross-Modal Agreement
Pedro Morgado
Nuno Vasconcelos
Ishan Misra
SSL
80
273
0
27 Apr 2020
Multi-modal Dense Video Captioning
Vladimir E. Iashin
Esa Rahtu
53
170
0
17 Mar 2020
Self-supervised learning for audio-visual speaker diarization
Yifan Ding
Yong-mei Xu
Shi-Xiong Zhang
Yahuan Cong
Liqiang Wang
VLM
53
29
0
13 Feb 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
69
159
0
14 Jan 2020
Self-Supervised Learning of Video-Induced Visual Invariances
Michael Tschannen
Josip Djolonga
Marvin Ritter
Aravindh Mahendran
Xiaohua Zhai
N. Houlsby
Sylvain Gelly
Mario Lucic
SSL
105
61
0
05 Dec 2019
MMTM: Multimodal Transfer Module for CNN Fusion
Hamid Reza Vaezi Joze
Amirreza Shaban
Michael L. Iuzzolino
K. Koishida
85
282
0
20 Nov 2019
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization
Bruno Korbar
Du Tran
Lorenzo Torresani
97
476
0
30 Jun 2018
Learnable pooling with Context Gating for video classification
Antoine Miech
Ivan Laptev
Josef Sivic
74
327
0
21 Jun 2017
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
115
905
0
23 May 2017
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
123
2,506
0
29 Sep 2016
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
151
1,270
0
27 Sep 2016
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
247
7,535
0
09 Jun 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
155
6,162
0
03 Dec 2012
Speech Recognition by Machine, A Review
M. Anusuya
S. Katti
89
393
0
13 Jan 2010
1