MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using
Transformers

MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers

1 August 2023

Muhammad Bilal Shaikh

Syed Mohammed Shamsul Islam

ArXiv (abs)PDF HTML

Papers citing "MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers"

17 / 17 papers shown

Title
MAiVAR: Multimodal Audio-Image and Video Action Recognizer Muhammad Bilal Shaikh Douglas Chai S. Islam Naveed Akhtar 73 5 0 11 Sep 2022
An Overview of Human Activity Recognition Using Wearable Sensors: Healthcare and Artificial Intelligence Rex Liu Albara Ah Ramli Huan Zhang Esha Datta Xin Liu 35 48 0 29 Mar 2021
ViViT: A Video Vision Transformer Anurag Arnab Mostafa Dehghani G. Heigold Chen Sun Mario Lucic Cordelia Schmid ViT 225 2,168 0 29 Mar 2021
SkeletonVis: Interactive Visualization for Understanding Adversarial Attacks on Human Action Recognition Models Haekyu Park Zijie J. Wang Nilaksh Das Anindya Paul Pruthvi Perumalla Zhiyan Zhou Duen Horng Chau AAML 27 4 0 26 Jan 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 682 41,483 0 22 Oct 2020
Listen to Look: Action Recognition by Previewing Audio Ruohan Gao Tae-Hyun Oh Kristen Grauman Lorenzo Torresani VLM 83 253 0 10 Dec 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen EgoV 65 339 0 22 Aug 2019
Collaborative Spatio-temporal Feature Learning for Video Action Recognition Chong Li Qiaoyong Zhong Di Xie Shiliang Pu 75 82 0 04 Mar 2019
TSM: Temporal Shift Module for Efficient Video Understanding Ji Lin Chuang Gan Song Han 98 1,694 0 20 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,229 0 11 Oct 2018
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu 101 439 0 23 Mar 2018
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 803 132,454 0 12 Jun 2017
AENet: Learning Deep Audio Features for Video Analysis Naoya Takahashi Michael Gygli Luc Van Gool 66 150 0 03 Jan 2017
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition Limin Wang Yuanjun Xiong Zhe Wang Yu Qiao Dahua Lin Xiaoou Tang Luc Van Gool ViT 120 3,841 0 02 Aug 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,510 0 10 Dec 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.1K 150,364 0 22 Dec 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild K. Soomro Amir Zamir M. Shah CLIP VGen 163 6,170 0 03 Dec 2012

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.