
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
Papers citing "MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers"
17 / 17 papers shown
Title |
---|
![]() Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu |