ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.06736
  4. Cited By
Multi-level Attention Fusion Network for Audio-visual Event Recognition

Multi-level Attention Fusion Network for Audio-visual Event Recognition

12 June 2021
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
ArXivPDFHTML

Papers citing "Multi-level Attention Fusion Network for Audio-visual Event Recognition"

8 / 8 papers shown
Title
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing
  Audio-Visual Question Answering
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Xuzhi Zhang
39
0
0
07 Nov 2024
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification
Mahrukh Awan
Asmar Nadeem
Muhammad Junaid Awan
Armin Mustafa
Syed Sameed Husain
25
1
0
26 Aug 2024
Progressive Spatio-temporal Perception for Audio-Visual Question
  Answering
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Guangyao Li
Wenxuan Hou
Di Hu
29
26
0
10 Aug 2023
Towards Continual Egocentric Activity Recognition: A Multi-modal
  Egocentric Activity Dataset for Continual Learning
Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning
Linfeng Xu
Qingbo Wu
Lili Pan
Fanman Meng
Hongliang Li
Chiyuan He
Hanxin Wang
Shaoxu Cheng
Yunshu Dai
EgoV
HAI
28
23
0
26 Jan 2023
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
29
136
0
26 Mar 2022
Deep soccer captioning with transformer: dataset, semantics-related
  losses, and multi-level evaluation
Deep soccer captioning with transformer: dataset, semantics-related losses, and multi-level evaluation
Ahmad Hammoudeh
Bastein Vanderplaetse
Stéphane Dupont
ViT
21
6
0
11 Feb 2022
Human Action Recognition from Various Data Modalities: A Review
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
45
504
0
22 Dec 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
206
0
23 Jan 2020
1