ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.05421
  4. Cited By
Progressive Spatio-temporal Perception for Audio-Visual Question
  Answering

Progressive Spatio-temporal Perception for Audio-Visual Question Answering

10 August 2023
Guangyao Li
Wenxuan Hou
Di Hu
ArXivPDFHTML

Papers citing "Progressive Spatio-temporal Perception for Audio-Visual Question Answering"

17 / 17 papers shown
Title
PAVE: Patching and Adapting Video Large Language Models
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu
Yiquan Li
Khoi Duc Nguyen
Yiwu Zhong
Yin Li
KELM
LRM
89
0
0
25 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
59
0
0
17 Mar 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu
Liying Yang
Peike Li
Dadong Wang
Lincheng Li
Xin Yu
VOS
99
0
0
17 Mar 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim
Inyoung Jung
Dayoon Suh
Youjia Zhang
Sangmin Lee
Sungeun Hong
61
0
0
06 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Hao Wu
VLM
58
4
0
18 Nov 2024
Multi-Source Spatial Knowledge Understanding for Immersive Visual
  Text-to-Speech
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Shuwei He
Rui Liu
Hong Li
32
4
0
18 Oct 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
32
4
0
30 Jul 2024
Learning Trimodal Relation for AVQA with Missing Modality
Learning Trimodal Relation for AVQA with Missing Modality
Kyu Ri Park
Hong Joo Lee
Jung Uk Kim
42
1
0
23 Jul 2024
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang
Peiwen Sun
Dongzhan Zhou
Guangyao Li
Honggang Zhang
Di Hu
VOS
49
5
0
15 Jul 2024
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual
  Question Answering
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
Zhe Yang
Wenrui Li
Guanghui Cheng
Mamba
28
0
0
14 Jun 2024
Towards Multilingual Audio-Visual Question Answering
Towards Multilingual Audio-Visual Question Answering
Orchid Chetia Phukan
Priyabrata Mallick
Swarup Ranjan Behera
Aalekhya Satya Narayani
Arun Balaji Buduru
Rajesh Sharma
49
0
0
13 Jun 2024
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual
  Question Answering
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
45
1
0
13 May 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual
  Clues
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
38
1
0
11 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in
  Dynamic Audio-Visual Scenarios
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Philip Torr
Xiaochun Cao
MLLM
56
24
0
07 Mar 2024
Disentangled Counterfactual Learning for Physical Audiovisual
  Commonsense Reasoning
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
Changsheng Lv
Shuai Zhang
Yapeng Tian
Mengshi Qi
Huadong Ma
CML
44
16
0
30 Oct 2023
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
1