Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.05421
Cited By
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
10 August 2023
Guangyao Li
Wenxuan Hou
Di Hu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Progressive Spatio-temporal Perception for Audio-Visual Question Answering"
17 / 17 papers shown
Title
PAVE: Patching and Adapting Video Large Language Models
Zhuoming Liu
Yiquan Li
Khoi Duc Nguyen
Yiwu Zhong
Yin Li
KELM
LRM
89
0
0
25 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
59
0
0
17 Mar 2025
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu
Liying Yang
Peike Li
Dadong Wang
Lincheng Li
Xin Yu
VOS
99
0
0
17 Mar 2025
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim
Inyoung Jung
Dayoon Suh
Youjia Zhang
Sangmin Lee
Sungeun Hong
61
0
0
06 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Hao Wu
VLM
58
4
0
18 Nov 2024
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Shuwei He
Rui Liu
Hong Li
32
4
0
18 Oct 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
32
4
0
30 Jul 2024
Learning Trimodal Relation for AVQA with Missing Modality
Kyu Ri Park
Hong Joo Lee
Jung Uk Kim
42
1
0
23 Jul 2024
Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang
Peiwen Sun
Dongzhan Zhou
Guangyao Li
Honggang Zhang
Di Hu
VOS
49
5
0
15 Jul 2024
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering
Zhe Yang
Wenrui Li
Guanghui Cheng
Mamba
28
0
0
14 Jun 2024
Towards Multilingual Audio-Visual Question Answering
Orchid Chetia Phukan
Priyabrata Mallick
Swarup Ranjan Behera
Aalekhya Satya Narayani
Arun Balaji Buduru
Rajesh Sharma
49
0
0
13 Jun 2024
CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering
Yuanyuan Jiang
Jianqin Yin
45
1
0
13 May 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
38
1
0
11 Mar 2024
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye
Zitong Yu
Rui Shao
Xinyu Xie
Philip Torr
Xiaochun Cao
MLLM
56
24
0
07 Mar 2024
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
Changsheng Lv
Shuai Zhang
Yapeng Tian
Mengshi Qi
Huadong Ma
CML
44
16
0
30 Oct 2023
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
33
16
0
05 Oct 2022
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
1