Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.08842
Cited By
Audio-Visual Event Localization in Unconstrained Videos
23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Audio-Visual Event Localization in Unconstrained Videos"
50 / 252 papers shown
Title
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
46
9
0
08 Apr 2024
UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization
Tiantian Geng
Teng Wang
Yanfu Zhang
Jinming Duan
Weili Guan
Feng Zheng
36
0
0
04 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
39
5
0
28 Mar 2024
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim
Sung-Jin Um
Sangmin Lee
Jung Uk Kim
46
4
0
26 Mar 2024
Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
Shijian Deng
Erin E. Kosloski
Siddhi Patel
Zeke A. Barnett
Yiyang Nan
...
William T. Doan
Matthew Wang
Harsh Singh
P. Rollins
Yapeng Tian
39
4
0
22 Mar 2024
Answering Diverse Questions via Text Attached with Key Audio-Visual Clues
Qilang Ye
Zitong Yu
Xin Liu
41
1
0
11 Mar 2024
Reframe Anything: LLM Agent for Open World Video Reframing
Jiawang Cao
Yongliang Wu
Weiheng Chi
Wenbo Zhu
Ziyue Su
Jay Wu
37
3
0
10 Mar 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
42
18
0
08 Mar 2024
Enhancing Multimodal Unified Representations for Cross Modal Generalization
Hai Huang
Yan Xia
Shengpeng Ji
Shulei Wang
Hanting Wang
Minghui Fang
Jieming Zhu
Zhenhua Dong
Sashuai Zhou
Zhou Zhao
37
6
0
08 Mar 2024
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Zheng Ning
Brianna L Wimer
Kaiwen Jiang
Keyi Chen
Jerrick Ban
Yapeng Tian
Yuhang Zhao
T. Li
47
15
0
11 Feb 2024
Multimodal Action Quality Assessment
Ling-an Zeng
Wei-Shi Zheng
43
13
0
31 Jan 2024
Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics
Pengcheng Zhao
Yanxiang Chen
Yang Zhao
Wei Jia
Zhao Zhang
Ronggang Wang
Richang Hong
DiffM
22
1
0
24 Jan 2024
On the Audio Hallucinations in Large Audio-Video Language Models
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
30
5
0
18 Jan 2024
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition
Yukun Zuo
Hantao Yao
Liansheng Zhuang
Changsheng Xu
15
2
0
11 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
42
6
0
08 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
41
5
0
08 Jan 2024
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
48
5
0
21 Dec 2023
Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering
Zhangbin Li
Dan Guo
Jinxing Zhou
Jing Zhang
Meng Wang
32
11
0
20 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
30
5
0
14 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
29
13
0
02 Dec 2023
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Hanyuan Wang
Majid Mirmehdi
Dima Damen
Toby Perrett
57
2
0
28 Nov 2023
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
Yating Xu
Conghui Hu
Gim Hee Lee
22
2
0
14 Nov 2023
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
27
17
0
09 Nov 2023
Can CLIP Help Sound Source Localization?
Sooyoung Park
Arda Senocak
Joon Son Chung
35
7
0
07 Nov 2023
Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems
Jung-Woo Chang
Ke Sun
Nasimeh Heydaribeni
Seira Hidano
Xinyu Zhang
F. Koushanfar
AAML
17
1
0
01 Nov 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
Yuxin Ye
Wenming Yang
Yapeng Tian
34
10
0
31 Oct 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
33
9
0
25 Oct 2023
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
31
5
0
13 Oct 2023
Multimodal Variational Auto-encoder based Audio-Visual Segmentation
Yuxin Mao
Jing Zhang
Mochu Xiang
Yiran Zhong
Yuchao Dai
40
34
0
12 Oct 2023
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment
Jaewoo Lee
Jaehong Yoon
Wonjae Kim
Yunji Kim
Sung Ju Hwang
CLL
19
1
0
12 Oct 2023
Deep Video Inpainting Guided by Audio-Visual Self-Supervision
Kyuyeon Kim
Junsik Jung
Woo Jae Kim
Sung-eui Yoon
SSL
31
1
0
11 Oct 2023
CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing
Yaru Chen
Ruohao Guo
Xubo Liu
Peipei Wu
Guangyao Li
Zhenbo Li
Wenwu Wang
34
7
0
11 Oct 2023
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
Siting Li
Chenzhuang Du
Yue Zhao
Yu Huang
Hang Zhao
24
4
0
10 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu
Zhikang Dong
Peng Zhang
27
21
0
10 Oct 2023
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
Edward Fish
Jon Weinbren
Andrew Gilbert
36
0
0
05 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
32
207
0
03 Oct 2023
Enabling Resource-efficient AIoT System with Cross-level Optimization: A survey
Sicong Liu
Bin Guo
Cheng Fang
Ziqi Wang
Shiyan Luo
Zimu Zhou
Zhiwen Yu
AI4CE
37
22
0
27 Sep 2023
Sound Source Localization is All about Cross-Modal Alignment
Arda Senocak
H. Ryu
Junsik Kim
Tae-Hyun Oh
Hanspeter Pfister
Joon Son Chung
36
18
0
19 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
41
51
0
18 Sep 2023
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo
Weiguo Pian
Yapeng Tian
CLL
VLM
43
23
0
11 Sep 2023
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
27
2
0
07 Sep 2023
Audio-Visual Class-Incremental Learning
Weiguo Pian
Shentong Mo
Yunhui Guo
Yapeng Tian
CLL
VLM
33
28
0
21 Aug 2023
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions
Michael Joannou
P. Rotshtein
U. Noppeney
21
0
0
18 Aug 2023
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
14
5
0
17 Aug 2023
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Hong Li
Xingyu Li
Pengbo Hu
Yinuo Lei
Chunxiao Li
Yi Zhou
49
22
0
15 Aug 2023
Progressive Spatio-temporal Perception for Audio-Visual Question Answering
Guangyao Li
Wenxuan Hou
Di Hu
37
26
0
10 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
25
2
0
09 Aug 2023
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
30
5
0
01 Aug 2023
PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data
Zheng Zhang
Zheng Ning
Chenliang Xu
Yapeng Tian
Toby Jia-Jun Li
64
6
0
27 Jul 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
48
30
0
24 Jul 2023
Previous
1
2
3
4
5
6
Next