ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXiv (abs)PDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 121 papers shown
Title
Action Dubber: Timing Audible Actions via Inflectional Flow
Action Dubber: Timing Audible Actions via Inflectional Flow
Wenlong Wan
Weiying Zheng
Tianyi Xiang
Guiqing Li
Shengfeng He
29
0
0
16 Jun 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
Lidong Lu
Guo Chen
Z. Li
Yicheng Liu
Tong Lu
VLMLRM
101
0
0
05 Jun 2025
MokA: Multimodal Low-Rank Adaptation for MLLMs
Yake Wei
Yu Miao
Dongzhan Zhou
Di Hu
99
0
0
05 Jun 2025
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection
Peng Wu
Wanshun Su
Guansong Pang
Yujia Sun
Qingsen Yan
Peng Wang
Yize Zhang
VLM
106
1
0
06 Apr 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
74
0
0
17 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
131
1
0
12 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
187
3
0
10 Jan 2025
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
SoundLoc3D: Invisible 3D Sound Source Localization and Classification Using a Multimodal RGB-D Acoustic Camera
Yuhang He
Sangyun Shin
Anoop Cherian
Niki Trigoni
Andrew Markham
114
0
0
31 Dec 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
127
5
0
18 Nov 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
136
5
0
14 Oct 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
161
2
0
12 Sep 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
133
2
0
02 Jul 2024
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
111
4
0
10 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
141
6
0
06 Jun 2024
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Progressive Confident Masking Attention Network for Audio-Visual Segmentation
Yuxuan Wang
Feng Dong
Jinchao Zhu
Shuyue Zhu
VOS
161
0
0
04 Jun 2024
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly
  Supervised Audio-Visual Video Parsing
CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing
Faegheh Sardari
A. Mustafa
Philip J. B. Jackson
Adrian Hilton
99
4
0
17 May 2024
Multimodal Action Quality Assessment
Multimodal Action Quality Assessment
Ling-an Zeng
Wei-Shi Zheng
113
16
0
31 Jan 2024
Object-aware Adaptive-Positivity Learning for Audio-Visual Question
  Answering
Object-aware Adaptive-Positivity Learning for Audio-Visual Question Answering
Zhangbin Li
Dan Guo
Jinxing Zhou
Jing Zhang
Meng Wang
106
15
0
20 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense
  Interactions through Masked Modeling
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
80
14
0
02 Dec 2023
Bridging High-Quality Audio and Video via Language for Sound Effects
  Retrieval from Visual Queries
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
55
5
0
17 Aug 2023
Induction Network: Audio-Visual Modality Gap-Bridging for
  Self-Supervised Sound Source Localization
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization
Tianyu Liu
Peng Zhang
Wei Huang
Yufei Zha
Tao You
Yanni Zhang
SSL
67
2
0
09 Aug 2023
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using
  Transformers
MAiVAR-T: Multimodal Audio-image and Video Action Recognizer using Transformers
Muhammad Bilal Shaikh
Douglas Chai
Syed Mohammed Shamsul Islam
Naveed Akhtar
112
5
0
01 Aug 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
74
22
0
30 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
102
100
0
14 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOSViT
72
1
0
12 May 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
81
35
0
22 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
89
14
0
04 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram
  Transformers
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
49
8
0
28 Feb 2023
Context Understanding in Computer Vision: A Survey
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
98
52
0
10 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
61
1
0
07 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
97
43
0
01 Feb 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
116
78
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
118
45
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
123
29
0
07 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
127
0
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
96
52
0
28 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural
  Representation
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
56
3
0
21 Nov 2022
The Lean Data Scientist: Recent Advances towards Overcoming the Data
  Bottleneck
The Lean Data Scientist: Recent Advances towards Overcoming the Data Bottleneck
Chen Shani
Jonathan Zarecki
Dafna Shahaf
41
6
0
15 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Yining Qi
Junxiao Wang
Song Guo
74
72
0
14 Nov 2022
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal
  Retrieval
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
123
4
0
07 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source
  Separation
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
85
12
0
29 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio
  Visual Event Localization
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Tanvir Mahmud
Diana Marculescu
CLIP
83
34
0
11 Oct 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
103
10
0
21 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
100
27
0
20 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
70
5
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jing Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
93
116
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
107
28
0
20 Jun 2022
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Ruohan Gao
Zilin Si
Yen-Yu Chang
Samuel Clarke
Jeannette Bohg
Li Fei-Fei
Wenzhen Yuan
Jiajun Wu
82
90
0
05 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
116
46
0
01 Apr 2022
The Sound of Bounding-Boxes
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
45
2
0
30 Mar 2022
123
Next