ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.08842
  4. Cited By
Audio-Visual Event Localization in Unconstrained Videos

Audio-Visual Event Localization in Unconstrained Videos

23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
ArXivPDFHTML

Papers citing "Audio-Visual Event Localization in Unconstrained Videos"

50 / 252 papers shown
Title
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event
  Localization
Temporal Label-Refinement for Weakly-Supervised Audio-Visual Event Localization
K. Ramakrishnan
15
0
0
12 Jul 2023
FTFDNet: Learning to Detect Talking Face Video Manipulation with
  Tri-Modality Interaction
FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction
Gang Wang
Peng Zhang
Jun Xiong
Fei Yang
Wei Huang
Yufei Zha
CVBM
25
1
0
08 Jul 2023
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised
  Audio-Visual Video Parsing
Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing
Jie Fu
Junyu Gao
Changsheng Xu
39
6
0
05 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
37
46
0
03 Jul 2023
Sonicverse: A Multisensory Simulation Platform for Embodied Household
  Agents that See and Hear
Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
Ruohan Gao
Hao Li
Gokul Dharan
Zhuzhu Wang
Chengshu Li
Fei Xia
Silvio Savarese
Li Fei-Fei
Jiajun Wu
39
11
0
01 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language
  Perspective
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
34
8
0
01 Jun 2023
A Unified Audio-Visual Learning Framework for Localization, Separation,
  and Recognition
A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Shentong Mo
Pedro Morgado
38
21
0
30 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
26
10
0
27 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
20
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
30
22
0
22 May 2023
Target-Aware Spatio-Temporal Reasoning via Answering Questions in
  Dynamics Audio-Visual Scenarios
Target-Aware Spatio-Temporal Reasoning via Answering Questions in Dynamics Audio-Visual Scenarios
Yuanyuan Jiang
Jianqin Yin
24
7
0
21 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
45
90
0
14 May 2023
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Transavs: End-To-End Audio-Visual Segmentation With Transformer
Yuhang Ling
Yuxi Li
Zhenye Gan
Jiangning Zhang
M. Chi
Yabiao Wang
VOS
ViT
37
1
0
12 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
32
8
0
06 May 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and
  Segmentation
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
90
49
0
03 May 2023
Audio-Visual Grouping Network for Sound Localization from Mixtures
Audio-Visual Grouping Network for Sound Localization from Mixtures
Shentong Mo
Yapeng Tian
45
42
0
29 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGen
AI4CE
35
20
0
29 Mar 2023
Egocentric Audio-Visual Object Localization
Egocentric Audio-Visual Object Localization
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
EgoV
29
30
0
23 Mar 2023
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale
  Benchmark and Baseline
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Tiantian Geng
Teng Wang
Jinming Duan
Runmin Cong
Feng Zheng
30
28
0
22 Mar 2023
Learning Audio-Visual Source Localization via False Negative Aware
  Contrastive Learning
Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning
Weixuan Sun
Jiayi Zhang
Jianyuan Wang
Zheyuan Liu
Yiran Zhong
Tianpeng Feng
Yandong Guo
Yanhao Zhang
Nick Barnes
SSL
27
44
0
20 Mar 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
44
13
0
04 Mar 2023
Adapter Incremental Continual Learning of Efficient Audio Spectrogram
  Transformers
Adapter Incremental Continual Learning of Efficient Audio Spectrogram Transformers
Nithish Muthuchamy Selvaraj
Xiaobao Guo
A. Kong
Bingquan Shen
Alex C. Kot
CLL
25
8
0
28 Feb 2023
Context Understanding in Computer Vision: A Survey
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
27
46
0
10 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
39
1
0
07 Feb 2023
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene
  Synthesis
AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis
Susan Liang
Chao Huang
Yapeng Tian
Anurag Kumar
Chenliang Xu
VGen
40
30
0
04 Feb 2023
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Epic-Sounds: A Large-scale Dataset of Actions That Sound
Jaesung Huh
Jacob Chalk
Evangelos Kazakos
Dima Damen
Andrew Zisserman
EgoV
24
41
0
01 Feb 2023
Audio-Visual Segmentation with Semantics
Audio-Visual Segmentation with Semantics
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
61
37
0
30 Jan 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
34
73
0
15 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
39
43
0
09 Dec 2022
iQuery: Instruments as Queries for Audio-Visual Sound Separation
iQuery: Instruments as Queries for Audio-Visual Sound Separation
Jiaben Chen
Renrui Zhang
Dongze Lian
Jiaqi Yang
Ziyao Zeng
Jianbo Shi
34
27
0
07 Dec 2022
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Day2Dark: Pseudo-Supervised Activity Recognition beyond Silent Daylight
Yunhua Zhang
Hazel Doughty
Cees G. M. Snoek
VLM
50
0
0
05 Dec 2022
Mix and Localize: Localizing Sound Sources in Mixtures
Mix and Localize: Localizing Sound Sources in Mixtures
Xixi Hu
Ziyang Chen
Andrew Owens
33
51
0
28 Nov 2022
LISA: Localized Image Stylization with Audio via Implicit Neural
  Representation
LISA: Localized Image Stylization with Audio via Implicit Neural Representation
Seung Hyun Lee
Chanyoung Kim
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
35
3
0
21 Nov 2022
Contrastive Positive Sample Propagation along the Audio-Visual Event
  Line
Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Dan Guo
Meng Wang
32
53
0
18 Nov 2022
The Lean Data Scientist: Recent Advances towards Overcoming the Data
  Bottleneck
The Lean Data Scientist: Recent Advances towards Overcoming the Data Bottleneck
Chen Shani
Jonathan Zarecki
Dafna Shahaf
29
6
0
15 Nov 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Yining Qi
Junxiao Wang
Song Guo
32
62
0
14 Nov 2022
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal
  Retrieval
Complete Cross-triplet Loss in Label Space for Audio-visual Cross-modal Retrieval
Donghuo Zeng
Yanan Wang
Jianming Wu
K. Ikeda
27
4
0
07 Nov 2022
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source
  Separation
Learning Audio-Visual Dynamics Using Scene Graphs for Audio Source Separation
Moitreya Chatterjee
Narendra Ahuja
A. Cherian
38
12
0
29 Oct 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using
  Permutation-Free Loss Function
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
Qing Wang
Hang Chen
Yannan Jiang
Zhe Wang
Yuyang Wang
Jun Du
Chin-Hui Lee
19
4
0
26 Oct 2022
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio
  Visual Event Localization
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Tanvir Mahmud
Diana Marculescu
CLIP
19
31
0
11 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
35
16
0
05 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
18
62
0
07 Sep 2022
Learning in Audio-visual Context: A Review, Analysis, and New
  Perspective
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
46
55
0
20 Aug 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge Belongie
31
10
0
21 Jul 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
Temporal and cross-modal attention for audio-visual zero-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
38
25
0
20 Jul 2022
Online Video Instance Segmentation via Robust Context Fusion
Online Video Instance Segmentation via Robust Context Fusion
Xiang Li
Jinglu Wang
Xiaohao Xu
Bhiksha Raj
Yan Lu
43
5
0
12 Jul 2022
Audio-Visual Segmentation
Audio-Visual Segmentation
Jinxing Zhou
Jianyuan Wang
Jingyang Zhang
Weixuan Sun
Jing Zhang
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
33
109
0
11 Jul 2022
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
40
26
0
20 Jun 2022
Discrete Contrastive Diffusion for Cross-Modal Music and Image
  Generation
Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation
Ye Zhu
Yuehua Wu
Kyle Olszewski
Jian Ren
Sergey Tulyakov
Yan Yan
DiffM
30
47
0
15 Jun 2022
Past and Future Motion Guided Network for Audio Visual Event
  Localization
Past and Future Motion Guided Network for Audio Visual Event Localization
Ting-Yen Chen
Jianqin Yin
Jin Tang
26
2
0
08 May 2022
Previous
123456
Next