Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.08842
Cited By
Audio-Visual Event Localization in Unconstrained Videos
23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Audio-Visual Event Localization in Unconstrained Videos"
50 / 121 papers shown
Title
Audio-Adaptive Activity Recognition Across Video Domains
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
71
42
0
27 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
131
153
0
26 Mar 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
108
105
0
24 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
62
26
0
13 Feb 2022
Multimodal data matters: language model pre-training over structured and unstructured electronic health records
Sicen Liu
Xiaolong Wang
Yongshuai Hou
Ge Li
Hui Wang
Huiqin Xu
Yang Xiang
Buzhou Tang
115
34
0
25 Jan 2022
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
72
42
0
22 Dec 2021
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
77
16
0
17 Dec 2021
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
Jiaqi Tang
Zhaoyang Liu
Chao Qian
Wayne Wu
Limin Wang
96
18
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
119
40
0
08 Dec 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
99
61
0
24 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
84
27
0
21 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Ziyang Chen
Xixi Hu
Andrew Owens
119
26
0
10 Nov 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
116
35
0
19 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
422
1,115
0
13 Oct 2021
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
116
10
0
18 Sep 2021
Audio-Visual Transformer Based Crowd Counting
Usman Sajid
Xiangyu Chen
Hasan Sajid
Taejoon Kim
Guanghui Wang
ViT
98
22
0
04 Sep 2021
Binaural Audio Generation via Multi-task Learning
Sijia Li
Shiguang Liu
Tianyi Zhou
51
12
0
02 Sep 2021
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
120
36
0
06 Aug 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
149
11
0
12 Jun 2021
Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing
Jianning Wu
Zhuqing Jiang
S. Wen
Aidong Men
Haiying Wang
84
1
0
30 May 2021
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
Xinyuan Qian
Maulik C. Madhavi
Zexu Pan
Jiadong Wang
Haizhou Li
79
44
0
13 May 2021
Where and When: Space-Time Attention for Audio-Visual Explanations
Yanbei Chen
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
50
3
0
04 May 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
147
83
0
22 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Jingjing Chen
Yu-Gang Jiang
80
4
0
20 Apr 2021
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Lingyu Zhu
Esa Rahtu
81
27
0
17 Apr 2021
Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
110
47
0
13 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
49
59
0
13 Apr 2021
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
90
191
0
06 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
90
88
0
05 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
90
36
0
01 Apr 2021
Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Liang Zheng
Yiran Zhong
Shijie Hao
Meng Wang
120
104
0
01 Apr 2021
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
115
21
0
05 Feb 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
167
47
0
26 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
247
202
0
08 Jan 2021
Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements
Kun Su
Xiulong Liu
Eli Shlizerman
91
29
0
07 Dec 2020
iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering
Aman Chadha
Gurneet Arora
Navpreet Kaloty
66
37
0
16 Nov 2020
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis
Scott Wisdom
A. Jansen
Shawn Hershey
Tal Remez
D. Ellis
J. Hershey
83
71
0
02 Nov 2020
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
80
137
0
12 Oct 2020
AVECL-UMONS database for audio-visual event classification and localization
Mathilde Brousmiche
Stéphane Dupont
Jean Rouat
37
2
0
02 Oct 2020
Learning to Set Waypoints for Audio-Visual Navigation
Changan Chen
Sagnik Majumder
Ziad Al-Halah
Ruohan Gao
Santhosh Kumar Ramakrishnan
Kristen Grauman
SSL
94
5
0
21 Aug 2020
Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
82
13
0
18 Aug 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
96
61
0
14 Aug 2020
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
75
139
0
21 Jul 2020
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing
Yapeng Tian
Dingzeyu Li
Chenliang Xu
127
185
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
80
84
0
20 Jul 2020
Talking-head Generation with Rhythmic Head Motion
Lele Chen
Guofeng Cui
Celong Liu
Zhong Li
Ziyi Kou
Yi Tian Xu
Chenliang Xu
77
183
0
16 Jul 2020
Leveraging Category Information for Single-Frame Visual Sound Source Separation
Lingyu Zhu
Esa Rahtu
74
9
0
15 Jul 2020
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
69
157
0
13 Jul 2020
Do We Need Sound for Sound Source Localization?
Takashi Oya
Shohei Iwase
Ryota Natsume
Takahiro Itazuri
Shugo Yamaguchi
Shigeo Morishima
41
22
0
11 Jul 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
124
152
0
24 Jun 2020
Previous
1
2
3
Next