Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.08842
Cited By
Audio-Visual Event Localization in Unconstrained Videos
23 March 2018
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Audio-Visual Event Localization in Unconstrained Videos"
50 / 252 papers shown
Title
How to Listen? Rethinking Visual Sound Localization
Ho-Hsiang Wu
Magdalena Fuentes
Prem Seetharaman
J. P. Bello
ObjD
30
4
0
11 Apr 2022
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Ruohan Gao
Zilin Si
Yen-Yu Chang
Samuel Clarke
Jeannette Bohg
Li Fei-Fei
Wenzhen Yuan
Jiajun Wu
32
83
0
05 Apr 2022
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
33
44
0
01 Apr 2022
Investigating Modality Bias in Audio Visual Video Parsing
Piyush Singh Pasi
Shubham Nemani
Preethi Jyothi
Ganesh Ramakrishnan
13
4
0
31 Mar 2022
The Sound of Bounding-Boxes
Takashi Oya
Shohei Iwase
Shigeo Morishima
19
2
0
30 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
17
38
0
27 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
39
136
0
26 Mar 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
37
99
0
24 Mar 2022
Towards Inadequately Pre-trained Models in Transfer Learning
Andong Deng
Xingjian Li
Di Hu
Tianyang Wang
Haoyi Xiong
Chengzhong Xu
19
6
0
09 Mar 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
33
48
0
07 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
18
25
0
13 Feb 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak
Junsik Kim
Tae-Hyun Oh
H. Ryu
Dingzeyu Li
In So Kweon
27
1
0
12 Feb 2022
OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos
Merey Ramazanova
Victor Escorcia
Fabian Caba Heilbron
Chen Zhao
Guohao Li
28
3
0
10 Feb 2022
Learning Sound Localization Better From Semantically Similar Samples
Arda Senocak
H. Ryu
Junsik Kim
In So Kweon
SSL
6
33
0
07 Feb 2022
Multimodal data matters: language model pre-training over structured and unstructured electronic health records
Sicen Liu
Xiaolong Wang
Yongshuai Hou
Ge Li
Hui Wang
Huiqin Xu
Yang Xiang
Buzhou Tang
52
30
0
25 Jan 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
38
38
0
20 Jan 2022
Weakly Supervised Visual-Auditory Fixation Prediction with Multigranularity Perception
Guotao Wang
Chenglizhao Chen
Deng-Ping Fan
Aimin Hao
Hong Qin
28
2
0
27 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
24
41
0
22 Dec 2021
Decompose the Sounds and Pixels, Recompose the Events
Varshanth R. Rao
Md Ibrahim Khalil
Haoda Li
Peng Dai
Juwei Lu
27
5
0
21 Dec 2021
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
30
16
0
17 Dec 2021
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
Jiaqi Tang
Zhaoyang Liu
Chao Qian
Wayne Wu
Limin Wang
17
17
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
26
37
0
08 Dec 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
32
53
0
24 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Ziyang Chen
Xixi Hu
Andrew Owens
31
26
0
10 Nov 2021
Space-Time Memory Network for Sounding Object Localization in Videos
Sizhe Li
Yapeng Tian
Chenliang Xu
26
10
0
10 Nov 2021
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
29
5
0
05 Nov 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
22
33
0
19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
34
0
0
13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
278
1,026
0
13 Oct 2021
V-SlowFast Network for Efficient Visual Sound Separation
Lingyu Zhu
Esa Rahtu
52
10
0
18 Sep 2021
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
Hailong Ning
Bin Zhao
Zhanxuan Hu
Lang He
Ercheng Pei
32
10
0
17 Sep 2021
Audio-Visual Transformer Based Crowd Counting
Usman Sajid
Xiangyu Chen
Hasan Sajid
Taejoon Kim
Guanghui Wang
ViT
50
22
0
04 Sep 2021
Binaural Audio Generation via Multi-task Learning
Sijia Li
Shiguang Liu
Tianyi Zhou
15
12
0
02 Sep 2021
Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers
Nikita Dvornik
Isma Hadji
Konstantinos G. Derpanis
Animesh Garg
Allan D. Jepson
19
50
0
26 Aug 2021
Multi-Modulation Network for Audio-Visual Event Localization
Hao Wang
Zhengjun Zha
Liang Li
Xuejin Chen
Jiebo Luo
30
2
0
26 Aug 2021
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
63
36
0
06 Aug 2021
Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
Anurag Bagchi
Jazib Mahmood
Dolton Fernandes
Ravi Kiran Sarvadevabhatla
27
21
0
27 Jun 2021
Saying the Unseen: Video Descriptions via Dialog Agents
Ye Zhu
Yu Wu
Yi Yang
Yan Yan
22
6
0
26 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
27
11
0
12 Jun 2021
Dual Normalization Multitasking for Audio-Visual Sounding Object Localization
Tokuhiro Nishikawa
Daiki Shimada
Jerry Jun Yokono
15
0
0
01 Jun 2021
Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing
Jianning Wu
Zhuqing Jiang
S. Wen
Aidong Men
Haiying Wang
47
1
0
30 May 2021
Multi-target DoA Estimation with an Audio-visual Fusion Mechanism
Xinyuan Qian
Maulik C. Madhavi
Zexu Pan
Jiadong Wang
Haizhou Li
27
44
0
13 May 2021
Where and When: Space-Time Attention for Audio-Visual Explanations
Yanbei Chen
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
14
3
0
04 May 2021
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Yan-Bo Lin
Y. Wang
53
21
0
03 May 2021
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Yanbei Chen
Yongqin Xian
A. Sophia Koepke
Ying Shan
Zeynep Akata
80
82
0
22 Apr 2021
HCMS: Hierarchical and Conditional Modality Selection for Efficient Video Recognition
Zejia Weng
Zuxuan Wu
Hengduo Li
Jingjing Chen
Yu-Gang Jiang
32
4
0
20 Apr 2021
Visually Guided Sound Source Separation and Localization using Self-Supervised Motion Representations
Lingyu Zhu
Esa Rahtu
29
25
0
17 Apr 2021
Self-supervised object detection from audio-visual correspondence
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
29
46
0
13 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
55
0
13 Apr 2021
Previous
1
2
3
4
5
6
Next