ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.10558
  4. Cited By
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video
  Parsing

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing

21 July 2020
Yapeng Tian
Dingzeyu Li
Chenliang Xu
ArXivPDFHTML

Papers citing "Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing"

29 / 129 papers shown
Title
A Comprehensive Survey on Video Saliency Detection with Auditory
  Information: the Audio-visual Consistency Perceptual is the Key!
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Chenglizhao Chen
Mengke Song
Wenfeng Song
Li Guo
Muwei Jian
35
25
0
20 Jun 2022
Investigating Modality Bias in Audio Visual Video Parsing
Investigating Modality Bias in Audio Visual Video Parsing
Piyush Singh Pasi
Shubham Nemani
P. Jyothi
Ganesh Ramakrishnan
11
4
0
31 Mar 2022
Audio-Adaptive Activity Recognition Across Video Domains
Audio-Adaptive Activity Recognition Across Video Domains
Yun C. Zhang
Hazel Doughty
Ling Shao
Cees G. M. Snoek
17
38
0
27 Mar 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
29
136
0
26 Mar 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture
  Generation
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
34
99
0
24 Mar 2022
Localizing Visual Sounds the Easy Way
Localizing Visual Sounds the Easy Way
Shentong Mo
Pedro Morgado
24
78
0
17 Mar 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention
  and Language
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
30
48
0
07 Mar 2022
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Audio-Visual Fusion Layers for Event Type Aware Video Recognition
Arda Senocak
Junsik Kim
Tae-Hyun Oh
H. Ryu
Dingzeyu Li
In So Kweon
21
1
0
12 Feb 2022
Audio-Visual Synchronisation in the wild
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
23
37
0
08 Dec 2021
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual
  Event Localization and Video Parsing
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
Jiashuo Yu
Ying Cheng
Ruiwei Zhao
Rui Feng
Yuejie Zhang
26
53
0
24 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
  Video
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Space-Time Memory Network for Sounding Object Localization in Videos
Space-Time Memory Network for Sounding Object Localization in Videos
Sizhe Li
Yapeng Tian
Chenliang Xu
26
10
0
10 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
29
8
0
26 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in
  First Person Action Recognition
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
19
33
0
19 Oct 2021
Rethinking the constraints of multimodal fusion: case study in
  Weakly-Supervised Audio-Visual Video Parsing
Rethinking the constraints of multimodal fusion: case study in Weakly-Supervised Audio-Visual Video Parsing
Jianning Wu
Zhuqing Jiang
S. Wen
Aidong Men
Haiying Wang
36
1
0
30 May 2021
Where and When: Space-Time Attention for Audio-Visual Explanations
Where and When: Space-Time Attention for Audio-Visual Explanations
Yanbei Chen
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
6
3
0
04 May 2021
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial
  Audio Generation
Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Yan-Bo Lin
Y. Wang
48
21
0
03 May 2021
Visually Informed Binaural Audio Generation without Binaural Audios
Visually Informed Binaural Audio Generation without Binaural Audios
Xudong Xu
Hang Zhou
Ziwei Liu
Bo Dai
Xiaogang Wang
Dahua Lin
DiffM
13
53
0
13 Apr 2021
Localizing Visual Sounds the Hard Way
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
13
184
0
06 Apr 2021
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound
  Separation
Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Yapeng Tian
Di Hu
Chenliang Xu
ObjD
18
86
0
05 Apr 2021
Can audio-visual integration strengthen robustness under multimodal
  attacks?
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
31
37
0
05 Apr 2021
Cross-Modal learning for Audio-Visual Video Parsing
Cross-Modal learning for Audio-Visual Video Parsing
Jatin Lamba
Abhishek
Jayaprakash Akula
Rishabh Dabral
P. Jyothi
Ganesh Ramakrishnan
13
7
0
03 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
27
34
0
01 Apr 2021
Positive Sample Propagation along the Audio-Visual Event Line
Positive Sample Propagation along the Audio-Visual Event Line
Jinxing Zhou
Liang Zheng
Yiran Zhong
Shijie Hao
Meng Wang
22
99
0
01 Apr 2021
Parameter Efficient Multimodal Transformers for Video Representation
  Learning
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
29
76
0
08 Dec 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating
  Source Separation
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
29
80
0
20 Jul 2020
Cross modal video representations for weakly supervised active speaker
  localization
Cross modal video representations for weakly supervised active speaker localization
Rahul Sharma
Krishna Somandepalli
Shrikanth Narayanan
9
8
0
09 Mar 2020
Gaussian Temporal Awareness Networks for Action Localization
Gaussian Temporal Awareness Networks for Action Localization
Fuchen Long
Ting Yao
Zhaofan Qiu
Xinmei Tian
Jiebo Luo
Tao Mei
148
319
0
09 Sep 2019
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation
Tianwei Lin
Xu Zhao
Haisheng Su
Chongjing Wang
Ming Yang
139
700
0
08 Jun 2018
Previous
123