ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.08567
  4. Cited By
Joint Learning of Visual-Audio Saliency Prediction and Sound Source
  Localization on Multi-face Videos

Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos

5 November 2021
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
    CVBM
ArXiv (abs)PDFHTML

Papers citing "Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos"

23 / 23 papers shown
Title
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak N. Araabi
451
0
0
14 Apr 2025
Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model
Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model
Yufan Liu
Minglang Qiao
Mai Xu
Bing Li
Weiming Hu
Ali Borji
CVBM
79
14
0
29 Mar 2021
Discriminative Sounding Objects Localization via Self-supervised
  Audiovisual Matching
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
66
135
0
12 Oct 2020
Active Speakers in Context
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
56
62
0
20 May 2020
Unified Image and Video Saliency Modeling
Unified Image and Video Saliency Modeling
Richard Droste
Jianbo Jiao
J. A. Noble
85
160
0
11 Mar 2020
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
64
55
0
20 Nov 2019
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for
  Video Saliency Detection
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
Kyle Min
Jason J. Corso
54
153
0
15 Aug 2019
Multi-Label Image Recognition with Graph Convolutional Networks
Multi-Label Image Recognition with Graph Convolutional Networks
Zhao-Min Chen
Xiu-Shen Wei
Peng Wang
Yanwen Guo
113
1,001
0
07 Apr 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
55
145
0
05 Jan 2019
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
98
753
0
10 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
536
0
09 Apr 2018
Learning to Localize Sound Source in Visual Scenes
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
66
345
0
10 Mar 2018
Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Wenguan Wang
Jianbing Shen
Fang Guo
Ming-Ming Cheng
Ali Borji
VLM
55
266
0
23 Jan 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjDVOS
113
530
0
18 Dec 2017
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in
  Video Classification
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
155
1,333
0
13 Dec 2017
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Kensho Hara
Hirokatsu Kataoka
Y. Satoh
3DPC
126
1,935
0
27 Nov 2017
Deep Visual Attention Prediction
Deep Visual Attention Prediction
Wenguan Wang
Jianbing Shen
MDE
78
587
0
07 May 2017
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Junting Pan
Cristian Canton Ferrer
Kevin McGuinness
Noel E. O'Connor
Jordi Torres
E. Sayrol
Xavier Giró-i-Nieto
GAN
84
398
0
04 Jan 2017
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive
  Model
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model
Marcella Cornia
Lorenzo Baraldi
G. Serra
Rita Cucchiara
90
550
0
29 Nov 2016
Joint Face Detection and Alignment using Multi-task Cascaded
  Convolutional Networks
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
Kaipeng Zhang
Zhanpeng Zhang
Zhifeng Li
Yu Qiao
CVBM
178
4,974
0
11 Apr 2016
Learning Deep Features for Discriminative Localization
Learning Deep Features for Discriminative Localization
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
SSLSSegFAtt
253
9,338
0
14 Dec 2015
Convolutional LSTM Network: A Machine Learning Approach for
  Precipitation Nowcasting
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Xingjian Shi
Zhourong Chen
Hao Wang
Dit-Yan Yeung
W. Wong
W. Woo
568
8,005
0
13 Jun 2015
FlowNet: Learning Optical Flow with Convolutional Networks
FlowNet: Learning Optical Flow with Convolutional Networks
Philipp Fischer
Alexey Dosovitskiy
Eddy Ilg
Philip Häusser
C. Hazirbas
Vladimir Golkov
Patrick van der Smagt
Daniel Cremers
Thomas Brox
3DPC
314
4,180
0
26 Apr 2015
1