Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.08567
Cited By
Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos
5 November 2021
Minglang Qiao
Yufan Liu
Mai Xu
Xin Deng
Bing Li
Weiming Hu
Ali Borji
CVBM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Joint Learning of Visual-Audio Saliency Prediction and Sound Source Localization on Multi-face Videos"
23 / 23 papers shown
Title
DTFSal: Audio-Visual Dynamic Token Fusion for Video Saliency Prediction
Kiana Hoshanfar
Alireza Hosseini
Ahmad Kalhor
Babak N. Araabi
451
0
0
14 Apr 2025
Learning to Predict Salient Faces: A Novel Visual-Audio Saliency Model
Yufan Liu
Minglang Qiao
Mai Xu
Bing Li
Weiming Hu
Ali Borji
CVBM
79
14
0
29 Mar 2021
Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching
Di Hu
Rui Qian
Minyue Jiang
Xiao Tan
Shilei Wen
Errui Ding
Weiyao Lin
Dejing Dou
66
135
0
12 Oct 2020
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
56
62
0
20 May 2020
Unified Image and Video Saliency Modeling
Richard Droste
Jianbo Jiao
J. A. Noble
85
160
0
11 Mar 2020
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
64
55
0
20 Nov 2019
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection
Kyle Min
Jason J. Corso
54
153
0
15 Aug 2019
Multi-Label Image Recognition with Graph Convolutional Networks
Zhao-Min Chen
Xiu-Shen Wei
Peng Wang
Yanwen Guo
113
1,001
0
07 Apr 2019
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
55
145
0
05 Jan 2019
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
98
753
0
10 Apr 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
102
536
0
09 Apr 2018
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
66
345
0
10 Mar 2018
Revisiting Video Saliency: A Large-scale Benchmark and a New Model
Wenguan Wang
Jianbing Shen
Fang Guo
Ming-Ming Cheng
Ali Borji
VLM
55
266
0
23 Jan 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
113
530
0
18 Dec 2017
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Saining Xie
Chen Sun
Jonathan Huang
Zhuowen Tu
Kevin Patrick Murphy
3DH
155
1,333
0
13 Dec 2017
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Kensho Hara
Hirokatsu Kataoka
Y. Satoh
3DPC
126
1,935
0
27 Nov 2017
Deep Visual Attention Prediction
Wenguan Wang
Jianbing Shen
MDE
78
587
0
07 May 2017
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Junting Pan
Cristian Canton Ferrer
Kevin McGuinness
Noel E. O'Connor
Jordi Torres
E. Sayrol
Xavier Giró-i-Nieto
GAN
84
398
0
04 Jan 2017
Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model
Marcella Cornia
Lorenzo Baraldi
G. Serra
Rita Cucchiara
90
550
0
29 Nov 2016
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks
Kaipeng Zhang
Zhanpeng Zhang
Zhifeng Li
Yu Qiao
CVBM
178
4,974
0
11 Apr 2016
Learning Deep Features for Discriminative Localization
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
SSL
SSeg
FAtt
253
9,338
0
14 Dec 2015
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
Xingjian Shi
Zhourong Chen
Hao Wang
Dit-Yan Yeung
W. Wong
W. Woo
568
8,005
0
13 Jun 2015
FlowNet: Learning Optical Flow with Convolutional Networks
Philipp Fischer
Alexey Dosovitskiy
Eddy Ilg
Philip Häusser
C. Hazirbas
Vladimir Golkov
Patrick van der Smagt
Daniel Cremers
Thomas Brox
3DPC
314
4,180
0
26 Apr 2015
1