ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.06651
  4. Cited By
Objects that Sound

Objects that Sound

18 December 2017
Relja Arandjelović
Andrew Zisserman
    ObjD
    VOS
ArXivPDFHTML

Papers citing "Objects that Sound"

36 / 136 papers shown
Title
Audio-Visual Event Localization via Recursive Fusion by Joint
  Co-Attention
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
33
59
0
14 Aug 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised
  Audio-Visual Representation Learning
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
30
106
0
13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
252
0
10 Aug 2020
Multiple Sound Sources Localization from Coarse to Fine
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
28
155
0
13 Jul 2020
Self-Supervised MultiModal Versatile Networks
Self-Supervised MultiModal Versatile Networks
Jean-Baptiste Alayrac
Adrià Recasens
R. Schneider
Relja Arandjelović
Jason Ramapuram
J. Fauw
Lucas Smaira
Sander Dieleman
Andrew Zisserman
SSL
40
371
0
29 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
24
75
0
11 Jun 2020
Visually Guided Sound Source Separation using Cascaded Opponent Filter
  Network
Visually Guided Sound Source Separation using Cascaded Opponent Filter Network
Lingyu Zhu
Esa Rahtu
22
23
0
04 Jun 2020
What Makes for Good Views for Contrastive Learning?
What Makes for Good Views for Contrastive Learning?
Yonglong Tian
Chen Sun
Ben Poole
Dilip Krishnan
Cordelia Schmid
Phillip Isola
SSL
39
1,305
0
20 May 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
83
0
04 May 2020
Conditioned Source Separation for Music Instrument Performances
Conditioned Source Separation for Music Instrument Performances
Olga Slizovskaia
G. Haro
E. Gómez
30
38
0
08 Apr 2020
Learning word-referent mappings and concepts from raw inputs
Learning word-referent mappings and concepts from raw inputs
Wai Keen Vong
Brenden Lake
17
5
0
12 Mar 2020
Evolving Losses for Unsupervised Video Representation Learning
Evolving Losses for Unsupervised Video Representation Learning
A. Piergiovanni
A. Angelova
Michael S. Ryoo
SSL
24
138
0
26 Feb 2020
Audiovisual SlowFast Networks for Video Recognition
Audiovisual SlowFast Networks for Video Recognition
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
197
206
0
23 Jan 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
STAViS: Spatio-Temporal AudioVisual Saliency Network
STAViS: Spatio-Temporal AudioVisual Saliency Network
A. Tsiami
Petros Koutras
Petros Maragos
24
73
0
09 Jan 2020
Listen to Look: Action Recognition by Previewing Audio
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Learning to Localize Sound Sources in Visual Scenes: Analysis and
  Applications
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
33
52
0
20 Nov 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
27
88
0
24 Oct 2019
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual
  Zeroshot Classification and Retrieval of Videos
Coordinated Joint Multimodal Embeddings for Generalized Audio-Visual Zeroshot Classification and Retrieval of Videos
Kranti K. Parida
Neeraj Matiyali
T. Guha
Gaurav Sharma
VLM
27
41
0
19 Oct 2019
Learning to Have an Ear for Face Super-Resolution
Learning to Have an Ear for Face Super-Resolution
Givi Meishvili
Simon Jenni
Paolo Favaro
SupR
CVBM
33
23
0
27 Sep 2019
Recursive Visual Sound Separation Using Minus-Plus Net
Recursive Visual Sound Separation Using Minus-Plus Net
Xudong Xu
Bo Dai
Dahua Lin
32
91
0
30 Aug 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Multi-task Self-Supervised Learning for Human Activity Detection
Multi-task Self-Supervised Learning for Human Activity Detection
Aaqib Saeed
T. Ozcelebi
J. Lukkien
SSL
21
268
0
27 Jul 2019
Learning Representations by Maximizing Mutual Information Across Views
Learning Representations by Maximizing Mutual Information Across Views
Philip Bachman
R. Devon Hjelm
William Buchwalter
SSL
67
1,452
0
03 Jun 2019
Data-Efficient Image Recognition with Contrastive Predictive Coding
Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff
A. Srinivas
J. Fauw
Ali Razavi
Carl Doersch
S. M. Ali Eslami
Aaron van den Oord
SSL
58
1,415
0
22 May 2019
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Scaling and Benchmarking Self-Supervised Visual Representation Learning
Priya Goyal
D. Mahajan
Abhinav Gupta
Ishan Misra
SSL
24
396
0
03 May 2019
Audio-Visual Model Distillation Using Acoustic Images
Audio-Visual Model Distillation Using Acoustic Images
Andrés F. Pérez
Valentina Sanguineti
Pietro Morerio
Vittorio Murino
VLM
15
27
0
16 Apr 2019
Co-Separating Sounds of Visual Objects
Co-Separating Sounds of Visual Objects
Ruohan Gao
Kristen Grauman
27
205
0
16 Apr 2019
The Sound of Motions
The Sound of Motions
Hang Zhao
Chuang Gan
Wei-Chiu Ma
Antonio Torralba
17
250
0
11 Apr 2019
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
26
140
0
02 May 2018
Weakly Supervised Representation Learning for Unsynchronized
  Audio-Visual Events
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events
Sanjeel Parekh
S. Essid
A. Ozerov
Ngoc Q. K. Duong
P. Pérez
G. Richard
SSL
8
19
0
19 Apr 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
51
743
0
10 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
527
0
09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
31
425
0
23 Mar 2018
Previous
123