Objects that Sound

18 December 2017

Papers citing "Objects that Sound"

50 / 136 papers shown

Title
Weakly-Supervised Action Detection Guided by Audio Narration Keren Ye Adriana Kovashka 30 0 0 12 May 2022
Self-supervised Contrastive Learning for Audio-Visual Action Recognition Yang Liu Y. Tan Haoyu Lan SSL 47 6 0 28 Apr 2022
Sound Localization by Self-Supervised Time Delay Estimation Ziyang Chen David Fouhey Andrew Owens SSL 24 19 0 26 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Joey Tianyi Zhou Gedas Bertasius 43 39 0 06 Apr 2022
The Sound of Bounding-Boxes Takashi Oya Shohei Iwase Shigeo Morishima 19 2 0 30 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos Saghir Alfasly Jian Lu C. Xu Yuru Zou 39 18 0 06 Mar 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 35 106 0 02 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing Xian Liu Rui Qian Hang Zhou Di Hu Weiyao Lin Ziwei Liu Bolei Zhou Xiaowei Zhou 18 25 0 13 Feb 2022
Self-Supervised Moving Vehicle Detection from Audio-Visual Cues Jannik Zürn Wolfram Burgard SSL 34 8 0 30 Jan 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection A. Haliassos Rodrigo Mira Stavros Petridis M. Pantic CVBM 40 126 0 18 Jan 2022
Progressive Video Summarization via Multimodal Self-supervised Learning Haopeng Li Qiuhong Ke Mingming Gong Tom Drummond AI4TS 39 18 0 07 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks A. Vasudevan Dengxin Dai Luc Van Gool SSL 33 6 0 04 Jan 2022
Cross Modal Retrieval with Querybank Normalisation Simion-Vlad Bogolin Ioana Croitoru Hailin Jin Yang Liu Samuel Albanie 27 84 0 23 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence Di Hu Yake Wei Rui Qian Weiyao Lin Ruihua Song Ji-Rong Wen 24 41 0 22 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading Leyuan Qu C. Weber S. Wermter 38 23 0 09 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval Nina Shvetsova Brian Chen Andrew Rouditchenko Samuel Thomas Brian Kingsbury Rogerio Feris David Harwath James R. Glass Hilde Kuehne ViT 34 129 0 08 Dec 2021
ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints Srijan Das Michael S. Ryoo SSL 37 17 0 07 Dec 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video Rishabh Garg Ruohan Gao Kristen Grauman 15 28 0 21 Nov 2021
Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 36 20 0 15 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 29 8 0 26 Oct 2021
Constrained Mean Shift for Representation Learning Ajinkya Tejankar Soroush Abbasi Koohpayegani Hamed Pirsiavash SSL 36 0 0 19 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 34 0 0 13 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 238 1,024 0 13 Oct 2021
$Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$ Videos$ Pano-AVQA: Grounded Audio-Visual Question Answering on 360 $^\circ$ Videos Heeseung Yun Youngjae Yu Wonsuk Yang Kangil Lee Gunhee Kim 25 78 0 11 Oct 2021
How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors Satoshi Tsutsui Ruta Desai Karl Ridgeway SSL 41 4 0 04 Oct 2021
Audio-to-Image Cross-Modal Generation Maciej Żelaszczyk Jacek Mańdziuk DiffM 53 15 0 27 Sep 2021
Visual Scene Graphs for Audio Source Separation Moitreya Chatterjee Jonathan Le Roux Narendra Ahuja A. Cherian 20 36 0 24 Sep 2021
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction Hailong Ning Bin Zhao Zhanxuan Hu Lang He Ercheng Pei 32 10 0 17 Sep 2021
Parsing Birdsong with Deep Audio Embeddings Irina Tolkova Brian Chu Marcel Hedman Stefan Kahl Holger Klinck 36 10 0 20 Aug 2021
Attention Bottlenecks for Multimodal Fusion Arsha Nagrani Shan Yang Anurag Arnab A. Jansen Cordelia Schmid Chen Sun 25 543 0 30 Jun 2021
Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design Yue Cao Payel Das Vijil Chenthamarakshan Pin-Yu Chen Igor Melnyk Yang Shen 24 45 0 24 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition Mathilde Brousmiche Jean Rouat Stéphane Dupont 27 11 0 12 Jun 2021
Unsupervised Discriminative Learning of Sounds for Audio Event Classification Sascha Hornauer Ke Li Stella X. Yu Shabnam Ghaffarzadegan Liu Ren SSL 26 5 0 19 May 2021
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning Christoph Feichtenhofer Haoqi Fan Bo Xiong Ross B. Girshick Kaiming He SSL AI4TS 39 257 0 29 Apr 2021
Visually Informed Binaural Audio Generation without Binaural Audios Xudong Xu Hang Zhou Ziwei Liu Bo Dai Xiaogang Wang Dahua Lin DiffM 13 53 0 13 Apr 2021
Can audio-visual integration strengthen robustness under multimodal attacks? Yapeng Tian Chenliang Xu AAML 31 37 0 05 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning Yan-Bo Lin Hung-Yu Tseng Hsin-Ying Lee Yen-Yu Lin Ming-Hsuan Yang SSL 24 34 0 01 Apr 2021
Broaden Your Views for Self-Supervised Video Learning Adrià Recasens Pauline Luc Jean-Baptiste Alayrac Luyu Wang Ross Hemsley ... Florent Altché M. Valko Jean-Bastien Grill Aaron van den Oord Andrew Zisserman SSL AI4TS 33 127 0 30 Mar 2021
Read and Attend: Temporal Localisation in Sign Language Videos Gül Varol Liliane Momeni Samuel Albanie Triantafyllos Afouras Andrew Zisserman SLR 24 40 0 30 Mar 2021
Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning Zheda Mai Ruiwen Li Hyunwoo J. Kim Scott Sanner CLL 23 181 0 22 Mar 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning Mandela Patrick Yuki M. Asano Bernie Huang Ishan Misra Florian Metze Joao Henriques Andrea Vedaldi AI4TS 29 33 0 18 Mar 2021
Beyond Image to Depth: Improving Depth Prediction using Echoes Kranti K. Parida Siddharth Srivastava Gaurav Sharma MDE 45 37 0 15 Mar 2021
Perceiver: General Perception with Iterative Attention Andrew Jaegle Felix Gimeno Andrew Brock Andrew Zisserman Oriol Vinyals João Carreira VLM ViT MDE 82 973 0 04 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde Juana Valeria Hurtado Abhinav Valada 26 72 0 01 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal Generation Ye Zhu Yu Wu Hugo Latapie Yi Yang Yan Yan SSL 29 20 0 05 Feb 2021
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction Samyak Jain P. Yarlagadda Shreyank Jyoti Shyamgopal Karthik Subramanian Ramanathan Vineet Gandhi ViT 29 66 0 11 Dec 2020
Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio João P. Ferreira Thiago M. Coutinho Thiago L. Gomes J. F. Neto Rafael Azevedo Renato Martins Erickson R. Nascimento GAN 36 68 0 25 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Yi Yang ViT 46 417 0 14 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment Pedro Morgado Yi Li Nuno Vasconcelos SSL 27 121 0 03 Nov 2020
Listening to Sounds of Silence for Speech Denoising Ruilin Xu Rundi Wu Y. Ishiwaka Carl Vondrick Changxi Zheng 25 32 0 22 Oct 2020