Text-to-feature diffusion for audio-visual few-shot learning

7 September 2023

A. Sophia Koepke

Papers citing "Text-to-feature diffusion for audio-visual few-shot learning"

26 / 26 papers shown

Title
A Comprehensive Review of Few-shot Action Recognition Yuyang Wanyan Xiaoshan Yang Weiming Dong Changsheng Xu VLM 118 3 0 20 Jul 2024
Temporal and cross-modal attention for audio-visual zero-shot learning Otniel-Bogdan Mercea Thomas Hummel A. Sophia Koepke Zeynep Akata 58 26 0 20 Jul 2022
Generative Adversarial Networks Gilad Cohen Raja Giryes GAN 151 30,069 0 01 Mar 2022
HiP: Hierarchical Perceiver João Carreira Skanda Koppula Daniel Zoran Adrià Recasens Catalin Ionescu ... M. Botvinick Oriol Vinyals Karen Simonyan Andrew Zisserman Andrew Jaegle VLM 76 14 0 22 Feb 2022
Attention Bottlenecks for Multimodal Fusion Arsha Nagrani Shan Yang Anurag Arnab A. Jansen Cordelia Schmid Chen Sun 81 553 0 30 Jun 2021
Score-based Generative Modeling in Latent Space Arash Vahdat Karsten Kreis Jan Kautz DiffM 35 667 0 10 Jun 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval Xiaohan Wang Linchao Zhu Yi Yang 170 172 0 20 Apr 2021
Localizing Visual Sounds the Hard Way Honglie Chen Weidi Xie Triantafyllos Afouras Arsha Nagrani Andrea Vedaldi Andrew Zisserman ObjD 44 185 0 06 Apr 2021
Perceiver: General Perception with Iterative Attention Andrew Jaegle Felix Gimeno Andrew Brock Andrew Zisserman Oriol Vinyals João Carreira VLM ViT MDE 135 1,003 0 04 Mar 2021
Temporal-Relational CrossTransformers for Few-Shot Action Recognition Toby Perrett A. Masullo T. Burghardt Majid Mirmehdi Dima Damen ViT 76 147 0 15 Jan 2021
Multi-modal Transformer for Video Retrieval Valentin Gabeur Chen Sun Alahari Karteek Cordelia Schmid ViT 504 602 0 21 Jul 2020
Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation Yongqin Xian Bruno Korbar Matthijs Douze Lorenzo Torresani Bernt Schiele Zeynep Akata VGen 48 18 0 09 Jul 2020
Domain-aware Visual Bias Eliminating for Generalized Zero-Shot Learning Shaobo Min Hantao Yao Hongtao Xie Chaoqun Wang Zhengjun Zha Yongdong Zhang 26 141 0 30 Mar 2020
Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio Events Wim Boes Hugo Van hamme 30 17 0 02 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering Humam Alwassel D. Mahajan Bruno Korbar Lorenzo Torresani Guohao Li Du Tran SSL 73 429 0 28 Nov 2019
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 110 88 0 24 Oct 2019
Co-Separating Sounds of Visual Objects Ruohan Gao Kristen Grauman 103 208 0 16 Apr 2019
Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions Han-Jia Ye Hexiang Hu De-Chuan Zhan Fei Sha 112 659 0 10 Dec 2018
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization Bruno Korbar Du Tran Lorenzo Torresani 78 473 0 30 Jun 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning Andrew Owens Jiajun Wu Josh H. McDermott William T. Freeman Antonio Torralba SSL 63 177 0 20 Dec 2017
Objects that Sound Relja Arandjelović Andrew Zisserman ObjD VOS 74 529 0 18 Dec 2017
Learning to Compare: Relation Network for Few-Shot Learning Flood Sung Yongxin Yang Li Zhang Tao Xiang Philip Torr Timothy M. Hospedales 203 4,035 0 16 Nov 2017
CNN Architectures for Large-Scale Audio Classification Shawn Hershey Sourish Chaudhuri D. Ellis J. Gemmeke A. Jansen ... Rif A. Saurous Bryan Seybold M. Slaney Ron J. Weiss K. Wilson 92 2,488 0 29 Sep 2016
Matching Networks for One Shot Learning Oriol Vinyals Charles Blundell Timothy Lillicrap Koray Kavukcuoglu Daan Wierstra VLM 293 7,299 0 13 Jun 2016
An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild Wei-Lun Chao Soravit Changpinyo Boqing Gong Fei Sha 49 563 0 13 May 2016
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 552 31,406 0 16 Jan 2013