Connecting the Dots between Audio and Text without Parallel Data through
Visual Knowledge Transfer

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer

16 December 2021

Yejin Choi

Papers citing "Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer"

13 / 13 papers shown

Title
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 81 2 0 10 Jan 2025
Gramian Multimodal Representation Learning and Alignment Giordano Cicchetti Eleonora Grassucci Luigi Sigillo Danilo Comminiello 96 1 0 16 Dec 2024
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks K. Paim Ricardo Rohweder M. R. Mendoza R. Mansilha Weverton Cordeiro 27 2 0 16 Jun 2023
Harvesting Event Schemas from Large Language Models Jialong Tang Hongyu Lin Zhuoqun Li Yaojie Lu Xianpei Han Le Sun 26 4 0 12 May 2023
Prefix tuning for automated audio captioning Minkyu Kim Kim Sung-Bin Tae-Hyun Oh 21 43 0 30 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data Xuenan Xu Zhiling Zhang Zelin Zhou Pingyue Zhang Zeyu Xie Mengyue Wu Ke Zhu CLIP 71 14 0 14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification Xiaoyu Liu Hanlin Lu Jianbo Yuan Xinyu Li ViT 28 22 0 14 Mar 2023
Contrastive Audio-Language Learning for Music Ilaria Manco Emmanouil Benetos Elio Quinton Gyorgy Fazekas 27 44 0 25 Aug 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Joey Tianyi Zhou Gedas Bertasius 54 39 0 06 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Andy Zeng Maria Attarian Brian Ichter K. Choromanski Adrian S. Wong ... Michael S. Ryoo Vikas Sindhwani Johnny Lee Vincent Vanhoucke Peter R. Florence ReLM LRM 47 574 0 01 Apr 2022
Multimodal Self-Supervised Learning of General Audio Representations Luyu Wang Pauline Luc Adrià Recasens Jean-Baptiste Alayrac Aaron van den Oord SSL 78 41 0 26 Apr 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 251 577 0 22 Apr 2021
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 264 4,505 0 23 Jan 2020