SpeechCLIP+: Self-supervised multi-task representation learning for
speech via CLIP and speech-image data

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

10 February 2024

Hung-yi Lee

Papers citing "SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data"

4 / 4 papers shown

Title
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 Chun Xu En-Wei Sun 33 0 0 19 Jul 2024
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model Yi-Jen Shih Hsuan-Fu Wang Heng-Jui Chang Layne Berry Hung-yi Lee David Harwath VLM CLIP 46 32 0 03 Oct 2022
Self-Supervised Speech Representation Learning: A Review Abdel-rahman Mohamed Hung-yi Lee Lasse Borgholt Jakob Drachmann Havtorn Joakim Edin ... Shang-Wen Li Karen Livescu Lars Maaløe Tara N. Sainath Shinji Watanabe SSL AI4TS 128 349 0 21 May 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 392 4,137 0 28 Jan 2022