WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words

5 December 2023

Papers citing "WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words"

5 / 5 papers shown

Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Alex Warstadt Aaron Mueller Leshem Choshen E. Wilcox Chengxu Zhuang ... Rafael Mosquera Bhargavi Paranjape Adina Williams Tal Linzen Ryan Cotterell 54 113 0 10 Apr 2025
Quantifying the redundancy between prosody and text Lukas Wolf Tiago Pimentel Evelina Fedorenko Ryan Cotterell Alex Warstadt Ethan Gotlieb Wilcox Tamar I. Regev 46 11 0 28 Nov 2023
Generative Spoken Language Modeling from Raw Audio Kushal Lakhotia Evgeny Kharitonov Wei-Ning Hsu Yossi Adi Adam Polyak ... Tu Nguyen Jade Copet Alexei Baevski A. Mohamed Emmanuel Dupoux AuLLM 201 348 0 01 Feb 2021
pyannote.audio: neural building blocks for speaker diarization H. Bredin Ruiqing Yin Juan Manuel Coria G. Gelly Pavel Korshunov Marvin Lavechin D. Fustes Hadrien Titeux Wassim Bouaziz Marie-Philippe Gill 204 318 0 04 Nov 2019
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 723 6,756 0 26 Sep 2016