Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.15081
Cited By
v1
v2
v3
v4
v5 (latest)
Word Discovery in Visually Grounded, Self-Supervised Speech Models
28 March 2022
Puyuan Peng
David Harwath
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Github (26★)
Papers citing
"Word Discovery in Visually Grounded, Self-Supervised Speech Models"
37 / 37 papers shown
Title
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation
Ziqiao Ma
Jing Ding
Xuejun Zhang
Dezhi Luo
Jiahe Ding
Sihan Xu
Yuchen Huang
Run Peng
Joyce Chai
216
0
0
22 Apr 2025
Towards Unsupervised Speech Recognition Without Pronunciation Models
Junrui Ni
Liming Wang
Yang Zhang
Kaizhi Qian
Heting Gao
Mark Hasegawa-Johnson
Chang D. Yoo
SSL
OffRL
146
0
0
10 Jan 2025
Unsupervised Word Discovery: Boundary Detection with Clustering vs. Dynamic Programming
Simon Malan
Benjamin van Niekerk
Herman Kamper
105
0
0
22 Sep 2024
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
103
9
0
16 Oct 2023
Learning English with Peppa Pig
Mitja Nikolaus
Afra Alishahi
Grzegorz Chrupała
60
14
0
25 Feb 2022
Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring
Herman Kamper
91
26
0
24 Feb 2022
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling
Puyuan Peng
David Harwath
SSL
87
26
0
07 Feb 2022
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation
Khazar Khorrami
Okko Räsänen
77
20
0
29 Sep 2021
Fast-Slow Transformer for Visually Grounding Speech
Puyuan Peng
David Harwath
139
30
0
16 Sep 2021
Layer-wise Analysis of a Self-supervised Speech Representation Model
Ankita Pasad
Ju-Chieh Chou
Karen Livescu
SSL
88
308
0
10 Jul 2021
Attention-Based Keyword Localisation in Speech using Visual Grounding
Kayode Olaleye
Herman Kamper
61
13
0
16 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
188
3,004
0
14 Jun 2021
Cross-Modal Discrete Representation Learning
Alexander H. Liu
SouYoung Jin
Cheng-I Jeff Lai
Andrew Rouditchenko
A. Oliva
James R. Glass
SSL
73
41
0
10 Jun 2021
Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro-Velazquez
Najim Dehak
SSL
78
37
0
03 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
743
6,139
0
29 Apr 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
85
67
0
31 Dec 2020
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
Herman Kamper
Benjamin van Niekerk
SSL
MQ
86
36
0
14 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics
Okko Räsänen
María Andrea Cruz Blandón
73
25
0
03 Aug 2020
Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Najim Dehak
SSL
79
16
0
26 Jul 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
303
5,853
0
20 Jun 2020
Learning to Recognise Words using Visually Grounded Speech
Sebastiaan Scholten
Danny Merkx
O. Scharenborg
57
13
0
31 May 2020
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech
David Harwath
Wei-Ning Hsu
James R. Glass
86
85
0
21 Nov 2019
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
79
61
0
19 Sep 2019
Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech
William N. Havard
Jean-Pierre Chevrot
Laurent Besacier
60
21
0
18 Sep 2019
Transfer Learning from Audio-Visual Grounding to Speech Recognition
Wei-Ning Hsu
David Harwath
James R. Glass
SSL
59
32
0
09 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,324
0
11 Oct 2018
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
356
10,369
0
10 Jul 2018
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
David Harwath
Adrià Recasens
Dídac Surís
Galen Chuang
Antonio Torralba
James R. Glass
104
201
0
04 Apr 2018
The Zero Resource Speech Challenge 2017
Maarten Versteegh
Xuan-Nga Cao
Roland Thiollière
Thomas Schatz
Mathieu Bernard
A. Jansen
Xavier Anguera Miró
Emmanuel Dupoux
81
204
0
12 Dec 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
819
132,725
0
12 Jun 2017
An embedded segmental K-means model for unsupervised segmentation and clustering of speech
Herman Kamper
Karen Livescu
Sharon Goldwater
60
98
0
23 Mar 2017
Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner
Emmanuel Dupoux
77
158
0
29 Jul 2016
A segmental framework for fully-unsupervised large-vocabulary speech recognition
Herman Kamper
A. Jansen
Sharon Goldwater
84
104
0
22 Jun 2016
Deep Multimodal Semantic Embeddings for Speech and Images
David Harwath
James R. Glass
73
157
0
11 Nov 2015
Deep Visual-Semantic Alignments for Generating Image Descriptions
A. Karpathy
Li Fei-Fei
154
5,595
0
07 Dec 2014
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
442
43,875
0
01 May 2014
1