ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1701.07481
  4. Cited By
Learning Word-Like Units from Joint Audio-Visual Analysis

Learning Word-Like Units from Joint Audio-Visual Analysis

25 January 2017
David Harwath
James R. Glass
ArXivPDFHTML

Papers citing "Learning Word-Like Units from Joint Audio-Visual Analysis"

20 / 20 papers shown
Title
RU-AI: A Large Multimodal Dataset for Machine Generated Content
  Detection
RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
Liting Huang
Zhihao Zhang
Yiran Zhang
Xiyue Zhou
Shoujin Wang
NoLa
46
2
0
07 Jun 2024
Simultaneous or Sequential Training? How Speech Representations
  Cooperate in a Multi-Task Self-Supervised Learning System
Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System
Khazar Khorrami
María Andrea Cruz Blandón
Tuomas Virtanen
Okko Rasanen
SSL
27
1
0
05 Jun 2023
Hindi as a Second Language: Improving Visually Grounded Speech with
  Semantically Similar Samples
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
H. Ryu
Arda Senocak
In So Kweon
Joon Son Chung
VLM
26
8
0
30 Mar 2023
Towards visually prompted keyword localisation for zero-resource spoken
  languages
Towards visually prompted keyword localisation for zero-resource spoken languages
Leanne Nortje
Herman Kamper
29
6
0
12 Oct 2022
TVLT: Textless Vision-Language Transformer
TVLT: Textless Vision-Language Transformer
Zineng Tang
Jaemin Cho
Yixin Nie
Joey Tianyi Zhou
VLM
51
28
0
28 Sep 2022
Keyword localisation in untranscribed speech using visually grounded
  speech models
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
32
7
0
02 Feb 2022
Unsupervised Multimodal Word Discovery based on Double Articulation
  Analysis with Co-occurrence cues
Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues
Akira Taniguchi
Hiroaki Murakami
Ryo Ozaki
T. Taniguchi
21
2
0
18 Jan 2022
Multimodal Image Synthesis and Editing: The Generative AI Era
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
29
48
0
27 Dec 2021
Cascaded Multilingual Audio-Visual Learning from Videos
Cascaded Multilingual Audio-Visual Learning from Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
...
Yikang Shen
Rogerio Feris
Brian Kingsbury
M. Picheny
James R. Glass
104
8
0
08 Nov 2021
What do End-to-End Speech Models Learn about Speaker, Language and
  Channel Information? A Layer-wise and Neuron-level Analysis
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis
Shammur A. Chowdhury
Nadir Durrani
Ahmed M. Ali
41
12
0
01 Jul 2021
What all do audio transformer models hear? Probing Acoustic
  Representations for Language Delivery and its Structure
What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
Jui Shah
Yaman Kumar Singla
Changyou Chen
R. Shah
25
81
0
02 Jan 2021
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Direct Speech-to-image Translation
Direct Speech-to-image Translation
Jiguo Li
Xinfeng Zhang
Chuanmin Jia
Jizheng Xu
Li Zhang
Y. Wang
Siwei Ma
Wen Gao
36
29
0
07 Apr 2020
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic
  Speech Recognition
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Yonatan Belinkov
Ahmed M. Ali
James R. Glass
28
32
0
09 Jul 2019
Multimodal Language Analysis with Recurrent Multistage Fusion
Multimodal Language Analysis with Recurrent Multistage Fusion
Paul Pu Liang
Liu Ziyin
Amir Zadeh
Louis-Philippe Morency
30
198
0
12 Aug 2018
Semantic speech retrieval with a visually grounded model of
  untranscribed speech
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
29
53
0
05 Oct 2017
Learning Latent Representations for Speech Generation and Transformation
Learning Latent Representations for Speech Generation and Transformation
Wei-Ning Hsu
Yu Zhang
James R. Glass
DRL
BDL
SSL
20
145
0
13 Apr 2017
Visually grounded learning of keyword prediction from untranscribed
  speech
Visually grounded learning of keyword prediction from untranscribed speech
Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
19
63
0
23 Mar 2017
Representations of language in a model of visually grounded speech
  signal
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
A. Alishahi
41
131
0
07 Feb 2017
Multi-view Recurrent Neural Acoustic Word Embeddings
Multi-view Recurrent Neural Acoustic Word Embeddings
Wanjia He
Weiran Wang
Karen Livescu
21
84
0
14 Nov 2016
1