ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.03690
  4. Cited By
Deep Multimodal Semantic Embeddings for Speech and Images

Deep Multimodal Semantic Embeddings for Speech and Images

11 November 2015
David Harwath
James R. Glass
ArXivPDFHTML

Papers citing "Deep Multimodal Semantic Embeddings for Speech and Images"

23 / 23 papers shown
Title
Measuring Sound Symbolism in Audio-visual Models
Measuring Sound Symbolism in Audio-visual Models
Wei-Cheng Tseng
Yi-Jen Shih
David Harwath
Raymond Mooney
37
0
0
18 Sep 2024
RU-AI: A Large Multimodal Dataset for Machine Generated Content
  Detection
RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
Liting Huang
Zhihao Zhang
Yiran Zhang
Xiyue Zhou
Shoujin Wang
NoLa
46
2
0
07 Jun 2024
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Cross-Modal Coordination Across a Diverse Set of Input Modalities
Jorge Sánchez
Rodrigo Laguna
VLM
44
0
0
29 Jan 2024
Leveraging multilingual transfer for unsupervised semantic acoustic word
  embeddings
Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings
C. Jacobs
Herman Kamper
32
1
0
05 Jul 2023
Hindi as a Second Language: Improving Visually Grounded Speech with
  Semantically Similar Samples
Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
H. Ryu
Arda Senocak
In So Kweon
Joon Son Chung
VLM
26
8
0
30 Mar 2023
Towards visually prompted keyword localisation for zero-resource spoken
  languages
Towards visually prompted keyword localisation for zero-resource spoken languages
Leanne Nortje
Herman Kamper
29
6
0
12 Oct 2022
Self-Supervised Speech Representation Learning: A Review
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
137
350
0
21 May 2022
Improving Multimodal Speech Recognition by Data Augmentation and Speech
  Representations
Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
Dan Oneaţă
H. Cucu
19
19
0
27 Apr 2022
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen
  Language Models
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Heting Gao
Junrui Ni
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
VLM
14
31
0
29 Mar 2022
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Word Discovery in Visually Grounded, Self-Supervised Speech Models
Puyuan Peng
David Harwath
SSL
20
39
0
28 Mar 2022
Adversarial Attacks on Speech Recognition Systems for Mission-Critical
  Applications: A Survey
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Imran Razzak
Kevin Lee
Chetan Arora
Ali Hassani
A. Zaslavsky
AAML
29
6
0
22 Feb 2022
Keyword localisation in untranscribed speech using visually grounded
  speech models
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
32
7
0
02 Feb 2022
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent
  Adversarial Networks
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
Russell Sammut Bonnici
C. Saitis
Martin Benning
GAN
36
15
0
05 Sep 2021
Can You Hear It? Backdoor Attacks via Ultrasonic Triggers
Can You Hear It? Backdoor Attacks via Ultrasonic Triggers
Stefanos Koffas
Jing Xu
Mauro Conti
S. Picek
AAML
22
66
0
30 Jul 2021
Unsupervised Automatic Speech Recognition: A Review
Unsupervised Automatic Speech Recognition: A Review
Hanan Aldarmaki
Asad Ullah
Nazar Zaki
VLM
SSL
39
56
0
09 Jun 2021
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
23
11
0
05 Oct 2020
AVLnet: Learning Audio-Visual Language Representations from
  Instructional Videos
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Direct Speech-to-image Translation
Direct Speech-to-image Translation
Jiguo Li
Xinfeng Zhang
Chuanmin Jia
Jizheng Xu
Li Zhang
Y. Wang
Siwei Ma
Wen Gao
36
29
0
07 Apr 2020
Captioning Images Taken by People Who Are Blind
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
22
181
0
20 Feb 2020
Semantic speech retrieval with a visually grounded model of
  untranscribed speech
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
29
53
0
05 Oct 2017
Visually grounded learning of keyword prediction from untranscribed
  speech
Visually grounded learning of keyword prediction from untranscribed speech
Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
19
63
0
23 Mar 2017
Representations of language in a model of visually grounded speech
  signal
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
A. Alishahi
41
131
0
07 Feb 2017
Multi-view Recurrent Neural Acoustic Word Embeddings
Multi-view Recurrent Neural Acoustic Word Embeddings
Wanjia He
Weiran Wang
Karen Livescu
18
84
0
14 Nov 2016
1