ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.04355
  4. Cited By
Transfer Learning from Audio-Visual Grounding to Speech Recognition

Transfer Learning from Audio-Visual Grounding to Speech Recognition

9 July 2019
Wei-Ning Hsu
David Harwath
James R. Glass
    SSL
ArXivPDFHTML

Papers citing "Transfer Learning from Audio-Visual Grounding to Speech Recognition"

29 / 29 papers shown
Title
An Unsupervised Autoregressive Model for Speech Representation Learning
An Unsupervised Autoregressive Model for Speech Representation Learning
Yu-An Chung
Wei-Ning Hsu
Hao Tang
James R. Glass
SSL
78
408
0
05 Apr 2019
Towards Visually Grounded Sub-Word Speech Unit Discovery
Towards Visually Grounded Sub-Word Speech Unit Discovery
David Harwath
James R. Glass
42
35
0
21 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Unsupervised Adaptation with Interpretable Disentangled Representations
  for Distant Conversational Speech Recognition
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Wei-Ning Hsu
Hao Tang
James R. Glass
SSL
48
22
0
13 Jun 2018
Training Augmentation with Adversarial Examples for Robust Speech
  Recognition
Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sining Sun
Ching-Feng Yeh
Mari Ostendorf
M. Hwang
Lei Xie
AAML
55
63
0
07 Jun 2018
Scalable Factorized Hierarchical Variational Autoencoder Training
Scalable Factorized Hierarchical Variational Autoencoder Training
Wei-Ning Hsu
James R. Glass
BDL
51
24
0
09 Apr 2018
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory
  Input
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input
David Harwath
Adrià Recasens
Dídac Surís
Galen Chuang
Antonio Torralba
James R. Glass
72
201
0
04 Apr 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts
Unsupervised Textual Grounding: Linking Words to Image Concepts
Raymond A. Yeh
Minh Do
Alex Schwing
42
40
0
29 Mar 2018
A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech
  Domain Adaptation
A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation
Ehsan Hosseini-Asl
Yingbo Zhou
Caiming Xiong
R. Socher
34
56
0
27 Mar 2018
Extracting Domain Invariant Features by Unsupervised Learning for Robust
  Automatic Speech Recognition
Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition
Wei-Ning Hsu
James R. Glass
46
43
0
07 Mar 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
214
11,556
0
15 Feb 2018
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
State-of-the-art Speech Recognition With Sequence-to-Sequence Models
Chung-Cheng Chiu
Tara N. Sainath
Yonghui Wu
Rohit Prabhavalkar
Patrick Nguyen
...
Katya Gonina
Navdeep Jaitly
Yue Liu
J. Chorowski
M. Bacchiani
AI4TS
89
1,153
0
05 Dec 2017
Unsupervised Learning of Semantic Audio Representations
Unsupervised Learning of Semantic Audio Representations
A. Jansen
Manoj Plakal
R. Pandya
D. Ellis
Shawn Hershey
Jiayang Liu
R. C. Moore
Rif A. Saurous
SSL
86
131
0
06 Nov 2017
Semantic speech retrieval with a visually grounded model of
  untranscribed speech
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
62
53
0
05 Oct 2017
Unsupervised Learning of Disentangled and Interpretable Representations
  from Sequential Data
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data
Wei-Ning Hsu
Yu Zhang
James R. Glass
BDL
SSL
78
353
0
22 Sep 2017
Unsupervised Domain Adaptation for Robust Speech Recognition via
  Variational Autoencoder-Based Data Augmentation
Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation
Wei-Ning Hsu
Yu Zhang
James R. Glass
61
129
0
19 Jul 2017
Encoding of phonology in a recurrent neural model of grounded speech
Encoding of phonology in a recurrent neural model of grounded speech
Afra Alishahi
Marie Barking
Grzegorz Chrupała
50
58
0
12 Jun 2017
Aligned Image-Word Representations Improve Inductive Transfer Across
  Vision-Language Tasks
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta
Kevin J. Shih
Saurabh Singh
Derek Hoiem
71
26
0
02 Apr 2017
Representations of language in a model of visually grounded speech
  signal
Representations of language in a model of visually grounded speech signal
Grzegorz Chrupała
Lieke Gelderloos
Afra Alishahi
73
131
0
07 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis
Learning Word-Like Units from Joint Audio-Visual Analysis
David Harwath
James R. Glass
68
106
0
25 Jan 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
...
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
129
2,973
0
08 Dec 2015
Highway Long Short-Term Memory RNNs for Distant Speech Recognition
Highway Long Short-Term Memory RNNs for Distant Speech Recognition
Yu Zhang
Guoguo Chen
Dong Yu
Kaisheng Yao
Sanjeev Khudanpur
James R. Glass
3DV
AI4TS
66
291
0
30 Oct 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
514
62,294
0
04 Jun 2015
Object Detectors Emerge in Deep Scene CNNs
Object Detectors Emerge in Deep Scene CNNs
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
ObjD
145
1,283
0
22 Dec 2014
Deep metric learning using Triplet network
Deep metric learning using Triplet network
Elad Hoffer
Nir Ailon
SSL
DML
192
1,998
0
20 Dec 2014
Two-Stream Convolutional Networks for Action Recognition in Videos
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
244
7,535
0
09 Jun 2014
CNN Features off-the-shelf: an Astounding Baseline for Recognition
CNN Features off-the-shelf: an Astounding Baseline for Recognition
A. Razavian
Hossein Azizpour
Josephine Sullivan
S. Carlsson
157
4,940
0
23 Mar 2014
Distributed Representations of Words and Phrases and their
  Compositionality
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
394
33,529
0
16 Oct 2013
1