Transfer Learning from Audio-Visual Grounding to Speech Recognition

9 July 2019

Papers citing "Transfer Learning from Audio-Visual Grounding to Speech Recognition"

29 / 29 papers shown

Title
An Unsupervised Autoregressive Model for Speech Representation Learning Yu-An Chung Wei-Ning Hsu Hao Tang James R. Glass SSL 78 408 0 05 Apr 2019
Towards Visually Grounded Sub-Word Speech Unit Discovery David Harwath James R. Glass 42 35 0 21 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 94,891 0 11 Oct 2018
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition Wei-Ning Hsu Hao Tang James R. Glass SSL 48 22 0 13 Jun 2018
Training Augmentation with Adversarial Examples for Robust Speech Recognition Sining Sun Ching-Feng Yeh Mari Ostendorf M. Hwang Lei Xie AAML 55 63 0 07 Jun 2018
Scalable Factorized Hierarchical Variational Autoencoder Training Wei-Ning Hsu James R. Glass BDL 51 24 0 09 Apr 2018
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input David Harwath Adrià Recasens Dídac Surís Galen Chuang Antonio Torralba James R. Glass 72 201 0 04 Apr 2018
Unsupervised Textual Grounding: Linking Words to Image Concepts Raymond A. Yeh Minh Do Alex Schwing 42 40 0 29 Mar 2018
A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation Ehsan Hosseini-Asl Yingbo Zhou Caiming Xiong R. Socher 34 56 0 27 Mar 2018
Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition Wei-Ning Hsu James R. Glass 46 43 0 07 Mar 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 214 11,556 0 15 Feb 2018
State-of-the-art Speech Recognition With Sequence-to-Sequence Models Chung-Cheng Chiu Tara N. Sainath Yonghui Wu Rohit Prabhavalkar Patrick Nguyen ... Katya Gonina Navdeep Jaitly Yue Liu J. Chorowski M. Bacchiani AI4TS 89 1,153 0 05 Dec 2017
Unsupervised Learning of Semantic Audio Representations A. Jansen Manoj Plakal R. Pandya D. Ellis Shawn Hershey Jiayang Liu R. C. Moore Rif A. Saurous SSL 86 131 0 06 Nov 2017
Semantic speech retrieval with a visually grounded model of untranscribed speech Herman Kamper Gregory Shakhnarovich Karen Livescu 62 53 0 05 Oct 2017
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data Wei-Ning Hsu Yu Zhang James R. Glass BDL SSL 78 353 0 22 Sep 2017
Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-Based Data Augmentation Wei-Ning Hsu Yu Zhang James R. Glass 61 129 0 19 Jul 2017
Encoding of phonology in a recurrent neural model of grounded speech Afra Alishahi Marie Barking Grzegorz Chrupała 50 58 0 12 Jun 2017
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks Tanmay Gupta Kevin J. Shih Saurabh Singh Derek Hoiem 71 26 0 02 Apr 2017
Representations of language in a model of visually grounded speech signal Grzegorz Chrupała Lieke Gelderloos Afra Alishahi 73 131 0 07 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis David Harwath James R. Glass 68 106 0 25 Jan 2017
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,020 0 10 Dec 2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Dario Amodei Rishita Anubhai Eric Battenberg Carl Case Jared Casper ... Chong-Jun Wang Bo Xiao Dani Yogatama J. Zhan Zhenyao Zhu 129 2,973 0 08 Dec 2015
Highway Long Short-Term Memory RNNs for Distant Speech Recognition Yu Zhang Guoguo Chen Dong Yu Kaisheng Yao Sanjeev Khudanpur James R. Glass 3DV AI4TS 66 291 0 30 Oct 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 514 62,294 0 04 Jun 2015
Object Detectors Emerge in Deep Scene CNNs Bolei Zhou A. Khosla Àgata Lapedriza A. Oliva Antonio Torralba ObjD 145 1,283 0 22 Dec 2014
Deep metric learning using Triplet network Elad Hoffer Nir Ailon SSL DML 192 1,998 0 20 Dec 2014
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman 244 7,535 0 09 Jun 2014
CNN Features off-the-shelf: an Astounding Baseline for Recognition A. Razavian Hossein Azizpour Josephine Sullivan S. Carlsson 157 4,940 0 23 Mar 2014
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 394 33,529 0 16 Oct 2013