v1v2 (latest)

DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling

27 November 2019

Sachin Mehta

Rik Koncel-Kedziorski

Papers citing "DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling"

38 / 38 papers shown

Title
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 257 3,745 0 09 Jan 2019
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization Anish Acharya Rahul Goel A. Metallinou Inderjit Dhillon 73 62 0 01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,175 0 11 Oct 2018
Adaptive Input Representations for Neural Language Modeling Alexei Baevski Michael Auli 106 390 0 28 Sep 2018
Pyramidal Recurrent Unit for Language Modeling Sachin Mehta Rik Koncel-Kedziorski Mohammad Rastegari Hannaneh Hajishirzi 63 10 0 27 Aug 2018
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking Patrick H. Chen Si Si Yang Li Ciprian Chelba Cho-Jui Hsieh 59 70 0 18 Jun 2018
An Analysis of Neural Language Modeling at Multiple Scales Stephen Merity N. Keskar R. Socher 59 171 0 22 Mar 2018
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling Shaojie Bai J. Zico Kolter V. Koltun DRL 97 4,845 0 04 Mar 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 233 11,566 0 15 Feb 2018
Slim Embedding Layers for Recurrent Neural Language Models Zhongliang Li Raymond Kulhanek Shaojun Wang Yunxin Zhao Shuang Wu KELM 57 23 0 27 Nov 2017
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model Zhilin Yang Zihang Dai Ruslan Salakhutdinov William W. Cohen BDL 71 372 0 10 Nov 2017
Compressing Word Embeddings via Deep Compositional Code Learning Raphael Shu Hideki Nakayama 79 129 0 03 Nov 2017
Simple Recurrent Units for Highly Parallelizable Recurrence Tao Lei Yu Zhang Sida I. Wang Huijing Dai Yoav Artzi LRM 115 276 0 08 Sep 2017
Regularizing and Optimizing LSTM Language Models Stephen Merity N. Keskar R. Socher 169 1,096 0 07 Aug 2017
Learned in Translation: Contextualized Word Vectors Bryan McCann James Bradbury Caiming Xiong R. Socher 121 909 0 01 Aug 2017
On the State of the Art of Evaluation in Neural Language Models Gábor Melis Chris Dyer Phil Blunsom 68 536 0 18 Jul 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 786 132,363 0 12 Jun 2017
Factorization tricks for LSTM networks Oleksii Kuchaiev Boris Ginsburg 63 113 0 31 Mar 2017
Massive Exploration of Neural Machine Translation Architectures D. Britz Anna Goldie Minh-Thang Luong Quoc V. Le 63 519 0 11 Mar 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation Guillaume Klein Yoon Kim Yuntian Deng Jean Senellart Alexander M. Rush 330 1,900 0 10 Jan 2017
Language Modeling with Gated Convolutional Networks Yann N. Dauphin Angela Fan Michael Auli David Grangier 242 2,404 0 23 Dec 2016
Improving Neural Language Models with a Continuous Cache Edouard Grave Armand Joulin Nicolas Usunier KELM 60 301 0 13 Dec 2016
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling Hakan Inan Khashayar Khosravi R. Socher 128 385 0 04 Nov 2016
Why Deep Neural Networks for Function Approximation? Shiyu Liang R. Srikant 135 385 0 13 Oct 2016
Compressing Neural Language Models by Sparse Word Representations Yunchuan Chen Lili Mou Yan Xu Ge Li Zhi Jin MoE 35 30 0 13 Oct 2016
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 338 2,898 0 26 Sep 2016
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations Itay Hubara Matthieu Courbariaux Daniel Soudry Ran El-Yaniv Yoshua Bengio MQ 155 1,868 0 22 Sep 2016
Efficient softmax approximation for GPUs Edouard Grave Armand Joulin Moustapha Cissé David Grangier Hervé Jégou 95 272 0 14 Sep 2016
Using the Output Embedding to Improve Language Models Ofir Press Lior Wolf 92 736 0 20 Aug 2016
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks Mohammad Rastegari Vicente Ordonez Joseph Redmon Ali Farhadi MQ 175 4,369 0 16 Mar 2016
Exploring the Limits of Language Modeling Rafal Jozefowicz Oriol Vinyals M. Schuster Noam M. Shazeer Yonghui Wu 201 1,145 0 07 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,426 0 10 Dec 2015
Neural Machine Translation of Rare Words with Subword Units Rico Sennrich Barry Haddow Alexandra Birch 228 7,757 0 31 Aug 2015
Character-Aware Neural Language Models Yoon Kim Yacine Jernite David Sontag Alexander M. Rush 107 1,669 0 26 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 413 7,969 0 17 Aug 2015
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 578 27,327 0 01 Sep 2014
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling Ciprian Chelba Tomas Mikolov M. Schuster Qi Ge T. Brants P. Koehn T. Robinson 190 1,109 0 11 Dec 2013
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 402 33,560 0 16 Oct 2013