Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.12385
Cited By
v1
v2 (latest)
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
27 November 2019
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
AI4TS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling"
38 / 38 papers shown
Title
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
257
3,745
0
09 Jan 2019
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Anish Acharya
Rahul Goel
A. Metallinou
Inderjit Dhillon
73
62
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
106
390
0
28 Sep 2018
Pyramidal Recurrent Unit for Language Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
63
10
0
27 Aug 2018
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Patrick H. Chen
Si Si
Yang Li
Ciprian Chelba
Cho-Jui Hsieh
59
70
0
18 Jun 2018
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
59
171
0
22 Mar 2018
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
DRL
97
4,845
0
04 Mar 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
233
11,566
0
15 Feb 2018
Slim Embedding Layers for Recurrent Neural Language Models
Zhongliang Li
Raymond Kulhanek
Shaojun Wang
Yunxin Zhao
Shuang Wu
KELM
57
23
0
27 Nov 2017
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Zhilin Yang
Zihang Dai
Ruslan Salakhutdinov
William W. Cohen
BDL
71
372
0
10 Nov 2017
Compressing Word Embeddings via Deep Compositional Code Learning
Raphael Shu
Hideki Nakayama
79
129
0
03 Nov 2017
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei
Yu Zhang
Sida I. Wang
Huijing Dai
Yoav Artzi
LRM
115
276
0
08 Sep 2017
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
169
1,096
0
07 Aug 2017
Learned in Translation: Contextualized Word Vectors
Bryan McCann
James Bradbury
Caiming Xiong
R. Socher
121
909
0
01 Aug 2017
On the State of the Art of Evaluation in Neural Language Models
Gábor Melis
Chris Dyer
Phil Blunsom
68
536
0
18 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Factorization tricks for LSTM networks
Oleksii Kuchaiev
Boris Ginsburg
63
113
0
31 Mar 2017
Massive Exploration of Neural Machine Translation Architectures
D. Britz
Anna Goldie
Minh-Thang Luong
Quoc V. Le
63
519
0
11 Mar 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
330
1,900
0
10 Jan 2017
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
242
2,404
0
23 Dec 2016
Improving Neural Language Models with a Continuous Cache
Edouard Grave
Armand Joulin
Nicolas Usunier
KELM
60
301
0
13 Dec 2016
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Hakan Inan
Khashayar Khosravi
R. Socher
128
385
0
04 Nov 2016
Why Deep Neural Networks for Function Approximation?
Shiyu Liang
R. Srikant
135
385
0
13 Oct 2016
Compressing Neural Language Models by Sparse Word Representations
Yunchuan Chen
Lili Mou
Yan Xu
Ge Li
Zhi Jin
MoE
35
30
0
13 Oct 2016
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
338
2,898
0
26 Sep 2016
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
Itay Hubara
Matthieu Courbariaux
Daniel Soudry
Ran El-Yaniv
Yoshua Bengio
MQ
155
1,868
0
22 Sep 2016
Efficient softmax approximation for GPUs
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
95
272
0
14 Sep 2016
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
92
736
0
20 Aug 2016
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Mohammad Rastegari
Vicente Ordonez
Joseph Redmon
Ali Farhadi
MQ
175
4,369
0
16 Mar 2016
Exploring the Limits of Language Modeling
Rafal Jozefowicz
Oriol Vinyals
M. Schuster
Noam M. Shazeer
Yonghui Wu
201
1,145
0
07 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
Character-Aware Neural Language Models
Yoon Kim
Yacine Jernite
David Sontag
Alexander M. Rush
107
1,669
0
26 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
413
7,969
0
17 Aug 2015
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
578
27,327
0
01 Sep 2014
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Ciprian Chelba
Tomas Mikolov
M. Schuster
Qi Ge
T. Brants
P. Koehn
T. Robinson
190
1,109
0
11 Dec 2013
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
402
33,560
0
16 Oct 2013
1