ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.12385
  4. Cited By
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence
  Modeling
v1v2 (latest)

DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling

27 November 2019
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
    AI4TS
ArXiv (abs)PDFHTML

Papers citing "DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling"

38 / 38 papers shown
Title
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
257
3,745
0
09 Jan 2019
Online Embedding Compression for Text Classification using Low Rank
  Matrix Factorization
Online Embedding Compression for Text Classification using Low Rank Matrix Factorization
Anish Acharya
Rahul Goel
A. Metallinou
Inderjit Dhillon
73
62
0
01 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
106
390
0
28 Sep 2018
Pyramidal Recurrent Unit for Language Modeling
Pyramidal Recurrent Unit for Language Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
63
10
0
27 Aug 2018
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model
  Shrinking
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Patrick H. Chen
Si Si
Yang Li
Ciprian Chelba
Cho-Jui Hsieh
59
70
0
18 Jun 2018
An Analysis of Neural Language Modeling at Multiple Scales
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
59
171
0
22 Mar 2018
An Empirical Evaluation of Generic Convolutional and Recurrent Networks
  for Sequence Modeling
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
DRL
97
4,845
0
04 Mar 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
233
11,566
0
15 Feb 2018
Slim Embedding Layers for Recurrent Neural Language Models
Slim Embedding Layers for Recurrent Neural Language Models
Zhongliang Li
Raymond Kulhanek
Shaojun Wang
Yunxin Zhao
Shuang Wu
KELM
57
23
0
27 Nov 2017
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Zhilin Yang
Zihang Dai
Ruslan Salakhutdinov
William W. Cohen
BDL
71
372
0
10 Nov 2017
Compressing Word Embeddings via Deep Compositional Code Learning
Compressing Word Embeddings via Deep Compositional Code Learning
Raphael Shu
Hideki Nakayama
79
129
0
03 Nov 2017
Simple Recurrent Units for Highly Parallelizable Recurrence
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei
Yu Zhang
Sida I. Wang
Huijing Dai
Yoav Artzi
LRM
115
276
0
08 Sep 2017
Regularizing and Optimizing LSTM Language Models
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
169
1,096
0
07 Aug 2017
Learned in Translation: Contextualized Word Vectors
Learned in Translation: Contextualized Word Vectors
Bryan McCann
James Bradbury
Caiming Xiong
R. Socher
121
909
0
01 Aug 2017
On the State of the Art of Evaluation in Neural Language Models
On the State of the Art of Evaluation in Neural Language Models
Gábor Melis
Chris Dyer
Phil Blunsom
68
536
0
18 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Factorization tricks for LSTM networks
Factorization tricks for LSTM networks
Oleksii Kuchaiev
Boris Ginsburg
63
113
0
31 Mar 2017
Massive Exploration of Neural Machine Translation Architectures
Massive Exploration of Neural Machine Translation Architectures
D. Britz
Anna Goldie
Minh-Thang Luong
Quoc V. Le
63
519
0
11 Mar 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
330
1,900
0
10 Jan 2017
Language Modeling with Gated Convolutional Networks
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
242
2,404
0
23 Dec 2016
Improving Neural Language Models with a Continuous Cache
Improving Neural Language Models with a Continuous Cache
Edouard Grave
Armand Joulin
Nicolas Usunier
KELM
60
301
0
13 Dec 2016
Tying Word Vectors and Word Classifiers: A Loss Framework for Language
  Modeling
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
Hakan Inan
Khashayar Khosravi
R. Socher
128
385
0
04 Nov 2016
Why Deep Neural Networks for Function Approximation?
Why Deep Neural Networks for Function Approximation?
Shiyu Liang
R. Srikant
135
385
0
13 Oct 2016
Compressing Neural Language Models by Sparse Word Representations
Compressing Neural Language Models by Sparse Word Representations
Yunchuan Chen
Lili Mou
Yan Xu
Ge Li
Zhi Jin
MoE
35
30
0
13 Oct 2016
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
338
2,898
0
26 Sep 2016
Quantized Neural Networks: Training Neural Networks with Low Precision
  Weights and Activations
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
Itay Hubara
Matthieu Courbariaux
Daniel Soudry
Ran El-Yaniv
Yoshua Bengio
MQ
155
1,868
0
22 Sep 2016
Efficient softmax approximation for GPUs
Efficient softmax approximation for GPUs
Edouard Grave
Armand Joulin
Moustapha Cissé
David Grangier
Hervé Jégou
95
272
0
14 Sep 2016
Using the Output Embedding to Improve Language Models
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
92
736
0
20 Aug 2016
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural
  Networks
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
Mohammad Rastegari
Vicente Ordonez
Joseph Redmon
Ali Farhadi
MQ
175
4,369
0
16 Mar 2016
Exploring the Limits of Language Modeling
Exploring the Limits of Language Modeling
Rafal Jozefowicz
Oriol Vinyals
M. Schuster
Noam M. Shazeer
Yonghui Wu
201
1,145
0
07 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
228
7,757
0
31 Aug 2015
Character-Aware Neural Language Models
Character-Aware Neural Language Models
Yoon Kim
Yacine Jernite
David Sontag
Alexander M. Rush
107
1,669
0
26 Aug 2015
Effective Approaches to Attention-based Neural Machine Translation
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
413
7,969
0
17 Aug 2015
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
578
27,327
0
01 Sep 2014
One Billion Word Benchmark for Measuring Progress in Statistical
  Language Modeling
One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
Ciprian Chelba
Tomas Mikolov
M. Schuster
Qi Ge
T. Brants
P. Koehn
T. Robinson
190
1,109
0
11 Dec 2013
Distributed Representations of Words and Phrases and their
  Compositionality
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAIOCL
402
33,560
0
16 Oct 2013
1