ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.10853
  4. Cited By
Adaptive Input Representations for Neural Language Modeling
v1v2v3 (latest)

Adaptive Input Representations for Neural Language Modeling

28 September 2018
Alexei Baevski
Michael Auli
ArXiv (abs)PDFHTML

Papers citing "Adaptive Input Representations for Neural Language Modeling"

50 / 269 papers shown
Title
Automated Source Code Generation and Auto-completion Using Deep
  Learning: Comparing and Discussing Current Language-Model-Related Approaches
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
66
31
0
16 Sep 2020
PopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment Generation
Yi Ren
Jinzheng He
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
91
118
0
18 Aug 2020
DeLighT: Deep and Light-weight Transformer
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
83
32
0
03 Aug 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
323
5,868
0
20 Jun 2020
Latent Video Transformer
Latent Video Transformer
Ruslan Rakhimov
Denis Volkhonskiy
Alexey Artemov
Denis Zorin
Evgeny Burnaev
VGen
107
121
0
18 Jun 2020
MLE-guided parameter search for task loss minimization in neural
  sequence modeling
MLE-guided parameter search for task loss minimization in neural sequence modeling
Sean Welleck
Kyunghyun Cho
65
10
0
04 Jun 2020
MicroNet for Efficient Language Modeling
MicroNet for Efficient Language Modeling
Zhongxia Yan
Hanrui Wang
Demi Guo
Song Han
62
8
0
16 May 2020
A Mixture of $h-1$ Heads is Better than $h$ Heads
A Mixture of h−1h-1h−1 Heads is Better than hhh Heads
Hao Peng
Roy Schwartz
Dianqi Li
Noah A. Smith
MoE
74
33
0
13 May 2020
Multi-scale Transformer Language Models
Multi-scale Transformer Language Models
Sandeep Subramanian
R. Collobert
MarcÁurelio Ranzato
Y-Lan Boureau
58
13
0
01 May 2020
Segatron: Segment-Aware Transformer for Language Modeling and
  Understanding
Segatron: Segment-Aware Transformer for Language Modeling and Understanding
Richard He Bai
Peng Shi
Jimmy J. Lin
Yuqing Xie
Luchen Tan
Kun Xiong
Wen Gao
Ming Li
38
8
0
30 Apr 2020
Lite Transformer with Long-Short Range Attention
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
62
323
0
24 Apr 2020
A Generic Network Compression Framework for Sequential Recommender
  Systems
A Generic Network Compression Framework for Sequential Recommender Systems
Yang Sun
Fajie Yuan
Ming Yang
Guoao Wei
Zhou Zhao
Duo Liu
81
55
0
21 Apr 2020
Understanding the Difficulty of Training Transformers
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
85
258
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
111
246
0
15 Apr 2020
Residual Energy-Based Models for Text
Residual Energy-Based Models for Text
A. Bakhtin
Yuntian Deng
Sam Gross
Myle Ott
MarcÁurelio Ranzato
Arthur Szlam
73
13
0
06 Apr 2020
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
TNT-KID: Transformer-based Neural Tagger for Keyword Identification
Matej Martinc
Blaž Škrlj
Senja Pollak
87
38
0
20 Mar 2020
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang
Manzil Zaheer
Yuan Wang
Amr Ahmed
BDL
37
1
0
18 Mar 2020
PowerNorm: Rethinking Batch Normalization in Transformers
PowerNorm: Rethinking Batch Normalization in Transformers
Sheng Shen
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
BDL
114
16
0
17 Mar 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
401
606
0
12 Mar 2020
Sparse Sinkhorn Attention
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
102
341
0
26 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
61
11
0
21 Feb 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
160
1,002
0
12 Feb 2020
Time-aware Large Kernel Convolutions
Time-aware Large Kernel Convolutions
Vasileios Lioutas
Yuhong Guo
AI4TS
97
29
0
08 Feb 2020
Blank Language Models
Blank Language Models
T. Shen
Victor Quach
Regina Barzilay
Tommi Jaakkola
288
73
0
08 Feb 2020
Normalization of Input-output Shared Embeddings in Text Generation
  Models
Normalization of Input-output Shared Embeddings in Text Generation Models
Jinyang Liu
Yujia Zhai
Zizhong Chen
37
0
0
22 Jan 2020
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
  and Recommendation
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation
Fajie Yuan
Xiangnan He
Alexandros Karatzoglou
Liguang Zhang
32
0
0
13 Jan 2020
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence
  Modeling
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
AI4TS
108
23
0
27 Nov 2019
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern
  Architectures
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Gabriel Synnaeve
Qiantong Xu
Jacob Kahn
Tatiana Likhomanenko
Edouard Grave
Vineel Pratap
Anuroop Sriram
Vitaliy Liptchinsky
R. Collobert
SSLAI4TS
134
248
0
19 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALMVLMKELM
85
655
0
13 Nov 2019
Improving Transformer Models by Reordering their Sublayers
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
87
88
0
10 Nov 2019
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
235
6,616
0
05 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
185
846
0
01 Nov 2019
Correction of Automatic Speech Recognition with Transformer
  Sequence-to-sequence Model
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii Hrinchuk
Mariya Popova
Boris Ginsburg
VLM
62
90
0
23 Oct 2019
Depth-Adaptive Transformer
Depth-Adaptive Transformer
Maha Elbayad
Jiatao Gu
Edouard Grave
Michael Auli
88
193
0
22 Oct 2019
Stabilizing Transformers for Reinforcement Learning
Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto
H. F. Song
Jack W. Rae
Razvan Pascanu
Çağlar Gülçehre
...
Aidan Clark
Seb Noury
M. Botvinick
N. Heess
R. Hadsell
OffRL
103
368
0
13 Oct 2019
Structured Pruning of Large Language Models
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
96
293
0
10 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
106
70
0
09 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSLAIMat
498
6,482
0
26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
130
597
0
25 Sep 2019
Deep Equilibrium Models
Deep Equilibrium Models
Shaojie Bai
J. Zico Kolter
V. Koltun
105
674
0
03 Sep 2019
Towards Understanding Neural Machine Translation with Word Importance
Towards Understanding Neural Machine Translation with Word Importance
Shilin He
Zhaopeng Tu
Xing Wang
Longyue Wang
Michael R. Lyu
Shuming Shi
AAML
129
40
0
01 Sep 2019
Bridging the Gap for Tokenizer-Free Language Models
Bridging the Gap for Tokenizer-Free Language Models
Dokook Choe
Rami Al-Rfou
Mandy Guo
Heeyoung Lee
Noah Constant
56
21
0
27 Aug 2019
Latent Relation Language Models
Latent Relation Language Models
Hiroaki Hayashi
Zecong Hu
Chenyan Xiong
Graham Neubig
KELM
77
43
0
21 Aug 2019
Simple and Effective Noisy Channel Modeling for Neural Machine
  Translation
Simple and Effective Noisy Channel Modeling for Neural Machine Translation
Kyra Yee
Nathan Ng
Yann N. Dauphin
Michael Auli
68
79
0
15 Aug 2019
On The Evaluation of Machine Translation Systems Trained With
  Back-Translation
On The Evaluation of Machine Translation Systems Trained With Back-Translation
Sergey Edunov
Myle Ott
MarcÁurelio Ranzato
Michael Auli
42
98
0
14 Aug 2019
Neural Text Generation with Unlikelihood Training
Neural Text Generation with Unlikelihood Training
Sean Welleck
Ilia Kulikov
Stephen Roller
Emily Dinan
Kyunghyun Cho
Jason Weston
MU
95
584
0
12 Aug 2019
Large Memory Layers with Product Keys
Large Memory Layers with Product Keys
Guillaume Lample
Alexandre Sablayrolles
MarcÁurelio Ranzato
Ludovic Denoyer
Hervé Jégou
MoE
86
135
0
10 Jul 2019
Augmenting Self-attention with Persistent Memory
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Hervé Jégou
Armand Joulin
RALMKELM
77
139
0
02 Jul 2019
A Tensorized Transformer for Language Modeling
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
87
168
0
24 Jun 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
248
8,462
0
19 Jun 2019
Previous
123456
Next