ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.10853
  4. Cited By
Adaptive Input Representations for Neural Language Modeling
v1v2v3 (latest)

Adaptive Input Representations for Neural Language Modeling

28 September 2018
Alexei Baevski
Michael Auli
ArXiv (abs)PDFHTML

Papers citing "Adaptive Input Representations for Neural Language Modeling"

50 / 269 papers shown
Title
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
RankNAS: Efficient Neural Architecture Search by Pairwise Ranking
Chi Hu
Chenglong Wang
Xiangnan Ma
Xia Meng
Yinqiao Li
Tong Xiao
Jingbo Zhu
Changliang Li
77
11
0
15 Sep 2021
Efficient Nearest Neighbor Language Models
Efficient Nearest Neighbor Language Models
Junxian He
Graham Neubig
Taylor Berg-Kirkpatrick
RALM
264
106
0
09 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input
  Length Extrapolation
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
350
779
0
27 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
70
36
0
05 Aug 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for
  Sequences
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
167
42
0
25 Jul 2021
Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
Brazilian Portuguese Speech Recognition Using Wav2vec 2.0
L. Gris
Edresson Casanova
F. S. Oliveira
A. S. Soares
A. Júnior
20
17
0
23 Jul 2021
Transformer Network for Significant Stenosis Detection in CCTA of
  Coronary Arteries
Transformer Network for Significant Stenosis Detection in CCTA of Coronary Arteries
Xin Ma
Gongning Luo
Wei Wang
Kuanquan Wang
ViTMedIm
41
26
0
07 Jul 2021
R-Drop: Regularized Dropout for Neural Networks
R-Drop: Regularized Dropout for Neural Networks
Xiaobo Liang
Lijun Wu
Juntao Li
Yue Wang
Qi Meng
Tao Qin
Wei Chen
Hao Fei
Tie-Yan Liu
90
439
0
28 Jun 2021
Stabilizing Equilibrium Models by Jacobian Regularization
Stabilizing Equilibrium Models by Jacobian Regularization
Shaojie Bai
V. Koltun
J. Zico Kolter
86
59
0
28 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer
  Training
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
71
33
0
17 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
197
1,147
0
08 Jun 2021
Top-KAST: Top-K Always Sparse Training
Top-KAST: Top-K Always Sparse Training
Siddhant M. Jayakumar
Razvan Pascanu
Jack W. Rae
Simon Osindero
Erich Elsen
184
100
0
07 Jun 2021
You Only Look at One Sequence: Rethinking Transformer in Vision through
  Object Detection
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
Yuxin Fang
Bencheng Liao
Xinggang Wang
Jiemin Fang
Jiyang Qi
Rui Wu
Jianwei Niu
Wenyu Liu
ViT
80
326
0
01 Jun 2021
Language Model Evaluation Beyond Perplexity
Language Model Evaluation Beyond Perplexity
Clara Meister
Ryan Cotterell
155
77
0
31 May 2021
Cascaded Head-colliding Attention
Cascaded Head-colliding Attention
Lin Zheng
Zhiyong Wu
Lingpeng Kong
55
2
0
31 May 2021
Unsupervised Speech Recognition
Unsupervised Speech Recognition
Alexei Baevski
Wei-Ning Hsu
Alexis Conneau
Michael Auli
SSL
149
275
0
24 May 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
Not All Memories are Created Equal: Learning to Forget by Expiring
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
75
35
0
13 May 2021
Differentiable Model Compression via Pseudo Quantization Noise
Differentiable Model Compression via Pseudo Quantization Noise
Alexandre Défossez
Yossi Adi
Gabriel Synnaeve
DiffMMQ
92
50
0
20 Apr 2021
Go Forth and Prosper: Language Modeling with Ancient Textual History
Go Forth and Prosper: Language Modeling with Ancient Textual History
Rik Koncel-Kedziorski
Noah A. Smith
KELM
27
0
0
18 Apr 2021
Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Large-Scale Self- and Semi-Supervised Learning for Speech Translation
Changhan Wang
Anne Wu
J. Pino
Alexei Baevski
Michael Auli
Alexis Conneau
SSL
76
46
0
14 Apr 2021
Lessons on Parameter Sharing across Layers in Transformers
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
105
87
0
13 Apr 2021
Evaluating Saliency Methods for Neural Language Models
Evaluating Saliency Methods for Neural Language Models
Shuoyang Ding
Philipp Koehn
FAttXAI
63
55
0
12 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language Models
Simeng Sun
Mohit Iyyer
84
14
0
08 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
128
197
0
31 Mar 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
109
67
0
24 Mar 2021
3D Human Pose Estimation with Spatial and Temporal Transformers
3D Human Pose Estimation with Spatial and Temporal Transformers
Ce Zheng
Sijie Zhu
Matías Mendieta
Taojiannan Yang
Chong Chen
Zhengming Ding
ViT
173
456
0
18 Mar 2021
Variable-rate discrete representation learning
Variable-rate discrete representation learning
Sander Dieleman
C. Nash
Jesse Engel
Karen Simonyan
BDLDRL
82
24
0
10 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
133
362
0
03 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
113
30
0
01 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with
  Reduced Compute
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALMVLM
149
49
0
24 Feb 2021
Do Transformer Modifications Transfer Across Implementations and
  Applications?
Do Transformer Modifications Transfer Across Implementations and Applications?
Sharan Narang
Hyung Won Chung
Yi Tay
W. Fedus
Thibault Févry
...
Wei Li
Nan Ding
Jake Marcus
Adam Roberts
Colin Raffel
100
128
0
23 Feb 2021
Linear Transformers Are Secretly Fast Weight Programmers
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
151
252
0
22 Feb 2021
Adaptive Semiparametric Language Models
Adaptive Semiparametric Language Models
Dani Yogatama
Cyprien de Masson dÁutume
Lingpeng Kong
KELMRALM
99
100
0
04 Feb 2021
A Comparison of Approaches to Document-level Machine Translation
A Comparison of Approaches to Document-level Machine Translation
Zhiyi Ma
Sergey Edunov
Michael Auli
39
13
0
26 Jan 2021
Compound Word Transformer: Learning to Compose Full-Song Music over
  Dynamic Directed Hypergraphs
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
Wen-Yi Hsiao
Jen-Yu Liu
Yin-Cheng Yeh
Yi-Hsuan Yang
193
187
0
07 Jan 2021
An Efficient Transformer Decoder with Compressed Sub-layers
An Efficient Transformer Decoder with Compressed Sub-layers
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
88
30
0
03 Jan 2021
Subformer: Exploring Weight Sharing for Parameter Efficiency in
  Generative Transformers
Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers
Machel Reid
Edison Marrese-Taylor
Y. Matsuo
MoE
108
48
0
01 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
304
91
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
114
55
0
31 Dec 2020
Transformer Feed-Forward Layers Are Key-Value Memories
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
209
850
0
29 Dec 2020
A Survey on Visual Transformer
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
229
2,278
0
23 Dec 2020
Cross-lingual Transfer of Abstractive Summarizer to Less-resource
  Language
Cross-lingual Transfer of Abstractive Summarizer to Less-resource Language
Aleš Žagar
Marko Robnik-Šikonja
68
9
0
08 Dec 2020
Accelerating Training of Transformer-Based Language Models with
  Progressive Layer Dropping
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Minjia Zhang
Yuxiong He
AI4CE
48
104
0
26 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
744
41,796
0
22 Oct 2020
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Rethinking Evaluation in ASR: Are Our Models Robust Enough?
Tatiana Likhomanenko
Qiantong Xu
Vineel Pratap
Paden Tomasello
Jacob Kahn
Gilad Avidov
R. Collobert
Gabriel Synnaeve
147
99
0
22 Oct 2020
Self-training and Pre-training are Complementary for Speech Recognition
Self-training and Pre-training are Complementary for Speech Recognition
Qiantong Xu
Alexei Baevski
Tatiana Likhomanenko
Paden Tomasello
Alexis Conneau
R. Collobert
Gabriel Synnaeve
Michael Auli
SSLVLM
153
173
0
22 Oct 2020
ChrEn: Cherokee-English Machine Translation for Endangered Language
  Revitalization
ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization
Shiyue Zhang
B. Frey
Joey Tianyi Zhou
89
31
0
09 Oct 2020
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial
  Text Generation
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
Tianlu Wang
Xuezhi Wang
Yao Qin
Ben Packer
Kang Li
Jilin Chen
Alex Beutel
Ed H. Chi
SILM
81
84
0
05 Oct 2020
Grounded Compositional Outputs for Adaptive Language Modeling
Grounded Compositional Outputs for Adaptive Language Modeling
Nikolaos Pappas
Phoebe Mulcaire
Noah A. Smith
KELM
80
7
0
24 Sep 2020
Towards Fully 8-bit Integer Inference for the Transformer Model
Towards Fully 8-bit Integer Inference for the Transformer Model
Ye Lin
Yanyang Li
Tengbo Liu
Tong Xiao
Tongran Liu
Jingbo Zhu
MQ
78
63
0
17 Sep 2020
Previous
123456
Next