Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1809.10853
Cited By
v1
v2
v3 (latest)
Adaptive Input Representations for Neural Language Modeling
28 September 2018
Alexei Baevski
Michael Auli
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adaptive Input Representations for Neural Language Modeling"
50 / 269 papers shown
Title
High-resolution power equipment recognition based on improved self-attention
Siyi Zhang
Cheng Liu
Xiang Li
Xin Zhai
Zhen Wei
Sizhe Li
Xun Ma
21
0
0
06 Nov 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoE
AI4CE
47
0
0
23 Oct 2023
Extending Input Contexts of Language Models through Training on Segmented Sequences
Petros Karypis
Julian McAuley
George Karypis
56
0
0
23 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann
Simon Schrodi
Jelena Bratulić
Nadine Behrmann
Volker Fischer
Thomas Brox
116
8
0
19 Oct 2023
Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective
Huayang Li
Tian Lan
Z. Fu
Deng Cai
Lemao Liu
Nigel Collier
Taro Watanabe
Yixuan Su
88
18
0
16 Oct 2023
How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances
Zihan Zhang
Meng Fang
Lingxi Chen
Mohammad-Reza Namazi-Rad
Jun Wang
KELM
96
24
0
11 Oct 2023
Large-Scale OD Matrix Estimation with A Deep Learning Method
Zheli Xiong
Defu Lian
Enhong Chen
Gang Chen
Xiaomin Cheng
40
0
0
09 Oct 2023
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression
Ayush Kaushal
Tejas Vaidhya
Irina Rish
121
16
0
25 Sep 2023
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min
Suchin Gururangan
Eric Wallace
Hannaneh Hajishirzi
Noah A. Smith
Luke Zettlemoyer
AILaw
105
67
0
08 Aug 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
255
622
0
12 Jul 2023
MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting
M. Tortora
F. Conte
G. Natrella
Paolo Soda
158
2
0
17 Jun 2023
Understanding Parameter Sharing in Transformers
Ye Lin
Mingxuan Wang
Zhexi Zhang
Xiaohui Wang
Tong Xiao
Jingbo Zhu
MoE
77
2
0
15 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
63
6
0
14 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
63
2
0
07 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
168
3
0
02 Jun 2023
Sampling and Ranking for Digital Ink Generation on a tight computational budget
A. Afonin
Andrii Maksai
A. Timofeev
C. Musat
DiffM
76
0
0
02 Jun 2023
Coneheads: Hierarchy Aware Attention
Albert Tseng
Tao Yu
Toni J.B. Liu
Chris De Sa
3DPC
102
6
0
01 Jun 2023
A Quantitative Review on Language Model Efficiency Research
Meng Jiang
Hy Dang
Lingbo Tong
76
0
0
28 May 2023
Geometric Algebra Transformer
Johann Brehmer
P. D. Haan
S. Behrends
Taco S. Cohen
102
32
0
28 May 2023
Finding the Pillars of Strength for Multi-Head Attention
Jinjie Ni
Rui Mao
Zonglin Yang
Han Lei
Min Zhang
36
5
0
22 May 2023
Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Zhuoyuan Mao
Raj Dabre
Qianying Liu
Haiyue Song
Chenhui Chu
Sadao Kurohashi
40
7
0
16 May 2023
Salient Mask-Guided Vision Transformer for Fine-Grained Classification
Dmitry Demidov
M.H. Sharif
Aliakbar Abdurahimov
Hisham Cholakkal
Fahad Shahbaz Khan
108
11
0
11 May 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
67
0
0
10 May 2023
Tensor Decomposition for Model Reduction in Neural Networks: A Review
Xingyi Liu
Keshab K. Parhi
82
14
0
26 Apr 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
58
1
0
17 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
105
43
0
07 Apr 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
114
78
0
11 Mar 2023
Policy Dispersion in Non-Markovian Environment
B. Qu
Xiaofeng Cao
Jielong Yang
Hechang Chen
Chang Yi
Ivor W.Tsang
Yew-Soon Ong
56
0
0
28 Feb 2023
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Bobby He
James Martens
Guodong Zhang
Aleksandar Botev
Andy Brock
Samuel L. Smith
Yee Whye Teh
85
30
0
20 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
133
20
0
09 Feb 2023
Dual PatchNorm
Manoj Kumar
Mostafa Dehghani
N. Houlsby
UQCV
ViT
97
12
0
02 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
111
38
0
27 Jan 2023
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
Davis Liang
Hila Gonen
Yuning Mao
Rui Hou
Naman Goyal
Marjan Ghazvininejad
Luke Zettlemoyer
Madian Khabsa
87
80
0
25 Jan 2023
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
71
23
0
07 Jan 2023
Is word segmentation necessary for Vietnamese sentiment classification?
Duc-Vu Nguyen
Ngan Luu-Thuy Nguyen
34
0
0
01 Jan 2023
On Transforming Reinforcement Learning by Transformer: The Development Trajectory
Shengchao Hu
Li Shen
Ya Zhang
Yixin Chen
Dacheng Tao
OffRL
148
30
0
29 Dec 2022
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
117
91
0
28 Dec 2022
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
157
404
0
28 Dec 2022
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
112
2
0
20 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
175
37
0
15 Dec 2022
A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong
Tongtao Zhang
Amit Chakraborty
Biswadip Dey
139
10
0
12 Dec 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
64
2
0
15 Nov 2022
You can't pick your neighbors, or can you? When and how to rely on retrieval in the
k
k
k
NN-LM
Andrew Drozdov
Shufan Wang
Razieh Rahimi
Andrew McCallum
Hamed Zamani
Mohit Iyyer
RALM
202
17
0
28 Oct 2022
N
N
N
-gram Is Back: Residual Learning of Neural Text Generation with
n
n
n
-gram Language Model
Huayang Li
Deng Cai
J. Xu
Taro Watanabe
VLM
67
1
0
26 Oct 2022
Mutual Information Alleviates Hallucinations in Abstractive Summarization
Liam van der Poel
Ryan Cotterell
Clara Meister
HILM
109
61
0
24 Oct 2022
Feature-Proxy Transformer for Few-Shot Segmentation
Jianwei Zhang
Yifan Sun
Yi Yang
Wei Chen
ViT
75
63
0
13 Oct 2022
Designing Robust Transformers using Robust Kernel Density Estimation
Xing Han
Zhaolin Ren
T. Nguyen
Khai Nguyen
Joydeep Ghosh
Nhat Ho
110
6
0
11 Oct 2022
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
114
22
0
09 Oct 2022
Small Character Models Match Large Word Models for Autocomplete Under Memory Constraints
Ganesh Jawahar
Subhabrata Mukherjee
Debadeepta Dey
Muhammad Abdul-Mageed
L. Lakshmanan
C. C. T. Mendes
Gustavo de Rosa
S. Shah
44
0
0
06 Oct 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
143
185
0
21 Sep 2022
Previous
1
2
3
4
5
6
Next