Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1809.10853
Cited By
Adaptive Input Representations for Neural Language Modeling
28 September 2018
Alexei Baevski
Michael Auli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Adaptive Input Representations for Neural Language Modeling"
50 / 111 papers shown
Title
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
75
5
0
09 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
67
0
0
02 Feb 2025
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
Benedikt Alkin
Tobias Kronlachner
Samuele Papa
Stefan Pirker
Thomas Lichtenegger
Johannes Brandstetter
PINN
AI4CE
62
1
1
14 Nov 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
41
7
0
14 Oct 2024
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Dinh-Phu Tran
Quoc-Anh Nguyen
Van-Truong Pham
Thi-Thao Tran
ViT
MedIm
29
5
0
24 Jul 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
37
8
0
12 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
44
3
0
29 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu
Sewon Min
Luke Zettlemoyer
Yejin Choi
Hannaneh Hajishirzi
51
52
0
30 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
35
5
0
09 Jan 2024
Plug-and-Play Transformer Modules for Test-Time Adaptation
Xiangyu Chang
Sk. Miraj Ahmed
S. Krishnamurthy
Başak Güler
A. Swami
Samet Oymak
Amit K. Roy-Chowdhury
33
0
0
06 Jan 2024
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Asim Khan
Umair Nawaz
K. Lochan
Lakmal D. Seneviratne
Irfan Hussain
MedIm
30
4
0
26 Dec 2023
Large-Scale OD Matrix Estimation with A Deep Learning Method
Zheli Xiong
Defu Lian
Enhong Chen
Gang Chen
Xiaomin Cheng
15
0
0
09 Oct 2023
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min
Suchin Gururangan
Eric Wallace
Hannaneh Hajishirzi
Noah A. Smith
Luke Zettlemoyer
AILaw
28
63
0
08 Aug 2023
MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting
M. Tortora
F. Conte
G. Natrella
Paolo Soda
19
1
0
17 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
29
6
0
14 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
38
1
0
07 Jun 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
38
0
0
10 May 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
27
1
0
17 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
35
31
0
27 Jan 2023
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
22
0
07 Jan 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
75
372
0
28 Dec 2022
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
127
36
0
15 Dec 2022
A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong
Tongtao Zhang
Amit Chakraborty
Biswadip Dey
41
9
0
12 Dec 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
34
2
0
15 Nov 2022
Mutual Information Alleviates Hallucinations in Abstractive Summarization
Liam van der Poel
Ryan Cotterell
Clara Meister
HILM
18
58
0
24 Oct 2022
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
38
183
0
21 Sep 2022
Self-Attentive Pooling for Efficient Deep Learning
Fang Chen
Gourav Datta
Souvik Kundu
P. Beerel
82
6
0
16 Sep 2022
Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model
Xiulong Yang
Sheng-Min Shih
Yinlin Fu
Xiaoting Zhao
Shihao Ji
DiffM
33
56
0
16 Aug 2022
Stable Invariant Models via Koopman Spectra
Takuya Konishi
Yoshinobu Kawahara
23
3
0
15 Jul 2022
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista
Rowel Atienza
28
169
0
14 Jul 2022
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction
Rizki Ramadhan Fitra
N. Yudistira
W. Mahmudy
27
1
0
10 Jul 2022
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
249
128
0
25 May 2022
Learning to Model Editing Processes
Machel Reid
Graham Neubig
KELM
BDL
116
35
0
24 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
28
6
0
11 Apr 2022
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization
Ze Yang
Liran Wang
Zhoujin Tian
Wei Wu
Zhoujun Li
30
4
0
09 Apr 2022
Parameter-efficient Model Adaptation for Vision Transformers
Xuehai He
Chunyuan Li
Pengchuan Zhang
Jianwei Yang
Junfeng Fang
30
84
0
29 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
71
338
0
28 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
30
5
0
23 Mar 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
38
212
0
17 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
43
65
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
31
15
0
11 Feb 2022
How to Understand Masked Autoencoders
Shuhao Cao
Peng Xu
David Clifton
29
40
0
08 Feb 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
77
63
0
28 Jan 2022
Can Wikipedia Help Offline Reinforcement Learning?
Machel Reid
Yutaro Yamada
S. Gu
3DV
RALM
OffRL
140
95
0
28 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
38
23
0
25 Jan 2022
1
2
3
Next