ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.10853
  4. Cited By
Adaptive Input Representations for Neural Language Modeling

Adaptive Input Representations for Neural Language Modeling

28 September 2018
Alexei Baevski
Michael Auli
ArXivPDFHTML

Papers citing "Adaptive Input Representations for Neural Language Modeling"

50 / 101 papers shown
Title
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
41
2
0
02 Mar 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
75
4
0
09 Feb 2025
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
Benedikt Alkin
Tobias Kronlachner
Samuele Papa
Stefan Pirker
Thomas Lichtenegger
Johannes Brandstetter
PINN
AI4CE
57
1
1
14 Nov 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
38
7
0
14 Oct 2024
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Dinh-Phu Tran
Quoc-Anh Nguyen
Van-Truong Pham
Thi-Thao Tran
ViT
MedIm
29
5
0
24 Jul 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
35
7
0
12 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
39
3
0
29 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and
  Progressive Re-parameterized Batch Normalization
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
49
9
0
19 May 2024
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu
Sewon Min
Luke Zettlemoyer
Yejin Choi
Hannaneh Hajishirzi
51
50
0
30 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
28
5
0
09 Jan 2024
Plug-and-Play Transformer Modules for Test-Time Adaptation
Plug-and-Play Transformer Modules for Test-Time Adaptation
Xiangyu Chang
Sk. Miraj Ahmed
S. Krishnamurthy
Başak Güler
A. Swami
Samet Oymak
A. Roy-Chowdhury
33
0
0
06 Jan 2024
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Asim Khan
Umair Nawaz
K. Lochan
Lakmal D. Seneviratne
Irfan Hussain
MedIm
30
4
0
26 Dec 2023
Large-Scale OD Matrix Estimation with A Deep Learning Method
Large-Scale OD Matrix Estimation with A Deep Learning Method
Zheli Xiong
Defu Lian
Enhong Chen
Gang Chen
Xiaomin Cheng
13
0
0
09 Oct 2023
MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV
  Generation Forecasting
MATNet: Multi-Level Fusion Transformer-Based Model for Day-Ahead PV Generation Forecasting
M. Tortora
F. Conte
G. Natrella
Paolo Soda
14
1
0
17 Jun 2023
Tagged End-to-End Simultaneous Speech Translation Training using
  Simultaneous Interpretation Data
Tagged End-to-End Simultaneous Speech Translation Training using Simultaneous Interpretation Data
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Katsuhito Sudoh
Satoshi Nakamura
29
6
0
14 Jun 2023
MobileNMT: Enabling Translation in 15MB and 30ms
MobileNMT: Enabling Translation in 15MB and 30ms
Ye Lin
Xiaohui Wang
Zhexi Zhang
Mingxuan Wang
Tong Xiao
Jingbo Zhu
MQ
30
1
0
07 Jun 2023
Multi-Path Transformer is Better: A Case Study on Neural Machine
  Translation
Multi-Path Transformer is Better: A Case Study on Neural Machine Translation
Ye Lin
Shuhan Zhou
Yanyang Li
Anxiang Ma
Tong Xiao
Jingbo Zhu
38
0
0
10 May 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
24
1
0
17 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Efficient Attention via Control Variates
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
34
18
0
09 Feb 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Why do Nearest Neighbor Language Models Work?
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
27
21
0
07 Jan 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
73
370
0
28 Dec 2022
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
32
2
0
20 Dec 2022
Efficient Long Sequence Modeling via State Space Augmented Transformer
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo
Xiaodong Liu
Jian Jiao
Denis Xavier Charles
Eren Manavoglu
Tuo Zhao
Jianfeng Gao
125
36
0
15 Dec 2022
A Neural ODE Interpretation of Transformer Layers
A Neural ODE Interpretation of Transformer Layers
Yaofeng Desmond Zhong
Tongtao Zhang
Amit Chakraborty
Biswadip Dey
28
9
0
12 Dec 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck
  Principle
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
34
2
0
15 Nov 2022
Mutual Information Alleviates Hallucinations in Abstractive
  Summarization
Mutual Information Alleviates Hallucinations in Abstractive Summarization
Liam van der Poel
Ryan Cotterell
Clara Meister
HILM
16
57
0
24 Oct 2022
Mega: Moving Average Equipped Gated Attention
Mega: Moving Average Equipped Gated Attention
Xuezhe Ma
Chunting Zhou
Xiang Kong
Junxian He
Liangke Gui
Graham Neubig
Jonathan May
Luke Zettlemoyer
33
183
0
21 Sep 2022
Self-Attentive Pooling for Efficient Deep Learning
Self-Attentive Pooling for Efficient Deep Learning
Fang Chen
Gourav Datta
Souvik Kundu
P. Beerel
82
6
0
16 Sep 2022
Stable Invariant Models via Koopman Spectra
Stable Invariant Models via Koopman Spectra
Takuya Konishi
Yoshinobu Kawahara
23
3
0
15 Jul 2022
Scene Text Recognition with Permuted Autoregressive Sequence Models
Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista
Rowel Atienza
26
169
0
14 Jul 2022
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth
  Prediction
Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction
Rizki Ramadhan Fitra
N. Yudistira
W. Mahmudy
24
1
0
10 Jul 2022
Training Language Models with Memory Augmentation
Training Language Models with Memory Augmentation
Zexuan Zhong
Tao Lei
Danqi Chen
RALM
239
128
0
25 May 2022
Learning to Model Editing Processes
Learning to Model Editing Processes
Machel Reid
Graham Neubig
KELM
BDL
113
35
0
24 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
TANet: Thread-Aware Pretraining for Abstractive Conversational
  Summarization
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization
Ze Yang
Liran Wang
Zhoujin Tian
Wei Wu
Zhoujun Li
27
4
0
09 Apr 2022
Parameter-efficient Model Adaptation for Vision Transformers
Parameter-efficient Model Adaptation for Vision Transformers
Xuehai He
Chunyuan Li
Pengchuan Zhang
Jianwei Yang
Qing Guo
30
84
0
29 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts
  in the Vocabulary Space
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
69
336
0
28 Mar 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
20
5
0
23 Mar 2022
cosFormer: Rethinking Softmax in Attention
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
26
212
0
17 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
43
65
0
15 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient
  Image Captioning
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
26
15
0
11 Feb 2022
How to Understand Masked Autoencoders
How to Understand Masked Autoencoders
Shuhao Cao
Peng Xu
David A. Clifton
29
40
0
08 Feb 2022
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval
Uri Alon
Frank F. Xu
Junxian He
Sudipta Sengupta
Dan Roth
Graham Neubig
RALM
77
62
0
28 Jan 2022
Can Wikipedia Help Offline Reinforcement Learning?
Can Wikipedia Help Offline Reinforcement Learning?
Machel Reid
Yutaro Yamada
S. Gu
3DV
RALM
OffRL
140
95
0
28 Jan 2022
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning
  for Speech Pre-Training
SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training
Wenyong Huang
Zhenhe Zhang
Y. Yeung
Xin Jiang
Qun Liu
35
23
0
25 Jan 2022
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
Artefact Retrieval: Overview of NLP Models with Knowledge Base Access
Vilém Zouhar
Marius Mosbach
Debanjali Biswas
Dietrich Klakow
KELM
27
4
0
24 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Efficient Large Scale Language Modeling with Mixtures of Experts
Efficient Large Scale Language Modeling with Mixtures of Experts
Mikel Artetxe
Shruti Bhosale
Naman Goyal
Todor Mihaylov
Myle Ott
...
Jeff Wang
Luke Zettlemoyer
Mona T. Diab
Zornitsa Kozareva
Ves Stoyanov
MoE
61
188
0
20 Dec 2021
123
Next