ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.10853
  4. Cited By
Adaptive Input Representations for Neural Language Modeling
v1v2v3 (latest)

Adaptive Input Representations for Neural Language Modeling

28 September 2018
Alexei Baevski
Michael Auli
ArXiv (abs)PDFHTML

Papers citing "Adaptive Input Representations for Neural Language Modeling"

50 / 269 papers shown
Title
Context-aware Biases for Length Extrapolation
Context-aware Biases for Length Extrapolation
Ali Veisi
Hamidreza Amirzadeh
Amir Mansourian
165
1
0
11 Mar 2025
Transformer Meets Twicing: Harnessing Unattended Residual Information
Laziz U. Abdullaev
Tan M. Nguyen
144
3
0
02 Mar 2025
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Revisiting Kernel Attention with Correlated Gaussian Process Representation
Long Minh Bui
Tho Tran Huu
Duy-Tung Dinh
T. Nguyen
Trong Nghia Hoang
127
2
0
27 Feb 2025
The Curse of Depth in Large Language Models
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
133
7
0
09 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
141
0
0
02 Feb 2025
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
  Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
116
8
0
18 Dec 2024
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal
  Approach
Can Reasons Help Improve Pedestrian Intent Estimation? A Cross-Modal Approach
Vaishnavi Khindkar
V. Balasubramanian
Chetan Arora
A. Subramanian
C. V. Jawahar
116
0
0
20 Nov 2024
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
NeuralDEM -- Real-time Simulation of Industrial Particulate Flows
Benedikt Alkin
Tobias Kronlachner
Samuele Papa
Stefan Pirker
Thomas Lichtenegger
Johannes Brandstetter
PINNAI4CE
126
3
1
14 Nov 2024
Predictor-Corrector Enhanced Transformers with Exponential Moving
  Average Coefficient Learning
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning
Yangqiu Song
Tong Zheng
Ran Wang
Jiahao Liu
Qingyan Guo
...
Xu Tan
Tong Xiao
Jingbo Zhu
Jiadong Wang
Xunliang Cai
114
2
0
05 Nov 2024
Adaptive Length Image Tokenization via Recurrent Allocation
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal
Phillip Isola
Antonio Torralba
William T. Freeman
VLM
102
9
0
04 Nov 2024
Does equivariance matter at scale?
Does equivariance matter at scale?
Johann Brehmer
S. Behrends
P. D. Haan
Taco S. Cohen
106
15
0
30 Oct 2024
Scalable Message Passing Neural Networks: No Need for Attention in Large
  Graph Representation Learning
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz Sáez de Ocáriz Borde
Artem Lukoianov
Anastasis Kratsios
Michael M. Bronstein
Xiaowen Dong
GNN
73
2
0
29 Oct 2024
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct
  Timestamp Encoding
PESFormer: Boosting Macro- and Micro-expression Spotting with Direct Timestamp Encoding
Wang-Wang Yu
Kai-Fu Yang
Xiangrui Hu
Jingwen Jiang
Hong-Mei Yan
Yong-Jie Li
62
0
0
24 Oct 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Zebin Yang
Renze Chen
Taiqiang Wu
Ngai Wong
Yun Liang
Runsheng Wang
R. Huang
Meng Li
MQ
85
1
0
23 Oct 2024
SeisLM: a Foundation Model for Seismic Waveforms
SeisLM: a Foundation Model for Seismic Waveforms
Tianlin Liu
Jannes Münchmeyer
Laura Laurenti
C. Marone
Maarten V. de Hoop
Ivan Dokmanić
VLM
124
6
0
21 Oct 2024
How much do contextualized representations encode long-range context?
How much do contextualized representations encode long-range context?
Simeng Sun
Cheng-Ping Hsieh
131
0
0
16 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
134
7
0
14 Oct 2024
Deep Transfer Learning for Breast Cancer Classification
Deep Transfer Learning for Breast Cancer Classification
Prudence Djagba
J. K. Buwa Mbouobda
23
0
0
05 Sep 2024
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Dinh-Phu Tran
Quoc-Anh Nguyen
Van-Truong Pham
Thi-Thao Tran
ViTMedIm
65
6
0
24 Jul 2024
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
Rulin Shao
Jacqueline He
Akari Asai
Weijia Shi
Tim Dettmers
Sewon Min
Luke Zettlemoyer
Pang Wei Koh
RALM
99
26
0
09 Jul 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
70
14
0
19 Jun 2024
Elliptical Attention
Elliptical Attention
Stefan K. Nielsen
Laziz U. Abdullaev
R. Teo
Tan M. Nguyen
87
4
0
19 Jun 2024
Unveiling the Hidden Structure of Self-Attention via Kernel Principal
  Component Analysis
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis
R. Teo
Tan M. Nguyen
91
4
0
19 Jun 2024
Short-Long Convolutions Help Hardware-Efficient Linear Attention to
  Focus on Long Sequences
Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences
Zicheng Liu
Siyuan Li
Li Wang
Zedong Wang
Yunfan Liu
Stan Z. Li
77
8
0
12 Jun 2024
On Limitation of Transformer for Learning HMMs
On Limitation of Transformer for Learning HMMs
Jiachen Hu
Qinghua Liu
Chi Jin
95
3
0
06 Jun 2024
Query2CAD: Generating CAD models using natural language queries
Query2CAD: Generating CAD models using natural language queries
Akshay Badagabettu
Sai Sravan Yarlagadda
A. Farimani
81
15
0
31 May 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
84
4
0
29 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
82
15
0
23 May 2024
SLAB: Efficient Transformers with Simplified Linear Attention and
  Progressive Re-parameterized Batch Normalization
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo
Xinghao Chen
Yehui Tang
Yunhe Wang
ViT
101
14
0
19 May 2024
LeaPformer: Enabling Linear Transformers for Autoregressive and
  Simultaneous Tasks via Learned Proportions
LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions
Victor Agostinelli
Sanghyun Hong
Lizhong Chen
KELM
79
2
0
18 May 2024
State-Free Inference of State-Space Models: The Transfer Function
  Approach
State-Free Inference of State-Space Models: The Transfer Function Approach
Rom N. Parnichkun
Stefano Massaroli
Alessandro Moro
Jimmy T.H. Smith
Ramin Hasani
...
Hajime Asama
Stefano Ermon
Taiji Suzuki
Atsushi Yamashita
Michael Poli
96
9
0
10 May 2024
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
Kevin Slagle
72
4
0
22 Apr 2024
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based
  Mixture of Experts
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
Dengchun Li
Yingzi Ma
Naizheng Wang
Zhengmao Ye
Zhiyuan Cheng
...
Yan Zhang
Lei Duan
Jie Zuo
Cal Yang
Mingjie Tang
MoE
128
59
0
22 Apr 2024
LongVQ: Long Sequence Modeling with Vector Quantization on Structured
  Memory
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory
Zicheng Liu
Li Wang
Siyuan Li
Zedong Wang
Haitao Lin
Stan Z. Li
VLM
88
4
0
17 Apr 2024
Compression Represents Intelligence Linearly
Compression Represents Intelligence Linearly
Yuzhen Huang
Jinghan Zhang
Zifei Shan
Junxian He
82
29
0
15 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma
Xiaomeng Yang
Wenhan Xiong
Beidi Chen
Lili Yu
Hao Zhang
Jonathan May
Luke Zettlemoyer
Omer Levy
Chunting Zhou
90
33
0
12 Apr 2024
Algorithmic progress in language models
Algorithmic progress in language models
Anson Ho
T. Besiroglu
Ege Erdil
David Owen
Robi Rahman
Zifan Carl Guo
David Atkinson
Neil Thompson
J. Sevilla
67
18
0
09 Mar 2024
Bridging Associative Memory and Probabilistic Modeling
Bridging Associative Memory and Probabilistic Modeling
Rylan Schaeffer
Nika Zahedi
Mikail Khona
Dhruv Pai
Sang T. Truong
...
Sarthak Chandra
Andres Carranza
Ila Rani Fiete
Andrey Gromov
Oluwasanmi Koyejo
DiffM
113
4
0
15 Feb 2024
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning
Ningyuan Tang
Minghao Fu
Ke Zhu
Jianxin Wu
104
10
0
06 Feb 2024
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu
Sewon Min
Luke Zettlemoyer
Yejin Choi
Hannaneh Hajishirzi
136
60
0
30 Jan 2024
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Bin Lin
Zhenyu Tang
Yang Ye
Jiaxi Cui
Bin Zhu
...
Jinfa Huang
Junwu Zhang
Yatian Pang
Munan Ning
Li-ming Yuan
VLMMLLMMoE
144
180
0
29 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese
  Masked Conditional Variational Autoencoder
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
44
0
0
18 Jan 2024
Setting the Record Straight on Transformer Oversmoothing
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
88
6
0
09 Jan 2024
Plug-and-Play Transformer Modules for Test-Time Adaptation
Plug-and-Play Transformer Modules for Test-Time Adaptation
Xiangyu Chang
Sk. Miraj Ahmed
S. Krishnamurthy
Başak Güler
A. Swami
Samet Oymak
Amit K. Roy-Chowdhury
120
0
0
06 Jan 2024
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Asim Khan
Umair Nawaz
K. Lochan
Lakmal D. Seneviratne
Irfan Hussain
MedIm
44
5
0
26 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal
  Functionals
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
76
14
0
01 Dec 2023
Advancing State of the Art in Language Modeling
Advancing State of the Art in Language Modeling
David Herel
Tomas Mikolov
93
1
0
28 Nov 2023
Who is leading in AI? An analysis of industry AI research
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
126
8
0
24 Nov 2023
Memory-efficient Stochastic methods for Memory-based Transformers
Memory-efficient Stochastic methods for Memory-based Transformers
Vishwajit Kumar Vishnu
C. Sekhar
22
0
0
14 Nov 2023
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert
  Pretraining
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining
Martin Kuo
Jianyi Zhang
Yiran Chen
52
2
0
08 Nov 2023
123456
Next