Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.19466
Cited By
The Impact of Positional Encoding on Length Generalization in Transformers
31 May 2023
Amirhossein Kazemnejad
Inkit Padhi
K. Ramamurthy
Payel Das
Siva Reddy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Impact of Positional Encoding on Length Generalization in Transformers"
50 / 137 papers shown
Title
Dyadic Mamba: Long-term Dyadic Human Motion Synthesis
Julian Tanke
Takashi Shibuya
Kengo Uchida
Koichi Saito
Yuki Mitsufuji
Mamba
47
0
0
14 May 2025
Overflow Prevention Enhances Long-Context Recurrent LLMs
Assaf Ben-Kish
Itamar Zimerman
M. Jehanzeb Mirza
James R. Glass
Leonid Karlinsky
Raja Giryes
LRM
32
0
0
12 May 2025
Rethinking Invariance in In-context Learning
Lizhe Fang
Yifei Wang
Khashayar Gatmiry
Lei Fang
Yishuo Wang
54
2
0
08 May 2025
Intra-Layer Recurrence in Transformers for Language Modeling
Anthony Nguyen
Wenjun Lin
29
0
0
03 May 2025
A Comparative Study on Positional Encoding for Time-frequency Domain Dual-path Transformer-based Source Separation Models
Kohei Saijo
Tetsuji Ogawa
52
1
0
28 Apr 2025
Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention
Xiang Hu
Jiaqi Leng
Jun Zhao
Kewei Tu
Wei Wu
Mamba
50
0
0
23 Apr 2025
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
M. Chowdhury
Md Rifat Ur Rahman
Akil Ahmad Taki
27
0
0
19 Apr 2025
SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling
Krishna C. Puvvada
Faisal Ladhak
Santiago Akle Serrano
Cheng-Ping Hsieh
Shantanu Acharya
...
Fei Jia
Samuel Kriman
Simeng Sun
Dima Rekesh
Boris Ginsburg
RALM
60
0
0
11 Apr 2025
Between Linear and Sinusoidal: Rethinking the Time Encoder in Dynamic Graph Learning
Hsing-Huan Chung
Shravan Chaudhari
Xing Han
Yoav Wald
S. Saria
Joydeep Ghosh
AI4TS
36
0
0
10 Apr 2025
On Vanishing Variance in Transformer Length Generalization
Ruining Li
Gabrijel Boduljak
Jensen
Zhou
41
0
0
03 Apr 2025
Spline-based Transformers
Prashanth Chandran
Agon Serifi
Markus Gross
Moritz Bächer
41
0
0
03 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
46
0
0
29 Mar 2025
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu
Qian Liu
Haonan Wang
Shiqi Chen
Xiangming Gu
Tianyu Pang
Min-Yen Kan
44
0
0
19 Mar 2025
Depth-Aware Range Image-Based Model for Point Cloud Segmentation
Bike Chen
Antti Tikänmaki
Juha Roning
3DPC
3DV
55
0
0
19 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
68
0
0
17 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
EnergyFormer: Energy Attention with Fourier Embedding for Hyperspectral Image Classification
Shri Kiran Srinivasan
Muhammad Usama
Usman Ghous
Manuel Mazzara
Salvatore Distefano
Muhammad Ahmad
64
1
0
11 Mar 2025
Context-aware Biases for Length Extrapolation
Ali Veisi
Amir Mansourian
55
0
0
11 Mar 2025
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Shen Zhang
Yaning Tan
Siyuan Liang
Zhaowei Chen
Linze Li
...
Shuheng Li
Zhenyu Zhao
Caihua Chen
Jiajun Liang
Yao Tang
51
0
0
06 Mar 2025
Conformal Transformations for Symmetric Power Transformers
Saurabh Kumar
Jacob Buckman
Carles Gelada
Sean Zhang
70
0
0
05 Mar 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Yifan Zhang
Wenyu Du
Dongming Jin
Jie Fu
Zhi Jin
LRM
53
0
0
27 Feb 2025
Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models
Konstantina Palla
José Luis Redondo García
C. Hauff
Francesco Fabbri
Henrik Lindström
Daniel R. Taber
Andreas Damianou
M. Lalmas
AILaw
67
0
0
25 Feb 2025
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
39
0
0
24 Feb 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
42
0
0
24 Feb 2025
Solving Empirical Bayes via Transformers
Anzo Teh
Mark Jabbour
Yury Polyanskiy
93
0
0
17 Feb 2025
The Rotary Position Embedding May Cause Dimension Inefficiency in Attention Heads for Long-Distance Retrieval
Ting-Rui Chiang
Dani Yogatama
41
0
0
16 Feb 2025
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
Jean Vassoyan
Nathanaël Beau
Roman Plaud
OffRL
100
1
0
10 Feb 2025
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck
Isaac Reid
M. Jacob
Alex Bewley
Joshua Ainslie
...
Matthias Minderer
Dmitry Kalashnikov
Jonathan Tompson
Vikas Sindhwani
Krzysztof Choromanski
66
1
0
04 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
83
4
0
03 Feb 2025
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Hadi Pouransari
Chun-Liang Li
Jen-Hao Rick Chang
Pavan Kumar Anasosalu Vasu
Cem Koc
Vaishaal Shankar
Oncel Tuzel
42
8
0
08 Jan 2025
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Jiajun Zhu
Peihao Wang
Ruisi Cai
Jason D. Lee
Pan Li
Zhilin Wang
KELM
51
1
0
03 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
85
4
0
31 Dec 2024
Precise Length Control in Large Language Models
Bradley Butcher
Michael O'Keefe
James Titchener
KELM
80
4
0
16 Dec 2024
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory
Shuo Wang
Issei Sato
76
0
0
16 Dec 2024
HIST-AID: Leveraging Historical Patient Reports for Enhanced Multi-Modal Automatic Diagnosis
Haoxu Huang
Cem M. Deniz
K. Cho
S. Chopra
Divyam Madaan
34
1
0
16 Nov 2024
Quantifying artificial intelligence through algebraic generalization
Takuya Ito
Murray Campbell
L. Horesh
Tim Klinger
Parikshit Ram
ELM
51
0
0
08 Nov 2024
Number Cookbook: Number Understanding of Language Models and How to Improve It
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
46
2
0
06 Nov 2024
Provable Length Generalization in Sequence Prediction via Spectral Filtering
Annie Marsden
Evan Dogariu
Naman Agarwal
Xinyi Chen
Daniel Suo
Elad Hazan
34
1
0
01 Nov 2024
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
Nursena Köprücü
Destiny Okpekpe
Antonio Orvieto
Mamba
44
1
0
31 Oct 2024
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
Yuhan Chen
Ang Lv
Jian Luan
Bin Wang
Wei Liu
33
4
0
28 Oct 2024
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan
Yikang Shen
Songlin Yang
Aaron C. Courville
Rameswar Panda
30
4
0
23 Oct 2024
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
Xin Ma
Yang Liu
Jiaheng Liu
Xiaoxu Ma
28
1
0
21 Oct 2024
The Mystery of the Pathological Path-star Task for Language Models
Arvid Frydenlund
LRM
27
4
0
17 Oct 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Cheng Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie
Junsong Chen
Junyu Chen
Han Cai
Haotian Tang
...
Zhekai Zhang
Muyang Li
Ligeng Zhu
Yaojie Lu
Song Han
VLM
46
49
0
14 Oct 2024
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Yang Chen
Yitao Liang
Zhouchen Lin
37
1
0
11 Oct 2024
Visual Scratchpads: Enabling Global Reasoning in Vision
Aryo Lotfi
Enrico Fini
Samy Bengio
Moin Nabi
Emmanuel Abbe
LRM
39
0
0
10 Oct 2024
MLissard: Multilingual Long and Simple Sequential Reasoning Benchmarks
M. Bueno
R. Lotufo
Rodrigo Nogueira
LRM
31
0
0
08 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
83
16
0
08 Oct 2024
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Chuanyang Zheng
Yihang Gao
Han Shi
Jing Xiong
Jiankai Sun
...
Xiaozhe Ren
Michael Ng
Xin Jiang
Zhenguo Li
Yu Li
36
2
0
07 Oct 2024
1
2
3
Next