ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.19466
  4. Cited By
The Impact of Positional Encoding on Length Generalization in
  Transformers

The Impact of Positional Encoding on Length Generalization in Transformers

31 May 2023
Amirhossein Kazemnejad
Inkit Padhi
K. Ramamurthy
Payel Das
Siva Reddy
ArXivPDFHTML

Papers citing "The Impact of Positional Encoding on Length Generalization in Transformers"

50 / 137 papers shown
Title
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Minsoo Kim
Kyuhong Shim
Jungwook Choi
Simyung Chang
19
5
0
02 Oct 2024
Extending Context Window of Large Language Models from a Distributional
  Perspective
Extending Context Window of Large Language Models from a Distributional Perspective
Yingsheng Wu
Yuxuan Gu
Xiaocheng Feng
Weihong Zhong
Dongliang Xu
Qing Yang
Hongtao Liu
Bing Qin
21
1
0
02 Oct 2024
Positional Attention: Expressivity and Learnability of Algorithmic Computation
Positional Attention: Expressivity and Learnability of Algorithmic Computation
Artur Back de Luca
George Giapitzakis
Shenghao Yang
Petar Veličković
K. Fountoulakis
46
0
0
02 Oct 2024
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs
Mehdi Ali
Michael Fromm
Klaudia Thellmann
Jan Ebert
Alexander Arno Weber
...
René Jäkel
Georg Rehm
Stefan Kesselheim
Joachim Köhler
Nicolas Flores-Herr
72
6
0
30 Sep 2024
Scaling Behavior for Large Language Models regarding Numeral Systems: An
  Example using Pythia
Scaling Behavior for Large Language Models regarding Numeral Systems: An Example using Pythia
Zhejian Zhou
Jiayu Wang
Dahua Lin
Kai Chen
LRM
37
2
0
25 Sep 2024
TeXBLEU: Automatic Metric for Evaluate LaTeX Format
TeXBLEU: Automatic Metric for Evaluate LaTeX Format
Kyudan Jung
N. Kim
Hyongon Ryu
Sieun Hyeon
Seung-jun Lee
Hyeok-jae Lee
37
0
0
10 Sep 2024
Your Context Is Not an Array: Unveiling Random Access Limitations in
  Transformers
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MohammadReza Ebrahimi
Sunny Panchal
Roland Memisevic
33
5
0
10 Aug 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
66
1
0
15 Jul 2024
Human-like Episodic Memory for Infinite Context LLMs
Human-like Episodic Memory for Infinite Context LLMs
Z. Fountas
Martin A Benfeghoul
Adnan Oomerjee
Fenia Christopoulou
Gerasimos Lampouras
Haitham Bou-Ammar
Jun Wang
31
18
0
12 Jul 2024
Teaching Transformers Causal Reasoning through Axiomatic Training
Teaching Transformers Causal Reasoning through Axiomatic Training
Aniket Vashishtha
Abhinav Kumar
Abbavaram Gowtham Reddy
Vineeth N. Balasubramanian
Amit Sharma
Vineeth N Balasubramanian
Amit Sharma
39
2
0
10 Jul 2024
On the Power of Convolution Augmented Transformer
On the Power of Convolution Augmented Transformer
Mingchen Li
Xuechen Zhang
Yixiao Huang
Samet Oymak
37
0
0
08 Jul 2024
Universal Length Generalization with Turing Programs
Universal Length Generalization with Turing Programs
Kaiying Hou
David Brandfonbrener
Sham Kakade
Samy Jelassi
Eran Malach
44
7
0
03 Jul 2024
Multi-State-Action Tokenisation in Decision Transformers for
  Multi-Discrete Action Spaces
Multi-State-Action Tokenisation in Decision Transformers for Multi-Discrete Action Spaces
Perusha Moodley
Pramod S. Kaushik
Dhillu Thambi
Mark Trovinger
Praveen Paruchuri
Xia Hong
Benjamin Rosman
54
0
0
01 Jul 2024
Eliminating Position Bias of Language Models: A Mechanistic Approach
Eliminating Position Bias of Language Models: A Mechanistic Approach
Ziqi Wang
Hanlin Zhang
Xiner Li
Kuan-Hao Huang
Chi Han
Shuiwang Ji
Sham Kakade
Hao Peng
Heng Ji
57
12
0
01 Jul 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
77
13
0
20 Jun 2024
What Can We Learn from State Space Models for Machine Learning on
  Graphs?
What Can We Learn from State Space Models for Machine Learning on Graphs?
Yinan Huang
Siqi Miao
Pan Li
44
7
0
09 Jun 2024
Explicitly Encoding Structural Symmetry is Key to Length Generalization
  in Arithmetic Tasks
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
Mahdi Sabbaghi
George Pappas
Hamed Hassani
Surbhi Goel
41
4
0
04 Jun 2024
Contextual Counting: A Mechanistic Study of Transformers on a
  Quantitative Task
Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task
Siavash Golkar
Alberto Bietti
Mariel Pettee
Michael Eickenberg
M. Cranmer
...
Ruben Ohana
Liam Parker
Bruno Régaldo-Saint Blancard
Kyunghyun Cho
Shirley Ho
47
1
0
30 May 2024
Language Models Need Inductive Biases to Count Inductively
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang
Yonatan Bisk
LRM
32
5
0
30 May 2024
XL3M: A Training-free Framework for LLM Length Extension Based on
  Segment-wise Inference
XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference
Shengnan Wang
Youhui Bai
Lin Zhang
Pingyi Zhou
Shixiong Zhao
Gong Zhang
Sen Wang
Renhai Chen
Hua Xu
Hongwei Sun
36
3
0
28 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
53
28
0
27 May 2024
Base of RoPE Bounds Context Length
Base of RoPE Bounds Context Length
Xin Men
Mingyu Xu
Bingning Wang
Qingyu Zhang
Hongyu Lin
Xianpei Han
Weipeng Chen
34
19
0
23 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
52
2
0
22 May 2024
Transforming the Bootstrap: Using Transformers to Compute Scattering
  Amplitudes in Planar N = 4 Super Yang-Mills Theory
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory
Tianji Cai
G. W. Merz
Franccois Charton
Niklas Nolte
Matthias Wilhelm
K. Cranmer
Lance J. Dixon
34
15
0
09 May 2024
Towards Less Biased Data-driven Scoring with Deep Learning-Based
  End-to-end Database Search in Tandem Mass Spectrometry
Towards Less Biased Data-driven Scoring with Deep Learning-Based End-to-end Database Search in Tandem Mass Spectrometry
Yonghan Yu
Ming Li
38
0
0
08 May 2024
Philosophy of Cognitive Science in the Age of Deep Learning
Philosophy of Cognitive Science in the Age of Deep Learning
Raphaël Millière
AI4CE
NAI
43
3
0
07 May 2024
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large
  Language Models
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models
Arpit Aggarwal
29
0
0
29 Apr 2024
Length Generalization of Causal Transformers without Position Encoding
Length Generalization of Causal Transformers without Position Encoding
Jie Wang
Tao Ji
Yuanbin Wu
Hang Yan
Tao Gui
Qi Zhang
Xuanjing Huang
Xiaoling Wang
VLM
55
15
0
18 Apr 2024
Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRM
LLMAG
CLL
91
102
0
10 Apr 2024
Superposition Prompting: Improving and Accelerating Retrieval-Augmented
  Generation
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Thomas Merth
Qichen Fu
Mohammad Rastegari
Mahyar Najibi
LRM
RALM
39
8
0
10 Apr 2024
A Theory for Length Generalization in Learning to Reason
A Theory for Length Generalization in Learning to Reason
Changnan Xiao
Bing Liu
LRM
44
8
0
31 Mar 2024
MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding
  Length Extrapolation
MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation
Weiguo Gao
34
1
0
26 Mar 2024
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li
William H. Beluch
M. Keuper
Dan Zhang
Anna Khoreva
DiffM
VGen
84
5
0
20 Mar 2024
Larimar: Large Language Models with Episodic Memory Control
Larimar: Large Language Models with Episodic Memory Control
Payel Das
Subhajit Chaudhury
Elliot Nelson
Igor Melnyk
Sarath Swaminathan
...
Vijil Chenthamarakshan
Jiří
Jirí Navrátil
Soham Dan
Pin-Yu Chen
CLL
KELM
37
18
0
18 Mar 2024
Denoising Autoregressive Representation Learning
Denoising Autoregressive Representation Learning
Yazhe Li
J. Bornschein
Ting Chen
DiffM
40
3
0
08 Mar 2024
HeAR -- Health Acoustic Representations
HeAR -- Health Acoustic Representations
Sebastien Baur
Zaid Nabulsi
Wei-Hung Weng
Jake Garrison
Louis Blankemeier
...
Shwetak N. Patel
S. Shetty
Shruthi Prabhakara
Monde Muyoyeta
Diego Ardila
LM&MA
24
10
0
04 Mar 2024
Resonance RoPE: Improving Context Length Generalization of Large
  Language Models
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Suyuchen Wang
I. Kobyzev
Peng Lu
Mehdi Rezagholizadeh
Bang Liu
40
11
0
29 Feb 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for
  Efficient Language Models
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
61
117
0
29 Feb 2024
Case-Based or Rule-Based: How Do Transformers Do the Math?
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu
Xiaojuan Tang
Haotong Yang
Muhan Zhang
LRM
27
18
0
27 Feb 2024
Training-Free Long-Context Scaling of Large Language Models
Training-Free Long-Context Scaling of Large Language Models
Chen An
Fei Huang
Jun Zhang
Shansan Gong
Xipeng Qiu
Chang Zhou
Lingpeng Kong
ALM
LRM
40
34
0
27 Feb 2024
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of
  Prompting Strategies
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ELM
46
7
0
27 Feb 2024
Seamless Human Motion Composition with Blended Positional Encodings
Seamless Human Motion Composition with Blended Positional Encodings
Germán Barquero
Sergio Escalera
Cristina Palmero
DiffM
47
28
0
23 Feb 2024
Structure-informed Positional Encoding for Music Generation
Structure-informed Positional Encoding for Music Generation
Manvi Agarwal
Changhong Wang
Gaël Richard
36
2
0
20 Feb 2024
Transformers Can Achieve Length Generalization But Not Robustly
Transformers Can Achieve Length Generalization But Not Robustly
Yongchao Zhou
Uri Alon
Xinyun Chen
Xuezhi Wang
Rishabh Agarwal
Denny Zhou
52
36
0
14 Feb 2024
Lissard: Long and Simple Sequential Reasoning Datasets
Lissard: Long and Simple Sequential Reasoning Datasets
M. Bueno
R. Lotufo
Rodrigo Nogueira
RALM
LRM
30
2
0
12 Feb 2024
How do Transformers perform In-Context Autoregressive Learning?
How do Transformers perform In-Context Autoregressive Learning?
Michael E. Sander
Raja Giryes
Taiji Suzuki
Mathieu Blondel
Gabriel Peyré
37
7
0
08 Feb 2024
On Provable Length and Compositional Generalization
On Provable Length and Compositional Generalization
Kartik Ahuja
Amin Mansouri
OODD
41
7
0
07 Feb 2024
A phase transition between positional and semantic learning in a
  solvable model of dot-product attention
A phase transition between positional and semantic learning in a solvable model of dot-product attention
Hugo Cui
Freya Behrens
Florent Krzakala
Lenka Zdeborová
MLT
33
11
0
06 Feb 2024
Breaking Symmetry When Training Transformers
Breaking Symmetry When Training Transformers
Chunsheng Zuo
Michael Guerzhoy
30
0
0
06 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
100
78
0
01 Feb 2024
Previous
123
Next