Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.19466
Cited By
The Impact of Positional Encoding on Length Generalization in Transformers
31 May 2023
Amirhossein Kazemnejad
Inkit Padhi
K. Ramamurthy
Payel Das
Siva Reddy
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Impact of Positional Encoding on Length Generalization in Transformers"
37 / 137 papers shown
Title
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
Zhenyu He
Guhao Feng
Shengjie Luo
Kai-Bo Yang
Liwei Wang
Jingjing Xu
Zhi Zhang
Hongxia Yang
Di He
32
14
0
29 Jan 2024
Carrying over algorithm in transformers
J. Kruthoff
24
0
0
15 Jan 2024
Extending LLMs' Context Window with 100 Samples
Yikai Zhang
Junlong Li
Pengfei Liu
37
11
0
13 Jan 2024
Mission: Impossible Language Models
Julie Kallini
Isabel Papadimitriou
Richard Futrell
Kyle Mahowald
Christopher Potts
ELM
LRM
44
19
0
12 Jan 2024
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing
Zi Yang
Nan Hua
RALM
39
4
0
10 Jan 2024
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Kaiqiang Song
Xiaoyang Wang
Sangwoo Cho
Xiaoman Pan
Dong Yu
34
7
0
14 Dec 2023
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
19
14
0
06 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
29
22
0
01 Dec 2023
Positional Description Matters for Transformers Arithmetic
Ruoqi Shen
Sébastien Bubeck
Ronen Eldan
Yin Tat Lee
Yuanzhi Li
Yi Zhang
34
37
0
22 Nov 2023
Automatic Restoration of Diacritics for Speech Data Sets
Sara Shatnawi
Sawsan Alqahtani
Hanan Aldarmaki
19
3
0
15 Nov 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
33
2
0
06 Nov 2023
Quantum circuit synthesis with diffusion models
Florian Fürrutter
Gorka Muñoz-Gil
H. Briegel
AI4CE
DiffM
30
20
0
03 Nov 2023
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou
Arwen Bradley
Etai Littwin
Noam Razin
Omid Saremi
Josh Susskind
Samy Bengio
Preetum Nakkiran
34
110
0
24 Oct 2023
Extending Input Contexts of Language Models through Training on Segmented Sequences
Petros Karypis
Julian McAuley
George Karypis
32
0
0
23 Oct 2023
The Locality and Symmetry of Positional Encodings
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
44
0
0
19 Oct 2023
From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers
Shaoxiong Duan
Yining Shi
Wei Xu
28
8
0
18 Oct 2023
Functional Invariants to Watermark Large Transformers
Pierre Fernandez
Guillaume Couairon
Teddy Furon
Matthijs Douze
19
8
0
17 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
32
10
0
07 Oct 2023
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Chia-Yuan Chang
Xia Hu
33
11
0
01 Oct 2023
In-Context Learning for Text Classification with Many Labels
Aristides Milios
Siva Reddy
Dzmitry Bahdanau
20
34
0
19 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
32
45
0
09 Sep 2023
YaRN: Efficient Context Window Extension of Large Language Models
Bowen Peng
Jeffrey Quesnelle
Honglu Fan
Enrico Shippole
OSLM
18
224
0
31 Aug 2023
LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models
Chi Han
Qifan Wang
Hao Peng
Wenhan Xiong
Yu Chen
Heng Ji
Sinong Wang
50
49
0
30 Aug 2023
Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie
Pengyu Cheng
Xiao Liang
Yong Dai
Nan Du
40
7
0
25 Aug 2023
Code Llama: Open Foundation Models for Code
Baptiste Rozière
Jonas Gehring
Fabian Gloeckle
Sten Sootla
Itai Gat
...
Hugo Touvron
Louis Martin
Nicolas Usunier
Thomas Scialom
Gabriel Synnaeve
ELM
ALM
63
1,906
0
24 Aug 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
70
525
0
12 Jul 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
26
70
0
25 May 2023
MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation
Marco Bellagente
Manuel Brack
H. Teufel
Felix Friedrich
Bjorn Deiseroth
...
Koen Oostermeijer
Andres Felipe Cruz Salinas
P. Schramowski
Kristian Kersting
Samuel Weinbach
36
15
0
24 May 2023
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers
Awni Altabaa
Taylor Webb
Jonathan D. Cohen
John Lafferty
30
8
0
01 Apr 2023
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
The Curious Case of Absolute Position Embeddings
Koustuv Sinha
Amirhossein Kazemnejad
Siva Reddy
J. Pineau
Dieuwke Hupkes
Adina Williams
87
15
0
23 Oct 2022
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
107
130
0
05 Jul 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
330
11,953
0
04 Mar 2022
Pretrained Language Models for Text Generation: A Survey
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
AI4CE
36
127
0
14 Jan 2022
SHAPE: Shifted Absolute Position Embedding for Transformers
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
Kentaro Inui
236
45
0
13 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
695
0
27 Aug 2021
From Local Structures to Size Generalization in Graph Neural Networks
Gilad Yehudai
Ethan Fetaya
E. Meirom
Gal Chechik
Haggai Maron
GNN
AI4CE
172
123
0
17 Oct 2020
Previous
1
2
3