Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.16843
Cited By
Randomized Positional Encodings Boost Length Generalization of Transformers
26 May 2023
Anian Ruoss
Grégoire Delétang
Tim Genewein
Jordi Grau-Moya
Róbert Csordás
Mehdi Abbana Bennani
Shane Legg
J. Veness
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Randomized Positional Encodings Boost Length Generalization of Transformers"
22 / 22 papers shown
Title
Spline-based Transformers
Prashanth Chandran
Agon Serifi
Markus Gross
Moritz Bächer
46
0
0
03 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
48
0
0
29 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
63
0
0
13 Mar 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
85
4
0
03 Feb 2025
Generative Retrieval for Book search
Yubao Tang
Ruqing Zhang
Jiafeng Guo
Maarten de Rijke
Shihao Liu
Shuaiqiang Wang
Dawei Yin
Xueqi Cheng
RALM
35
0
0
19 Jan 2025
Investigating Length Issues in Document-level Machine Translation
Ziqian Peng
Rachel Bawden
François Yvon
77
1
0
23 Dec 2024
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi
Ghazal Khalighinejad
Anej Svete
Josef Valvoda
Ryan Cotterell
Brian DuSell
NAI
44
2
0
11 Nov 2024
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation
Mufei Li
Viraj Shitole
Eli Chien
Changhai Man
Zhaodong Wang
Srinivas Sridharan
Ying Zhang
Tushar Krishna
P. Li
53
0
0
04 Nov 2024
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
Nanne van Noord
Marcel Worring
Cees G. M. Snoek
VLM
53
3
0
13 Oct 2024
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero
Alex Vitvitskyi
Christos Perivolaropoulos
Razvan Pascanu
Petar Velickovic
85
19
0
08 Oct 2024
The CLRS-Text Algorithmic Reasoning Language Benchmark
Larisa Markeeva
Sean McLeish
Borja Ibarz
Wilfried Bounsi
Olga Kozlova
Alex Vitvitskyi
Charles Blundell
Tom Goldstein
Avi Schwarzschild
Petar Veličković
LRM
43
12
0
06 Jun 2024
Explicitly Encoding Structural Symmetry is Key to Length Generalization in Arithmetic Tasks
Mahdi Sabbaghi
George Pappas
Hamed Hassani
Surbhi Goel
45
4
0
04 Jun 2024
A Large Language Model Enhanced Sequential Recommender for Joint Video and Comment Recommendation
Bowen Zheng
Zihan Lin
Enze Liu
Chen Yang
Enyang Bai
Cheng Ling
Wayne Xin Zhao
Ji-Rong Wen
40
4
0
20 Mar 2024
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li
William H. Beluch
Margret Keuper
Dan Zhang
Anna Khoreva
DiffM
VGen
89
5
0
20 Mar 2024
Positional Encoding Helps Recurrent Neural Networks Handle a Large Vocabulary
Takashi Morita
29
3
0
31 Jan 2024
Positional Description Matters for Transformers Arithmetic
Ruoqi Shen
Sébastien Bubeck
Ronen Eldan
Yin Tat Lee
Yuanzhi Li
Yi Zhang
47
38
0
22 Nov 2023
Language Modeling Is Compression
Grégoire Delétang
Anian Ruoss
Paul-Ambroise Duquenne
Elliot Catt
Tim Genewein
...
Wenliang Kevin Li
Matthew Aitchison
Laurent Orseau
Marcus Hutter
J. Veness
AI4CE
53
133
0
19 Sep 2023
Giraffe: Adventures in Expanding Context Lengths in LLMs
Arka Pal
Deep Karkhanis
Manley Roberts
Samuel Dooley
Arvind Sundararajan
Siddartha Naidu
39
40
0
21 Aug 2023
Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers
K. Choromanski
Shanda Li
Valerii Likhosherstov
Kumar Avinava Dubey
Shengjie Luo
Di He
Yiming Yang
Tamás Sarlós
Thomas Weingarten
Adrian Weller
39
8
0
03 Feb 2023
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
55
17
0
02 Oct 2022
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
109
133
0
05 Jul 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
253
710
0
27 Aug 2021
1