Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

1 November 2023

Papers citing "Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation"

2 / 2 papers shown

Title
GLM-130B: An Open Bilingual Pre-trained Model Aohan Zeng Xiao Liu Zhengxiao Du Zihan Wang Hanyu Lai ... Jidong Zhai Wenguang Chen Peng Zhang Yuxiao Dong Jie Tang BDL LRM 275 1,077 0 05 Oct 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 710 0 27 Aug 2021