LSG Attention: Extrapolation of pretrained Transformers to long sequences

13 October 2022

Papers citing "LSG Attention: Extrapolation of pretrained Transformers to long sequences"

6 / 6 papers shown

Title
Wormhole Memory: A Rubik's Cube for Cross-Dialogue Retrieval Libo Wang 117 0 0 24 Jan 2025
Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention Ziwei He Jian Yuan Le Zhou Jingwen Leng Bo Jiang 32 1 0 13 Nov 2023
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization Wen Xiao Iz Beltagy Giuseppe Carenini Arman Cohan CVBM 83 114 0 16 Oct 2021
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English Ilias Chalkidis Abhik Jana D. Hartung M. Bommarito Ion Androutsopoulos Daniel Martin Katz Nikolaos Aletras AILaw ELM 130 248 0 03 Oct 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 695 0 27 Aug 2021
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 285 2,015 0 28 Jul 2020