Manifold-Preserving Transformers are Effective for Short-Long Range Encoding

22 October 2023

Papers citing "Manifold-Preserving Transformers are Effective for Short-Long Range Encoding"

5 / 5 papers shown

Title
Shortformer: Better Language Modeling using Shorter Inputs Ofir Press Noah A. Smith M. Lewis 230 89 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 288 2,017 0 28 Jul 2020
Similarity Analysis of Contextual Word Representation Models John M. Wu Yonatan Belinkov Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 51 73 0 03 May 2020
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives Elena Voita Rico Sennrich Ivan Titov 207 181 0 03 Sep 2019
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks Lechao Xiao Yasaman Bahri Jascha Narain Sohl-Dickstein S. Schoenholz Jeffrey Pennington 244 348 0 14 Jun 2018