A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies

13 February 2023

Papers citing "A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies"

15 / 15 papers shown

Title
Lambda-Skip Connections: the architectural component that prevents Rank Collapse Federico Arangath Joseph Jerome Sieber Melanie Zeilinger Carmen Amo Alonso 118 0 0 14 Oct 2024
Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs Ðorðe Miladinovic Kumar Shridhar Kushal Kumar Jain Max B. Paulus J. M. Buhmann Mrinmaya Sachan Carl Allen DRL 67 5 0 26 Sep 2022
On the Parameterization and Initialization of Diagonal State Space Models Albert Gu Ankit Gupta Karan Goel Christopher Ré 51 308 0 23 Jun 2022
Memorizing Transformers Yuhuai Wu M. Rabe DeLesley S. Hutchins Christian Szegedy RALM 54 175 0 16 Mar 2022
Mixture-of-Experts with Expert Choice Routing Yan-Quan Zhou Tao Lei Han-Chu Liu Nan Du Yanping Huang Vincent Zhao Andrew M. Dai Zhifeng Chen Quoc V. Le James Laudon MoE 231 341 0 18 Feb 2022
Combiner: Full Attention Transformer with Sparse Computation Cost Hongyu Ren H. Dai Zihang Dai Mengjiao Yang J. Leskovec Dale Schuurmans Bo Dai 94 79 0 12 Jul 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes Zihang Dai Hanxiao Liu Quoc V. Le Mingxing Tan ViT 91 1,188 0 09 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 381 2,638 0 04 May 2021
Rethinking Attention with Performers K. Choromanski Valerii Likhosherstov David Dohan Xingyou Song Andreea Gane ... Afroz Mohiuddin Lukasz Kaiser David Belanger Lucy J. Colwell Adrian Weller 144 1,548 0 30 Sep 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 484 2,051 0 28 Jul 2020
Linformer: Self-Attention with Linear Complexity Sinong Wang Belinda Z. Li Madian Khabsa Han Fang Hao Ma 170 1,678 0 08 Jun 2020
Longformer: The Long-Document Transformer Iz Beltagy Matthew E. Peters Arman Cohan RALM VLM 95 3,996 0 10 Apr 2020
Axial Attention in Multidimensional Transformers Jonathan Ho Nal Kalchbrenner Dirk Weissenborn Tim Salimans 78 525 0 20 Dec 2019
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel Yao-Hung Hubert Tsai Shaojie Bai M. Yamada Louis-Philippe Morency Ruslan Salakhutdinov 91 251 0 30 Aug 2019
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling Junyoung Chung Çağlar Gülçehre Kyunghyun Cho Yoshua Bengio 293 12,662 0 11 Dec 2014