Finetuning Pretrained Transformers into RNNs

24 March 2021

Hao Peng

Papers citing "Finetuning Pretrained Transformers into RNNs"

22 / 22 papers shown

Title
Conformal Transformations for Symmetric Power Transformers Saurabh Kumar Jacob Buckman Carles Gelada Sean Zhang 70 0 0 05 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures Disen Lan Weigao Sun Jiaxi Hu Jusen Du Yu-Xi Cheng 66 0 0 03 Mar 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 82 0 0 09 Oct 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models Aviv Bick Kevin Y. Li Eric P. Xing J. Zico Kolter Albert Gu Mamba 53 24 0 19 Aug 2024
Practical Conformer: Optimizing size, speed and flops of Conformer for on-Device and cloud ASR Rami Botros Anmol Gulati Tara N. Sainath K. Choromanski Ruoming Pang Trevor Strohman Weiran Wang Jiahui Yu MQ 20 3 0 31 Mar 2023
Efficient Attention via Control Variates Lin Zheng Jianbo Yuan Chong-Jun Wang Lingpeng Kong 34 18 0 09 Feb 2023
Training Integer-Only Deep Recurrent Neural Networks V. Nia Eyyub Sari Vanessa Courville M. Asgharian MQ 45 2 0 22 Dec 2022
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers Michael Hassid Hao Peng Daniel Rotem Jungo Kasai Ivan Montero Noah A. Smith Roy Schwartz 32 24 0 07 Nov 2022
Transformers Learn Shortcuts to Automata Bingbin Liu Jordan T. Ash Surbhi Goel A. Krishnamurthy Cyril Zhang OffRL LRM 46 156 0 19 Oct 2022
Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms Surbhi Goel Sham Kakade Adam Tauman Kalai Cyril Zhang 32 1 0 01 Sep 2022
Twist Decoding: Diverse Generators Guide Each Other Jungo Kasai Keisuke Sakaguchi Ronan Le Bras Hao Peng Ximing Lu Dragomir R. Radev Yejin Choi Noah A. Smith SyDa 24 4 0 19 May 2022
Sequencer: Deep LSTM for Image Classification Yuki Tatsunami Masato Taki VLM ViT 16 78 0 04 May 2022
Linearizing Transformer with Key-Value Memory Yizhe Zhang Deng Cai 20 5 0 23 Mar 2022
SOFT: Softmax-free Transformer with Linear Complexity Jiachen Lu Jinghan Yao Junge Zhang Martin Danelljan Hang Xu Weiguo Gao Chunjing Xu Thomas B. Schon Li Zhang 18 161 0 22 Oct 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity Lin Zheng Huijie Pan Lingpeng Kong 26 3 0 06 Oct 2021
iRNN: Integer-only Recurrent Neural Network Eyyub Sari Vanessa Courville V. Nia MQ 48 4 0 20 Sep 2021
PermuteFormer: Efficient Relative Position Encoding for Long Sequences Peng-Jen Chen 28 21 0 06 Sep 2021
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 255 4,781 0 24 Feb 2021
Shortformer: Better Language Modeling using Shorter Inputs Ofir Press Noah A. Smith M. Lewis 230 89 0 31 Dec 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 282 2,015 0 28 Jul 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 243 580 0 12 Mar 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Sheng Shen Zhen Dong Jiayu Ye Linjian Ma Z. Yao A. Gholami Michael W. Mahoney Kurt Keutzer MQ 233 576 0 12 Sep 2019