SMYRF: Efficient Attention using Asymmetric Clustering

SMYRF: Efficient Attention using Asymmetric Clustering

11 October 2020

Papers citing "SMYRF: Efficient Attention using Asymmetric Clustering"

13 / 13 papers shown

Title
Accelerating Transformers with Spectrum-Preserving Token Merging Hoai-Chau Tran D. M. Nguyen Duy M. Nguyen Trung Thanh Nguyen Ngan Le Pengtao Xie Daniel Sonntag James Y. Zou Binh T. Nguyen Mathias Niepert 39 8 0 25 May 2024
An Evaluation of Memory Optimization Methods for Training Neural Networks Xiaoxuan Liu Siddharth Jha Alvin Cheung 26 0 0 26 Mar 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models Daniel Y. Fu Tri Dao Khaled Kamal Saab A. Thomas Atri Rudra Christopher Ré 73 370 0 28 Dec 2022
Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost Sungjun Cho Seonwoo Min Jinwoo Kim Moontae Lee Honglak Lee Seunghoon Hong 38 3 0 27 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning Weicong Liang Yuhui Yuan Henghui Ding Xiao Luo Weihong Lin Ding Jia Zheng-Wei Zhang Chao Zhang Hanhua Hu 29 25 0 03 Oct 2022
Efficient Methods for Natural Language Processing: A Survey Marcos Vinícius Treviso Ji-Ung Lee Tianchu Ji Betty van Aken Qingqing Cao ... Emma Strubell Niranjan Balasubramanian Leon Derczynski Iryna Gurevych Roy Schwartz 28 109 0 31 Aug 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Tri Dao Daniel Y. Fu Stefano Ermon Atri Rudra Christopher Ré VLM 63 2,024 0 27 May 2022
cosFormer: Rethinking Softmax in Attention Zhen Qin Weixuan Sun Huicai Deng Dongxu Li Yunshen Wei Baohong Lv Junjie Yan Lingpeng Kong Yiran Zhong 24 212 0 17 Feb 2022
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 243 580 0 12 Mar 2020
What is the State of Neural Network Pruning? Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle John Guttag 191 1,027 0 06 Mar 2020
A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras S. Laine Timo Aila 282 10,354 0 12 Dec 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,926 0 17 Aug 2015