ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.04347
  4. Cited By
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
  Mimicry

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

6 February 2024
Michael Zhang
Kush S. Bhatia
Hermann Kumbong
Christopher Ré
ArXivPDFHTML

Papers citing "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"

37 / 37 papers shown
Title
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang
Wen-Ding Li
Daniele Paliotta
Daniel Ritter
Alexander M. Rush
Tri Dao
LRM
33
0
0
14 Apr 2025
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers
Yoshihiro Yamada
ViT
30
0
0
09 Apr 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
40
0
0
30 Mar 2025
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency
Jiangxuan Long
Zhao-quan Song
Chiwun Yang
AI4TS
162
0
0
18 Mar 2025
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression
Guihong Li
Mehdi Rezagholizadeh
Mingyu Yang
Vikram Appia
Emad Barsoum
VLM
55
0
0
14 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
66
0
0
03 Mar 2025
Attention Condensation via Sparsity Induced Regularized Training
Eli Sason
Darya Frolova
Boris Nazarov
Felix Goldberd
181
0
0
03 Mar 2025
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners
Daniele Paliotta
Junxiong Wang
Matteo Pagliardini
Kevin Y. Li
Aviv Bick
J. Zico Kolter
Albert Gu
F. Fleuret
Tri Dao
ReLM
LRM
51
7
0
27 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng
Yadan Luo
Xin Li
D. Jiang
Zheng Zhang
156
0
0
25 Jan 2025
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
ZETA: Leveraging Z-order Curves for Efficient Top-k Attention
Qiuhao Zeng
Jerry Huang
Peng Lu
Gezheng Xu
Boxing Chen
Charles Ling
Boyu Wang
49
1
0
24 Jan 2025
Tensor Product Attention Is All You Need
Tensor Product Attention Is All You Need
Yifan Zhang
Yifeng Liu
Huizhuo Yuan
Zhen Qin
Yang Yuan
Q. Gu
Andrew Chi-Chih Yao
77
9
0
11 Jan 2025
CARE Transformer: Mobile-Friendly Linear Visual Transformer via
  Decoupled Dual Interaction
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
Yuan Zhou
Qingshan Xu
Jiequan Cui
Junbao Zhou
Jing Zhang
Richang Hong
H. Zhang
ViT
78
0
0
25 Nov 2024
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jian Wu
Bo Xu
Guoqi Li
51
4
0
16 Nov 2024
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and
  Fully-Connected Neural Networks for Causally Constrained Predictions
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions
M. Vowels
Mathieu Rochat
S. Akbari
CML
GNN
OOD
27
0
0
18 Oct 2024
AERO: Softmax-Only LLMs for Efficient Private Inference
AERO: Softmax-Only LLMs for Efficient Private Inference
N. Jha
Brandon Reagen
32
1
0
16 Oct 2024
Mimetic Initialization Helps State Space Models Learn to Recall
Mimetic Initialization Helps State Space Models Learn to Recall
Asher Trockman
Hrayr Harutyunyan
J. Zico Kolter
Sanjiv Kumar
Srinadh Bhojanapalli
Mamba
21
3
0
14 Oct 2024
HSR-Enhanced Sparse Attention Acceleration
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
93
18
0
14 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large
  Language Models
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
33
0
0
12 Oct 2024
Fine-grained Attention I/O Complexity: Comprehensive Analysis for
  Backward Passes
Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
Yufa Zhou
54
15
0
12 Oct 2024
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
82
0
0
09 Oct 2024
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang
Songlin Yang
Ruijie Zhu
Yue Zhang
Leyang Cui
...
Freda Shi
Bailin Wang
Wei Bi
P. Zhou
Guohong Fu
65
17
0
11 Sep 2024
A Tighter Complexity Analysis of SparseGPT
A Tighter Complexity Analysis of SparseGPT
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao-quan Song
74
22
0
22 Aug 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
53
24
0
19 Aug 2024
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache
  Consumption
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption
Shi Luohe
Hongyi Zhang
Yao Yao
Z. Li
Zhao Hai
31
33
0
25 Jul 2024
M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution
M5: A Whole Genome Bacterial Encoder at Single Nucleotide Resolution
Agust Egilsson
31
0
0
03 Jul 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
33
50
0
24 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
35
18
0
24 Jun 2024
Training of Physical Neural Networks
Training of Physical Neural Networks
Ali Momeni
Babak Rahmani
B. Scellier
Logan G. Wright
Peter L. McMahon
...
Julie Grollier
Andrea J. Liu
D. Psaltis
Andrea Alù
Romain Fleury
PINN
AI4CE
51
9
0
05 Jun 2024
Exact Conversion of In-Context Learning to Model Weights in
  Linearized-Attention Transformers
Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers
Brian K Chen
Tianyang Hu
Hui Jin
Hwee Kuan Lee
Kenji Kawaguchi
50
0
0
05 Jun 2024
Pretrained Hybrids with MAD Skills
Pretrained Hybrids with MAD Skills
Nicholas Roberts
Samuel Guo
Zhiqi Gao
Satya Sai Srinath Namburi
Sonia Cromp
Chengjun Wu
Chengyu Duan
Frederic Sala
Mamba
40
0
0
02 Jun 2024
Linearizing Large Language Models
Linearizing Large Language Models
Jean-Pierre Mercat
Igor Vasiljevic
Sedrick Scott Keh
Kushal Arora
Achal Dave
Adrien Gaidon
Thomas Kollar
38
19
0
10 May 2024
Cross-Architecture Transfer Learning for Linear-Cost Inference
  Transformers
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers
Sehyun Choi
26
3
0
03 Apr 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang
Bailin Wang
Yikang Shen
Rameswar Panda
Yoon Kim
45
142
0
11 Dec 2023
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels
Praneeth Kacham
Vahab Mirrokni
Peilin Zhong
36
7
0
02 Oct 2023
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
H. H. Mao
66
20
0
09 Oct 2022
On The Computational Complexity of Self-Attention
On The Computational Complexity of Self-Attention
Feyza Duman Keles
Pruthuvi Maheshakya Wijewardena
C. Hegde
68
108
0
11 Sep 2022
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,785
0
29 Apr 2021
1