QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

Papers citing "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

17 / 17 papers shown
Title
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
183
1,694
0
08 Jun 2020
Neural Turing Machines
Neural Turing Machines
93
2,325
0
20 Oct 2014
Memory Networks
Memory Networks
143
1,705
0
15 Oct 2014

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.