Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.01273
Cited By
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
2 March 2024
Tianyi Zhang
Jonah Yi
Bowen Yao
Zhaozhuo Xu
Anshumali Shrivastava
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention"
5 / 5 papers shown
Title
Towards a Middleware for Large Language Models
Narcisa Guran
Florian Knauf
Man Ngo
Stefan Petrescu
Jan S. Rellermeyer
73
1
0
21 Nov 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Tianyi Zhang
Jonah Yi
Zhaozhuo Xu
Anshumali Shrivastava
MQ
29
26
0
07 May 2024
Language Model Crossover: Variation through Few-Shot Prompting
Elliot Meyerson
M. Nelson
Herbie Bradley
Adam Gaier
Arash Moradi
Amy K. Hoover
Joel Lehman
VLM
31
79
0
23 Feb 2023
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
282
2,015
0
28 Jul 2020
1