ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.01273
  4. Cited By
NoMAD-Attention: Efficient LLM Inference on CPUs Through
  Multiply-add-free Attention

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

2 March 2024
Tianyi Zhang
Jonah Yi
Bowen Yao
Zhaozhuo Xu
Anshumali Shrivastava
    MQ
ArXivPDFHTML

Papers citing "NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention"

5 / 5 papers shown
Title
Towards a Middleware for Large Language Models
Towards a Middleware for Large Language Models
Narcisa Guran
Florian Knauf
Man Ngo
Stefan Petrescu
Jan S. Rellermeyer
73
1
0
21 Nov 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference
  with Coupled Quantization
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Tianyi Zhang
Jonah Yi
Zhaozhuo Xu
Anshumali Shrivastava
MQ
29
26
0
07 May 2024
Language Model Crossover: Variation through Few-Shot Prompting
Language Model Crossover: Variation through Few-Shot Prompting
Elliot Meyerson
M. Nelson
Herbie Bradley
Adam Gaier
Arash Moradi
Amy K. Hoover
Joel Lehman
VLM
31
79
0
23 Feb 2023
Big Bird: Transformers for Longer Sequences
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
282
2,015
0
28 Jul 2020
1