ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.03134
  4. Cited By
A Hardware Evaluation Framework for Large Language Model Inference

A Hardware Evaluation Framework for Large Language Model Inference

5 December 2023
Hengrui Zhang
August Ning
R. Prabhakar
D. Wentzlaff
    ELM
ArXivPDFHTML

Papers citing "A Hardware Evaluation Framework for Large Language Model Inference"

15 / 15 papers shown
Title
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Understanding and Optimizing Multi-Stage AI Inference Pipelines
Abhimanyu Bambhaniya
Hanjiang Wu
Suvinay Subramanian
Sudarshan Srinivasan
Souvik Kundu
Amir Yazdanbakhsh
Suvinay Subramanian
Madhu Kumar
Tushar Krishna
135
0
0
14 Apr 2025
Hysteresis Activation Function for Efficient Inference
Hysteresis Activation Function for Efficient Inference
Moshe Kimhi
Idan Kashani
A. Mendelson
Chaim Baskin
LLMSV
40
0
0
15 Nov 2024
Achieving Peak Performance for Large Language Models: A Systematic
  Review
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
37
3
0
07 Sep 2024
Performance Modeling and Workload Analysis of Distributed Large Language
  Model Training and Inference
Performance Modeling and Workload Analysis of Distributed Large Language Model Training and Inference
Joyjit Kundu
Wenzhe Guo
Ali BanaGozar
Udari De Alwis
Sourav Sengupta
Puneet Gupta
Arindam Mallik
45
3
0
19 Jul 2024
Memory Is All You Need: An Overview of Compute-in-Memory Architectures
  for Accelerating Large Language Model Inference
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference
Christopher Wolters
Xiaoxuan Yang
Ulf Schlichtmann
Toyotaro Suzumura
39
11
0
12 Jun 2024
Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM
  Inference
Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference
Jovan Stojkovic
Esha Choukse
Chaojie Zhang
Inigo Goiri
Josep Torrellas
43
36
0
29 Mar 2024
NoMAD-Attention: Efficient LLM Inference on CPUs Through
  Multiply-add-free Attention
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Tianyi Zhang
Jonah Yi
Bowen Yao
Zhaozhuo Xu
Anshumali Shrivastava
MQ
27
6
0
02 Mar 2024
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's
  LLM with Open Source SLMs in Production
Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production
Chandra Irugalbandara
Ashish Mahendra
Roland Daynauth
T. Arachchige
Jayanaka L. Dantanarayana
K. Flautner
Lingjia Tang
Yiping Kang
Jason Mars
ELM
28
14
0
20 Dec 2023
LLM in a flash: Efficient Large Language Model Inference with Limited
  Memory
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
77
112
0
12 Dec 2023
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with
  Heterogeneous Precisions
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions
Cyrus Zhou
Pedro H. P. Savarese
Vaughn Richard
Zack Hassman
Xin Yuan
Michael Maire
Michael DiBrino
Yanjing Li
MQ
23
0
0
23 Nov 2023
FlexGen: High-Throughput Generative Inference of Large Language Models
  with a Single GPU
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
Can Foundation Models Wrangle Your Data?
Can Foundation Models Wrangle Your Data?
A. Narayan
Ines Chami
Laurel J. Orr
Simran Arora
Christopher Ré
LMTD
AI4CE
181
214
0
20 May 2022
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
261
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1