Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.15159
Cited By
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
23 December 2023
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference"
18 / 18 papers shown
Title
NSFlow: An End-to-End FPGA Framework with Scalable Dataflow Architecture for Neuro-Symbolic AI
Hanchen Yang
Zishen Wan
Ritik Raj
Joongun Park
Ziwei Li
A. Samajdar
A. Raychowdhury
Tushar Krishna
26
0
0
27 Apr 2025
On-Device Qwen2.5: Efficient LLM Inference with Model Compression and Hardware Acceleration
Maoyang Xiang
Ramesh Fernando
Bo Wang
MQ
38
0
0
24 Apr 2025
LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design
Renjie Wei
Songqiang Xu
Linfeng Zhong
Zebin Yang
Qingyu Guo
Yidan Wang
Runsheng Wang
Meng Li
84
0
0
24 Feb 2025
LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference
Yanyue Xie
Zhengang Li
Dana Diaconu
Suranga Handagala
M. Leeser
Xue Lin
69
0
0
01 Nov 2024
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
47
3
0
29 Oct 2024
RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis
Jason Lau
Yuanlong Xiao
Yutong Xie
Yuze Chi
Linghao Song
Shaojie Xiang
Michael Lo
Zhiru Zhang
Jason Cong
Licheng Guo
38
0
0
16 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline
Qingyu Guo
Jiayong Wan
Songqiang Xu
Meng Li
Yuan Wang
36
1
0
25 Jul 2024
The Feasibility of Implementing Large-Scale Transformers on Multi-FPGA Platforms
Yu Gao
Juan Camilo Vega
Paul Chow
23
1
0
24 Apr 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
54
14
0
07 Apr 2024
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
Jiaao He
Jidong Zhai
42
27
0
18 Mar 2024
EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge
Xuan Shen
Zhenglun Kong
Changdi Yang
Zhaoyang Han
Lei Lu
...
Zhihao Shu
Wei Niu
Miriam Leeser
Pu Zhao
Yanzhi Wang
MQ
51
18
0
16 Feb 2024
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation
Seongmin Hong
Seungjae Moon
Junsoo Kim
Sungjae Lee
Minsub Kim
Dongsoo Lee
Joo-Young Kim
72
76
0
22 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
395
8,559
0
28 Jan 2022
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
233
576
0
12 Sep 2019
1