ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.06003
  4. Cited By
LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference
v1v2 (latest)

LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference

12 August 2024
Zhiwen Mo
Lei Wang
Jianyu Wei
Zhichen Zeng
Shijie Cao
Lingxiao Ma
Naifeng Jing
Ting Cao
Jilong Xue
Fan Yang
Mao Yang
ArXiv (abs)PDFHTML

Papers citing "LUT Tensor Core: A Software-Hardware Co-Design for LUT-Based Low-Bit LLM Inference"

17 / 17 papers shown
Title
Yi: Open Foundation Models by 01.AI
Yi: Open Foundation Models by 01.AI
01. AI
Alex Young
01.AI Alex Young
Bei Chen
Chao Li
...
Yue Wang
Yuxuan Cai
Zhenyu Gu
Zhiyuan Liu
Zonghong Dai
OSLMLRM
269
570
0
07 Mar 2024
Microscaling Data Formats for Deep Learning
Microscaling Data Formats for Deep Learning
B. Rouhani
Ritchie Zhao
Ankit More
Mathew Hall
Alireza Khodamoradi
...
Maxim Naumov
Colin Verilli
Ralph Wittig
Doug Burger
Eric S. Chung
MQ
84
63
0
16 Oct 2023
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large
  Language Models
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu
Ruihao Gong
Xiuying Wei
Zhiwei Dong
Jianfei Cai
Bohan Zhuang
MQ
54
52
0
12 Oct 2023
YaRN: Efficient Context Window Extension of Large Language Models
YaRN: Efficient Context Window Extension of Large Language Models
Bowen Peng
Jeffrey Quesnelle
Honglu Fan
Enrico Shippole
OSLM
72
261
0
31 Aug 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
394
2,388
0
09 Nov 2022
FP8 Formats for Deep Learning
FP8 Formats for Deep Learning
Paulius Micikevicius
Dusan Stosic
N. Burgess
Marius Cornea
Pradeep Dubey
...
Naveen Mellempudi
S. Oberman
Mohammad Shoeybi
Michael Siu
Hao Wu
BDLVLMMQ
131
138
0
12 Sep 2022
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural
  Network Quantization
ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Cong Guo
Chen Zhang
Jingwen Leng
Zihan Liu
Fan Yang
Yun-Bo Liu
Minyi Guo
Yuhao Zhu
MQ
48
60
0
30 Aug 2022
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient
  Inference in Large-Scale Generative Language Models
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park
Baeseong Park
Minsub Kim
Sungjae Lee
Jeonghoon Kim
Beomseok Kwon
S. Kwon
Byeongwook Kim
Youngjoo Lee
Dongsoo Lee
MQ
60
83
0
20 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLMMQ
122
479
0
04 Jun 2022
The Carbon Footprint of Machine Learning Training Will Plateau, Then
  Shrink
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink
David A. Patterson
Joseph E. Gonzalez
Urs Holzle
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
78
246
0
11 Apr 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,949
0
29 Mar 2022
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box
  Floating-Point Transformer Models
Mokey: Enabling Narrow Fixed-Point Inference for Out-of-the-Box Floating-Point Transformer Models
Ali Hadi Zadeh
Mostafa Mahmoud
Ameer Abdelhadi
Andreas Moshovos
MQ
62
33
0
23 Mar 2022
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Lianmin Zheng
Chengfan Jia
Minmin Sun
Zhao Wu
Cody Hao Yu
...
Jun Yang
Danyang Zhuo
Koushik Sen
Joseph E. Gonzalez
Ion Stoica
140
399
0
11 Jun 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy
  Efficient Inference
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
65
188
0
08 May 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
608
4,893
0
23 Jan 2020
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
227
1,549
0
24 May 2019
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
328
2,895
0
26 Sep 2016
1