Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05877
Cited By
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
15 December 2017
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
50 / 1,298 papers shown
Title
Stacking Small Language Models for Generalizability
Laurence Liang
LRM
36
0
0
21 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
77
14
0
17 Oct 2024
Large Language Models as Narrative-Driven Recommenders
Lukas Eberhard
Thorsten Ruprechter
Denis Helic
LRM
124
0
0
17 Oct 2024
Error Diffusion: Post Training Quantization with Block-Scaled Number Formats for Neural Networks
Alireza Khodamoradi
K. Denolf
Eric Dellinger
MQ
74
0
0
15 Oct 2024
Reducing Data Bottlenecks in Distributed, Heterogeneous Neural Networks
Ruhai Lin
Rui-Jie Zhu
Jason K. Eshraghian
74
1
0
12 Oct 2024
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun
Ruikang Liu
Haoli Bai
Han Bao
Kang Zhao
...
Lu Hou
Chun Yuan
Xin Jiang
Wen Liu
Jun Yao
MQ
176
11
0
12 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
60
7
0
08 Oct 2024
Synthesizing Interpretable Control Policies through Large Language Model Guided Search
Carlo Bosio
Mark W. Mueller
59
0
0
07 Oct 2024
Continuous Approximations for Improving Quantization Aware Training of LLMs
He Li
Jianhang Hong
Yuanzhuo Wu
Snehal Adbol
Zonglin Li
MQ
65
1
0
06 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLM
MQ
186
39
0
03 Oct 2024
Constraint Guided Model Quantization of Neural Networks
Quinten Van Baelen
P. Karsmakers
MQ
61
0
0
30 Sep 2024
InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries
Mengze Hong
Chen Jason Zhang
Lingxiao Yang
Yuanfeng Song
Di Jiang
86
2
0
29 Sep 2024
MicroFlow: An Efficient Rust-Based Inference Engine for TinyML
Matteo Carnelos
Francesco Pasti
Nicola Bellotto
73
1
0
28 Sep 2024
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Nathan Leroux
Paul-Philipp Manea
Chirag Sudarshan
Jan Finkbeiner
Sebastian Siegel
J. Strachan
Emre Neftci
51
1
0
28 Sep 2024
A method of using RSVD in residual calculation of LowBit GEMM
Hongyaoxing Gu
MQ
99
0
0
27 Sep 2024
Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators
Seyedarmin Azizi
Mohammad Erfan Sadeghi
M. Kamal
Massoud Pedram
63
2
0
27 Sep 2024
P4Q: Learning to Prompt for Quantization in Visual-language Models
H. Sun
Runqi Wang
Yanjing Li
Xianbin Cao
Xiaolong Jiang
Feng-Long Xie
Baochang Zhang
MQ
VLM
74
0
0
26 Sep 2024
Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
Artem Dementyev
Chandan K. A. Reddy
Scott Wisdom
Navin Chatlani
J. Hershey
R. Lyon
111
0
0
26 Sep 2024
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai
Ping Zhang
Cheng-Hao Tu
Hong-You Chen
Li Zhang
Wei-Lun Chao
52
1
0
24 Sep 2024
SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms
Niraj Pudasaini
Muhammad Abdullah Hanif
Mohamed Bennai
53
0
0
22 Sep 2024
Bilateral Sharpness-Aware Minimization for Flatter Minima
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Qingming Huang
AAML
454
0
0
20 Sep 2024
Less Memory Means smaller GPUs: Backpropagation with Compressed Activations
Daniel Barley
Holger Froning
124
0
0
18 Sep 2024
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview
Yanshu Wang
Tong Yang
Xiyan Liang
Guoan Wang
Hanning Lu
Xu Zhe
Yaoming Li
Li Weitao
MQ
92
3
0
18 Sep 2024
Robust Training of Neural Networks at Arbitrary Precision and Sparsity
Chengxi Ye
Grace Chu
Yanfeng Liu
Yichi Zhang
Lukasz Lew
Andrew G. Howard
MQ
56
2
0
14 Sep 2024
Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning
Hao-Wei Chiang
Chi-Tse Huang
Hsiang-Yun Cheng
P. Tseng
Ming-Hsiu Lee
An-Yeu
Wu
53
0
0
12 Sep 2024
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL
Mohammad Reshadati
78
0
0
04 Sep 2024
Foundations of Large Language Model Compression -- Part 1: Weight Quantization
Sean I. Young
MQ
70
1
0
03 Sep 2024
Evaluating the Performance of Large Language Models in Competitive Programming: A Multi-Year, Multi-Grade Analysis
Adrian Marius Dumitran
Adrian Catalin Badea
Stefan-Gabriel Muscalu
ELM
LRM
76
3
0
31 Aug 2024
1-Bit FQT: Pushing the Limit of Fully Quantized Training to 1-bit
Chang Gao
Jianfei Chen
Kang Zhao
Jiaqi Wang
Liping Jing
MQ
70
2
0
26 Aug 2024
Infrared Domain Adaptation with Zero-Shot Quantization
Burak Sevsay
Erdem Akagündüz
VLM
MQ
134
1
0
25 Aug 2024
A Web-Based Solution for Federated Learning with LLM-Based Automation
Chamith Mawela
Chaouki Ben Issaid
Mehdi Bennis
FedML
29
0
0
23 Aug 2024
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
60
0
0
21 Aug 2024
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars
Sumanth Prabhu
89
1
0
16 Aug 2024
FourierKAN outperforms MLP on Text Classification Head Fine-tuning
Abdullah Al Imran
Md Farhan Ishmam
VLM
56
1
0
16 Aug 2024
Computer Vision Model Compression Techniques for Embedded Systems: A Survey
Alexandre Lopes
Fernando Pereira dos Santos
D. Oliveira
Mauricio Schiezaro
Hélio Pedrini
84
10
0
15 Aug 2024
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator
Federico Nicolás Peccia
Svetlana Pavlitska
Tobias Fleck
Oliver Bringmann
63
0
0
14 Aug 2024
Large Investment Model
Jian Guo
H. Shum
AIFin
133
0
0
12 Aug 2024
Combining Neural Architecture Search and Automatic Code Optimization: A Survey
Inas Bachiri
Hadjer Benmeziane
Smail Niar
Riyadh Baghdadi
Hamza Ouarnoughi
Abdelkrime Aries
128
0
0
07 Aug 2024
A Metric Driven Approach to Mixed Precision Training
M. Rasquinha
Gil Tabak
35
0
0
06 Aug 2024
An approach to optimize inference of the DIART speaker diarization pipeline
Roman Aperdannier
Sigurd Schacht
Alexander Piazza
83
0
0
05 Aug 2024
Reclaiming Residual Knowledge: A Novel Paradigm to Low-Bit Quantization
Róisín Luo
Alexandru Drimbarean
Walsh Simon
Colm O'Riordan
MQ
81
1
0
01 Aug 2024
TinyChirp: Bird Song Recognition Using TinyML Models on Low-power Wireless Acoustic Sensors
Zhaolan Huang
Adrien Tousnakhoff
Polina Kozyr
Roman Rehausen
Felix Biessmann
Robert Lachlan
C. Adjih
Emmanuel Baccelli
114
2
0
31 Jul 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione
Yue Zhao
Harry Yang
Ahmad Farhan
Fielding Johnston
James Buban
Patrick Colangelo
90
1
0
29 Jul 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
129
2
0
29 Jul 2024
Temporal Feature Matters: A Framework for Diffusion Model Quantization
Yushi Huang
Ruihao Gong
Xianglong Liu
Jing Liu
Yuhang Li
Jiwen Lu
Dacheng Tao
DiffM
MQ
119
0
0
28 Jul 2024
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
91
0
0
26 Jul 2024
Comprehensive Study on Performance Evaluation and Optimization of Model Compression: Bridging Traditional Deep Learning and Large Language Models
Aayush Saxena
Arit Kumar Bishwas
Ayush Ashok Mishra
Ryan Armstrong
44
2
0
22 Jul 2024
Inverted Activations
Georgii Sergeevich Novikov
Ivan Oseledets
35
0
0
22 Jul 2024
StreamTinyNet: video streaming analysis with spatial-temporal TinyML
Hazem Hesham Yousef Shalby
Massimo Pavan
Manuel Roveri
80
1
0
22 Jul 2024
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Yifei Gao
Jie Ou
Lei Wang
Fanhua Shang
Jaji Wu
MQ
107
0
0
22 Jul 2024
Previous
1
2
3
4
5
...
24
25
26
Next