ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.01405
  4. Cited By
I-ViT: Integer-only Quantization for Efficient Vision Transformer
  Inference

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

4 July 2022
Zhikai Li
Qingyi Gu
    MQ
ArXivPDFHTML

Papers citing "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference"

50 / 59 papers shown
Title
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities
NeuroSim V1.5: Improved Software Backbone for Benchmarking Compute-in-Memory Accelerators with Device and Circuit-level Non-idealities
James Read
Ming-Yen Lee
Wei-Hsing Huang
Yuan-Chun Luo
A. Lu
Shimeng Yu
32
0
0
05 May 2025
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Low-Bit Integerization of Vision Transformers using Operand Reodering for Efficient Hardware
Ching-Yi Lin
Sahil Shah
MQ
64
0
0
11 Apr 2025
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Post-Training Quantization for Diffusion Transformer via Hierarchical Timestep Grouping
Ning Ding
Jing Han
Yuchuan Tian
Chao Xu
Kai Han
Yehui Tang
MQ
44
0
0
10 Mar 2025
SAQ-SAM: Semantically-Aligned Quantization for Segment Anything Model
Jing Zhang
Z. Li
Qingyi Gu
MQ
VLM
53
0
0
09 Mar 2025
AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Wenlun Zhang
Shimpei Ando
Kentaro Yoshioka
VLM
MQ
67
0
0
05 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
35
0
0
03 Mar 2025
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Continual Quantization-Aware Pre-Training: When to transition from 16-bit to 1.58-bit pre-training for BitNet language models?
Jacob Nielsen
Peter Schneider-Kamp
Lukas Galke
MQ
61
1
0
17 Feb 2025
UAV-Assisted Real-Time Disaster Detection Using Optimized Transformer Model
UAV-Assisted Real-Time Disaster Detection Using Optimized Transformer Model
Branislava Jankovic
Sabina Jangirova
Waseem Ullah
Latif U. Khan
Mohsen Guizani
31
0
0
21 Jan 2025
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Yunshan Zhong
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Fei Chao
Zhanpeng Zeng
Rongrong Ji
MQ
94
0
0
31 Dec 2024
Slicing Vision Transformer for Flexible Inference
Slicing Vision Transformer for Flexible Inference
Yitian Zhang
Huseyin Coskun
Xu Ma
Huan Wang
Ke Ma
Xi
Chen
Derek Hao Hu
Y. Fu
ViT
76
0
0
06 Dec 2024
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient,
  Fast, and Efficient Transformer Acceleration
FLARE: FP-Less PTQ and Low-ENOB ADC Based AMS-PiM for Error-Resilient, Fast, and Efficient Transformer Acceleration
Donghyeon Yi
Seoyoung Lee
Jongho Kim
Junyoung Kim
Sohmyung Ha
Ik Joon Chang
Minkyu Je
70
0
0
22 Nov 2024
Quantization without Tears
Quantization without Tears
Minghao Fu
Hao Yu
Jie Shao
Junjie Zhou
Ke Zhu
Jianxin Wu
MQ
64
1
0
21 Nov 2024
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
70
0
0
20 Nov 2024
When are 1.58 bits enough? A Bottom-up Exploration of BitNet
  Quantization
When are 1.58 bits enough? A Bottom-up Exploration of BitNet Quantization
Jacob Nielsen
Lukas Galke
Peter Schneider-Kamp
MQ
32
1
0
08 Nov 2024
M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed
  Quantization
M2^22-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization
Yanbiao Liang
Huihong Shi
Zhongfeng Wang
MQ
21
0
0
10 Oct 2024
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight
  Dilation
DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation
Xuewen Liu
Zhikai Li
Qingyi Gu
MQ
32
4
0
22 Sep 2024
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in
  Healthcare
Privacy-Preserving SAM Quantization for Efficient Edge Intelligence in Healthcare
Zhikai Li
Jing Zhang
Qingyi Gu
MedIm
36
1
0
14 Sep 2024
VLTP: Vision-Language Guided Token Pruning for Task-Oriented
  Segmentation
VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Yezi Liu
SungHeon Jeong
Fei Wen
Nathaniel Bastian
Hugo Latapie
Mohsen Imani
VLM
32
4
0
13 Sep 2024
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking
  Language Model
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
Kaiwen Tang
Zhanglu Yan
Weng-Fai Wong
15
0
0
04 Sep 2024
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li
Xuewen Liu
Dongrong Fu
Jianquan Li
Qingyi Gu
Kurt Keutzer
Zhen Dong
EGVM
VGen
DiffM
89
1
0
26 Aug 2024
PQV-Mobile: A Combined Pruning and Quantization Toolkit to Optimize
  Vision Transformers for Mobile Applications
PQV-Mobile: A Combined Pruning and Quantization Toolkit to Optimize Vision Transformers for Mobile Applications
Kshitij Bhardwaj
ViT
MQ
34
1
0
15 Aug 2024
DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training
  Quantization for Vision Transformers
DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers
Lianwei Yang
Haisong Gong
Qingyi Gu
MQ
32
3
0
06 Aug 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
48
1
0
29 Jul 2024
Mixed Non-linear Quantization for Vision Transformers
Mixed Non-linear Quantization for Vision Transformers
Gihwan Kim
Jemin Lee
Sihyeong Park
Yongin Kwon
Hyungshin Kim
MQ
35
0
0
26 Jul 2024
Co-Designing Binarized Transformer and Hardware Accelerator for
  Efficient End-to-End Edge Deployment
Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment
Yuhao Ji
Chao Fang
Shaobo Ma
Haikuo Shao
Zhongfeng Wang
MQ
39
1
0
16 Jul 2024
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training
  Quantization of ViTs
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
Akshat Ramachandran
Souvik Kundu
Tushar Krishna
MQ
32
9
0
07 Jul 2024
Integer-only Quantized Transformers for Embedded FPGA-based Time-series
  Forecasting in AIoT
Integer-only Quantized Transformers for Embedded FPGA-based Time-series Forecasting in AIoT
Tianheng Ling
Chao Qian
Gregor Schiele
AI4TS
MQ
19
1
0
06 Jul 2024
ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
Zhengqing Yuan
Rong-Er Zhou
Hongyi Wang
Lifang He
Yanfang Ye
Lichao Sun
MQ
25
8
0
26 Jun 2024
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller
  Networks
BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
Jacob Nielsen
Peter Schneider-Kamp
MQ
35
4
0
24 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
50
2
0
12 Jun 2024
P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for
  Fully Quantized Vision Transformer
P2^22-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer
Huihong Shi
Xin Cheng
Wendong Mao
Zhongfeng Wang
MQ
40
3
0
30 May 2024
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit
  Large Language Models
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Xing Hu
Yuan Cheng
Dawei Yang
Zhihang Yuan
Jiangyong Yu
Chen Xu
Sifan Zhou
MQ
36
7
0
28 May 2024
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free
  Efficient Vision Transformer
Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer
Huihong Shi
Haikuo Shao
Wendong Mao
Zhongfeng Wang
ViT
MQ
36
3
0
06 May 2024
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression
  and Deployment Toolkit for Prototype Hardware Accelerator Design
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
Jian Meng
Yuan Liao
Anupreetham Anupreetham
Ahmed Hassan
Shixing Yu
Han-Sok Suh
Xiaofeng Hu
Jae-sun Seo
MQ
49
1
0
02 May 2024
Model Quantization and Hardware Acceleration for Vision Transformers: A
  Comprehensive Survey
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Dayou Du
Gu Gong
Xiaowen Chu
MQ
38
7
0
01 May 2024
IM-Unpack: Training and Inference with Arbitrarily Low Precision
  Integers
IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers
Zhanpeng Zeng
Karthikeyan Sankaralingam
Vikas Singh
58
1
0
12 Mar 2024
A Comprehensive Survey of Convolutions in Deep Learning: Applications,
  Challenges, and Future Trends
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
Abolfazl Younesi
Mohsen Ansari
Mohammadamin Fazli
A. Ejlali
Muhammad Shafique
Joerg Henkel
3DV
47
44
0
23 Feb 2024
RepQuant: Towards Accurate Post-Training Quantization of Large
  Transformer Models via Scale Reparameterization
RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization
Zhikai Li
Xuewen Liu
Jing Zhang
Qingyi Gu
MQ
37
7
0
08 Feb 2024
A Survey on Transformer Compression
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
31
27
0
05 Feb 2024
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise
  Relevance Propagation
LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation
Navin Ranjan
Andreas E. Savakis
MQ
26
6
0
20 Jan 2024
Enhanced Distribution Alignment for Post-Training Quantization of
  Diffusion Models
Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models
Xuewen Liu
Zhikai Li
Junrui Xiao
Qingyi Gu
MQ
34
13
0
09 Jan 2024
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of
  Post-Training ViTs Quantization
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization
Yunshan Zhong
Jiawei Hu
Mingbao Lin
Mengzhao Chen
Rongrong Ji
MQ
28
3
0
16 Nov 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
32
7
0
11 Oct 2023
Jumping through Local Minima: Quantization in the Loss Landscape of
  Vision Transformers
Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers
N. Frumkin
Dibakar Gope
Diana Marculescu
MQ
35
16
0
21 Aug 2023
A Model for Every User and Budget: Label-Free and Personalized
  Mixed-Precision Quantization
A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization
Edward Fish
Umberto Michieli
Mete Ozay
MQ
30
4
0
24 Jul 2023
A Survey of Techniques for Optimizing Transformer Inference
A Survey of Techniques for Optimizing Transformer Inference
Krishna Teja Chitty-Venkata
Sparsh Mittal
M. Emani
V. Vishwanath
Arun Somani
40
62
0
16 Jul 2023
Vision Transformers for Mobile Applications: A Short Survey
Vision Transformers for Mobile Applications: A Short Survey
Nahid Alam
Steven Kolawole
S. Sethi
Nishant Bansali
Karina Nguyen
ViT
28
3
0
30 May 2023
Transformer-based models and hardware acceleration analysis in
  autonomous driving: A survey
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
44
17
0
21 Apr 2023
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
Alberto Marchisio
David Durà
Maurizio Capra
Maurizio Martina
Guido Masera
Muhammad Shafique
33
18
0
08 Apr 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
33
101
0
27 Feb 2023
12
Next