ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized
Transformers

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

7 July 2023

Gamze Islamoglu

Victor J. B. Jung

Angelo Garofalo

Luca Benini

Papers citing "ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers"

8 / 8 papers shown

Title
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers Run Wang Gamze Islamoglu Andrea Belano Viviane Potocnik Francesco Conti Angelo Garofalo Luca Benini 26 0 0 15 Apr 2025
EXAQ: Exponent Aware Quantization For LLMs Acceleration Moran Shkolnik Maxim Fishman Brian Chmiel Hilla Ben-Yaacov Ron Banner Kfir Y. Levy MQ 21 0 0 04 Oct 2024
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow Philip Wiese Gamze İslamoğlu Moritz Scherer Luka Macan Victor J. B. Jung Alessio Burrello Francesco Conti Luca Benini 29 0 0 05 Aug 2024
Reusing Softmax Hardware Unit for GELU Computation in Transformers C. Peltekis K. Alexandridis G. Dimitrakopoulos 27 0 0 15 Feb 2024
BETA: Binarized Energy-Efficient Transformer Accelerator at the Edge Yuhao Ji Chao Fang Zhongfeng Wang 30 3 0 22 Jan 2024
Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO Julian Moosmann Pietro Bonazzi Yawei Li Sizhen Bian Philipp Mayer Luca Benini Michele Magno 28 12 0 02 Nov 2023
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 280 1,982 0 09 Feb 2021
I-BERT: Integer-only BERT Quantization Sehoon Kim A. Gholami Z. Yao Michael W. Mahoney Kurt Keutzer MQ 102 341 0 05 Jan 2021