Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators

25 January 2024

Papers citing "Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators"

5 / 5 papers shown

Title
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs Qizhe Wu Huawen Liang Yuchen Gui Zhichen Zeng Z. He ... Letian Zhao Zhaoxi Zeng W. Yuan Wei Wu Xi Jin 49 0 0 08 Mar 2025
Accumulator-Aware Post-Training Quantization Ian Colbert Fabian Grob Giuseppe Franco Jinjie Zhang Rayan Saab MQ 30 3 0 25 Sep 2024
A2Q+: Improving Accumulator-Aware Weight Quantization Ian Colbert Alessandro Pappalardo Jakoba Petri-Koenig Yaman Umuroglu MQ 29 4 0 19 Jan 2024
Overcoming Oscillations in Quantization-Aware Training Markus Nagel Marios Fournarakis Yelysei Bondarenko Tijmen Blankevoort MQ 111 101 0 21 Mar 2022
Pruning and Quantization for Deep Neural Network Acceleration: A Survey Tailin Liang C. Glossner Lei Wang Shaobo Shi Xiaotong Zhang MQ 150 675 0 24 Jan 2021