ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1911.09925
  4. Cited By
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via
  Full-Stack Integration

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration

22 November 2019
Hasan Genç
Seah Kim
Alon Amid
Ameer Haj-Ali
Vighnesh Iyer
P. Prakash
Jerry Zhao
D. Grubb
Harrison Liew
Howard Mao
Albert J. Ou
Colin Schmidt
Samuel Steffl
J. Wright
Ion Stoica
Jonathan Ragan-Kelley
Krste Asanović
B. Nikolić
Y. Shao
ArXivPDFHTML

Papers citing "Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration"

33 / 33 papers shown
Title
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases
Pingqing Zheng
Jiayin Qin
Fuqi Zhang
Shang Wu
Yu Cao
Caiwen Ding
Yang
Zhao
12
0
0
21 May 2025
CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs
CaMDN: Enhancing Cache Efficiency for Multi-tenant DNNs on Integrated NPUs
Tianhao Cai
Liang Wang
Limin Xiao
Meng Han
Zeyu Wang
Lin Sun
Xiaojian Liao
36
0
0
10 May 2025
TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator
Deepak Vungarala
Mohammed E. Elbtity
Sumiya Syed
Sakila Alam
Kartik Pandit
Arnob Ghosh
Ramtin Zand
Shaahin Angizi
41
1
0
07 Mar 2025
FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference
N. Cherezova
Artur Jutman
M. Jenihhin
75
0
0
06 Mar 2025
Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs
Zhantong Zhu
Hongou Li
Wenjie Ren
Meng Wu
Le Ye
Ru Huang
Tianyu Jia
48
0
0
01 Mar 2025
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator
Guoyu Li
Shengyu Ye
Chong Chen
Yang Wang
Fan Yang
Ting Cao
Cheng Liu
Mohamed M. Sabry
Mao Yang
MQ
199
0
0
18 Jan 2025
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
MATCH: Model-Aware TVM-based Compilation for Heterogeneous Edge Devices
Mohamed Amine Hamdi
Francesco Daghero
G. M. Sarda
Josse Van Delm
Arne Symons
Luca Benini
Marian Verhelst
Daniele Jahier Pagliari
Luca Bompani
29
2
0
11 Oct 2024
HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in
  resource constrained devices
HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices
Federico Nicolás Peccia
Luciano Ferreyro
Alejandro Furfaro
20
0
0
26 Aug 2024
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with
  the Gemmini Accelerator
Efficient Edge AI: Deploying Convolutional Neural Networks on FPGA with the Gemmini Accelerator
Federico Nicolás Peccia
Svetlana Pavlitska
Tobias Fleck
Oliver Bringmann
30
0
0
14 Aug 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference
  Serving at Scale
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Jaehong Cho
Minsu Kim
Hyunmin Choi
Guseul Heo
Jongse Park
49
9
0
10 Aug 2024
LLM-Aided Compilation for Tensor Accelerators
LLM-Aided Compilation for Tensor Accelerators
Charles Hong
Sahil Bhatia
Altan Haan
Shengjun Kris Dong
Dima Nikiforov
Alvin Cheung
Y. Shao
37
0
0
06 Aug 2024
HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML
  Platforms
HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms
Josse Van Delm
Maarten Vandersteegen
Luca Bompani
G. M. Sarda
Francesco Conti
Daniele Jahier Pagliari
Luca Benini
Marian Verhelst
38
7
0
11 Jun 2024
Iterative Filter Pruning for Concatenation-based CNN Architectures
Iterative Filter Pruning for Concatenation-based CNN Architectures
Svetlana Pavlitska
Oliver Bagge
Federico Nicolás Peccia
Toghrul Mammadov
J. Marius Zöllner
VLM
3DPC
48
2
0
04 May 2024
Allo: A Programming Model for Composable Accelerator Design
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen
Niansong Zhang
Shaojie Xiang
Zhichen Zeng
Mengjia Dai
Zhiru Zhang
56
14
0
07 Apr 2024
Data-Oblivious ML Accelerators using Hardware Security Extensions
Data-Oblivious ML Accelerators using Hardware Security Extensions
Hossam ElAtali
John Z. Jekel
Lachlan J. Gunn
N. Asokan
19
0
0
29 Jan 2024
DEAP: Design Space Exploration for DNN Accelerator Parallelism
DEAP: Design Space Exploration for DNN Accelerator Parallelism
Ekansh Agrawal
Xiangyu Sam Xu
31
1
0
24 Dec 2023
A Hardware Evaluation Framework for Large Language Model Inference
A Hardware Evaluation Framework for Large Language Model Inference
Hengrui Zhang
August Ning
R. Prabhakar
D. Wentzlaff
ELM
35
17
0
05 Dec 2023
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Serena Curzel
Fabrizio Ferrandi
Leandro Fiorin
Daniele Ielmini
Cristina Silvano
...
Enrico Russo
Valeria Cardellini
Salvatore Filippone
Francesco Lo Presti
Stefania Perri
36
4
0
29 Nov 2023
Tackling the Matrix Multiplication Micro-kernel Generation with Exo
Tackling the Matrix Multiplication Micro-kernel Generation with Exo
Adrián Castelló
Julian Bellavita
Grace Dinh
Yuka Ikarashi
Héctor J. Martínez
8
4
0
26 Oct 2023
An Open-Source ML-Based Full-Stack Optimization Framework for Machine
  Learning Accelerators
An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators
H. Esmaeilzadeh
Soroush Ghodrati
A. Kahng
Joo-Young Kim
Sean Kinzer
...
R. Mahapatra
Susmita Dey Manasi
S. Sapatnekar
Zhiang Wang
Ziqing Zeng
27
4
0
23 Aug 2023
Performance Analysis of DNN Inference/Training with Convolution and
  non-Convolution Operations
Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations
H. Esmaeilzadeh
Soroush Ghodrati
A. Kahng
Sean Kinzer
Susmita Dey Manasi
S. Sapatnekar
Zhiang Wang
27
1
0
29 Jun 2023
DiffRate : Differentiable Compression Rate for Efficient Vision
  Transformers
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Yonghong Tian
Wenqi Shao
Peng Xu
Mingbao Lin
Kaipeng Zhang
Rongrong Ji
Rongrong Ji
Yu Qiao
Ping Luo
ViT
49
44
0
29 May 2023
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural
  Networks
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
Seah Kim
Hasan Genç
Vadim Nikiforov
Krste Asanović
B. Nikolić
Y. Shao
27
20
0
10 May 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
41
102
0
27 Feb 2023
Integration of a systolic array based hardware accelerator into a DNN
  operator auto-tuning framework
Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework
Federico Nicolás Peccia
Oliver Bringmann
19
5
0
06 Dec 2022
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model
  Co-Exploration
QUIDAM: A Framework for Quantization-Aware DNN Accelerator and Model Co-Exploration
A. Inci
Siri Garudanagiri Virupaksha
Aman Jain
Ting-Wu Chin
Venkata Vivek Thallam
Ruizhou Ding
Diana Marculescu
MQ
23
3
0
30 Jun 2022
Machine Learning Sensors
Machine Learning Sensors
Pete Warden
Matthew P. Stewart
Brian Plancher
Colby R. Banbury
Shvetank Prakash
Emma Chen
Zain Asgar
Sachin Katti
Vijay Janapa Reddi
47
12
0
07 Jun 2022
Communication Bounds for Convolutional Neural Networks
Communication Bounds for Convolutional Neural Networks
An Chen
J. Demmel
Grace Dinh
Mason Haberle
Olga Holtz
9
4
0
18 Apr 2022
Hardware/Software Co-Programmable Framework for Computational SSDs to
  Accelerate Deep Learning Service on Large-Scale Graphs
Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs
Miryeong Kwon
Donghyun Gouk
Sangwon Lee
Myoungsoo Jung
GNN
21
26
0
23 Jan 2022
CFU Playground: Full-Stack Open-Source Framework for Tiny Machine
  Learning (tinyML) Acceleration on FPGAs
CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs
Shvetank Prakash
Tim Callahan
Joseph R. Bushagour
Colby R. Banbury
Alan V. Green
Pete Warden
Tim Ansell
Vijay Janapa Reddi
43
31
0
05 Jan 2022
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix
  Dense-Matrix Multiplication
Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication
Linghao Song
Yuze Chi
Atefeh Sohrabizadeh
Young-kyu Choi
Jason Lau
Jason Cong
GNN
16
60
0
22 Sep 2021
A Survey on Domain-Specific Memory Architectures
A Survey on Domain-Specific Memory Architectures
Stephanie Soldavini
C. Pilato
13
3
0
19 Aug 2021
GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific
  Caching
GNNIE: GNN Inference Engine with Load-balancing and Graph-Specific Caching
Sudipta Mondal
Susmita Dey Manasi
K. Kunal
S. Ramprasath
S. Sapatnekar
GNN
6
15
0
21 May 2021
1