ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.00322
24
0

T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET

1 March 2025
Seunghyun Moon
Mao Li
Gregory K. Chen
Phil Knag
R. Krishnamurthy
Mingoo Seok
ArXivPDFHTML
Abstract

This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference. Additionally, a new control flow mechanism, called dynamic batching, and a novel buffer architecture, termed a two-direction accessible register file, further reduce external memory access while improving hardware utilization.

View on arXiv
@article{moon2025_2503.00322,
  title={ T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET },
  author={ Seunghyun Moon and Mao Li and Gregory Chen and Phil Knag and Ram Krishnamurthy and Mingoo Seok },
  journal={arXiv preprint arXiv:2503.00322},
  year={ 2025 }
}
Comments on this paper