ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.03640
6
11

Analyzing GPU Tensor Core Potential for Fast Reductions

8 March 2019
R. Carrasco
R. Vega
C. Navarro
ArXivPDFHTML
Abstract

The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep Learning} applications. In this work we present the idea of using tensor cores for a different purpose such as the parallel arithmetic reduction problem, and propose a new GPU tensor-core based algorithm as well as analyze its potential performance benefits in comparison to a traditional GPU-based one. The proposed method, encodes the reduction of nnn numbers as a set of m×mm\times mm×m MMA tensor-core operations (for Nvidia's Volta architecture m=16m=16m=16) and takes advantage from the fact that each MMA operation takes just one GPU cycle. When analyzing the cost under a simplified GPU computing model, the result is that the new algorithm manages to reduce a problem of nnn numbers in T(n)=5log⁡m2(n)T(n) = 5\log_{m^2}(n)T(n)=5logm2​(n) steps with a speedup of S=45log⁡2(m2)S = \frac{4}{5}\log_2(m^2)S=54​log2​(m2).

View on arXiv
Comments on this paper